ElasticSearch 实现分词全文检索 - Restful基本操作

elasticsearch,实现,分词,全文检索,restful,基本操作 · 浏览次数 : 196

小编点评

**ElasticSearch 实现分词全文检索** **简介** ElasticSearch 是一个用于搜索和分析文本的开源搜索引擎。它可以用于创建和管理索引，并使用各种搜索操作来检索文本。 **分词** 分词是将文本转换为一系列关键字的过程。ElasticSearch 使用不同的分词器来实现分词。默认情况下，ElasticSearch 使用的是 IK 分词器，它可以处理大多数常见的语言中常见的分词语。 **实现** 1. **创建索引** ```json PUT /person{ "settings": { "number_of_shards": 5, "number_of_replicas": 1 }, "mappings": { "properties": { "name": { "type": "text", "analyzer": "ik_max_word", "index": true, "store": false }, "author": { "type": "keyword" }, "count": { "type": "long" }, "on-sale": { "type": "date", "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd" }, "descr": { "type": "text" } } } } ``` 2. **添加文档** ```json POST /book/_doc{ "name": "太极拳", "author": "陈王廷", "count": 666, "on-sale": "2023-02-23", "descr": "掤、捋、挤、按、采、列、肘、靠" } ``` **修改文档** ```json PUT /book/_doc/1/_update{ "doc": { "name": "太极拳", "author": "陈王廷", "count": 666, "on-sale": "2023-02-28", "descr": "掤、捋、挤、按、采、列、肘、靠" } } ``` **删除文档** ```json DELETE /book/_doc/YJJLfYYBGlLaT58LDoV5 ``` **总结** 分词是使用 Elasticsearch 进行文本搜索的一种重要技术。通过创建索引并使用不同的文档格式，您可以创建可以用于各种搜索的索引。

正文

Restful 语法

GET 请求:

http://ip:port/index: 查询索引信息
http://ip;port/index/type/doc_id: 查询指定的文档信息
复制

POST 请求:

http://ip;port/index/type/_search: 查询文档，可以在请求体中添加json字符串来代表查询条件
http://ip;port/index/type/doc_id/_update: 修改文档，在请求体中指定ison字符串代表修改的具体信息
复制

PUT 请求:

http://ip;port/index: 创建一个索引，需要在请求体中指定索引的信息，类型，结构
http://ip:port/index/type/_mappings: 代表创建索引时，指定索引文档存储的属性的信息
复制

DELETE 请求:

http://ip;port/index: 删除跑路
http://ip;port/index/type/doc_id: 删除指定的文档
复制

操作

创建一个索引

Kibana 操作
创建 person 索引

# 创建一个索引
PUT /person
{
  "settings": {
    "number_of_shards": 5, #分片数
    "number_of_replicas": 1 #备份数
  }
}
复制

返回值

{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "person"
}
复制

Kibana 中查看

Stack Management -> 索引管理

单机版，分片无法存放，所以 yellow 黄色

查看索引信息

# 查看索引信息
GET /person
复制

删除索引信息

# 删除索引
DELETE /person
复制

Kibana 操作

Field datatypes

String:
 text: 一般用于全文检索。将当前的Field进行分词
 keyword: 当前 Field 不会被分词 
数值类型：
 long、 integer、 short、 byte、 double、 float、
 half_float：精度比float小一半
 scaled_float:根据一个long和scaled来表达一个浮点型, long=345,scaled=100 => 3.45
时间类型：
 date：针对时间类型指定具体格式
布尔类型：
 boolean：表达true和false
二进制类型：
 binary：暂时支持Base64 encode string
范围类型（Range datatypes）：
 long_range: 赋值时，无序指定具体的内容，只需要存储一个范围即可，指定gt，此，gte，lte
 integer_range:同上
 double_range:同上
 float_range: 同上
 date_range:同上
 ip_range: 同上。
经纬度类型：
 geo_point: 用来存储经纬度，结合定位的经纬度，来计算出距离
IP类型
 ip： 可以存付IPV4、IPV6
复制

其它类型参考官网：

创建索引并指定结构

## 创建索引并指定结构
PUT /book
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 1
  },
  "mappings": {
    // 文档存储的Field
    "properties":{
      "name":{
        // 类型
        "type":"text",
        // 指定分词器
        "analyzer":"ik_max_word",
        // 指定当前Field可以被作为查询的条件
        "index":true,
        // 是否需要额外存储
        "store":false
      },
      "author":{
        "type":"keyword"
      },
      "count":{
        "type":"long"
      },
      "on-sale":{
        "type":"date",
        // 时间类型时格式化方式
        "format":"yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
      },
      "descr":{
        "type":"text",
        "analyzer":"ik_max_word"
      }
    }
  }
}
复制

文档操作

文档在ES服务中的唯一标识，_index, _type, _id 三个内容为组合，锁定一个文档进行操作

新建文档

自动生成 _id

ID不方便使用

# 添加文档，自动生成 id
POST /book/_doc
{
  "name":"太极",
  "author":"伏羲",
  "count":888,
  "on-sale":"2023-02-23",
  "descr":"太极生两仪，两仪生四象，四象生八卦"
}
复制

{
  "_index" : "book",
  "_type" : "_doc", //创建时没指定，默认的
  "_id" : "YJJLfYYBGlLaT58LDoV5",  // 自动生成的ID
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}
复制

手动创建ID

POST /book/_doc/1
{
  "name":"太极拳",
  "author":"陈王廷",
  "count":666,
  "on-sale":"2023-02-23",
  "descr":"掤、捋、挤、按、采、列、肘、靠"
}
复制

返回值

{
  "_index" : "book",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 1,
  "_primary_term" : 1
}
复制

修改文档

覆盖式修改

PUT /book/_doc/1
{
  "name":"太极拳",
  "author":"陈王廷",
  "count":666, //如果不赋值，原来的值将被更新成 0
  "on-sale":"2023-02-28",
  "descr":"掤、捋、挤、按、采、列、肘、靠"
}
复制

doc 修改方式

Deprecation: [types removal] Specifying types in document update requests is deprecated, use the endpoint /{index}/_update/{id} instead.

POST /book/_doc/1/_update
{
  "doc":{
    // 指定Field单独修改
    "count":6666
  }  
}
复制

返回值

#! Deprecation: [types removal] Specifying types in document update requests is deprecated, use the endpoint /{index}/_update/{id} instead.
{
  "_index" : "book",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 2,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 2,
  "_primary_term" : 1
}
复制

删除文档

DELETE /book/_doc/YJJLfYYBGlLaT58LDoV5
复制