需求:
創(chuàng)新互聯(lián)專注于岱山企業(yè)網(wǎng)站建設(shè),響應(yīng)式網(wǎng)站開發(fā),商城網(wǎng)站開發(fā)。岱山網(wǎng)站建設(shè)公司,為岱山等地區(qū)提供建站服務(wù)。全流程定制網(wǎng)站建設(shè),專業(yè)設(shè)計,全程項目跟蹤,創(chuàng)新互聯(lián)專業(yè)和態(tài)度為您提供的服務(wù)IK分詞:
GitHub - medcl/elasticsearch-analysis-ik: The IK Analysis plugin integrates Lucene IK analyzer into elasticsearch, support customized dictionary.
拼音:
https://github.com/medcl/elasticsearch-analysis-pinyin
簡繁體:
ehttps://github.com/medcl/elasticsearch-analysis-stconvert
analysis分析是 Elasticsearch 在文檔發(fā)送之前對文檔正文執(zhí)行的過程,以添加到反向索引中(inverted index)。 在將文檔添加到索引之前,Elasticsearch 會為每個分析的字段執(zhí)行許多步驟:
詳細(xì)介紹:Elasticsearch: analyzer_Elastic 中國社區(qū)官方博客的博客-博客_elasticsearch analyzer如果大家之前看過我寫的文章“開始使用Elasticsearch (3)”,在文章的最后部分寫了有關(guān)于analyzer的有關(guān)介紹。在今天的文章中,我們來進(jìn)一步了解analyzer。 analyzer執(zhí)行將輸入字符流分解為token的過程,它一般發(fā)生在兩個場合:在indexing的時候,也即在建立索引的時候在searching的時候,也即在搜索時,分析需要搜索的詞語什么是analysis...https://blog.csdn.net/UbuntuTouch/article/details/100392478
三、索引模板PUT /_template/test_template
{
"index_patterns": [
"test-*"
],
"aliases": {
"test_read": {}
},
"settings": {
"index": {
"max_result_window": "100000",
"refresh_interval": "5s",
"number_of_shards": "5",
"translog": {
"flush_threshold_size": "1024mb",
"sync_interval": "30s",
"durability": "async"
},
"number_of_replicas": "1"
},
"analysis": {
"char_filter": {
"tsconvert": {
"type": "stconvert",
"convert_type": "t2s"
}
},
"analyzer": {
"ik_t2s_pinyin_analyzer": {
"type": "custom",
"char_filter": [
"tsconvert"
],
"tokenizer": "ik_max_word",
"filter": [
"pinyin_filter",
"lowercase"
]
},
"stand_t2s_pinyin_analyzer": {
"type": "custom",
"char_filter": [
"tsconvert"
],
"tokenizer": "standard",
"filter": [
"pinyin_filter",
"lowercase"
]
},
"ik_t2s_analyzer": {
"type": "custom",
"char_filter": [
"tsconvert"
],
"tokenizer": "ik_max_word",
"filter": [
"lowercase"
]
},
"stand_t2s_analyzer": {
"type": "custom",
"char_filter": [
"tsconvert"
],
"tokenizer": "standard",
"filter": [
"lowercase"
]
},
"ik_pinyin_analyzer": {
"type": "custom",
"tokenizer": "ik_max_word",
"filter": [
"pinyin_filter",
"lowercase"
]
},
"stand_pinyin_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"pinyin_filter",
"lowercase"
]
}
},
"filter": {
"pinyin_first_letter_and_full_pinyin_filter": {
"type": "pinyin",
"keep_first_letter": true,
"keep_separate_first_letter": false,
"keep_full_pinyin": false,
"keep_joined_full_pinyin": true,
"keep_none_chinese": true,
"none_chinese_pinyin_tokenize": false,
"keep_none_chinese_in_joined_full_pinyin": true,
"keep_original": false,
"limit_first_letter_length": 1000,
"lowercase": true,
"trim_whitespace": true,
"remove_duplicated_term": true
}
}
}
},
"mappings": {
"properties": {
"name": {
"index_phrases": true,
"analyzer": "ik_max_word",
"index": true,
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
},
"stand": {
"analyzer": "standard",
"type": "text"
},
"STPA": {
"type": "text",
"analyzer": "stand_t2s_pinyin_analyzer"
},
"ITPA": {
"type": "text",
"analyzer": "ik_t2s_pinyin_analyzer"
}
}
},
"desc": {
"index_phrases": true,
"analyzer": "ik_max_word",
"index": true,
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
},
"stand": {
"analyzer": "standard",
"type": "text"
},
"STPA": {
"type": "text",
"analyzer": "stand_t2s_pinyin_analyzer"
},
"ITPA": {
"type": "text",
"analyzer": "ik_t2s_pinyin_analyzer"
}
}
},
"abstr": {
"index_phrases": true,
"analyzer": "ik_max_word",
"index": true,
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
},
"stand": {
"analyzer": "standard",
"type": "text"
},
"STPA": {
"type": "text",
"analyzer": "stand_t2s_pinyin_analyzer"
},
"ITPA": {
"type": "text",
"analyzer": "ik_t2s_pinyin_analyzer"
}
}
}
}
}
}
四、DSL語句GET /test_read/_search
{
"from": 0,
"size": 10,
"terminate_after": 100000,
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "bj天安門 OR 測試",
"fields": [
"name.ITPA"
],
"type": "phrase",
"default_operator": "and"
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
},
"post_filter": {
"bool": {
"must": [
{
"match": {
"name": "天安門"
}
}
]
}
},
"highlight": {
"fragment_size": 1000,
"pre_tags": [
""
],
"post_tags": [
""
],
"fields": {
"name.stand": {},
"desc.stand": {},
"abstr.stand": {},
"name.IPA": {},
"desc.IPA": {},
"abstr.IPA": {},
"name.ITPA": {},
"desc.ITPA": {},
"abstr.ITPA": {}
}
}
}
post_filter:后過濾器 | Elasticsearch: 權(quán)威指南 | Elastic
PS:post_filter實現(xiàn)二次搜索功能,post_filter無法使用es高亮功能,需要自己通過代碼進(jìn)行手動標(biāo)記高亮;根據(jù)上面的DSL語句,可寫出對應(yīng)的代碼啦~
拼音插件配置:
你是否還在尋找穩(wěn)定的海外服務(wù)器提供商?創(chuàng)新互聯(lián)www.cdcxhl.cn海外機(jī)房具備T級流量清洗系統(tǒng)配攻擊溯源,準(zhǔn)確流量調(diào)度確保服務(wù)器高可用性,企業(yè)級服務(wù)器適合批量采購,新人活動首月15元起,快前往官網(wǎng)查看詳情吧