真实的国产乱ⅩXXX66竹夫人,五月香六月婷婷激情综合,亚洲日本VA一区二区三区,亚洲精品一区二区三区麻豆

成都創(chuàng)新互聯(lián)網(wǎng)站制作重慶分公司

Elasticsearch搜索打分計(jì)算原理淺析-創(chuàng)新互聯(lián)

搜索打分計(jì)算幾個(gè)關(guān)鍵詞

從網(wǎng)站建設(shè)到定制行業(yè)解決方案,為提供成都做網(wǎng)站、網(wǎng)站建設(shè)服務(wù)體系,各種行業(yè)企業(yè)客戶提供網(wǎng)站建設(shè)解決方案,助力業(yè)務(wù)快速發(fā)展。創(chuàng)新互聯(lián)建站將不斷加快創(chuàng)新步伐,提供優(yōu)質(zhì)的建站服務(wù)。
  • TF: token frequency ,某個(gè)搜索字段分詞后再document中字段(待搜索的字段)中出現(xiàn)的次數(shù)

  • IDF:inverse document frequency,逆文檔頻率,某個(gè)搜索的字段在所有document中出現(xiàn)的次數(shù)取反

  • TFNORM:token frequency normalized,詞頻歸一化
  • BM25:算法:(freq + k1 * (1 - b + b * dl / avgdl))

兩個(gè)文檔如下:

{
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "321697",
        "_score" : 6.6273837,
        "_source" : {
          "title" : "Steve Jobs"
      }
}
{
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "23706",
        "_score" : 6.0948296,
        "_source" : {
          "title" : "All About Steve"
      }
}

如果我們通過titlematch查詢

GET /movies/_search
{
  "query": {
    "match": {
      "title": "steve"
    }
  }
}

那么從打分結(jié)果就可以看出第一個(gè)文檔打分高于第二個(gè),這個(gè)具體原因是:

TF方面看在帶搜索字段上出現(xiàn)的頻率一致

IDF方面看在整個(gè)文檔中出現(xiàn)的頻率一致

TFNORM方面則不一樣了,第一個(gè)文檔中該詞占比為1/2,第二個(gè)文檔中該詞占比為1/3,故而第一個(gè)文檔在該搜索下打分比第二個(gè)索引高,所以ES算法時(shí)使用了TFNORM計(jì)算方式freq / (freq + k1 * (1 - b + b * dl / avgdl))

最后的ES中的TF算法融合了詞頻歸一化BM25

如果我們要查看具體Elasticsearch一個(gè)打分算法,則可以通過如下命令展示

GET /movies/_search
{
  // 和MySQL的執(zhí)行計(jì)劃類似
  "explain": true, 
  "query": {
    "match": {
      "title": "steve"
    }
  }
}

執(zhí)行結(jié)果,查看其中一個(gè)

{
    "_shard": "[movies][1]",
    "_node": "pqNhgutvQfqcLqLEzIDnbQ",
    "_index": "movies",
    "_type": "_doc",
    "_id": "321697",
    "_score": 6.6273837,
    "_source": {
        "overview": "Set backstage at three iconic product launches and ending in 1998 with the unveiling of the iMac, Steve Jobs takes us behind the scenes of the digital revolution to paint an intimate portrait of the brilliant man at its epicenter.",
        "voteAverage": 6.8,
        "keywords": [
            {
                "id": 5565,
                "name": "biography"
            },
            {
                "id": 6104,
                "name": "computer"
            },
            {
                "id": 15300,
                "name": "father daughter relationship"
            },
            {
                "id": 157935,
                "name": "apple computer"
            },
            {
                "id": 161160,
                "name": "steve jobs"
            },
            {
                "id": 185722,
                "name": "based on true events"
            }
        ],
        "releaseDate": "2015-01-01T00:00:00.000Z",
        "runtime": 122,
        "originalLanguage": "en",
        "title": "Steve Jobs",
        "productionCountries": [
            {
                "iso_3166_1": "US",
                "name": "United States of America"
            }
        ],
        "revenue": 34441873,
        "genres": [
            {
                "id": 18,
                "name": "Drama"
            },
            {
                "id": 36,
                "name": "History"
            }
        ],
        "originalTitle": "Steve Jobs",
        "popularity": 53.670525,
        "tagline": "Can a great man be a good man?",
        "spokenLanguages": [
            {
                "iso_639_1": "en",
                "name": "English"
            }
        ],
        "id": 321697,
        "voteCount": 1573,
        "productionCompanies": [
            {
                "name": "Universal Pictures",
                "id": 33
            },
            {
                "name": "Scott Rudin Productions",
                "id": 258
            },
            {
                "name": "Legendary Pictures",
                "id": 923
            },
            {
                "name": "The Mark Gordon Company",
                "id": 1557
            },
            {
                "name": "Management 360",
                "id": 4220
            },
            {
                "name": "Cloud Eight Films",
                "id": 6708
            }
        ],
        "budget": 30000000,
        "homepage": "http://www.stevejobsthefilm.com",
        "status": "Released"
    },
    -          }
                ]
            }
        ]
    }
}

此時(shí)可以看到結(jié)果多出了以下的一組數(shù)據(jù)(執(zhí)行計(jì)劃)

{
    "_explanation": {
        "value": 6.6273837,
        // title字段值steve在所有匹配的1526個(gè)文檔中的權(quán)重
        "description": "weight(title:steve in 1526) [PerFieldSimilarity], result of:",
        "details": [
            {
                // value = idf.value * tf.value * 2.2
                // 6.6273837 = 6.4412656 * 0.46767938 * 2.2
                "value": 6.6273837,
                "description": "score(freq=1.0), product of:",
                "details": [
                    {
                        "value": 2.2,
                        // 放大因子,這個(gè)數(shù)值可以在創(chuàng)建索引的時(shí)候指定,默認(rèn)值是2.2
                        "description": "boost",
                        "details": []
                    },
                    {
                        "value": 6.4412656,
                        "description": "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                        "details": [
                            {
                                "value": 2,
                                "description": "n, number of documents containing term",
                                "details": []
                            },
                            {
                                "value": 1567,
                                "description": "N, total number of documents with field",
                                "details": []
                            }
                        ]
                    },
                    {
                        "value": 0.46767938,
                        "description": "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                        "details": [
                            {
                                "value": 1,
                                "description": "freq, occurrences of term within document",
                                "details": []
                            },
                            // 這塊提現(xiàn)了BM25算法((freq + k1 * (1 - b + b * dl / avgdl)))
                            {
                                "value": 1.2,
                                "description": "k1, term saturation parameter",
                                "details": []
                            },
                            {
                                "value": 0.75,
                                "description": "b, length normalization parameter",
                                "details": []
                            },
                            // 這塊就可以提現(xiàn)出一個(gè)歸一化的操作算法
                            {
                                "value": 2,
                                "description": "dl, length of field",
                                "details": []
                            },
                            {
                                "value": 2.1474154,
                                "description": "avgdl, average length of field",
                                "details": []
                            }
                        ]
                    }
                ]
            }
        ]
    }
}

另外有需要云服務(wù)器可以了解下創(chuàng)新互聯(lián)cdcxhl.cn,海內(nèi)外云服務(wù)器15元起步,三天無理由+7*72小時(shí)售后在線,公司持有idc許可證,提供“云服務(wù)器、裸金屬服務(wù)器、高防服務(wù)器、香港服務(wù)器、美國服務(wù)器、虛擬主機(jī)、免備案服務(wù)器”等云主機(jī)租用服務(wù)以及企業(yè)上云的綜合解決方案,具有“安全穩(wěn)定、簡單易用、服務(wù)可用性高、性價(jià)比高”等特點(diǎn)與優(yōu)勢(shì),專為企業(yè)上云打造定制,能夠滿足用戶豐富、多元化的應(yīng)用場景需求。


網(wǎng)頁題目:Elasticsearch搜索打分計(jì)算原理淺析-創(chuàng)新互聯(lián)
網(wǎng)頁地址:http://weahome.cn/article/gchhi.html

其他資訊

在線咨詢

微信咨詢

電話咨詢

028-86922220(工作日)

18980820575(7×24)

提交需求

返回頂部