Aggregation 可以和普通查詢結(jié)果并存,一個(gè)查詢結(jié)果中也允許包含多個(gè)不相關(guān)的Aggregation. 如果只關(guān)心聚合結(jié)果而不關(guān)心查詢結(jié)果的話會(huì)把SearchSource的size設(shè)置為0,能有效提高性能.
創(chuàng)新互聯(lián)公司是一家專注于成都網(wǎng)站建設(shè)、網(wǎng)站設(shè)計(jì)與策劃設(shè)計(jì),大寧網(wǎng)站建設(shè)哪家好?創(chuàng)新互聯(lián)公司做網(wǎng)站,專注于網(wǎng)站建設(shè)十載,網(wǎng)設(shè)計(jì)領(lǐng)域的專業(yè)建站公司;建站業(yè)務(wù)涵蓋:大寧等地區(qū)。大寧做網(wǎng)站價(jià)格咨詢:028-86922220
Metrics:
簡(jiǎn)單聚合類型, 對(duì)于目標(biāo)集和中的所有文檔計(jì)算聚合指標(biāo), 一般沒(méi)有嵌套的sub aggregations. 比如 平均值(avg) , 求和 (sum), 計(jì)數(shù) (count), 基數(shù) (cardinality). Cardinality對(duì)應(yīng)distinct count
Bucketing:
桶聚合類型, 在一系列的桶而不是所有文檔上計(jì)算聚合指標(biāo),每個(gè)桶表示原始結(jié)果集合中符合某種條件的子集. 一般有嵌套的sub aggregations. 典型的如TermsAggregation, HistogramAggregation
Matrix:
矩陣聚合, 多維度聚合, 即根據(jù)兩個(gè)或者多個(gè)聚合維度計(jì)算二維甚至多維聚合指標(biāo)表格. 目前貌似只有一種MatrixStatAggregation. 并且目前不支持腳本(scripting)
Aggregation request:
兩層結(jié)構(gòu):
Aggregation -> SubAggregation
Sub aggregation是在原來(lái)的Aggregation的計(jì)算結(jié)果中進(jìn)一步做聚合計(jì)算
Aggregation response:
三層結(jié)構(gòu): (針對(duì)Bucketing aggregation) MultiBucketsAggregation -> Buckets -> Aggregations
Aggregation 屬性:
name: 和請(qǐng)求中的Aggregation的名字對(duì)應(yīng)
buckets: 每個(gè)Bucket對(duì)應(yīng)Agggregation結(jié)果中每一個(gè)可能的取值和相應(yīng)的聚合結(jié)果.
Bucket 屬性:
key: 對(duì)應(yīng)的是聚合維度可能的取值, 具體的值和Aggregation的類型有關(guān), 比如Term aggregation (按交易類型計(jì)算總金額), 那么Bucket key值就是所有可能的交易類型 (credit/debit etc). 又比如DateHistogram aggregation (按天計(jì)算交易筆數(shù)), 那么Bucket key值就是具體的日期.
docCount: 對(duì)應(yīng)的是每個(gè)桶中的文本數(shù)量.
value: 對(duì)應(yīng)的是聚合指標(biāo)的計(jì)算結(jié)果. 注意如果是多層Aggregation計(jì)算, 中間層的Aggregation value一般沒(méi)有值, 比如Term aggregation. 只有到底層具體計(jì)算指標(biāo)的Aggregation才有值.
aggregations: 對(duì)應(yīng)請(qǐng)求中當(dāng)前Aggregation的subAggregation的計(jì)算結(jié)果 (如果存在)
SQL映射實(shí)現(xiàn)的前提: 只針對(duì)聚合計(jì)算,即sql select部分存在聚合函數(shù)類型的column
映射過(guò)程很難直接描述,上幾個(gè)例子方便大家理解,反正SQL的結(jié)構(gòu)也無(wú)非就是SELECT/FROM/WHERE/GROUP BY/HAVING/ORDER BY. ORDER BY先不討論,一般聚合結(jié)果不太關(guān)心順序. FROM也很容易理解,就是索引的名字.
SQL組成部分對(duì)應(yīng)的ES Builder:
Column 1 | Column 2 | Column 3 |
---|---|---|
select column (聚合函數(shù)) | MetricsAggregationBuilder 由 column對(duì)應(yīng)聚合函數(shù)決定 (例如 MaxAggregationBuilder) | |
select column (group by 字段) | Bucket key | |
where | FiltersAggregationBuilder + FiltersAggregator.KeydFilter | keyedFilter = FiltersAggregator.KeyedFilter("combineCondition", sub QueryBuilder) AggregationBuilders.filters("whereAggr", keyedFilter) |
group by | TermsAggregationBuilder | AggregationBuilders.terms("aggregation name").field(fieldName) |
having | MetricsAggregationBuilder 由 having 條件聚合函數(shù)決定 (例如 MaxAggregationBuilder) + BucketSelectorPipelineAggregationBuilder | PipelineAggregatorBuilders.bucketSelector(aggregationName, bucketPathMap, script) |
常用的SQL運(yùn)算符和聚合函數(shù)對(duì)應(yīng)的ES Builder:
Sql element | Aggregation Type | Code to build |
---|---|---|
count(field) | ValueCountAggregationBuilder | AggregationBuilders.count(metricsName).field(fieldName) |
count(distinct field) | CardinalityAggregationBuilder | AggregationBuilders.cardinality(metricsName).field(fieldName) |
sum(field) | SumAggregationBuilder | AggregationBuilders.sum(metricsName).field(fieldName) |
min(field) | MinAggregationBuilder | AggregationBuilders.min(metricsName).field(fieldName) |
max(field) | MaxAggregationBuilder | AggregationBuilders.max(metricsName).field(fieldName) |
avg(field) | AvgAggregationBuilder | AggregationBuilders.avg(metricsName).field(fieldName) |
AND | BoolQueryBuilder | QueryBuilders.boolQuery().must().add(sub QueryBuilder) |
OR | BoolQueryBuilder | QueryBuilders.boolQuery().should().add(sub QueryBuilder) |
NOT | BoolQueryBuilder | QueryBuilders.boolQuery().mustNot().add(sub QueryBuilder) |
= | TermQueryBuilder | QueryBuilders.termQuery(fieldName, value) |
IN | TermsQueryBuilder | QueryBuilders.termsQuery(fieldName, values) |
LIKE | WildcardQueryBuilder | QueryBuilders.wildcardQuery(fieldName, value) |
> | RangeQueryBuilder | QueryBuilders.rangeQuery(fieldName).gt(value) |
>= | RangeQueryBuilder | QueryBuilders.rangeQuery(fieldName).gte(value) |
< | RangeQueryBuilder | QueryBuilders.rangeQuery(fieldName).lt(value) |
<= | RangeQueryBuilder | QueryBuilders.rangeQuery(fieldName).lte(value) |
1.select count(payerId) as payerCount from Payment group by country
這里需要注意的是payerId這個(gè)doc的屬性在實(shí)際構(gòu)造的Aggregation query 中變成了 payerId.keyword,Elasticsearch 默認(rèn)對(duì)于分詞的字段(text類型)不支持聚合,會(huì)報(bào)出 "Fielddata is disabled on text fields by default. Set fielddata=true"的錯(cuò)誤. fielddata聚合是一個(gè)非常costly的運(yùn)算,一般不建議使用. 好在Elasticsearch索引時(shí)默認(rèn)會(huì)對(duì)payerId這個(gè)屬性生成兩個(gè)字段, payerId 是分詞的text類型, payerId.keyword是不分詞的keyword類型.
2.select max(payerId) from Payment group by accountId, country
兩個(gè)group by 條件對(duì)應(yīng)兩層term aggregation
3.select count(distinct payerId) as payerCount from Payment where country in ('CN', 'GE') group by accountId, country
增加了where條件, 在頂層是一個(gè)FiltersAggregationBuilder. 其中分為兩部分, 其中filters對(duì)應(yīng)的是所有查詢條件構(gòu)建的一個(gè)KeyedFilter, 其中又包含了多個(gè)子查詢條件. aggregations 對(duì)應(yīng)的是groupBy條件和select部分的聚合函數(shù)
4.select count(distinct payerId) as payerCount from Payment where withinTime(createAt, 1, 'DAY') and name like '%SH%' group by accountId, country
多個(gè)where條件, 用BoolQueryBuilder組合起來(lái)
5.select max(amount) as maxAmt, min(amount) as minAmt from Payment where amount > 1000.00 or amount <= 50.53 group by accountId, country having count(distinct beneficiaryId) > 3 and sum(amount) > 1530.20
史上最復(fù)雜SQL產(chǎn)生! 這里主要關(guān)注having部分的處理, 用到了Pipeline類型的BucketSelectorPipelineAggregationBuilder. 在最后一個(gè)GroupBy 條件對(duì)應(yīng)的term aggregation下增加了兩類子節(jié)點(diǎn): sub aggregations 除了包括select 部分的聚合函數(shù)還包括having條件對(duì)應(yīng)的聚合函數(shù). pipeline aggregations 包括having條件對(duì)應(yīng)的 BucketSelectorPipelineAggregationBuilder. BucketSelectorPipelineAggregationBuilder 主要的屬性有: bucketsPathMap: 保存了path的名字和對(duì)應(yīng)的聚合屬性的映射,script:用腳本描述聚合條件,但是條件左側(cè)不直接使用屬性名而是path的名字替換
注意雖然從邏輯上來(lái)說(shuō)having 條件是應(yīng)用在之前計(jì)算出聚合的結(jié)果之上, 但是從ES Aggregation的結(jié)構(gòu)來(lái)看, BucketSelectorPipelineAggregationBuilder和having 條件中對(duì)應(yīng)聚合指標(biāo)的Aggregation是兄弟關(guān)系而不是父子關(guān)系!
另外要注意script path 是對(duì)于兄弟節(jié)點(diǎn)(sibling node)一個(gè)相對(duì)路徑而不是從根節(jié)點(diǎn)Aggregation的絕對(duì)路徑,用的是聚合屬性的名稱而不是Aggregation本身的名稱. 并且要求根據(jù)路徑訪問(wèn)到的Bucket必須是唯一的,因?yàn)锽ucketSelector只是根據(jù)條件判斷當(dāng)前Bucket是否被選擇, 如果路徑返回多個(gè)Bucket則無(wú)法應(yīng)用這種Bool判斷.
6.select count(paymentId) from Payment group by timeRange(createdAt, '1D', 'yyyy/MM/dd')
這里用到一個(gè)自定義函數(shù)timeRage, 表示對(duì)于createAt這個(gè)屬性按天聚合,對(duì)應(yīng)的ES aggregation類型為DateHistogramAggregation
Bucket count
Distinct count: Elasticsearch 采用的是基于hyperLogLog的近似算法.
https://www.elastic.co/guide/en/elasticsearch/reference/current/fielddata.html