PostgreSQL中HashAggregate與GroupAggregate的區(qū)別是什么

本篇內(nèi)容介紹了“PostgreSQL中HashAggregate與GroupAggregate的區(qū)別是什么”的有關(guān)知識，在實際案例的操作過程中，不少人都會遇到這樣的困境，接下來就讓小編帶領(lǐng)大家學習一下如何處理這些情況吧！希望大家仔細閱讀，能夠?qū)W有所成！

站在用戶的角度思考問題，與客戶深入溝通，找到內(nèi)丘網(wǎng)站設(shè)計與內(nèi)丘網(wǎng)站推廣的解決方案，憑借多年的經(jīng)驗，讓設(shè)計與互聯(lián)網(wǎng)技術(shù)結(jié)合，創(chuàng)造個性化、用戶體驗好的作品，建站類型包括：成都做網(wǎng)站、網(wǎng)站建設(shè)、企業(yè)官網(wǎng)、英文網(wǎng)站、手機端網(wǎng)站、網(wǎng)站推廣、申請域名、虛擬空間、企業(yè)郵箱。業(yè)務(wù)覆蓋內(nèi)丘地區(qū)。

案例一

首先我們看一個案例:
測試表:

drop table  if exists t_agg;
create table t_agg(bh varchar(20),c1 int,c2 int,c3 int,c4 int,c5 int,c6 int);
insert into t_agg select 'GZ01',col,col,col,col,col,col from generate_series(1,100000) as col;
insert into t_agg select 'GZ02',col,col,col,col,col,col from generate_series(1,100000) as col;
insert into t_agg select 'GZ03',col,col,col,col,col,col from generate_series(1,100000) as col;
insert into t_agg select 'GZ04',col,col,col,col,col,col from generate_series(1,100000) as col;
insert into t_agg select 'GZ05',col,col,col,col,col,col from generate_series(1,100000) as col;

執(zhí)行查詢:

testdb=# -- 禁用并行
testdb=# set max_parallel_workers_per_gather=0;
SET
testdb=# explain verbose select bh,min(c1),max(c1),min(c2),max(c2),min(c3),max(c3),min(c4),max(c4),min(c5),max(c5) from t_agg group by bh;
                                               QUERY PLAN                                               
--------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=22427.00..22427.05 rows=5 width=45)
   Output: bh, min(c1), max(c1), min(c2), max(c2), min(c3), max(c3), min(c4), max(c4), min(c5), max(c5)
   Group Key: t_agg.bh
   ->  Seq Scan on public.t_agg  (cost=0.00..8677.00 rows=500000 width=25)
         Output: bh, c1, c2, c3, c4, c5, c6
(5 rows)

PG的優(yōu)化器選擇了HashAggregate.
下面禁用HashAggregate,優(yōu)化器只能選擇GroupAggregate.可以看到兩者的總成本比較:22427.05 vs 82968.97

testdb=# set enable_hashagg = off;
SET
testdb=# explain verbose select bh,min(c1),max(c1),min(c2),max(c2),min(c3),max(c3),min(c4),max(c4),min(c5),max(c5) from t_agg group by bh;
                                               QUERY PLAN                                               
--------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=67968.92..82968.97 rows=5 width=45)
   Output: bh, min(c1), max(c1), min(c2), max(c2), min(c3), max(c3), min(c4), max(c4), min(c5), max(c5)
   Group Key: t_agg.bh
   ->  Sort  (cost=67968.92..69218.92 rows=500000 width=25)
         Output: bh, c1, c2, c3, c4, c5
         Sort Key: t_agg.bh
         ->  Seq Scan on public.t_agg  (cost=0.00..8677.00 rows=500000 width=25)
               Output: bh, c1, c2, c3, c4, c5
(8 rows)

案例二
下面用一個寬表來進行測試:分組鍵值很少,但聚合列很多

drop table  if exists t_agg_width;
create table t_agg_width
(bh varchar(20)
,c1 int,c2 int,c3 int,c4 int,c5 int,c6 int,c7 int,c8 int,c9 int
,c11 int,c12 int,c13 int,c14 int,c15 int,c16 int,c17 int,c18 int,c19 int
,c21 int,c22 int,c23 int,c24 int,c25 int,c26 int,c27 int,c28 int,c29 int
,c31 int,c32 int,c33 int,c34 int,c35 int,c36 int,c37 int,c38 int,c39 int);
insert into t_agg_width 
select 'GZ01'
,col,col,col,col,col,col,col,col,col 
,col,col,col,col,col,col,col,col,col 
,col,col,col,col,col,col,col,col,col 
,col,col,col,col,col,col,col,col,col 
from generate_series(1,100000) as col;
insert into t_agg_width 
select 'GZ02'
,col,col,col,col,col,col,col,col,col 
,col,col,col,col,col,col,col,col,col 
,col,col,col,col,col,col,col,col,col 
,col,col,col,col,col,col,col,col,col 
from generate_series(1,100000) as col;
insert into t_agg_width 
select 'GZ03'
,col,col,col,col,col,col,col,col,col 
,col,col,col,col,col,col,col,col,col 
,col,col,col,col,col,col,col,col,col 
,col,col,col,col,col,col,col,col,col 
from generate_series(1,100000) as col;
insert into t_agg_width 
select 'GZ04'
,col,col,col,col,col,col,col,col,col 
,col,col,col,col,col,col,col,col,col 
,col,col,col,col,col,col,col,col,col 
,col,col,col,col,col,col,col,col,col 
from generate_series(1,100000) as col;
-- 禁用hashagg
set enable_hashagg = off;
-- 禁用并行
set max_parallel_workers_per_gather=0;
select bh
,min(c1),min(c2) ,min(c3) ,min(c4) ,min(c5) ,min(c6) ,min(c7) ,min(c8) ,min(c9)
,min(c11),min(c12) ,min(c13) ,min(c14) ,min(c15) ,min(c16) ,min(c17) ,min(c18) ,min(c19)
,min(c21),min(c22) ,min(c23) ,min(c24) ,min(c25) ,min(c26) ,min(c27) ,min(c28) ,min(c29)
,min(c31),min(c32) ,min(c33) ,min(c34) ,min(c35) ,min(c36) ,min(c37) ,min(c38) ,min(c39)
from t_agg_width group by bh;

在這種情況下,優(yōu)化器仍會選擇Hash

testdb=# explain verbose select bh
testdb-# ,min(c1),min(c2) ,min(c3) ,min(c4) ,min(c5) ,min(c6) ,min(c7) ,min(c8) ,min(c9)
testdb-# ,min(c11),min(c12) ,min(c13) ,min(c14) ,min(c15) ,min(c16) ,min(c17) ,min(c18) ,min(c19)
testdb-# ,min(c21),min(c22) ,min(c23) ,min(c24) ,min(c25) ,min(c26) ,min(c27) ,min(c28) ,min(c29)
testdb-# ,min(c31),min(c32) ,min(c33) ,min(c34) ,min(c35) ,min(c36) ,min(c37) ,min(c38) ,min(c39)
testdb-# from t_agg_width group by bh;
                                                    QUERY PLAN                                                               
----------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=49889.00..49889.04 rows=4 width=149)
   Output: bh, min(c1), min(c2), min(c3), min(c4), min(c5), min(c6), min(c7), min(c8), min(c9), min(c11), min(c12), min(c13),
 min(c14), min(c15), min(c16), min(c17), min(c18), min(c19), min(c21), min(c22), min(c23), min(c24), min(c25), min(c26), min(
c27), min(c28), min(c29), min(c31), min(c32), min(c33), min(c34), min(c35), min(c36), min(c37), min(c38), min(c39)
   Group Key: t_agg_width.bh
   ->  Seq Scan on public.t_agg_width  (cost=0.00..12889.00 rows=400000 width=149)
         Output: bh, c1, c2, c3, c4, c5, c6, c7, c8, c9, c11, c12, c13, c14, c15, c16, c17, c18, c19, c21, c22, c23, c24, c25
, c26, c27, c28, c29, c31, c32, c33, c34, c35, c36, c37, c38, c39
(5 rows)
testdb=# set enable_hashagg = off;
SET
testdb=# explain verbose select bh
,min(c1),min(c2) ,min(c3) ,min(c4) ,min(c5) ,min(c6) ,min(c7) ,min(c8) ,min(c9)
,min(c11),min(c12) ,min(c13) ,min(c14) ,min(c15) ,min(c16) ,min(c17) ,min(c18) ,min(c19)
,min(c21),min(c22) ,min(c23) ,min(c24) ,min(c25) ,min(c26) ,min(c27) ,min(c28) ,min(c29)
,min(c31),min(c32) ,min(c33) ,min(c34) ,min(c35) ,min(c36) ,min(c37) ,min(c38) ,min(c39)
from t_agg_width group by bh;
                                                    QUERY PLAN                                                               
----------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=110266.28..148266.32 rows=4 width=149)
   Output: bh, min(c1), min(c2), min(c3), min(c4), min(c5), min(c6), min(c7), min(c8), min(c9), min(c11), min(c12), min(c13),
 min(c14), min(c15), min(c16), min(c17), min(c18), min(c19), min(c21), min(c22), min(c23), min(c24), min(c25), min(c26), min(
c27), min(c28), min(c29), min(c31), min(c32), min(c33), min(c34), min(c35), min(c36), min(c37), min(c38), min(c39)
   Group Key: t_agg_width.bh
   ->  Sort  (cost=110266.28..111266.28 rows=400000 width=149)
         Output: bh, c1, c2, c3, c4, c5, c6, c7, c8, c9, c11, c12, c13, c14, c15, c16, c17, c18, c19, c21, c22, c23, c24, c25
, c26, c27, c28, c29, c31, c32, c33, c34, c35, c36, c37, c38, c39
         Sort Key: t_agg_width.bh
         ->  Seq Scan on public.t_agg_width  (cost=0.00..12889.00 rows=400000 width=149)
               Output: bh, c1, c2, c3, c4, c5, c6, c7, c8, c9, c11, c12, c13, c14, c15, c16, c17, c18, c19, c21, c22, c23, c2
4, c25, c26, c27, c28, c29, c31, c32, c33, c34, c35, c36, c37, c38, c39
(8 rows)
testdb=#

下面增大分組鍵值的分布,同時提高c1等列的選擇率,再次測試:

testdb=# insert into t_agg_width 
testdb-# select 'GZ'||col
testdb-# ,mod(col,100),mod(col,100),mod(col,100),mod(col,100),mod(col,100),mod(col,100),mod(col,100),mod(col,100),mod(col,100) 
testdb-# ,mod(col,100),mod(col,100),mod(col,100),mod(col,100),mod(col,100),mod(col,100),mod(col,100),mod(col,100),mod(col,100) 
testdb-# ,mod(col,100),mod(col,100),mod(col,100),mod(col,100),mod(col,100),mod(col,100),mod(col,100),mod(col,100),mod(col,100) 
testdb-# ,mod(col,100),mod(col,100),mod(col,100),mod(col,100),mod(col,100),mod(col,100),mod(col,100),mod(col,100),mod(col,100) 
testdb-# from generate_series(1,1000000) as col;
INSERT 0 1000000
testdb=# set enable_hashagg = on;
SET
testdb=# explain verbose select bh
,min(c1),min(c2) ,min(c3) ,min(c4) ,min(c5) ,min(c6) ,min(c7) ,min(c8) ,min(c9)
,min(c11),min(c12) ,min(c13) ,min(c14) ,min(c15) ,min(c16) ,min(c17) ,min(c18) ,min(c19)
,min(c21),min(c22) ,min(c23) ,min(c24) ,min(c25) ,min(c26) ,min(c27) ,min(c28) ,min(c29)
,min(c31),min(c32) ,min(c33) ,min(c34) ,min(c35) ,min(c36) ,min(c37) ,min(c38) ,min(c39)
from t_agg_width group by bh;
                                                    QUERY PLAN                                                               
----------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=440012.46..586553.52 rows=7414 width=149)
   Output: bh, min(c1), min(c2), min(c3), min(c4), min(c5), min(c6), min(c7), min(c8), min(c9), min(c11), min(c12), min(c13),
 min(c14), min(c15), min(c16), min(c17), min(c18), min(c19), min(c21), min(c22), min(c23), min(c24), min(c25), min(c26), min(
c27), min(c28), min(c29), min(c31), min(c32), min(c33), min(c34), min(c35), min(c36), min(c37), min(c38), min(c39)
   Group Key: t_agg_width.bh
   ->  Sort  (cost=440012.46..443866.86 rows=1541757 width=149)
         Output: bh, c1, c2, c3, c4, c5, c6, c7, c8, c9, c11, c12, c13, c14, c15, c16, c17, c18, c19, c21, c22, c23, c24, c25
, c26, c27, c28, c29, c31, c32, c33, c34, c35, c36, c37, c38, c39
         Sort Key: t_agg_width.bh
         ->  Seq Scan on public.t_agg_width  (cost=0.00..49681.57 rows=1541757 width=149)
               Output: bh, c1, c2, c3, c4, c5, c6, c7, c8, c9, c11, c12, c13, c14, c15, c16, c17, c18, c19, c21, c22, c23, c2
4, c25, c26, c27, c28, c29, c31, c32, c33, c34, c35, c36, c37, c38, c39
(8 rows)
testdb=#

這一次選擇的是GroupAggregate.

HashAggregate
HashAggregate,數(shù)據(jù)庫會根據(jù)group by字段后面的值算出hash值,并在內(nèi)存中維護對應(yīng)的Hash表,比如select有n個聚合函數(shù),那么在內(nèi)存中就會維護n個Hash表.這種方式使用的內(nèi)存比GroupAggregate要大,內(nèi)存的使用與group by COLUMN中的COLUMN的唯一鍵值以及聚合列的多少成正比.

GroupAggregate
GroupAggregate,數(shù)據(jù)庫先將表中的數(shù)據(jù)按group by的字段進行排序,然后對排好序的數(shù)據(jù)進行一次掃描,計算得到聚合的結(jié)果.這種方式需要先執(zhí)行一次排序,計算復雜度上面要比HashAggregate要高,但這種方法的好處是與group by COLUMN中的COLUMN的唯一鍵值多寡/聚合列多寡無關(guān),分組鍵值很多而且聚合列很多且列數(shù)據(jù)選擇很高的情況下,會優(yōu)于HashAggregate.

“PostgreSQL中HashAggregate與GroupAggregate的區(qū)別是什么”的內(nèi)容就介紹到這里了，感謝大家的閱讀。如果想了解更多行業(yè)相關(guān)的知識可以關(guān)注創(chuàng)新互聯(lián)網(wǎng)站，小編將為大家輸出更多高質(zhì)量的實用文章！

網(wǎng)站名稱：PostgreSQL中HashAggregate與GroupAggregate的區(qū)別是什么
轉(zhuǎn)載注明：http://weahome.cn/article/psggoc.html

真实的国产乱ⅩXXX66竹夫人,五月香六月婷婷激情综合,亚洲日本VA一区二区三区,亚洲精品一区二区三区麻豆

PostgreSQL中HashAggregate與GroupAggregate的區(qū)別是什么

其他資訊

網(wǎng)站制作

企業(yè)服務(wù)

網(wǎng)站建設(shè)

服務(wù)器托管