postgres-xl是個(gè)好東西?為什么呢?
創(chuàng)新互聯(lián)建站致力于成都網(wǎng)站制作、成都網(wǎng)站建設(shè)、外貿(mào)營(yíng)銷(xiāo)網(wǎng)站建設(shè),成都網(wǎng)站設(shè)計(jì),集團(tuán)網(wǎng)站建設(shè)等服務(wù)標(biāo)準(zhǔn)化,推過(guò)標(biāo)準(zhǔn)化降低中小企業(yè)的建站的成本,并持續(xù)提升建站的定制化服務(wù)水平進(jìn)行質(zhì)量交付,讓企業(yè)網(wǎng)站從市場(chǎng)競(jìng)爭(zhēng)中脫穎而出。 選擇創(chuàng)新互聯(lián)建站,就選擇了安全、穩(wěn)定、美觀的網(wǎng)站建設(shè)服務(wù)!
它可以搞oltp, 抗衡MySQL, mysql沒(méi)有分析函數(shù)功能
它可以搞oltp, 抗衡oracle, oracle生態(tài)弱, 沒(méi)法實(shí)時(shí)
它對(duì)json支持好, 抗衡MongoDB, mongodb用得人越來(lái)越少了。。。
它還恐怖地支持各種語(yǔ)言擴(kuò)展, java, javascript, r, python, haskell。。。
并且開(kāi)源免費(fèi),簡(jiǎn)單強(qiáng)大的沒(méi)朋友。。。
它比greenpulm版本新: 它跟greenpulm本是一家人,都是MPP架構(gòu)的postgresql集群
但是原生改造, 版本基本上與postgresql一致, greenpulm的版本升不動(dòng)啊。。。
它比oracle RAC/Teradata便宜,免費(fèi)使用
它比hadoop省資源,沒(méi)有GC,基于C語(yǔ)言資源利用率高,并且生態(tài)圈豐富,可視化方便
它出道早,版本穩(wěn)定性強(qiáng)。
postgres-xl分為以下組件:
a. gtm 負(fù)責(zé)全局事務(wù)
b. coordinator 處理分發(fā)執(zhí)行
c. datanode 負(fù)責(zé)底層處理
datanode跟coordinator都 連接到gtm,
客戶端連接到coordinator運(yùn)行sql,
coordinator使用gtm進(jìn)行一些事務(wù)功能分發(fā)給datanode執(zhí)行
大數(shù)據(jù)的發(fā)展方向
什么是MPP?
MPP (Massively Parallel Processing),即大規(guī)模并行處理,是分布式、并行、結(jié)構(gòu)化數(shù)據(jù)庫(kù)集群,具備高性能、高可用、高擴(kuò)展特性,可以為超大規(guī)模數(shù)據(jù)管理提供高性?xún)r(jià)比的通用計(jì)算平臺(tái),廣泛用于支撐各類(lèi)數(shù)據(jù)倉(cāng)庫(kù)系統(tǒng)、BI 系統(tǒng)。
MPP架構(gòu)特征:
?任務(wù)并行執(zhí)行?
?數(shù)據(jù)分布式存儲(chǔ)(本地化)?
?分布式計(jì)算?
?橫向擴(kuò)展
思考對(duì)比
Oracle集群的特點(diǎn):每個(gè)節(jié)點(diǎn)全量存儲(chǔ)
Mysql熱備的特點(diǎn):主從,每個(gè)節(jié)點(diǎn)全量存儲(chǔ)
MPP——分布式關(guān)系數(shù)據(jù)庫(kù)
GreenPlum:基于Postgres XL 8.2進(jìn)行封裝,多年沒(méi)升級(jí)過(guò)。
Mysql Cluster:剛出來(lái)兩年,穩(wěn)定性和性能比較差。
Postgres XL 與 Postgres LL ,是Postgresql數(shù)據(jù)庫(kù)兩種集群模式,Postgres XL是當(dāng)今主流MPP
Postgres XL集群架構(gòu)
GTM:Global Transaction Manager
Coordinator:協(xié)調(diào)器
Datanode:數(shù)據(jù)節(jié)點(diǎn)
GTM-Proxy:GTM代理器
組件介紹
Global Transaction Monitor (GTM)
全局事務(wù)管理器,確保群集范圍內(nèi)的事務(wù)一致性。 GTM負(fù)責(zé)發(fā)放事務(wù)ID和快照作為其多版本并發(fā)控制的一部分。
集群可選地配置一個(gè)備用GTM(GTM Standby),以改進(jìn)可用性。此外,可以在協(xié)調(diào)器間配置代理GTM, 可用于改善可擴(kuò)展性,減少GTM的通信量。
GTM Standby
GTM的備節(jié)點(diǎn),在pgxc,pgxl中,GTM控制所有的全局事務(wù)分配,如果出現(xiàn)問(wèn)題,就會(huì)導(dǎo)致整個(gè)集群不可用,為了增加可用性,增加該備用節(jié)點(diǎn)。當(dāng)GTM出現(xiàn)問(wèn)題時(shí),GTM Standby可以升級(jí)為GTM,保證集群正常工作。
GTM-Proxy
GTM需要與所有的Coordinators通信,為了降低壓力,可以在每個(gè)Coordinator機(jī)器上部署一個(gè)GTM-Proxy。
Coordinator
協(xié)調(diào)員管理用戶會(huì)話,并與GTM和數(shù)據(jù)節(jié)點(diǎn)進(jìn)行交互。協(xié)調(diào)員解析,產(chǎn)生查詢(xún)計(jì)劃,并給語(yǔ)句中的每一個(gè)組件發(fā)送下一個(gè)序列化的全局性計(jì)劃。
通常此服務(wù)和數(shù)據(jù)節(jié)點(diǎn)部署在一起。
正式安裝
Note: 其實(shí)在生產(chǎn)環(huán)境,如果你集群的數(shù)量少于20臺(tái)的話,甚至可以不需要使用gtm-proxy
#1)System?Initialization?Optimization?on?every?nodes cat?>>?/etc/security/limits.conf?</etc/hosts?<>/etc/sudoers ###################################################################### #3)Configure?ssh?authentication?to?avoid?inputing?password?for?pgxc_ctl(run?this?commad?on?every?nodes) ###################################################################### su?-?postgres ssh-keygen?-t?rsa??? cat?~/.ssh/id_rsa.pub?>>?~/.ssh/authorized_keys chmod?600?authorized_keys cat?~/.ssh/id_rsa.pub?|?ssh?neo4j01?'cat?>>?~/.ssh/authorized_keys' ############################################ #4)Install?dependency?packages?on?every?nodes# ############################################ sudo?yum?install?-y?flex?bison?readline-devel?zlib-devel?openjade?docbook-style-dsssl?gcc?bzip2?e2fsprogs-devel?uuid-devel?libuuid-devel?make?wget? wget?-c?http://download.cashalo.com/schema/postgres-xl-9.5r1.6.tar.bz2?&&?tar?jxf?postgres-xl-9.5r1.6.tar.bz2 cd?postgres-xl-9.5r1.6 ./configure?--prefix=/home/postgres/pgxl9.5?--with-uuid=ossp?--with-uuid=ossp?&&?make?&&?make?install?&&?cd?contrib/?&&?make?&&?make?install ? #5)Configuring?environment?variables?on?every?nodes? cat?>>/home/postgres/.bashrc?< ?Need?to?synchronize?when?original?changed. #----?GTM?------------------------------------------------------------------------------------ #?GTM?is?mandatory.??You?must?have?at?least?(and?only)?one?GTM?master?in?your?Postgres-XC?cluster. #?If?GTM?crashes?and?you?need?to?reconfigure?it,?you?can?do?it?by?pgxc_update_gtm?command?to?update #?GTM?master?with?others.???Of?course,?we?provide?pgxc_remove_gtm?command?to?remove?it.??This?command #?will?not?stop?the?current?GTM.??It?is?up?to?the?operator. #----?GTM?Master?----------------------------------------------- #----?Overall?---- gtmName=gtm1 gtmMasterServer=neo4j01 gtmMasterPort=6666 gtmMasterDir=$pgxlDATA/gtm1 #----?Configuration?--- gtmExtraConfig=none?????????????????????#?Will?be?added?gtm.conf?for?both?Master?and?Slave?(done?at?initilization?only) gtmMasterSpecificExtraConfig=none???????#?Will?be?added?to?Master's?gtm.conf?(done?at?initialization?only) #----?GTM?Slave?----------------------------------------------- #?Because?GTM?is?a?key?component?to?maintain?database?consistency,?you?may?want?to?configure?GTM?slave #?for?backup. #----?Overall?------ gtmSlave=y??????????????????????????????????????#?Specify?y?if?you?configure?GTM?Slave.???Otherwise,?GTM?slave?will?not?be?configured?and ????????????????????????????????????????????????????????#?all?the?following?variables?will?be?reset. gtmSlaveName=gtm2 gtmSlaveServer=neo4j02??????????#?value?none?means?GTM?slave?is?not?available.??Give?none?if?you?don't?configure?GTM?Slave. gtmSlavePort=6666???????????????????????#?Not?used?if?you?don't?configure?GTM?slave. gtmSlaveDir=$pgxlDATA/gtm2??????#?Not?used?if?you?don't?configure?GTM?slave. #?Please?note?that?when?you?have?GTM?failover,?then?there?will?be?no?slave?available?until?you?configure?the?slave #?again.?(pgxc_add_gtm_slave?function?will?handle?it) #----?Configuration?---- gtmSlaveSpecificExtraConfig=none?#?Will?be?added?to?Slave's?gtm.conf?(done?at?initialization?only) #----?GTM?Proxy?------------------------------------------------------------------------------------------------------- #?GTM?proxy?will?be?selected?based?upon?which?server?each?component?runs?on. #?When?fails?over?to?the?slave,?the?slave?inherits?its?master's?gtm?proxy.??It?should?be #?reconfigured?based?upon?the?new?location. # #?To?do?so,?slave?should?be?restarted.???So?pg_ctl?promote?->?(edit?postgresql.conf?and?recovery.conf)?->?pg_ctl?restart # #?You?don't?have?to?configure?GTM?Proxy?if?you?dont'?configure?GTM?slave?or?you?are?happy?if?every?component?connects #?to?GTM?Master?directly.??If?you?configure?GTL?slave,?you?must?configure?GTM?proxy?too. #----?Shortcuts?------ gtmProxyDir=$pgxlDATA/gtm_proxy #----?Overall?------- gtmProxy=n??????????????????????????????#?Specify?y?if?you?conifugre?at?least?one?GTM?proxy.???You?may?not?configure?gtm?proxies ????????????????????????????????????????????????#?only?when?you?dont'?configure?GTM?slaves. ????????????????????????????????????????????????#?If?you?specify?this?value?not?to?y,?the?following?parameters?will?be?set?to?default?empty?values. ????????????????????????????????????????????????#?If?we?find?there're?no?valid?Proxy?server?names?(means,?every?servers?are?specified ????????????????????????????????????????????????#?as?none),?then?gtmProxy?value?will?be?set?to?"n"?and?all?the?entries?will?be?set?to ????????????????????????????????????????????????#?empty?values. #gtmProxyNames=(gtm_pxy1?gtm_pxy2?gtm_pxy3?gtm_pxy4)????#?No?used?if?it?is?not?configured #gtmProxyServers=(neo4j01?neo4j02?neo4j03?neo4j04)??????????????????????#?Specify?none?if?you?dont'?configure?it. #gtmProxyPorts=(6660?6666?6666?6666)????????????????????????????#?Not?used?if?it?is?not?configured. #gtmProxyDirs=($gtmProxyDir?$gtmProxyDir?$gtmProxyDir?$gtmProxyDir)?????#?Not?used?if?it?is?not?configured. #----?Configuration?---- gtmPxyExtraConfig=none??????????#?Extra?configuration?parameter?for?gtm_proxy.??Coordinator?section?has?an?example. gtmPxySpecificExtraConfig=(none?none?none?none) #----?Coordinators?---------------------------------------------------------------------------------------------------- #----?shortcuts?---------- coordMasterDir=$pgxlDATA/coord coordSlaveDir=$pgxlDATA/coord_slave coordArchLogDir=$pgxlDATA/coord_archlog #----?Overall?------------ coordNames=(coord1?coord2?coord3?coord4)????????????????#?Master?and?slave?use?the?same?name coordPorts=(5432?5432?5432?5432)????????????????????????#?Master?ports poolerPorts=(6667?6667?6667?6667)???????????????????????#?Master?pooler?ports #coordPgHbaEntries=(192.168.29.0/24)????????????????????????????#?Assumes?that?all?the?coordinator?(master/slave)?accepts coordPgHbaEntries=(0.0.0.0/0) ????????????????????????????????????????????????????????????????????????????????????????????????#?the?same?connection ????????????????????????????????????????????????????????????????????????????????????????????????#?This?entry?allows?only?$pgxcOwner?to?connect. ????????????????????????????????????????????????????????????????????????????????????????????????#?If?you'd?like?to?setup?another?connection,?you?should ????????????????????????????????????????????????????????????????????????????????????????????????#?supply?these?entries?through?files?specified?below. #?Note:?The?above?parameter?is?extracted?as?"host?all?all?0.0.0.0/0?trust".???If?you?don't?want #?such?setups,?specify?the?value?()?to?this?variable?and?suplly?what?you?want?using?coordExtraPgHba #?and/or?coordSpecificExtraPgHba?variables. #coordPgHbaEntries=(::1/128)????#?Same?as?above?but?for?IPv6?addresses #----?Master?------------- coordMasterServers=(neo4j01?neo4j02?neo4j03?neo4j04)????????????#?none?means?this?master?is?not?available coordMasterDirs=($coordMasterDir?$coordMasterDir?$coordMasterDir?$coordMasterDir) coordMaxWALsernder=0????#?max_wal_senders:?needed?to?configure?slave.?If?zero?value?is?specified, ????????????????????????????????????????????????#?it?is?expected?to?supply?this?parameter?explicitly?by?external?files ????????????????????????????????????????????????#?specified?in?the?following.???If?you?don't?configure?slaves,?leave?this?value?to?zero. coordMaxWALSenders=($coordMaxWALsernder?$coordMaxWALsernder?$coordMaxWALsernder?$coordMaxWALsernder) ????????????????????????????????????????????????#?max_wal_senders?configuration?for?each?coordinator. #----?Slave?------------- coordSlave=n????????????????????#?Specify?y?if?you?configure?at?least?one?coordiantor?slave.??Otherwise,?the?following ????????????????????????????????????????????????#?configuration?parameters?will?be?set?to?empty?values. ????????????????????????????????????????????????#?If?no?effective?server?names?are?found?(that?is,?every?servers?are?specified?as?none), ????????????????????????????????????????????????#?then?coordSlave?value?will?be?set?to?n?and?all?the?following?values?will?be?set?to ????????????????????????????????????????????????#?empty?values. #coordSlaveSync=y???????????????#?Specify?to?connect?with?synchronized?mode. #coordSlaveServers=(node07?node08?node09?node06)????????????????????????#?none?means?this?slave?is?not?available #coordSlavePorts=(20004?20005?20004?20005)??????????????????????#?Master?ports #coordSlavePoolerPorts=(20010?20011?20010?20011)????????????????????????#?Master?pooler?ports #coordSlaveDirs=($coordSlaveDir?$coordSlaveDir?$coordSlaveDir?$coordSlaveDir) #coordArchLogDirs=($coordArchLogDir?$coordArchLogDir?$coordArchLogDir?$coordArchLogDir) #----?Configuration?files--- #?Need?these?when?you'd?like?setup?specific?non-default?configuration #?These?files?will?go?to?corresponding?files?for?the?master. #?You?may?supply?your?bash?script?to?setup?extra?config?lines?and?extra?pg_hba.conf?entries #?Or?you?may?supply?these?files?manually. coordExtraConfig=coordExtraConfig???????#?Extra?configuration?file?for?coordinators. ????????????????????????????????????????????????#?This?file?will?be?added?to?all?the?coordinators' ????????????????????????????????????????????????#?postgresql.conf #?Pleae?note?that?the?following?sets?up?minimum?parameters?which?you?may?want?to?change. #?You?can?put?your?postgresql.conf?lines?here. cat?>?$coordExtraConfig?< postgres-XL 下存在兩種數(shù)據(jù)表,分別是replication表和distribute表
REPLICATION復(fù)制表:各個(gè)datanode節(jié)點(diǎn)中,表的數(shù)據(jù)完全相同,也就是說(shuō),插入數(shù)據(jù)時(shí),會(huì)分別在每個(gè)datanode節(jié)點(diǎn)插入相同數(shù)據(jù)。讀數(shù)據(jù)時(shí),只需要讀任意一個(gè)datanode節(jié)點(diǎn)上的數(shù)據(jù)。小表采用。
建表語(yǔ)法:
postgres=#?create?table?rep(col1?int,col2?int)distribute?by?replication;DISTRIBUTE表 :會(huì)將插入的數(shù)據(jù),按照拆分規(guī)則,分配到不同的datanode節(jié)點(diǎn)中存儲(chǔ),也就是sharding技術(shù)。每個(gè)datanode節(jié)點(diǎn)只保存了部分?jǐn)?shù)據(jù),通過(guò)coordinate節(jié)點(diǎn)可以查詢(xún)完整的數(shù)據(jù)視圖。分布式存儲(chǔ),大表采用,默認(rèn)
postgres=#??CREATE?TABLE?dist(col1?int,?col2?int)?DISTRIBUTE?BY?HASH(col1);如何驗(yàn)證分布式存儲(chǔ)?
分別插入100行數(shù)據(jù):
postgres=#?INSERT?INTO?rep?SELECT?generate_series(1,100),?generate_series(101,?200); postgres=#?INSERT?INTO?dist?SELECT?generate_series(1,100),?generate_series(101,?200);psql -p 5432,通過(guò)Coordinater 訪問(wèn)查詢(xún)完整的數(shù)據(jù)視圖;
psql -p 5433,5433是Datanode的端口,此時(shí)只訪問(wèn)該單個(gè)節(jié)點(diǎn)
如何鏈接到指定的數(shù)據(jù)庫(kù)呢?看下面的例子
psql -p 5432 aa? ,aa為指定的庫(kù)名,不指定時(shí)默認(rèn)是postgres庫(kù),相當(dāng)于hive里的default庫(kù)
查詢(xún)這個(gè)分布表數(shù)據(jù)在每個(gè)節(jié)點(diǎn)的分布:
postgres=#?SELECT?xc_node_id,?count(*)?FROM?dist?GROUP?BY?xc_node_id; ?xc_node_id?|?count ------------+------- ?-700122826?|????19 ??352366662?|????27 ?-560021589?|????23 ??823103418?|????31 (4?rows)查詢(xún)每個(gè)節(jié)點(diǎn)的ID信息
postgres=#?select?*?from?pgxc_node;我們?cè)賮?lái)看看復(fù)制表
postgres=#?select?xc_node_id,count(*)?from?rep?group?by?xc_node_id; ?xc_node_id?|?count ------------+------- ?-560021589?|???100 (1?row)因?yàn)槲覀兪窃趎eo4j01節(jié)點(diǎn)上查詢(xún)的,所以顯示的id就是neo4j01節(jié)點(diǎn)的id;同理,如果我們是在其它節(jié)點(diǎn)查詢(xún)的話那就顯示其它節(jié)點(diǎn)的id。也就是說(shuō)當(dāng)我們查詢(xún)復(fù)制表的時(shí)候,它只會(huì)走一個(gè)節(jié)點(diǎn),不會(huì)走多個(gè)節(jié)點(diǎn)。針對(duì)這個(gè)特點(diǎn),如果未來(lái)數(shù)據(jù)量很大,我們查詢(xún)的時(shí)候,可以走負(fù)載均衡。
https://www.cnblogs.com/sfnz/p/7908380.html
Psql是PostgreSQL的一個(gè)命令行交互式客戶端工具。PostgreSQL 一些命令、用法、語(yǔ)法,在Postgres xl集群都是通用的。
分享名稱(chēng):postgres-XL集群安裝
URL地址:http://weahome.cn/article/peopph.html