resource manager HA是hadoop自從2.4之后推出的功能,以Active/Standby的方式提供冗余,目的是為了消除單點失敗的風險。
創(chuàng)新互聯(lián)建站是一家專業(yè)提供濟南企業(yè)網(wǎng)站建設(shè),專注與成都做網(wǎng)站、網(wǎng)站設(shè)計、H5建站、小程序制作等業(yè)務。10年已為濟南眾多企業(yè)、政府機構(gòu)等服務。創(chuàng)新互聯(lián)專業(yè)網(wǎng)站制作公司優(yōu)惠進行中。
1、總體架構(gòu):
2、故障切換:有自動和手動兩種形式。
手動:如果以手動形式切換,使用yarn haadmin命令首先將Active節(jié)點轉(zhuǎn)為standby,再將standby節(jié)點轉(zhuǎn)為active。
自動:RM有基于zookeeper的節(jié)點選舉機制決定哪一個是活動節(jié)點。不需要像HDFS一樣部署一個zkfc守護進程,因為RM內(nèi)嵌了這樣的功能。
做了rm的HA之后,所有節(jié)點和客戶端都要列出所有RM節(jié)點,連接時會用輪詢的方式遍歷,直到找到一個active的節(jié)點。如果活動節(jié)點down了,它們會繼續(xù)輪詢。這一動作被實現(xiàn)為org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider類??梢酝ㄟ^重新實現(xiàn)該類,并在yarn.client.failover-proxy-provider 配置項中指定新的類名來改寫這一行為邏輯。
3、配置
Configuration Properties | Description |
---|---|
yarn.resourcemanager.zk-address | Address of the ZK-quorum. Used both for the state-store and embedded leader-election. |
yarn.resourcemanager.ha.enabled | Enable RM HA. |
yarn.resourcemanager.ha.rm-ids | List of logical IDs for the RMs. e.g., “rm1,rm2”. |
yarn.resourcemanager.hostname.rm-id | For each rm-id, specify the hostname the RM corresponds to. Alternately, one could set each of the RM’s service addresses. |
yarn.resourcemanager.address.rm-id | For each rm-id, specify host:port for clients to submit jobs. If set, overrides the hostname set in yarn.resourcemanager.hostname.rm-id. |
yarn.resourcemanager.scheduler.address.rm-id | For each rm-id, specify scheduler host:port for ApplicationMasters to obtain resources. If set, overrides the hostname set in yarn.resourcemanager.hostname.rm-id. |
yarn.resourcemanager.resource-tracker.address.rm-id | For each rm-id, specify host:port for NodeManagers to connect. If set, overrides the hostname set in yarn.resourcemanager.hostname.rm-id. |
yarn.resourcemanager.admin.address.rm-id | For each rm-id, specify host:port for administrative commands. If set, overrides the hostname set in yarn.resourcemanager.hostname.rm-id. |
yarn.resourcemanager.webapp.address.rm-id | For each rm-id, specify host:port of the RM web application corresponds to. You do not need this if you set yarn.http.policy to HTTPS_ONLY. If set, overrides the hostname set in yarn.resourcemanager.hostname.rm-id. |
yarn.resourcemanager.webapp.https.address.rm-id | For each rm-id, specify host:port of the RM https web application corresponds to. You do not need this if you set yarn.http.policy to HTTP_ONLY. If set, overrides the hostname set in yarn.resourcemanager.hostname.rm-id. |
yarn.resourcemanager.ha.id | Identifies the RM in the ensemble. This is optional; however, if set, admins have to ensure that all the RMs have their own IDs in the config. |
yarn.resourcemanager.ha.automatic-failover.enabled | Enable automatic failover; By default, it is enabled only when HA is enabled. |
yarn.resourcemanager.ha.automatic-failover.embedded | Use embedded leader-elector to pick the Active RM, when automatic failover is enabled. By default, it is enabled only when HA is enabled. |
yarn.resourcemanager.cluster-id | Identifies the cluster. Used by the elector to ensure an RM doesn’t take over as Active for another cluster. |
yarn.client.failover-proxy-provider | The class to be used by Clients, AMs and NMs to failover to the Active RM. |
yarn.client.failover-max-attempts | The max number of times FailoverProxyProvider should attempt failover. |
yarn.client.failover-sleep-base-ms | The sleep base (in milliseconds) to be used for calculating the exponential delay between failovers. |
yarn.client.failover-sleep-max-ms | The maximum sleep time (in milliseconds) between failovers. |
yarn.client.failover-retries | The number of retries per attempt to connect to a ResourceManager. |
yarn.client.failover-retries-on-socket-timeouts | The number of retries per attempt to connect to a ResourceManager on socket timeouts. |
4、示例(最小配置)
yarn.resourcemanager.ha.enabled
true
yarn.resourcemanager.cluster-id
cluster1
yarn.resourcemanager.ha.rm-ids
rm1,rm2
yarn.resourcemanager.hostname.rm1
master1
yarn.resourcemanager.hostname.rm2
master2
yarn.resourcemanager.webapp.address.rm1
master1:8088
yarn.resourcemanager.webapp.address.rm2
master2:8088
yarn.resourcemanager.zk-address
zk1:2181,zk2:2181,zk3:2181
5、管理命令
查看節(jié)點狀態(tài):
$ yarn rmadmin -getServiceState rm1
active
$ yarn rmadmin -getServiceState rm2
standby
故障切換:
$ yarn rmadmin -transitionToStandby rm1
$ yarn rmadmin -transitionToActive rm2
注意:
開啟自動故障切換后,系統(tǒng)為防止造成腦裂或其它不一致的狀態(tài),會拒絕人為管理HA狀態(tài)。如果非常清楚自己的行為,可以在切換命令中指定-forcemanual選項。