MongoDB集群shard狀態(tài)異常:RECOVERING
創(chuàng)新互聯(lián)主營蕪湖網(wǎng)站建設(shè)的網(wǎng)絡(luò)公司,主營網(wǎng)站建設(shè)方案,app開發(fā)定制,蕪湖h5微信平臺小程序開發(fā)搭建,蕪湖網(wǎng)站營銷推廣歡迎蕪湖等地區(qū)企業(yè)咨詢
2018-11-28T06:46:55.783+0000 I REPL [replication-0] We are too stale to use 172.19.9.12:27003 as a sync source. Blacklisting this sync source because our last fetched timestamp: Timestamp(1542344943, 1) is before their earliest timestamp: Timestamp(1543387334, 5197) for 1min until: 2018-11-28T06:47:55.783+0000
2018-11-28T06:46:55.783+0000 I REPL [replication-0] sync source candidate: 172.19.9.11:27003
2018-11-28T06:46:55.783+0000 I REPL [replication-0] We are too stale to use 172.19.9.11:27003 as a sync source. Blacklisting this sync source because our last fetched timestamp: Timestamp(1542344943, 1) is before their earliest timestamp: Timestamp(1543387334, 5953) for 1min until: 2018-11-28T06:47:55.783+0000
報(bào)錯(cuò)節(jié)點(diǎn)數(shù)據(jù)太”陳舊:stale”了;網(wǎng)絡(luò)異?;蛘吖?jié)點(diǎn)異常,太久沒有進(jìn)行同步數(shù)據(jù)操作,而導(dǎo)致其他節(jié)點(diǎn)的數(shù)據(jù)操作日志已經(jīng)覆蓋,所以本節(jié)點(diǎn)被認(rèn)為 stale,無法從其他節(jié)點(diǎn)同步數(shù)據(jù)。
1:停掉數(shù)據(jù)庫,直接刪除異常節(jié)點(diǎn)(shard)本地?cái)?shù)據(jù),然后啟動mongo數(shù)據(jù)庫,啟動之后存在一個(gè)同步的過程,根據(jù)數(shù)據(jù)量、網(wǎng)絡(luò)、磁盤性能等因素所需時(shí)間不同。
2:停掉數(shù)據(jù)庫,直接拷貝主節(jié)點(diǎn)上的數(shù)據(jù),然后再啟動mongo,這樣就不存在數(shù)據(jù)同步的過程了.問題,就是數(shù)據(jù)時(shí)刻在變化,拷貝過程中難免會漏掉一些數(shù)據(jù)。
我們的mongodb集群是使用docker拉起的,使用方案 1;
首先確定異常分片節(jié)點(diǎn)==》然后確定映射目錄==》刪除異常分片實(shí)例數(shù)據(jù)目錄==》docker 服務(wù)會自動拉起服務(wù)==》集群開始數(shù)據(jù)恢復(fù);
/mongo localhost:27017/admin
:PRIMARY> rs.status();
{
"_id" : 2,
"name" : "172.19.9.13:27003", 《== 節(jié)點(diǎn)信息
"health" : 1,
"state" : 5,
"stateStr" : "RECOVERING", 《== 異常狀態(tài)
"uptime" : 64,
"optime" : {
"ts" : Timestamp(0, 0),
"t" : NumberLong(-1)
},
數(shù)據(jù)目錄:/data/shard3
ssh $HOSTNAME
cd /data/
rm -rf shard3
刪除數(shù)據(jù)目錄后,容器異常,集群會自動拉起新的一個(gè)docker 實(shí)例運(yùn)行 shard 3實(shí)例;
STARTUP2表示正在初始化并同步數(shù)據(jù),會看到數(shù)據(jù)目錄文件在不停增加文件。
/mongo localhost:27017/admin
:PRIMARY> rs.status();
{
"_id" : 2,
"name" : "172.19.9.13:27003",
"health" : 1,
"state" : 5,
"stateStr" : "STARTUP2", 《===表示正在初始化并同步數(shù)據(jù)。
"uptime" : 64,
"optime" : {
"ts" : Timestamp(0, 0),
"t" : NumberLong(-1)
},
/mongo localhost:27017/admin
:PRIMARY> rs.status();
"_id" : 2,
"name" : "172.19.9.13:27003",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY", <== 一段時(shí)間后狀態(tài)恢復(fù)正常
"uptime" : 945196,
"optime" : {
"ts" : Timestamp(1543401694, 1),
"t" : NumberLong(1)
},
同步數(shù)據(jù)時(shí)候比較耗費(fèi)資源,推薦在系統(tǒng)訪問量最低的時(shí)間段進(jìn)行。防止數(shù)據(jù)大量更新滯后,降低集群數(shù)據(jù)恢復(fù)風(fēng)險(xiǎn)。