前期的gitlab 已經(jīng)開始推廣測試,最近對postgresql 做了主備,這里說下方案及在實施過程中遇到的坑。
postgresql 的具安裝不在此介紹。
創(chuàng)新互聯(lián)專注于醴陵企業(yè)網(wǎng)站建設(shè),成都響應(yīng)式網(wǎng)站建設(shè)公司,購物商城網(wǎng)站建設(shè)。醴陵網(wǎng)站建設(shè)公司,為醴陵等地區(qū)提供建站服務(wù)。全流程按需求定制網(wǎng)站,專業(yè)設(shè)計,全程項目跟蹤,創(chuàng)新互聯(lián)專業(yè)和態(tài)度為您提供的服務(wù)
primary_ip: 192.168.10.2,
standby_ip: 192.168.10.3,
PGDATA: /opt/gitlab/postgresql/data,
postgresql_version:(PostgreSQL) 9.6.8,
PGCONF_DIR: $PGDATA,
涉及修改的配置文件有:
1. 主備postgresql 版本需保持一致!
2. postgresql.conf 配置文件需保持一致!
3. 備庫提權(quán)為主庫后,切記不要直接啟動原主庫!
1.為備庫準(zhǔn)備主庫,修改配置文件
cat postgresql.conf
wal_level = hot_standby # minimal, replica, or logical
max_wal_senders = 2 # max number of walsender processes
hot_standby = on # "on" allows queries during recovery
max_connections = 300 # (change requires restart)
archive_mode = on
restore_command = ''
cat pg_hba.conf
host all all 127.0.0.1/32 trust
host all all ::1/128 trust
host replication gitlab_replicator 192.168.10.3/32 trust
cat recovery.done
restore_command = ''
recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=192.168.10.3 port=5432 user=gitlab_replicator'
2.創(chuàng)建用于復(fù)制的帳號,并賦予replication 權(quán)限
postgres=#CREATE USER gitlab_replicator REPLICATION LOGIN;
3.基本備份為備庫準(zhǔn)備引導(dǎo)數(shù)據(jù)
postgres=#SELECT pg_start_backup(back_20180929);
cd /opt/gitlab/postgresql && tar zcf base_data.tar.gz data
postgres=#SELECT pg_start_stop();
1.解壓基本數(shù)據(jù)
將主庫上創(chuàng)建的base_data.tar.gz
上傳到備庫主機(jī),并解壓到數(shù)據(jù)目錄
tar zxf base_data.tar.gz -C /opt/gitlab/postgresql/
2.修改配置文件
注:postgresql.conf 文件內(nèi)此部分一定要與主庫的配置保持一致,否則可能會在主從切換恢復(fù)時產(chǎn)生錯誤
cat postgresql.conf
wal_level = hot_standby # minimal, replica, or logical
max_wal_senders = 2 # max number of walsender processes
hot_standby = on # "on" allows queries during recovery
max_connections = 300 # (change requires restart)
archive_mode = on
restore_command = ''
cat pg_hba.conf
host all all 127.0.0.1/32 trust
host all all ::1/128 trust
host replication gitlab_replicator 192.168.10.2/32 trust
cat recovery.conf
restore_command = ''
recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=192.168.10.2 port=5432 user=gitlab_replicator'
3.啟動備庫,在主庫執(zhí)行sql,并在備庫驗證
主備庫的判斷是根據(jù)當(dāng)前是否存在recovery.conf文件
在將備庫提升為主庫時,會自動重命名recovery.conf文件為recovery.done。同時要將主庫降為備庫,降備方式為重命名recovery.done文件mv recover.done recovery.conf
這樣在處理完主庫故障后,才會將提升到主庫的更新數(shù)據(jù)同步過來
這里提供個簡單的思路及腳本,前提是假設(shè)主備之間不存在網(wǎng)絡(luò)故障,且不存在同時為主或備的情況
判斷主庫的狀態(tài)
1.為shut down
時
判斷備庫是否為in archive recovery
并執(zhí)行將主庫降為備庫,將備庫升為主庫,其余狀態(tài)發(fā)送報警
2.為in production
時
判斷備庫是否為in archive recovery
,其余狀態(tài)發(fā)送報警
3.為in archive recovery
時
判斷備庫是否為in production
,其余狀態(tài)發(fā)送報警
4.為shut down in recovery
時
發(fā)送報警
#!/bin/bash
PRIMARY_IP="192.168.10.2"
STANDBY_IP="192.168.10.3"
PGDATA="/DATA/postgresql/data"
SYS_USER="root"
PG_USER="postgresql"
PGPREFIX="/opt/pgsql"
pg_status()
{
ssh ${SYS_USER}@$1 /
"su - ${PG_USER} -c '${PGPREFIX}/bin/pg_controldata -D ${PGDATA} /
| grep cluster' | awk -F : '{print \$2}' | sed 's/^[ \t]*\|[ \t]*$//'"
}
# recover to primary
recovery_primary()
{
ssh ${SYS_USER}@$1 /
"su - ${PG_USER} -c '${PGPREFIX}/bin/pg_ctl promote -D ${PGDATA}'"
}
# primary to recovery
primary_recovery()
{
ssh ${SYS_USER}@$1 /
"su - ${PG_USER} -c 'cd ${PGDATA} && mv recovery.done recovery.conf'"
}
send_mail()
{
echo "send SNS"
}
case "`pg_status ${PRIMARY_IP}`" in
"shut down")
case "`pg_status ${STANDBY_IP}`" in
"in archive recovery")
primary_recovery ${PRIMARY_IP}
recovery_primary ${STANDBY_IP}
;;
"shut down in recovery"|"in production")
send_mail
;;
esac
;;
"in production")
case "`pg_status ${STANDBY_IP}`" in
"shut down in recovery"|"shut down"|"in production")
send_mail
;;
esac
echo "primary"
;;
"in archive recovery")
case "`pg_status ${STANDBY_IP}`" in
"shut down")
primary_recovery ${STANDBY_IP}
recovery_primary ${PRIMARY_IP}
;;
"shut down in recovery"|"in archive recovery")
send_mail
;;
esac
echo "recovery"
;;
"shut down in recovery")
case "`pg_status ${STANDBY_IP}`" in
"shut down in recovery"|"shut down"|"in archive recovery")
send_mail
;;
esac
echo "recovery down"
;;
esac
FATAL: no pg_hba.conf entry for replication connection from host "192.168.1.2", user "standby", SSL off
需要將用戶加入到192.168.1.2
的pg_hba.conf
文件內(nèi),并配置好認(rèn)證方式及口令
FATAL: database system identifier differs between the primary and standby
DETAIL: The primary's identifier is 6589099331306617531, the standby's identifier is 6605061381709180314
這是因為在將備庫提升為主庫后,將原先的主庫恢復(fù)為主庫時沒有完全將缺少的數(shù)據(jù)同步過來導(dǎo)致的
FATAL: number of requested standby connections exceeds max_wal_senders (currently 0)
FATAL: hot standby is not possible because max_connections = 100 is a lower setting than on the master server (its value was 200)
FATAL: hot standby is not possible because max_locks_per_transaction = 64 is a lower setting than on the master server (its value was 128)
這是因為備庫的數(shù)量超過主庫配置的允許備庫最大連接數(shù)量了
這里配置的為0
此問提出現(xiàn)在將備庫升為主庫后,將原主庫降為備庫同步數(shù)據(jù)時,因此需要注意這部分的配置主備要一致
postgresql 主主同步需要使用三方中間件實現(xiàn),有需要的可查詢相關(guān)資料
本文參考資料為postgresql 官方文檔