Postgresql
repmgr - 故障轉移切換後,兩個節點都充當主節點
我有一個通過
repmgr
.數據庫拓撲如下所示:
db1 - 10.10.10.50 ( master ) db2 - 10.10.10.60 ( standby ) wit - 10.10.10.70 ( witness )
集群的創建(作為複制和自動故障轉移)按預期工作,但問題如下。
假設在我的集群中
db1
節點出現故障,那麼預期的行為是db2
節點被提升為新的主節點。這一切都很好,日誌證明了這一點:[WARNING] connection to upstream has been lost, trying to recover... 60 seconds before failover decision [WARNING] connection to upstream has been lost, trying to recover... 50 seconds before failover decision [WARNING] connection to upstream has been lost, trying to recover... 40 seconds before failover decision [WARNING] connection to upstream has been lost, trying to recover... 30 seconds before failover decision [WARNING] connection to upstream has been lost, trying to recover... 20 seconds before failover decision [WARNING] connection to upstream has been lost, trying to recover... 10 seconds before failover decision [ERROR] unable to reconnect to upstream after 60 seconds... [ERROR] connection to database failed: could not connect to server: No route to host Is the server running on host "10.10.10.50" and accepting TCP/IP connections on port 5432? [ERROR] connection to database failed: could not connect to server: No route to host Is the server running on host "10.10.10.50" and accepting TCP/IP connections on port 5432? [NOTICE] promoting standby [NOTICE] promoting server using '/usr/lib/postgresql/9.3/bin/pg_ctl -D /var/lib/postgresql/9.3/main promote' [NOTICE] STANDBY PROMOTE successful. You should REINDEX any hash indexes you have.
該
db2
節點現在被提升為一個新的主節點,一切都很好,直到db1
節點恢復。在那種情況下,預計
db1
to 會成為新的備用節點,但事實並非如此,因為我最終將兩個節點都作為主節點?!所以我的問題是,在故障轉移之後,我怎樣才能防止兩個節點都充當主節點(在文件中它說包括第三個節點作為見證節點 - 我有那個),但預期的效果不存在。
這是我的 repmgr.conf 文件的範例:
cluster=test_cluster node=1 node_name=db1 conninfo='host=10.10.10.50 dbname=repmgr user=repmgr' master_response_timeout=60 reconnect_attempts=6 reconnect_interval=10 failover=automatic promote_command='repmgr standby promote -f /etc/repmgr/repmgr.conf' follow_command='repmgr standby follow -f /etc/repmgr/repmgr.conf' pg_bindir=/usr/lib/postgresql/9.3/bin
db1
以及節點恢復後的集群狀態:repmgr -f /etc/repmgr/repmgr.conf cluster show Role | Connection String * master | host=10.10.10.50 dbname=repmgr user=repmgr * master | host=10.10.10.60 dbname=repmgr user=repmgr witness | host=10.10.10.70 dbname=repmgr user=repmgr port=5499
非常感謝,
最好的問候
幾個月前,我使用 repmgr 研究了自動故障轉移。似乎 repmgr 正在按預期工作。
IIRC repmgr 不會將舊主伺服器作為新備用伺服器,您需要執行
--force standby clone
. 如果發生故障轉移(repmgr standby follow),您可以設置其他備用節點跟隨新的主節點。
- 你會期待你的主人意外康復嗎?
- 您如何處理應用程序中的故障轉移?
- 您不是將所有數據庫流量重定向到新的主伺服器嗎?
通常希望將失敗的主伺服器作為備用伺服器帶回複製中。首先,確保master的PostgreSQL伺服器不再執行;然後用於
repmgr standby clone
將其數據目錄與目前主伺服器重新同步,例如:repmgr -f /etc/repmgr/repmgr.conf --force --rsync-only -h node2 -d repmgr -U repmgr --verbose standby clone
在這裡必須使用命令行選項
--force
,以確保 repmgr 將重新使用現有數據目錄,並且--rsync-only
,這會導致 repmgr 使用rsync
而不是pg_basebackup
,因為後者只能用於複製新的備用數據庫。然後可以重新啟動節點。然後需要向repmgr 重新註冊該節點;再次需要該
--force
選項來更新現有記錄:repmgr -f /etc/repmgr/9.5/repmgr.conf --force standby register