Postgresql

repmgr - 故障轉移切換後,兩個節點都充當主節點

  • January 31, 2018

我有一個通過repmgr.

數據庫拓撲如下所示:

db1 - 10.10.10.50 ( master )
db2 - 10.10.10.60 ( standby )
wit - 10.10.10.70 ( witness )

集群的創建(作為複制和自動故障轉移)按預期工作,但問題如下。

假設在我的集群中db1節點出現故障,那麼預期的行為是db2節點被提升為新的主節點。這一切都很好,日誌證明了這一點:

[WARNING] connection to upstream has been lost, trying to recover... 60 seconds before failover decision
[WARNING] connection to upstream has been lost, trying to recover... 50 seconds before failover decision
[WARNING] connection to upstream has been lost, trying to recover... 40 seconds before failover decision
[WARNING] connection to upstream has been lost, trying to recover... 30 seconds before failover decision
[WARNING] connection to upstream has been lost, trying to recover... 20 seconds before failover decision
[WARNING] connection to upstream has been lost, trying to recover... 10 seconds before failover decision
[ERROR] unable to reconnect to upstream after 60 seconds...
[ERROR] connection to database failed: could not connect to server: No route to host
       Is the server running on host "10.10.10.50" and accepting
       TCP/IP connections on port 5432?

[ERROR] connection to database failed: could not connect to server: No route to host
       Is the server running on host "10.10.10.50" and accepting
       TCP/IP connections on port 5432?

[NOTICE] promoting standby
[NOTICE] promoting server using '/usr/lib/postgresql/9.3/bin/pg_ctl -D /var/lib/postgresql/9.3/main promote'
[NOTICE] STANDBY PROMOTE successful.  You should REINDEX any hash indexes you have.

db2節點現在被提升為一個新的主節點,一切都很好,直到db1節點恢復。

在那種情況下,預計db1to 會成為新的備用節點,但事實並非如此,因為我最終將兩個節點都作為主節點?!

所以我的問題是,在故障轉移之後,我怎樣才能防止兩個節點都充當主節點(在文件中它說包括第三個節點作為見證節點 - 我有那個),但預期的效果不存在。

這是我的 repmgr.conf 文件的範例:

cluster=test_cluster
node=1
node_name=db1
conninfo='host=10.10.10.50 dbname=repmgr user=repmgr'
master_response_timeout=60
reconnect_attempts=6
reconnect_interval=10
failover=automatic
promote_command='repmgr standby promote -f /etc/repmgr/repmgr.conf'
follow_command='repmgr standby follow -f /etc/repmgr/repmgr.conf'
pg_bindir=/usr/lib/postgresql/9.3/bin

db1以及節點恢復後的集群狀態:

repmgr -f /etc/repmgr/repmgr.conf cluster show
Role      | Connection String
* master  | host=10.10.10.50 dbname=repmgr user=repmgr
* master  | host=10.10.10.60 dbname=repmgr user=repmgr
 witness | host=10.10.10.70 dbname=repmgr user=repmgr port=5499

非常感謝,

最好的問候

幾個月前,我使用 repmgr 研究了自動故障轉移。似乎 repmgr 正在按預期工作。

IIRC repmgr 不會將舊主伺服器作為新備用伺服器,您需要執行--force standby clone. 如果發生故障轉移(repmgr standby follow),您可以設置其他備用節點跟隨新的主節點。

  • 你會期待你的主人意外康復嗎?
  • 您如何處理應用程序中的故障轉移?
  • 您不是將所有數據庫流量重定向到新的主伺服器嗎?

通常希望將失敗的主伺服器作為備用伺服器帶回複製中。首先,確保master的PostgreSQL伺服器不再執行;然後用於repmgr standby clone將其數據目錄與目前主伺服器重新同步,例如:

repmgr -f /etc/repmgr/repmgr.conf --force --rsync-only  -h node2 -d repmgr -U repmgr --verbose  standby clone

在這裡必須使用命令行選項--force,以確保 repmgr 將重新使用現有數據目錄,並且--rsync-only,這會導致 repmgr 使用rsync而不是pg_basebackup,因為後者只能用於複製新的備用數據庫。

然後可以重新啟動節點。然後需要向repmgr 重新註冊該節點;再次需要該--force選項來更新現有記錄:

repmgr -f /etc/repmgr/9.5/repmgr.conf --force standby register

引用自:https://serverfault.com/questions/717348