無法使用 Corosync/Pacemaker 啟動 PostgreSQL 複製資源

February 8, 2017

我正在使用 Corosync/Pacemaker 的 HA 在兩台伺服器（CentOS 6.5）上設置 PostgreSQL 複製。

我的軟體資訊：

postgresql91-9.1.19-1PGDG.rhel6.x86_64
postgresql91-server-9.1.19-1PGDG.rhel6.x86_64
postgresql91-libs-9.1.19-1PGDG.rhel6.x86_64
postgresql91-contrib-9.1.19-1PGDG.rhel6.x86_64
postgresql91-devel-9.1.19-1PGDG.rhel6.x86_64


corosynclib-1.4.7-2.el6.x86_64
corosync-1.4.7-2.el6.x86_64

pacemaker-cli-1.1.12-8.el6_7.2.x86_64
pacemaker-1.1.12-8.el6_7.2.x86_64
pacemaker-cluster-libs-1.1.12-8.el6_7.2.x86_64
pacemaker-libs-1.1.12-8.el6_7.2.x86_64

resource-agents-3.9.5-24.el6_7.1.x86_64

複製正在工作，從主伺服器我可以看到從伺服器連接：

-bash-4.1$ psql -c "select client_addr,sync_state from pg_stat_replication;" client_addr | sync_state -------------+------------ 172.16.1.10 | async (1 row)

而且我還確認在主伺服器上創建的數據被複製到從伺服器。

這是我的crm configure show：

node master                                                                  
node slave

primitive PSQL pgsql \
   params restart_on_promote=true pgctl="/usr/pgsql-9.1/bin/pg_ctl" psql="/usr/pgsql-9.1/bin/psql" pgdata="/var/lib/pgsql/9.1/data" node_list="master slave" repuser=rep rep_mode=sync restore_command="cp /var/lib/pgsql/pg_archive/%f %p" primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5 keepalives_count=5" master_ip=172.16.1.100 archive_cleanup_command="/usr/pgsql-9.1/bin/pg_archivecleanup /var/lib/pgsql/pg_archive/ %r"
primitive RepIP IPaddr2 \
   params ip=172.16.1.100 nic=eth2 cidr_netmask=24 \
   op monitor interval=30s
primitive VirtualIP IPaddr2 \
   params ip=10.0.0.100 cidr_netmask=24 \
   op monitor interval=30s
group psql-ha VirtualIP RepIP \
   meta target-role=Started
property cib-bootstrap-options: \
   dc-version=1.1.11-97629de \
   cluster-infrastructure="classic openais (with plugin)" \
   expected-quorum-votes=2 \
   stonith-enabled=false \
   no-quorum-policy=ignore

但資源PSQL無法啟動。我的crm status：

Last updated: Sat Nov 28 13:09:47 2015
Last change: Sat Nov 28 12:50:21 2015
Stack: classic openais (with plugin)
Current DC: master - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured, 2 expected votes
3 Resources configured


Online: [ master slave ]

Resource Group: psql-ha
    VirtualIP  (ocf::heartbeat:IPaddr2):   Started master 
    RepIP  (ocf::heartbeat:IPaddr2):   Started master 

Failed actions:
   PSQL_start_0 on slave 'not configured' (6): call=60, status=complete, last-rc-change='Sat Nov 28 12:50:21 2015', queued=0ms, exec=53ms

登錄時出現錯誤/var/log/messages：

11 月 28 日 12:50:21 從屬 pgsql (PSQL)
$$ 3387 $$：錯誤：複製（rep_mode=async 或 sync）需要主/從配置。

誰能為我解釋為什麼我會收到這個錯誤？

謝謝。

更新：

（主機名稱更改為 node1/node2）

通過@gf_ 的配置解決了問題。

注意：忘記我的舊配置，我在這個部署模型中只使用一個虛擬 IP。

目前狀態：

[root@node1 ~]# crm_mon -Af -1
Last updated: Wed Dec  2 05:13:56 2015
Last change: Wed Dec  2 05:10:06 2015
Stack: classic openais (with plugin)
Current DC: node2 - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured, 2 expected votes
3 Resources configured


Online: [ node1 node2 ]

VirtualIP  (ocf::heartbeat:IPaddr2):   Started node2 
Master/Slave Set: msPSQL [PSQL]
    Masters: [ node2 ]
    Slaves: [ node1 ]

Node Attributes:
* Node node1:
   + PSQL-data-status                  : STREAMING|SYNC
   + PSQL-status                       : HS:sync   
   + master-PSQL                       : 100       
* Node node2:
   + PSQL-data-status                  : LATEST    
   + PSQL-master-baseline              : 000000000E000078
   + PSQL-status                       : PRI       
   + master-PSQL                       : 1000      

Migration summary:
* Node node1: 
* Node node2:

工作配置：

node node1 \
   attributes PSQL-data-status="STREAMING|SYNC"
node node2 \
   attributes PSQL-data-status=LATEST
primitive PSQL pgsql \
   params restart_on_promote=false pgctl="/usr/pgsql-9.1/bin/pg_ctl" psql="/usr/pgsql-9.1/bin/psql" pgdata="/var/lib/pgsql/9.1/data" node_list="node1 node2" repuser=replicate rep_mode=sync restore_command="cp /var/lib/pgsql/pg_archive/%f %p" primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5 keepalives_count=5" master_ip=10.0.0.100 archive_cleanup_command="/usr/pgsql-9.1/bin/pg_archivecleanup /var/lib/pgsql/pg_archive/ %r" \
   op start timeout=60s interval=0s on-fail=restart \
   op monitor timeout=60s interval=4s on-fail=restart \
   op monitor timeout=60s interval=3s on-fail=restart role=Master \
   op promote timeout=60s interval=0s on-fail=restart \
   op demote timeout=60s interval=0s on-fail=stop \
   op stop timeout=60s interval=0s on-fail=block \
   op notify timeout=60s interval=0s
primitive VirtualIP IPaddr2 \
   params ip=10.0.0.100 nic=eth1 cidr_netmask=24 \
   op monitor interval=30s
ms msPSQL PSQL \
   meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 target-role=Started notify=true
colocation rsc_colocation-1 inf: VirtualIP msPSQL:Master
order rsc_order-1 0: msPSQL:promote VirtualIP:start symmetrical=false
order rsc_order-2 0: msPSQL:promote VirtualIP:stop symmetrical=false
property cib-bootstrap-options: \
   dc-version=1.1.11-97629de \
   cluster-infrastructure="classic openais (with plugin)" \
   expected-quorum-votes=2 \
   no-quorum-policy=ignore \
   stonith-enabled=false \
   last-lrm-refresh=1449033003
rsc_defaults rsc-options: \
   resource-stickiness=100

同時，PSQL應該在您的兩個節點上執行，master並且slave. （只是一個小提示：不確定這些術語是否適合在您的設置中選擇作為節點名稱。）
因此，您必須在配置中反映這一點，您遇到的錯誤非常清楚，並描述了缺少的內容：您必須將您的配置配置PSQL為複製（應該同時在多個節點上執行），多-state（應該在主從設置中執行）資源。如果您不知道這是關於什麼的，那麼現在是查看文件的好時機，尤其是對於Clones - Resources That Get Active on Multiple Hosts和Multi-state-Resources That Have Multiple Modes。
因此，您的擴展配置可能如下所示：
ms msPSQL PSQL \
meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
此外，您必須指定您應該在哪些節點上執行VirtualIP，並且您必須確保資源以正確的順序停止/啟動：RepIP
colocation rsc_colocation-1 inf: psql-ha msPSQL:Master
order rsc_order-1 0: msPSQL:promote psql-ha:start symmetrical=false
order rsc_order-2 0: msPSQL:demote psql-ha:stop symmetrical=false

引用自：https://serverfault.com/questions/739468

無法使用 Corosync/Pacemaker 啟動 PostgreSQL 複製資源

相關問答

由於未知主機，pcs 創建集群失敗

節點切換後 Pacemaker 不啟動服務

不支持 CIB：驗證器 ‘pacemaker-2.0’，發布 ‘3.0.9’

斷開/拔出後如何使節點自動加入集群（Pacemaker，Corosync 2節點設置）？

Corosync 和 Pacemaker 部署

如何在我的 LAN 中測試負載平衡？