Postgresql
無法使用 Corosync/Pacemaker 啟動 PostgreSQL 複製資源
我正在使用 Corosync/Pacemaker 的 HA 在兩台伺服器(CentOS 6.5)上設置 PostgreSQL 複製。
我的軟體資訊:
postgresql91-9.1.19-1PGDG.rhel6.x86_64 postgresql91-server-9.1.19-1PGDG.rhel6.x86_64 postgresql91-libs-9.1.19-1PGDG.rhel6.x86_64 postgresql91-contrib-9.1.19-1PGDG.rhel6.x86_64 postgresql91-devel-9.1.19-1PGDG.rhel6.x86_64 corosynclib-1.4.7-2.el6.x86_64 corosync-1.4.7-2.el6.x86_64 pacemaker-cli-1.1.12-8.el6_7.2.x86_64 pacemaker-1.1.12-8.el6_7.2.x86_64 pacemaker-cluster-libs-1.1.12-8.el6_7.2.x86_64 pacemaker-libs-1.1.12-8.el6_7.2.x86_64 resource-agents-3.9.5-24.el6_7.1.x86_64
複製正在工作,從主伺服器我可以看到從伺服器連接:
-bash-4.1$ psql -c "select client_addr,sync_state from pg_stat_replication;" client_addr | sync_state -------------+------------ 172.16.1.10 | async (1 row)
而且我還確認在主伺服器上創建的數據被複製到從伺服器。
這是我的
crm configure show
:node master node slave primitive PSQL pgsql \ params restart_on_promote=true pgctl="/usr/pgsql-9.1/bin/pg_ctl" psql="/usr/pgsql-9.1/bin/psql" pgdata="/var/lib/pgsql/9.1/data" node_list="master slave" repuser=rep rep_mode=sync restore_command="cp /var/lib/pgsql/pg_archive/%f %p" primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5 keepalives_count=5" master_ip=172.16.1.100 archive_cleanup_command="/usr/pgsql-9.1/bin/pg_archivecleanup /var/lib/pgsql/pg_archive/ %r" primitive RepIP IPaddr2 \ params ip=172.16.1.100 nic=eth2 cidr_netmask=24 \ op monitor interval=30s primitive VirtualIP IPaddr2 \ params ip=10.0.0.100 cidr_netmask=24 \ op monitor interval=30s group psql-ha VirtualIP RepIP \ meta target-role=Started property cib-bootstrap-options: \ dc-version=1.1.11-97629de \ cluster-infrastructure="classic openais (with plugin)" \ expected-quorum-votes=2 \ stonith-enabled=false \ no-quorum-policy=ignore
但資源
PSQL
無法啟動。我的crm status
:Last updated: Sat Nov 28 13:09:47 2015 Last change: Sat Nov 28 12:50:21 2015 Stack: classic openais (with plugin) Current DC: master - partition with quorum Version: 1.1.11-97629de 2 Nodes configured, 2 expected votes 3 Resources configured Online: [ master slave ] Resource Group: psql-ha VirtualIP (ocf::heartbeat:IPaddr2): Started master RepIP (ocf::heartbeat:IPaddr2): Started master Failed actions: PSQL_start_0 on slave 'not configured' (6): call=60, status=complete, last-rc-change='Sat Nov 28 12:50:21 2015', queued=0ms, exec=53ms
登錄時出現錯誤
/var/log/messages
:11 月 28 日 12:50:21 從屬 pgsql (PSQL)
$$ 3387 $$:錯誤:複製(rep_mode=async 或 sync)需要主/從配置。
誰能為我解釋為什麼我會收到這個錯誤?
謝謝。
更新:
(主機名稱更改為 node1/node2)
通過@gf_ 的配置解決了問題。
注意:忘記我的舊配置,我在這個部署模型中只使用一個虛擬 IP。
目前狀態:
[root@node1 ~]# crm_mon -Af -1 Last updated: Wed Dec 2 05:13:56 2015 Last change: Wed Dec 2 05:10:06 2015 Stack: classic openais (with plugin) Current DC: node2 - partition with quorum Version: 1.1.11-97629de 2 Nodes configured, 2 expected votes 3 Resources configured Online: [ node1 node2 ] VirtualIP (ocf::heartbeat:IPaddr2): Started node2 Master/Slave Set: msPSQL [PSQL] Masters: [ node2 ] Slaves: [ node1 ] Node Attributes: * Node node1: + PSQL-data-status : STREAMING|SYNC + PSQL-status : HS:sync + master-PSQL : 100 * Node node2: + PSQL-data-status : LATEST + PSQL-master-baseline : 000000000E000078 + PSQL-status : PRI + master-PSQL : 1000 Migration summary: * Node node1: * Node node2:
工作配置:
node node1 \ attributes PSQL-data-status="STREAMING|SYNC" node node2 \ attributes PSQL-data-status=LATEST primitive PSQL pgsql \ params restart_on_promote=false pgctl="/usr/pgsql-9.1/bin/pg_ctl" psql="/usr/pgsql-9.1/bin/psql" pgdata="/var/lib/pgsql/9.1/data" node_list="node1 node2" repuser=replicate rep_mode=sync restore_command="cp /var/lib/pgsql/pg_archive/%f %p" primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5 keepalives_count=5" master_ip=10.0.0.100 archive_cleanup_command="/usr/pgsql-9.1/bin/pg_archivecleanup /var/lib/pgsql/pg_archive/ %r" \ op start timeout=60s interval=0s on-fail=restart \ op monitor timeout=60s interval=4s on-fail=restart \ op monitor timeout=60s interval=3s on-fail=restart role=Master \ op promote timeout=60s interval=0s on-fail=restart \ op demote timeout=60s interval=0s on-fail=stop \ op stop timeout=60s interval=0s on-fail=block \ op notify timeout=60s interval=0s primitive VirtualIP IPaddr2 \ params ip=10.0.0.100 nic=eth1 cidr_netmask=24 \ op monitor interval=30s ms msPSQL PSQL \ meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 target-role=Started notify=true colocation rsc_colocation-1 inf: VirtualIP msPSQL:Master order rsc_order-1 0: msPSQL:promote VirtualIP:start symmetrical=false order rsc_order-2 0: msPSQL:promote VirtualIP:stop symmetrical=false property cib-bootstrap-options: \ dc-version=1.1.11-97629de \ cluster-infrastructure="classic openais (with plugin)" \ expected-quorum-votes=2 \ no-quorum-policy=ignore \ stonith-enabled=false \ last-lrm-refresh=1449033003 rsc_defaults rsc-options: \ resource-stickiness=100
- 同時,
PSQL
應該在您的兩個節點上執行,master
並且slave
. (只是一個小提示:不確定這些術語是否適合在您的設置中選擇作為節點名稱。)- 因此,您必須在配置中反映這一點,您遇到的錯誤非常清楚,並描述了缺少的內容:您必須將您的配置配置
PSQL
為複製(應該同時在多個節點上執行),多-state(應該在主從設置中執行)資源。如果您不知道這是關於什麼的,那麼現在是查看文件的好時機,尤其是對於Clones - Resources That Get Active on Multiple Hosts和Multi-state-Resources That Have Multiple Modes。- 因此,您的擴展配置可能如下所示:
ms msPSQL PSQL \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
- 此外,您必須指定您應該在哪些節點上執行
VirtualIP
,並且您必須確保資源以正確的順序停止/啟動:RepIP
colocation rsc_colocation-1 inf: psql-ha msPSQL:Master order rsc_order-1 0: msPSQL:promote psql-ha:start symmetrical=false order rsc_order-2 0: msPSQL:demote psql-ha:stop symmetrical=false