Zabbix
zabserver-a ‘未執行’ (7) 上的 HA 起搏器 monitor_5000: call=53, status=complete, last-rc-change=‘Mon Jul 13 07:51:32 2015’, queue…
我有一個使用起搏器和 cman 的 Zabbix 主動/被動集群。但是,我在“pcs status”中看到以下內容,並且在故障轉移期間 zabbix-server 服務沒有出現。浮動IP雖然移動得很好。
作業系統 CentOS 6.6 Zabbix 2.4
[root@abc-zabserver-b cluster]# rpm -qa | grep cman cman-3.0.12.1-68.el6_6.1.x86_64 [root@abc-zabserver-b cluster]# rpm -qa | grep pacemaker pacemaker-cluster-libs-1.1.12-4.el6.x86_64 pacemaker-1.1.12-4.el6.x86_64 pacemaker-libs-1.1.12-4.el6.x86_64 pacemaker-cli-1.1.12-4.el6.x86_64 [root@abc-zabserver-b ~]# rpm -qa | grep corosync corosync-1.4.7-1.el6.x86_64 corosynclib-1.4.7-1.el6.x86_64
這是錯誤
[root@abc-zabserver-b cluster]# pcs status Cluster name: abc-zabvip Last updated: Mon Jul 13 08:01:57 2015 Last change: Thu Jul 2 17:01:48 2015 Stack: cman Current DC: abc-zabserver-a - partition with quorum Version: 1.1.11-97629de 2 Nodes configured 2 Resources configured Online: [ abc-zabserver-a abc-zabserver-b ] Full list of resources: Resource Group: zabbix-cluster ClusterIP (ocf::heartbeat:IPaddr2): Started abc-zabserver-a zabbix-server (lsb:zabbix-server): Stopped Failed actions: zabbix-server_monitor_5000 on abc-zabserver-a 'not running' (7): call=541, status=complete, last-rc-change='Mon Jul 13 08:01:57 2015', queued=0ms, exec=0ms
這是 cluster.conf
<cluster config_version="9" name="abc-zabvip"> <fence_daemon/> <clusternodes> <clusternode name="abc-zabserver-a" nodeid="1"> <fence> <method name="pcmk-redirect"> <device name="pcmk" port="abc-zabserver-a"/> </method> </fence> </clusternode> <clusternode name="abc-zabserver-b" nodeid="2"> <fence> <method name="pcmk-redirect"> <device name="pcmk" port="abc-zabserver-b"/> </method> </fence> </clusternode> </clusternodes> <cman expected_votes="1" port="5405" transport="udpu" two_node="1"/> <fencedevices> <fencedevice agent="fence_pcmk" name="pcmk"/> </fencedevices> <rm> <failoverdomains/> <resources/> </rm> </cluster>
這是 /etc/sysconfig/cman
CMAN_QUORUM_TIMEOUT=0
我在這個集群上做的其他一些配置
pcs resource create ClusterIP ocf:heartbeat:IPaddr2 ip=10.99.122.69 cidr_netmask=24 op monitor interval=5s pcs property set stonith-enabled=false pcs resource create zabbix-server lsb:zabbix-server op monitor interval=5s pcs resource group add zabbix-cluster ClusterIP zabbix-server pcs property set no-quorum-policy=ignore pcs property set default-resource-stickiness="100"
zabbix_server.log 中的錯誤
listener failed: zbx_tcp_listen() fatal error: unable to serve on any address [[-]:10051]
Zabbix 伺服器程序正在執行但服務未執行
[root@abc-zabserver-b zabbix]# service zabbix-server status zabbix_server is stopped [root@abc-zabserver-b zabbix]# ps afx | grep -i zabbix 26835 pts/0 S+ 0:00 | \_ grep -i zabbix 2867 ? S 0:00 zabbix_server: poller #50 [connecting to the database] 2926 ? S 0:00 zabbix_server -c /etc/zabbi/zabbix_server.conf 2962 ? S 0:00 zabbix_server -c /etc/zabbi/zabbix_server.conf [root@abc-zabserver-b zabbix]# service zabbix-server status zabbix_server is stopped
電腦配置顯示
[root@abc-zabserver-b zabbix]# pcs config show Cluster Name: abc-zabvip Corosync Nodes: abc-zabserver-a abc-zabserver-b Pacemaker Nodes: abc-zabserver-a abc-zabserver-b Resources: Group: zabbix-cluster Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=10.99.122.69 cidr_netmask=24 Operations: start interval=0s timeout=20s (ClusterIP-start- timeout-20s) stop interval=0s timeout=20s (ClusterIP-stop-timeout-20s) monitor interval=5s (ClusterIP-monitor-interval-5s) Resource: zabbix-server (class=lsb type=zabbix-server) Operations: monitor interval=5s (zabbix-server-monitor-interval-5s) Stonith Devices: Fencing Levels: Location Constraints: Ordering Constraints: Colocation Constraints: Cluster Properties: cluster-infrastructure: cman dc-version: 1.1.11-97629de default-resource-stickiness: 100 no-quorum-policy: ignore stonith-enabled: false
Zabbix 二進製文件和配置文件之間存在配置不匹配,導致此問題。最糟糕的事情!