Glusterfs
治療 GlusterFS 似乎不起作用?
我最近更換了一個在 GlusterFS 集群中提供磚塊的 HDD。我能夠將該 HDD 映射回磚塊,然後讓 GlusterFS 成功複製到它。
然而,整個過程有一個問題似乎對我不起作用。我試圖用替換的磚塊在卷上執行“heal”命令,但會不斷遇到這個問題:
$ gluster volume heal nova Locking failed on c551316f-7218-44cf-bb36-befe3d3df34b. Please check log file for details. Locking failed on ae62c691-ae55-4c99-8364-697cb3562668. Please check log file for details. Locking failed on cb78ba3c-256f-4413-ae7e-aa5c0e9872b5. Please check log file for details. Locking failed on 79a6a414-3569-482c-929f-b7c5da16d05e. Please check log file for details. Locking failed on 5f43c6a4-0ccd-424a-ae56-0492ec64feeb. Please check log file for details. Locking failed on c7416c1f-494b-4a95-b48d-6c766c7bce14. Please check log file for details. Locking failed on 6c0111fc-b5e7-4350-8be5-3179a1a5187e. Please check log file for details. Locking failed on 88fcb687-47aa-4921-b3ab-d6c3b330b32a. Please check log file for details. Locking failed on d73de03a-0f66-4619-89ef-b73c9bbd800e. Please check log file for details. Locking failed on 4a780f57-37e4-4f1b-9c34-187a0c7e44bf. Please check log file for details.
日誌基本上與上述內容相呼應,特別是:
$ tail etc-glusterfs-glusterd.vol.log [2015-08-03 23:08:03.289249] E [glusterd-syncop.c:562:_gd_syncop_mgmt_lock_cbk] 0-management: Could not find peer with ID d827a48e-627f-0000-0a00-000000000000 [2015-08-03 23:08:03.289258] E [glusterd-syncop.c:111:gd_collate_errors] 0-: Locking failed on c7416c1f-494b-4a95-b48d-6c766c7bce14. Please check log file for details. [2015-08-03 23:08:03.289279] W [rpc-clnt-ping.c:199:rpc_clnt_ping_cbk] 0-management: socket or ib related error [2015-08-03 23:08:03.289827] E [glusterd-syncop.c:562:_gd_syncop_mgmt_lock_cbk] 0-management: Could not find peer with ID d827a48e-627f-0000-0a00-000000000000 [2015-08-03 23:08:03.289858] E [glusterd-syncop.c:111:gd_collate_errors] 0-: Locking failed on d73de03a-0f66-4619-89ef-b73c9bbd800e. Please check log file for details. [2015-08-03 23:08:03.290509] E [glusterd-syncop.c:562:_gd_syncop_mgmt_lock_cbk] 0-management: Could not find peer with ID d827a48e-627f-0000-0a00-000000000000 [2015-08-03 23:08:03.290529] E [glusterd-syncop.c:111:gd_collate_errors] 0-: Locking failed on 4a780f57-37e4-4f1b-9c34-187a0c7e44bf. Please check log file for details. [2015-08-03 23:08:03.290597] E [glusterd-syncop.c:1804:gd_sync_task_begin] 0-management: Locking Peers Failed. [2015-08-03 23:07:03.351603] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already stopped [2015-08-03 23:07:03.351644] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already stopped
這些其他日誌在我嘗試上述操作時有消息:
$ ls -ltr -rw------- 1 root root 41704 Aug 2 12:07 glfsheal-nova.log -rw------- 1 root root 15986 Aug 2 12:07 cmd_history.log-20150802 -rw------- 1 root root 290359 Aug 3 19:07 var-lib-nova-instances.log -rw------- 1 root root 221829 Aug 3 19:07 glustershd.log -rw------- 1 root root 195472 Aug 3 19:07 nfs.log -rw------- 1 root root 61831116 Aug 3 19:07 var-lib-nova-mnt-92ef2ec54fd18595ed18d8e6027a1b3d.log -rw------- 1 root root 3504 Aug 3 19:08 cmd_history.log -rw------- 1 root root 89294 Aug 3 19:08 cli.log -rw------- 1 root root 136421 Aug 3 19:08 etc-glusterfs-glusterd.vol.log
縱觀它們,尚不清楚其中是否與這個特定問題有關。
通過上述設置,我最初認為我只能從 GlusterFS 集群的主節點執行修復命令,但事實證明,我真正的問題在於 GlusterFS 集群中的 11 個節點執行 2 個不同的版本GlusterFS 的。
一旦我意識到這一點,我將所有節點更新到最新版本的 GlusterFS (3.7.3) 並且能夠從任何節點執行修復,正如人們所期望的那樣。