Glusterfs

治療 GlusterFS 似乎不起作用?

  • August 9, 2015

我最近更換了一個在 GlusterFS 集群中提供磚塊的 HDD。我能夠將該 HDD 映射回磚塊,然後讓 GlusterFS 成功複製到它。

然而,整個過程有一個問題似乎對我不起作用。我試圖用替換的磚塊在卷上執行“heal”命令,但會不斷遇到這個問題:

$ gluster volume heal nova
Locking failed on c551316f-7218-44cf-bb36-befe3d3df34b. Please check log file for details.
Locking failed on ae62c691-ae55-4c99-8364-697cb3562668. Please check log file for details.
Locking failed on cb78ba3c-256f-4413-ae7e-aa5c0e9872b5. Please check log file for details.
Locking failed on 79a6a414-3569-482c-929f-b7c5da16d05e. Please check log file for details.
Locking failed on 5f43c6a4-0ccd-424a-ae56-0492ec64feeb. Please check log file for details.
Locking failed on c7416c1f-494b-4a95-b48d-6c766c7bce14. Please check log file for details.
Locking failed on 6c0111fc-b5e7-4350-8be5-3179a1a5187e. Please check log file for details.
Locking failed on 88fcb687-47aa-4921-b3ab-d6c3b330b32a. Please check log file for details.
Locking failed on d73de03a-0f66-4619-89ef-b73c9bbd800e. Please check log file for details.
Locking failed on 4a780f57-37e4-4f1b-9c34-187a0c7e44bf. Please check log file for details.

日誌基本上與上述內容相呼應,特別是:

$ tail etc-glusterfs-glusterd.vol.log
[2015-08-03 23:08:03.289249] E [glusterd-syncop.c:562:_gd_syncop_mgmt_lock_cbk] 0-management: Could not find peer with ID d827a48e-627f-0000-0a00-000000000000
[2015-08-03 23:08:03.289258] E [glusterd-syncop.c:111:gd_collate_errors] 0-: Locking failed on c7416c1f-494b-4a95-b48d-6c766c7bce14. Please check log file for details.
[2015-08-03 23:08:03.289279] W [rpc-clnt-ping.c:199:rpc_clnt_ping_cbk] 0-management: socket or ib related error
[2015-08-03 23:08:03.289827] E [glusterd-syncop.c:562:_gd_syncop_mgmt_lock_cbk] 0-management: Could not find peer with ID d827a48e-627f-0000-0a00-000000000000
[2015-08-03 23:08:03.289858] E [glusterd-syncop.c:111:gd_collate_errors] 0-: Locking failed on d73de03a-0f66-4619-89ef-b73c9bbd800e. Please check log file for details.
[2015-08-03 23:08:03.290509] E [glusterd-syncop.c:562:_gd_syncop_mgmt_lock_cbk] 0-management: Could not find peer with ID d827a48e-627f-0000-0a00-000000000000
[2015-08-03 23:08:03.290529] E [glusterd-syncop.c:111:gd_collate_errors] 0-: Locking failed on 4a780f57-37e4-4f1b-9c34-187a0c7e44bf. Please check log file for details.
[2015-08-03 23:08:03.290597] E [glusterd-syncop.c:1804:gd_sync_task_begin] 0-management: Locking Peers Failed.
[2015-08-03 23:07:03.351603] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already stopped
[2015-08-03 23:07:03.351644] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already stopped

這些其他日誌在我嘗試上述操作時有消息:

$ ls -ltr
-rw-------   1 root root      41704 Aug  2 12:07 glfsheal-nova.log
-rw-------   1 root root      15986 Aug  2 12:07 cmd_history.log-20150802
-rw-------   1 root root     290359 Aug  3 19:07 var-lib-nova-instances.log
-rw-------   1 root root     221829 Aug  3 19:07 glustershd.log
-rw-------   1 root root     195472 Aug  3 19:07 nfs.log
-rw-------   1 root root   61831116 Aug  3 19:07 var-lib-nova-mnt-92ef2ec54fd18595ed18d8e6027a1b3d.log
-rw-------   1 root root       3504 Aug  3 19:08 cmd_history.log
-rw-------   1 root root      89294 Aug  3 19:08 cli.log
-rw-------   1 root root     136421 Aug  3 19:08 etc-glusterfs-glusterd.vol.log

縱觀它們,尚不清楚其中是否與這個特定問題有關。

通過上述設置,我最初認為我只能從 GlusterFS 集群的主節點執行修復命令,但事實證明,我真正的問題在於 GlusterFS 集群中的 11 個節點執行 2 個不同的版本GlusterFS 的。

一旦我意識到這一點,我將所有節點更新到最新版本的 GlusterFS (3.7.3) 並且能夠從任何節點執行修復,正如人們所期望的那樣。

引用自:https://serverfault.com/questions/710611