Raid

如何從一個設備正常但暫時離線的故障 zpool 中恢復?

  • December 15, 2015

我在 raidz 配置中有一個帶有 4 個 2TB USB 磁碟的 zpool:

[root@chef /mnt/Chef]# zpool status farcryz1
 pool: farcryz1
state: ONLINE
scrub: none requested
config:

   NAME        STATE     READ WRITE CKSUM
   farcryz1    ONLINE       0     0     0
     raidz1    ONLINE       0     0     0
       da1     ONLINE       0     0     0
       da2     ONLINE       0     0     0
       da3     ONLINE       0     0     0
       da4     ONLINE       0     0     0

為了測試池,我通過從其中一個驅動器中拔出 USB 電纜而不使其離線來模擬驅動器故障:

[root@chef /mnt/Chef]# zpool status farcryz1
 pool: farcryz1
state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
   attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
   using 'zpool clear' or replace the device with 'zpool replace'.
  see: http://www.sun.com/msg/ZFS-8000-9P
scrub: none requested
config:

   NAME        STATE     READ WRITE CKSUM
   farcryz1    ONLINE       0     0     0
     raidz1    ONLINE       0     0     0
       da4     ONLINE      22     4     0
       da3     ONLINE       0     0     0
       da1     ONLINE       0     0     0
       da2     ONLINE       0     0     0

errors: No known data errors

數據仍然存在,池仍然線上。偉大的!現在讓我們嘗試恢復池。我重新插入驅動器,並zpool replace按照上面的指示發出命令:

[root@chef /mnt/Chef]# zpool replace farcryz1 da4
invalid vdev specification
use '-f' to override the following errors:
/dev/da4 is part of active pool 'farcryz1'

嗯….這沒有幫助…所以我嘗試了 a zpool clear farcryz1,但這根本沒有幫助。我還是換不來da4。所以我嘗試了onlineing,offlineing,clearing ,ing, replaceing和scrubing的組合。現在我被困在這裡:

[root@chef /mnt/Chef]# zpool status -v farcryz1
 pool: farcryz1
state: DEGRADED
status: One or more devices could not be used because the label is missing or
   invalid.  Sufficient replicas exist for the pool to continue
   functioning in a degraded state.
action: Replace the device using 'zpool replace'.
  see: http://www.sun.com/msg/ZFS-8000-4J
scrub: scrub completed after 0h2m with 0 errors on Fri Sep  9 13:43:34 2011
config:

   NAME        STATE     READ WRITE CKSUM
   farcryz1    DEGRADED     0     0     0
     raidz1    DEGRADED     0     0     0
       da4     UNAVAIL      9     0     0  experienced I/O failures
       da3     ONLINE       0     0     0
       da1     ONLINE       0     0     0
       da2     ONLINE       0     0     0

errors: No known data errors
[root@chef /mnt/Chef]# zpool replace farcryz1 da4
cannot replace da4 with da4: da4 is busy

我如何從這種情況中恢復,其中我的 zpool 中的一個設備意外斷開連接(但不是故障設備)並且現在又回來了,準備重新同步?


**編輯:**根據要求,tail一個dmesg

(ses3:umass-sim4:4:0:1): removing device entry
(da4:umass-sim4:4:0:0): removing device entry
ugen3.2: <Western Digital> at usbus3
umass4: <Western Digital My Book 1140, class 0/0, rev 3.00/10.03, addr 1> on usbus3
da4 at umass-sim4 bus 4 scbus6 target 0 lun 0
da4: <WD My Book 1140 1003> Fixed Direct Access SCSI-6 device 
da4: 400.000MB/s transfers
da4: 1907697MB (3906963456 512 byte sectors: 255H 63S/T 243197C)
ses3 at umass-sim4 bus 4 scbus6 target 0 lun 1
ses3: <WD SES Device 1003> Fixed Enclosure Services SCSI-6 device 
ses3: 400.000MB/s transfers
ses3: SCSI-3 SES Device
GEOM: da4: partition 1 does not start on a track boundary.
GEOM: da4: partition 1 does not end on a track boundary.
GEOM: da4: partition 1 does not start on a track boundary.
GEOM: da4: partition 1 does not end on a track boundary.
ugen3.2: <Western Digital> at usbus3 (disconnected)
umass4: at uhub3, port 1, addr 1 (disconnected)
(da4:umass-sim4:4:0:0): lost device
(da4:umass-sim4:4:0:0): removing device entry
(ses3:umass-sim4:4:0:1): lost device
(ses3:umass-sim4:4:0:1): removing device entry
ugen3.2: <Western Digital> at usbus3
umass4: <Western Digital My Book 1140, class 0/0, rev 3.00/10.03, addr 1> on usbus3
da4 at umass-sim4 bus 4 scbus6 target 0 lun 0
da4: <WD My Book 1140 1003> Fixed Direct Access SCSI-6 device 
da4: 400.000MB/s transfers
da4: 1907697MB (3906963456 512 byte sectors: 255H 63S/T 243197C)
ses3 at umass-sim4 bus 4 scbus6 target 0 lun 1
ses3: <WD SES Device 1003> Fixed Enclosure Services SCSI-6 device 
ses3: 400.000MB/s transfers
ses3: SCSI-3 SES Device

確定是否需要更換設備,並使用“zpool clear”清除錯誤或使用“zpool replace”更換設備。

看起來在最初的臨時故障之後,您可能只需要zpool clear清除錯誤即可。

如果您想假裝它是驅動器更換,您可能需要先清除驅動器上的數據,然後再嘗試將其重新添加到池中。

您嘗試的各種命令的輸出是什麼?您是否嘗試過-f打開它們中的任何一個?

你跑了zpool clear poolname device-name嗎?

在你的情況下,zpool clear farcryz1 da4- 這應該已經開始重新同步過程。

引用自:https://serverfault.com/questions/309859