如何從一個設備正常但暫時離線的故障 zpool 中恢復？

December 15, 2015

我在 raidz 配置中有一個帶有 4 個 2TB USB 磁碟的 zpool：

[root@chef /mnt/Chef]# zpool status farcryz1
 pool: farcryz1
state: ONLINE
scrub: none requested
config:

   NAME        STATE     READ WRITE CKSUM
   farcryz1    ONLINE       0     0     0
     raidz1    ONLINE       0     0     0
       da1     ONLINE       0     0     0
       da2     ONLINE       0     0     0
       da3     ONLINE       0     0     0
       da4     ONLINE       0     0     0

為了測試池，我通過從其中一個驅動器中拔出 USB 電纜而不使其離線來模擬驅動器故障：

[root@chef /mnt/Chef]# zpool status farcryz1
 pool: farcryz1
state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
   attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
   using 'zpool clear' or replace the device with 'zpool replace'.
  see: http://www.sun.com/msg/ZFS-8000-9P
scrub: none requested
config:

   NAME        STATE     READ WRITE CKSUM
   farcryz1    ONLINE       0     0     0
     raidz1    ONLINE       0     0     0
       da4     ONLINE      22     4     0
       da3     ONLINE       0     0     0
       da1     ONLINE       0     0     0
       da2     ONLINE       0     0     0

errors: No known data errors

數據仍然存在，池仍然線上。偉大的！現在讓我們嘗試恢復池。我重新插入驅動器，並zpool replace按照上面的指示發出命令：

[root@chef /mnt/Chef]# zpool replace farcryz1 da4
invalid vdev specification
use '-f' to override the following errors:
/dev/da4 is part of active pool 'farcryz1'

嗯….這沒有幫助…所以我嘗試了 a zpool clear farcryz1，但這根本沒有幫助。我還是換不來da4。所以我嘗試了onlineing，offlineing，clearing ，ing， replaceing和scrubing的組合。現在我被困在這裡：

[root@chef /mnt/Chef]# zpool status -v farcryz1
 pool: farcryz1
state: DEGRADED
status: One or more devices could not be used because the label is missing or
   invalid.  Sufficient replicas exist for the pool to continue
   functioning in a degraded state.
action: Replace the device using 'zpool replace'.
  see: http://www.sun.com/msg/ZFS-8000-4J
scrub: scrub completed after 0h2m with 0 errors on Fri Sep  9 13:43:34 2011
config:

   NAME        STATE     READ WRITE CKSUM
   farcryz1    DEGRADED     0     0     0
     raidz1    DEGRADED     0     0     0
       da4     UNAVAIL      9     0     0  experienced I/O failures
       da3     ONLINE       0     0     0
       da1     ONLINE       0     0     0
       da2     ONLINE       0     0     0

errors: No known data errors
[root@chef /mnt/Chef]# zpool replace farcryz1 da4
cannot replace da4 with da4: da4 is busy

我如何從這種情況中恢復，其中我的 zpool 中的一個設備意外斷開連接（但不是故障設備）並且現在又回來了，準備重新同步？

**編輯：**根據要求，tail一個dmesg：

(ses3:umass-sim4:4:0:1): removing device entry
(da4:umass-sim4:4:0:0): removing device entry
ugen3.2: &lt;Western Digital&gt; at usbus3
umass4: &lt;Western Digital My Book 1140, class 0/0, rev 3.00/10.03, addr 1&gt; on usbus3
da4 at umass-sim4 bus 4 scbus6 target 0 lun 0
da4: &lt;WD My Book 1140 1003&gt; Fixed Direct Access SCSI-6 device 
da4: 400.000MB/s transfers
da4: 1907697MB (3906963456 512 byte sectors: 255H 63S/T 243197C)
ses3 at umass-sim4 bus 4 scbus6 target 0 lun 1
ses3: &lt;WD SES Device 1003&gt; Fixed Enclosure Services SCSI-6 device 
ses3: 400.000MB/s transfers
ses3: SCSI-3 SES Device
GEOM: da4: partition 1 does not start on a track boundary.
GEOM: da4: partition 1 does not end on a track boundary.
GEOM: da4: partition 1 does not start on a track boundary.
GEOM: da4: partition 1 does not end on a track boundary.
ugen3.2: &lt;Western Digital&gt; at usbus3 (disconnected)
umass4: at uhub3, port 1, addr 1 (disconnected)
(da4:umass-sim4:4:0:0): lost device
(da4:umass-sim4:4:0:0): removing device entry
(ses3:umass-sim4:4:0:1): lost device
(ses3:umass-sim4:4:0:1): removing device entry
ugen3.2: &lt;Western Digital&gt; at usbus3
umass4: &lt;Western Digital My Book 1140, class 0/0, rev 3.00/10.03, addr 1&gt; on usbus3
da4 at umass-sim4 bus 4 scbus6 target 0 lun 0
da4: &lt;WD My Book 1140 1003&gt; Fixed Direct Access SCSI-6 device 
da4: 400.000MB/s transfers
da4: 1907697MB (3906963456 512 byte sectors: 255H 63S/T 243197C)
ses3 at umass-sim4 bus 4 scbus6 target 0 lun 1
ses3: &lt;WD SES Device 1003&gt; Fixed Enclosure Services SCSI-6 device 
ses3: 400.000MB/s transfers
ses3: SCSI-3 SES Device

確定是否需要更換設備，並使用“zpool clear”清除錯誤或使用“zpool replace”更換設備。
看起來在最初的臨時故障之後，您可能只需要zpool clear清除錯誤即可。
如果您想假裝它是驅動器更換，您可能需要先清除驅動器上的數據，然後再嘗試將其重新添加到池中。

您嘗試的各種命令的輸出是什麼？您是否嘗試過-f打開它們中的任何一個？
你跑了zpool clear poolname device-name嗎？
在你的情況下，zpool clear farcryz1 da4- 這應該已經開始重新同步過程。

引用自：https://serverfault.com/questions/309859

如何從一個設備正常但暫時離線的故障 zpool 中恢復？

相關問答

硬碟上的位腐爛是一個真正的問題嗎？可以做些什麼呢？

ZFS 在硬體鏡像之上，還是只是在 ZFS 中鏡像？

廣泛使用的不同 RAID 級別有哪些？我應該在什麼時候考慮它們？

ZFS 有多少成本？

FreeBSD ZFS 上的讀寫速度慢

觸摸 ZFS 卷導致作業系統凍結，據報導驅動器正常