Raid
如何從一個設備正常但暫時離線的故障 zpool 中恢復?
我在 raidz 配置中有一個帶有 4 個 2TB USB 磁碟的 zpool:
[root@chef /mnt/Chef]# zpool status farcryz1 pool: farcryz1 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM farcryz1 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 da1 ONLINE 0 0 0 da2 ONLINE 0 0 0 da3 ONLINE 0 0 0 da4 ONLINE 0 0 0
為了測試池,我通過從其中一個驅動器中拔出 USB 電纜而不使其離線來模擬驅動器故障:
[root@chef /mnt/Chef]# zpool status farcryz1 pool: farcryz1 state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: none requested config: NAME STATE READ WRITE CKSUM farcryz1 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 da4 ONLINE 22 4 0 da3 ONLINE 0 0 0 da1 ONLINE 0 0 0 da2 ONLINE 0 0 0 errors: No known data errors
數據仍然存在,池仍然線上。偉大的!現在讓我們嘗試恢復池。我重新插入驅動器,並
zpool replace
按照上面的指示發出命令:[root@chef /mnt/Chef]# zpool replace farcryz1 da4 invalid vdev specification use '-f' to override the following errors: /dev/da4 is part of active pool 'farcryz1'
嗯….這沒有幫助…所以我嘗試了 a
zpool clear farcryz1
,但這根本沒有幫助。我還是換不來da4
。所以我嘗試了online
ing,offline
ing,clear
ing ,ing,replace
ing和scrub
ing的組合。現在我被困在這裡:[root@chef /mnt/Chef]# zpool status -v farcryz1 pool: farcryz1 state: DEGRADED status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the device using 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-4J scrub: scrub completed after 0h2m with 0 errors on Fri Sep 9 13:43:34 2011 config: NAME STATE READ WRITE CKSUM farcryz1 DEGRADED 0 0 0 raidz1 DEGRADED 0 0 0 da4 UNAVAIL 9 0 0 experienced I/O failures da3 ONLINE 0 0 0 da1 ONLINE 0 0 0 da2 ONLINE 0 0 0 errors: No known data errors [root@chef /mnt/Chef]# zpool replace farcryz1 da4 cannot replace da4 with da4: da4 is busy
我如何從這種情況中恢復,其中我的 zpool 中的一個設備意外斷開連接(但不是故障設備)並且現在又回來了,準備重新同步?
**編輯:**根據要求,
tail
一個dmesg
:(ses3:umass-sim4:4:0:1): removing device entry (da4:umass-sim4:4:0:0): removing device entry ugen3.2: <Western Digital> at usbus3 umass4: <Western Digital My Book 1140, class 0/0, rev 3.00/10.03, addr 1> on usbus3 da4 at umass-sim4 bus 4 scbus6 target 0 lun 0 da4: <WD My Book 1140 1003> Fixed Direct Access SCSI-6 device da4: 400.000MB/s transfers da4: 1907697MB (3906963456 512 byte sectors: 255H 63S/T 243197C) ses3 at umass-sim4 bus 4 scbus6 target 0 lun 1 ses3: <WD SES Device 1003> Fixed Enclosure Services SCSI-6 device ses3: 400.000MB/s transfers ses3: SCSI-3 SES Device GEOM: da4: partition 1 does not start on a track boundary. GEOM: da4: partition 1 does not end on a track boundary. GEOM: da4: partition 1 does not start on a track boundary. GEOM: da4: partition 1 does not end on a track boundary. ugen3.2: <Western Digital> at usbus3 (disconnected) umass4: at uhub3, port 1, addr 1 (disconnected) (da4:umass-sim4:4:0:0): lost device (da4:umass-sim4:4:0:0): removing device entry (ses3:umass-sim4:4:0:1): lost device (ses3:umass-sim4:4:0:1): removing device entry ugen3.2: <Western Digital> at usbus3 umass4: <Western Digital My Book 1140, class 0/0, rev 3.00/10.03, addr 1> on usbus3 da4 at umass-sim4 bus 4 scbus6 target 0 lun 0 da4: <WD My Book 1140 1003> Fixed Direct Access SCSI-6 device da4: 400.000MB/s transfers da4: 1907697MB (3906963456 512 byte sectors: 255H 63S/T 243197C) ses3 at umass-sim4 bus 4 scbus6 target 0 lun 1 ses3: <WD SES Device 1003> Fixed Enclosure Services SCSI-6 device ses3: 400.000MB/s transfers ses3: SCSI-3 SES Device
確定是否需要更換設備,並使用“zpool clear”清除錯誤或使用“zpool replace”更換設備。
看起來在最初的臨時故障之後,您可能只需要
zpool clear
清除錯誤即可。如果您想假裝它是驅動器更換,您可能需要先清除驅動器上的數據,然後再嘗試將其重新添加到池中。
您嘗試的各種命令的輸出是什麼?您是否嘗試過
-f
打開它們中的任何一個?你跑了
zpool clear poolname device-name
嗎?在你的情況下,
zpool clear farcryz1 da4
- 這應該已經開始重新同步過程。