Zfs

更換磁碟故障

  • December 16, 2021

我的池中有一個磁碟故障(引發了太多錯誤)。

The number of I/O errors associated with a ZFS device exceeded
acceptable levels. ZFS has marked the device as faulted.

impact: Fault tolerance of the pool may be compromised.
   eid: 52
 class: statechange
 state: FAULTED
 host: databank-a
 time: 2021-12-11 16:36:33-0500
 vpath: /dev/disk02_old
 vphys: pci-0000:00:1f.2-ata-4
 vguid: 0x73F7B0B1D1B45864
 devid: /dev/disk02_old
 pool: 0x47B3E7C1336F1F4F

所以,我用一個全新的磁碟替換它zpool replace pool /dev/foo /dev/barzpool clear pool /dev/bar

 pool: DATA01
state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
       continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Wed Dec 15 11:23:57 2021
       6.83T scanned at 256M/s, 5.80T issued at 217M/s, 9.08T total
       232G resilvered, 63.85% done, 0 days 04:24:05 to go
config:

       NAME                        STATE     READ WRITE CKSUM
       DATA01                      DEGRADED     0     0     0
       raidz1-0                    DEGRADED     0     0     0
           /dev/disk01             ONLINE       0     0     0
           replacing-1             UNAVAIL      0     0     0  insufficient replicas
           8356341911383201892     UNAVAIL      0     0     0  was /dev/disk02_old
           /dev/disk02_new         FAULTED      0    81     0  too many errors  (resilvering)
           /dev/disk03             ONLINE       0     0     0
           /dev/disk04             ONLINE       0     0     0


errors: No known data errors

驅動器沒有故障的可能性有多大?

驅動器沒有故障的可能性有多大?

可能驅動器有故障。如果錯誤計數器是正確的,那麼前幾 TB 使用中的幾十個錯誤比預期的要差。而且您已經清除了錯誤,因此這不是一次性的瞬態事件。

雖然Backblaze 消費者驅動器故障數據並不完全是您所擁有的,但它表明早期故障仍然存在。即使早死率很低,你也可能是幾千人中不幸的人,得到一個不太完美的產品。

開始對來自不同媒體的重要數據進行備份恢復測試,以防萬一在最壞的情況下需要。確保有更多的備用磁碟庫存。重新同步完成後,再次檢查磁碟。根據需要繼續更換它們。

引用自:https://serverfault.com/questions/1086451