Zfs
ZFS 池在重新啟動時降級
我有一個設置了 14 磁碟 ZFS raidz2 池的 Ubuntu 伺服器。
大約 80% 的時間,在重新啟動時,我最終會得到一個退化的池,其中兩個磁碟標記為故障。發生故障的驅動器並不總是相同的,但總是恰好是兩個驅動器。例如:
$ sudo zpool status pool: tank state: DEGRADED status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the device using 'zpool replace'. see: http://zfsonlinux.org/msg/ZFS-8000-4J scan: resilvered 4K in 0h0m with 0 errors on Sun Sep 30 23:08:51 2018 config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 sde ONLINE 0 0 0 sdc ONLINE 0 0 0 sdd ONLINE 0 0 0 sda ONLINE 0 0 0 sdh ONLINE 0 0 0 11521322863231878081 FAULTED 0 0 0 was /dev/sdf1 15273938560620494453 FAULTED 0 0 0 was /dev/sdg1 sdb ONLINE 0 0 0 sdi ONLINE 0 0 0 sdj ONLINE 0 0 0 sdk ONLINE 0 0 0 sdl ONLINE 0 0 0 sdm ONLINE 0 0 0 sdn ONLINE 0 0 0 errors: No known data errors
我可以導出和重新導入池,磁碟不再出現故障。例如:
$ sudo zpool export tank $ sudo zpool import tank $ sudo zpool status pool: tank state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://zfsonlinux.org/msg/ZFS-8000-9P scan: resilvered 4K in 0h0m with 0 errors on Sun Sep 30 23:08:51 2018 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 sde ONLINE 0 0 0 sdc ONLINE 0 0 0 sdd ONLINE 0 0 0 sda ONLINE 0 0 0 sdh ONLINE 0 0 0 sdg ONLINE 0 0 1 sdf ONLINE 0 0 0 sdb ONLINE 0 0 0 sdi ONLINE 0 0 0 sdj ONLINE 0 0 0 sdk ONLINE 0 0 0 sdl ONLINE 0 0 0 sdm ONLINE 0 0 0 sdn ONLINE 0 0 0 errors: No known data errors
正在使用的 HBA 已在另一台伺服器上正常工作。
還有什麼我可以嘗試在重新啟動時避免這些故障驅動器嗎?我有另一個可以換掉的 HBA。
您不應在池配置中使用 /dev/sdX 名稱。
SCSI 列舉中的任何更改(例如插入 CDROM 或 USB 驅動器)都可能導致設備名稱更改,從而導致您遇到錯誤。
您可以選擇使用 /dev/disk/by-id 名稱。
zpool export tank
使用and執行此操作zpool import -d /dev/disk/by-id tank