Zfs

ZFS 池在重新啟動時降級

  • October 11, 2018

我有一個設置了 14 磁碟 ZFS raidz2 池的 Ubuntu 伺服器。

大約 80% 的時間,在重新啟動時,我最終會得到一個退化的池,其中兩個磁碟標記為故障。發生故障的驅動器並不總是相同的,但總是恰好是兩個驅動器。例如:

$ sudo zpool status
 pool: tank
state: DEGRADED
status: One or more devices could not be used because the label is missing or
       invalid.  Sufficient replicas exist for the pool to continue
       functioning in a degraded state.
action: Replace the device using 'zpool replace'.
  see: http://zfsonlinux.org/msg/ZFS-8000-4J
 scan: resilvered 4K in 0h0m with 0 errors on Sun Sep 30 23:08:51 2018
config:

       NAME                      STATE     READ WRITE CKSUM
       tank                      DEGRADED     0     0     0
         raidz2-0                DEGRADED     0     0     0
           sde                   ONLINE       0     0     0
           sdc                   ONLINE       0     0     0
           sdd                   ONLINE       0     0     0
           sda                   ONLINE       0     0     0
           sdh                   ONLINE       0     0     0
           11521322863231878081  FAULTED      0     0     0  was /dev/sdf1
           15273938560620494453  FAULTED      0     0     0  was /dev/sdg1
           sdb                   ONLINE       0     0     0
           sdi                   ONLINE       0     0     0
           sdj                   ONLINE       0     0     0
           sdk                   ONLINE       0     0     0
           sdl                   ONLINE       0     0     0
           sdm                   ONLINE       0     0     0
           sdn                   ONLINE       0     0     0

errors: No known data errors

我可以導出和重新導入池,磁碟不再出現故障。例如:

$ sudo zpool export tank
$ sudo zpool import tank
$ sudo zpool status
 pool: tank
state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
       attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
       using 'zpool clear' or replace the device with 'zpool replace'.
  see: http://zfsonlinux.org/msg/ZFS-8000-9P
 scan: resilvered 4K in 0h0m with 0 errors on Sun Sep 30 23:08:51 2018
config:

       NAME        STATE     READ WRITE CKSUM
       tank        ONLINE       0     0     0
         raidz2-0  ONLINE       0     0     0
           sde     ONLINE       0     0     0
           sdc     ONLINE       0     0     0
           sdd     ONLINE       0     0     0
           sda     ONLINE       0     0     0
           sdh     ONLINE       0     0     0
           sdg     ONLINE       0     0     1
           sdf     ONLINE       0     0     0
           sdb     ONLINE       0     0     0
           sdi     ONLINE       0     0     0
           sdj     ONLINE       0     0     0
           sdk     ONLINE       0     0     0
           sdl     ONLINE       0     0     0
           sdm     ONLINE       0     0     0
           sdn     ONLINE       0     0     0

errors: No known data errors

正在使用的 HBA 已在另一台伺服器上正常工作。

還有什麼我可以嘗試在重新啟動時避免這些故障驅動器嗎?我有另一個可以換掉的 HBA。

您不應在池配置中使用 /dev/sdX 名稱。

SCSI 列舉中的任何更改(例如插入 CDROM 或 USB 驅動器)都可能導致設備名稱更改,從而導致您遇到錯誤。

您可以選擇使用 /dev/disk/by-id 名稱。

zpool export tank使用and執行此操作zpool import -d /dev/disk/by-id tank

引用自:https://serverfault.com/questions/935123