Linux

使用 2 個過期磁碟恢復 mdadm 4 磁碟 RAID5 陣列

  • May 6, 2020

編輯:

此 wiki中的場景,其中 1 個驅動器的事件計數略低於陣列的其餘部分,另一個驅動器的事件計數顯著低於陣列的其餘部分,建議在組裝--force時忽略最舊的驅動器,並添加它(或如果磁碟是新的驅動器實際上很糟糕)在陣列以降級狀態組裝後返回。

在我的情況下這樣做是否有意義,或者--force考慮到兩個過時的驅動器具有相同的事件計數,嘗試與所有 4 個驅動器進行組裝是否更可取?


鑑於我有限的 RAID 知識,我想我會在嘗試任何事情之前詢問我的具體情況。失去這 4 個驅動器上的數據對我來說並不是世界末日,但找回它仍然很好。

我最初將 RAID5 陣列從舊機器遷移到新機器,沒有任何問題。我用了大約 2 天,直到我注意到 2 個驅動器沒有在 BIOS 引導螢幕中列出。由於進入 linux 後陣列仍然組裝並且工作正常,我並沒有考慮太多。

第二天陣列停止工作,所以我連接了一張 PCI-e SATA 卡並更換了我所有的 SATA 電纜。之後,所有 4 個驅動器都出現在 BIOS 啟動螢幕中,所以我假設我的電纜或 SATA 埠導致了最初的問題。

現在我留下了一個損壞的數組。mdadm --assemble將兩個驅動器列為(possibly out of date),並mdadm --examine顯示22717過期驅動器和23199其他兩個驅動器的事件。這個 wiki 條目表明<50可以通過組裝來克服事件計數差異--force,但是我的 4 個驅動器被482事件分開。

以下是所有相關的突襲資訊。在陣列發生故障之前,我知道所有 4 個驅動器的主 GPT 表都已損壞,但由於當時一切正常,我還沒有解決這個問題。

mdadm --assemble --scan --verbose

mdadm: /dev/sde is identified as a member of /dev/md/guyyst-server:0, slot 2.
mdadm: /dev/sdd is identified as a member of /dev/md/guyyst-server:0, slot 3.
mdadm: /dev/sdc is identified as a member of /dev/md/guyyst-server:0, slot 1.
mdadm: /dev/sdb is identified as a member of /dev/md/guyyst-server:0, slot 0.
mdadm: added /dev/sdb to /dev/md/guyyst-server:0 as 0 (possibly out of date)
mdadm: added /dev/sdc to /dev/md/guyyst-server:0 as 1 (possibly out of date)
mdadm: added /dev/sdd to /dev/md/guyyst-server:0 as 3
mdadm: added /dev/sde to /dev/md/guyyst-server:0 as 2
mdadm: /dev/md/guyyst-server:0 assembled from 2 drives - not enough to start the array.

mdadm --examine /dev/sd[bcde]

/dev/sdb:
         Magic : a92b4efc
       Version : 1.2
   Feature Map : 0x1
    Array UUID : 356cd1df:3a5c992d:c9899cbc:4c01e6d9
          Name : guyyst-server:0
 Creation Time : Wed Mar 27 23:49:58 2019
    Raid Level : raid5
  Raid Devices : 4

Avail Dev Size : 7813772976 (3725.90 GiB 4000.65 GB)
    Array Size : 11720658432 (11177.69 GiB 12001.95 GB)
 Used Dev Size : 7813772288 (3725.90 GiB 4000.65 GB)
   Data Offset : 264192 sectors
  Super Offset : 8 sectors
  Unused Space : before=264112 sectors, after=688 sectors
         State : clean
   Device UUID : 7ea39918:2680d2f3:a6c3b0e6:0e815210

Internal Bitmap : 8 sectors from superblock
   Update Time : Fri May  1 03:53:45 2020
 Bad Block Log : 512 entries available at offset 24 sectors
      Checksum : 76a81505 - correct
        Events : 22717

        Layout : left-symmetric
    Chunk Size : 512K

  Device Role : Active device 0
  Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)



/dev/sdc:
         Magic : a92b4efc
       Version : 1.2
   Feature Map : 0x1
    Array UUID : 356cd1df:3a5c992d:c9899cbc:4c01e6d9
          Name : guyyst-server:0
 Creation Time : Wed Mar 27 23:49:58 2019
    Raid Level : raid5
  Raid Devices : 4

Avail Dev Size : 7813772976 (3725.90 GiB 4000.65 GB)
    Array Size : 11720658432 (11177.69 GiB 12001.95 GB)
 Used Dev Size : 7813772288 (3725.90 GiB 4000.65 GB)
   Data Offset : 264192 sectors
  Super Offset : 8 sectors
  Unused Space : before=264112 sectors, after=688 sectors
         State : clean
   Device UUID : 119ed456:cbb187fa:096d15e1:e544db2c

Internal Bitmap : 8 sectors from superblock
   Update Time : Fri May  1 03:53:45 2020
 Bad Block Log : 512 entries available at offset 24 sectors
      Checksum : d285ae78 - correct
        Events : 22717

        Layout : left-symmetric
    Chunk Size : 512K

  Device Role : Active device 1
  Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)



/dev/sdd:
         Magic : a92b4efc
       Version : 1.2
   Feature Map : 0x1
    Array UUID : 356cd1df:3a5c992d:c9899cbc:4c01e6d9
          Name : guyyst-server:0
 Creation Time : Wed Mar 27 23:49:58 2019
    Raid Level : raid5
  Raid Devices : 4

Avail Dev Size : 7813772976 (3725.90 GiB 4000.65 GB)
    Array Size : 11720658432 (11177.69 GiB 12001.95 GB)
 Used Dev Size : 7813772288 (3725.90 GiB 4000.65 GB)
   Data Offset : 264192 sectors
  Super Offset : 8 sectors
  Unused Space : before=264112 sectors, after=688 sectors
         State : clean
   Device UUID : 2670e048:4ebf581d:bf9ea089:0eae56c3

Internal Bitmap : 8 sectors from superblock
   Update Time : Fri May  1 04:12:18 2020
 Bad Block Log : 512 entries available at offset 24 sectors
      Checksum : 26662f2e - correct
        Events : 23199

        Layout : left-symmetric
    Chunk Size : 512K

  Device Role : Active device 3
  Array State : A.AA ('A' == active, '.' == missing, 'R' == replacing)



/dev/sde:
         Magic : a92b4efc
       Version : 1.2
   Feature Map : 0x1
    Array UUID : 356cd1df:3a5c992d:c9899cbc:4c01e6d9
          Name : guyyst-server:0
 Creation Time : Wed Mar 27 23:49:58 2019
    Raid Level : raid5
  Raid Devices : 4

Avail Dev Size : 7813772976 (3725.90 GiB 4000.65 GB)
    Array Size : 11720658432 (11177.69 GiB 12001.95 GB)
 Used Dev Size : 7813772288 (3725.90 GiB 4000.65 GB)
   Data Offset : 264192 sectors
  Super Offset : 8 sectors
  Unused Space : before=264112 sectors, after=688 sectors
         State : clean
   Device UUID : 093856ae:bb19e552:102c9f77:86488154

Internal Bitmap : 8 sectors from superblock
   Update Time : Fri May  1 04:12:18 2020
 Bad Block Log : 512 entries available at offset 24 sectors
      Checksum : 40917946 - correct
        Events : 23199

        Layout : left-symmetric
    Chunk Size : 512K

  Device Role : Active device 2
  Array State : A.AA ('A' == active, '.' == missing, 'R' == replacing)

mdadm --detail /dev/md0

/dev/md0:
          Version : 1.2
       Raid Level : raid0
    Total Devices : 4
      Persistence : Superblock is persistent

            State : inactive
  Working Devices : 4

             Name : guyyst-server:0
             UUID : 356cd1df:3a5c992d:c9899cbc:4c01e6d9
           Events : 23199

   Number   Major   Minor   RaidDevice

      -       8       64        -        /dev/sde
      -       8       32        -        /dev/sdc
      -       8       48        -        /dev/sdd
      -       8       16        -        /dev/sdb

fdisk -l

The primary GPT table is corrupt, but the backup appears OK, so that will be used.
Disk /dev/sdb: 3.7 TiB, 4000787030016 bytes, 7814037168 sectors
Disk model: WDC WD40EFRX-68N
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 79F4A900-C9B7-03A9-402A-7DDE6D72EA00

Device     Start        End    Sectors  Size Type
/dev/sdb1   2048 7814035455 7814033408  3.7T Microsoft basic data


The primary GPT table is corrupt, but the backup appears OK, so that will be used.
Disk /dev/sdc: 3.7 TiB, 4000787030016 bytes, 7814037168 sectors
Disk model: WDC WD40EFRX-68N
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 43B95B20-C9B1-03A9-C856-EE506C72EA00

Device     Start        End    Sectors  Size Type
/dev/sdc1   2048 7814035455 7814033408  3.7T Microsoft basic data


The primary GPT table is corrupt, but the backup appears OK, so that will be used.
Disk /dev/sdd: 3.7 TiB, 4000787030016 bytes, 7814037168 sectors
Disk model: WDC WD40EFRX-68N
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 1E276A80-99EA-03A7-A0DA-89877AE6E900


The primary GPT table is corrupt, but the backup appears OK, so that will be used.
Disk /dev/sde: 3.7 TiB, 4000787030016 bytes, 7814037168 sectors
Disk model: WDC WD40EFRX-68N
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 11BD8020-C9B5-03A9-0860-6F446D72EA00

Device     Start        End    Sectors  Size Type
/dev/sde1   2048 7814035455 7814033408  3.7T Microsoft basic data

smartctl -a -d ata /dev/sd[bcde]

作為pastebin,因為它超過了字元限制:https ://pastebin.com/vMVCX9EH

一般來說,在這種情況下,您必須預料到數據會失去。四個磁碟中有兩個在大致相同的時間點從 RAID 中彈出。組裝回來後,您將擁有一個損壞的文件系統。

如果可能的話,我只會在 - 將dd所有磁碟作為備份重新開始之後再進行試驗。

使用所有 4 個磁碟將允許您辨識哪些塊不同(因為那裡的校驗和不匹配),但它不會幫助您計算正確的狀態。您可以checkarray在強制重新組裝所有 4 個之後開始,然後在/sys/block/mdX/md/mismatch_cnt. 估計文件系統的“損壞程度”可能會或可能不會有趣。

重建陣列只能使用三個磁碟的資訊來重新計算奇偶校驗。由於彈出的磁碟具有相同的事件計數,因此使用任何一個彈出的磁碟都會導致重新計算相同(部分錯誤)的部分資訊。

引用自:https://serverfault.com/questions/1015208