如何恢復數據 - 從軟體 RAID1 - MBR 在兩個驅動器上都失去

May 31, 2018

我正在嘗試恢復 raid1 陣列，兩個磁碟都是 NVMe 快閃記憶體驅動器。
在漫長而糟糕的一天結束後，我做了非常愚蠢的事情——擦除了每個 NVMe 驅動器的前 512 個字節——目的是禁用引導載入程序。事實證明，我刪除了分區數據以及 RAID 資訊。我對這 512 個字節進行了備份——但你猜怎麼著——我把它們備份到了同一個磁碟上，所以它們現在無法訪問。
我用 dd 將磁碟複製到另一個磁碟，並開始嘗試恢復數據 - 做了 testdisk，它找到了所有分區：
Disk /dev/nvme0n1 - 512 GB / 476 GiB - CHS 488386 64 32
Current partition structure:
   Partition                  Start        End    Size in sectors

1 * Linux RAID               1   0  1 32737  63 32   67045376 [rescue:0]
2 P Linux RAID           32769   0  1 33280  63 32    1048576 [rescue:1]
3 P Linux RAID           33281   0  1 488257  63 32  931792896 [rescue:2]
我將此分區數據寫入兩個磁碟，重新啟動，但只有 /boot 分區 - 第一個 - 恢復。我試圖用 mdadm 組裝根分區（第三個），但它失敗了
[Sun May 27 11:30:40 2018] md: nvme0n1p3 does not have a valid v1.2 superblock, not importing!
[Sun May 27 11:30:45 2018] md: nvme0n1p3 does not have a valid v1.2 superblock, not importing!
[Sun May 27 13:45:32 2018] md: nvme1n1p1 does not have a valid v1.2 superblock, not importing!
[Sun May 27 13:45:32 2018] md: nvme0n1p1 does not have a valid v1.2 superblock, not importing!
[Sun May 27 13:45:32 2018] md: nvme1n1p3 does not have a valid v1.2 superblock, not importing!
[Sun May 27 13:45:32 2018] md: nvme0n1p3 does not have a valid v1.2 superblock, not importing!
我的計劃是以某種方式從其中一個磁碟掛載根分區，獲取扇區備份，然後恢復所有內容。
但我無法掛載/dev/nvme1n1p3，它失敗了
# mount /dev/nvme0n1p3  /mnt/arr2
mount: unknown filesystem type 'linux_raid_member'

# mount /dev/nvme0n1p3  /mnt/arr2 -t ext4
mount: /dev/nvme0n1p3 is already mounted or /mnt/arr2 busy
如何訪問 /dev/nvme0n1p3 中的文件？
**更新：**感謝 Peter Zhabin 的建議，我確實嘗試在其中一個驅動器 /dev/nvme1n1 上恢復文件系統，並在 testdisk 的幫助下恢復了分區：
我從另一台具有類似（但不准確）磁碟和分區的伺服器上獲取了偏移量：
losetup --find --show --read-only --offset $((262144*512)) /dev/nvme1n1p3 
Fsck 抱怨分區錯誤（或超級塊），並給出了看起來非常接近驅動器上的 FS 統計資訊：
fsck.ext3 -n -v /dev/loop1

   e2fsck 1.43.3 (04-Sep-2016)
   Warning: skipping journal recovery because doing a read-only filesystem check.
   The filesystem size (according to the superblock) is 116473936 blocks
   The physical size of the device is 116441344 blocks
   Either the superblock or the partition table is likely to be corrupt!
   Abort? no

   /dev/loop1 contains a file system with errors, check forced.
   Pass 1: Checking inodes, blocks, and sizes
   Inode 26881053 extent tree (at level 2) could be narrower.  Fix? no

   Pass 2: Checking directory structure
   Pass 3: Checking directory connectivity
   Pass 4: Checking reference counts
   Pass 5: Checking group summary information
   Free blocks count wrong (20689291, counted=20689278).
   Fix? no

   Free inodes count wrong (25426857, counted=25426852).
   Fix? no


        3695703 inodes used (12.69%, out of 29122560)
          30256 non-contiguous files (0.8%)
            442 non-contiguous directories (0.0%)
                # of inodes with ind/dind/tind blocks: 0/0/0
                Extent depth histogram: 3616322/1294/3
       95784645 blocks used (82.24%, out of 116473936)
              0 bad blocks
             29 large files

        3510238 regular files
         107220 directories
              2 character device files
              0 block device files
             53 fifos
           1248 links
          78147 symbolic links (77987 fast symbolic links)
             39 sockets
   ------------
        3696947 files
但是，我無法掛載文件系統：
root@rescue /mnt/backups # mount -o ro /dev/loop1 /mnt/reco/
mount: wrong fs type, bad option, bad superblock on /dev/loop1,
  missing codepage or helper program, or other error
接下來可以做什麼？感覺數據很接近了……

好的，最後我設法恢復了 MBR。正如我上面提到的，我已經將兩個 RAID 驅動器的 MBR 備份到驅動器本身。它是在 dd 命令的幫助下完成的：
dd if=/dev/nvme0n1 bs=512 count=1 of=nvme0n1.bootsector.backup
dd if=/dev/nvme1n1 bs=512 count=1 of=nvme1n1.bootsector.backup 
我認為可以在驅動器映像中查找 MBR 備份文件。我已經將類似伺服器上的 MBR 扇區保存到文件 mbrb.backup 中，它具有以下字元串：
"GRUB\20\0Geom\0Hard\20Disk\0Read\0\20Error"
由於我沒有管理如何在 512Gb 圖像中查找具有空字節的字元串，因此我進行了 grep 搜尋以查找單個字元串，就像在工作的 MBR 上這樣：
#dd if=/dev/sdb of=mbrb.backup bs=512 count=1
#strings -t d mbr.backup | grep -4 -iE 'GRUB' | grep -4 'Geom' | grep -4 'Hard Disk' | grep -4 'Read' | grep -4 'Error'
392 GRUB
398 Geom
403 Hard Disk
413 Read
418  Error
我開始在原始驅動器中尋找這個字元串：
#strings -t d /dev/nvme1n1 | grep -4 -iE 'GRUB' | grep -4 'Geom' | grep -4 'Hard Disk' | grep -4 'Read' | grep -4 'Error'
它發現這個字元串有 20 多個偏移量。偏移量如下所示：
34368320904 GRUB
34368320910 Geom
34368320915 Hard Disk
34368320925 Read
34368320930  Error

34702932360 GRUB
34702932366 Geom
34702932371 Hard Disk
34702932381 Read
34702932386  Error

and some more results....
比我用 dd 保存所有這些，用 bc 計算塊數：
bc 1.06.95
Copyright 1991-1994, 1997, 1998, 2000, 2004, 2006 Free Software Foundation, Inc.
This is free software with ABSOLUTELY NO WARRANTY.
For details type `warranty'.
34368320904/512
67125626

dd if=/dev/nvme1n1 of=recovery_file.34368320904 bs=512 skip=67125626 count=2
得到了大約 20 多個文件，其中大多數是完全相似的，也許是一些 GRUB 文件。然後我開始將它們與我從工作伺服器保存的 MBR 進行比較。最後一個看起來非常相似。我已將其保存到損壞磁碟的 MBR 中：
dd if=recovery_file.475173835144 of=/dev/nvme1n1 bs=521 count=1
用testdisk檢查了一下，有趣的是，它抱怨分區錯誤，但其他一切看起來都很有希望：
Disk /dev/nvme1n1 - 512 GB / 476 GiB - CHS 488386 64 32
Current partition structure:
Partition                  Start        End    Size in sectors

1 P Linux RAID               1   0  1 32768  63 32   67108864 [rescue:0]

Warning: Bad starting sector (CHS and LBA don't match)
2 P Linux RAID           32769   0  1 33280  63 32    1048576 [rescue:1]

Warning: Bad starting sector (CHS and LBA don't match)
3 P Linux RAID           33281   0  1 488385  21 16  932053680 [rescue:2]

Warning: Bad starting sector (CHS and LBA don't match)
No partition is bootable
所以我冒險將相同的 MBR 放入 /dev/nvme0n1 RAID。重新啟動後，md 設備啟動，我的數據又回來了。看起來像一個奇蹟。

引用自：https://serverfault.com/questions/914000

如何恢復數據 - 從軟體 RAID1 - MBR 在兩個驅動器上都失去

相關問答

在同一組驅動器上混合使用 RAID0 和 RAID1

mdadm 3 路 RAID 1 - 保證 2 驅動器容錯的良好解決方案？

我應該如何增加被襲擊的伺服器的大小？

數據清理和修復 RAID1 鏡像？

處於降級狀態時無法刪除軟體 raid-1 陣列

軟體 RAID 1 未在兩個新的附加驅動器上擴展