Ubuntu

使用 mdadm 刪除 raid 後的 DegradedArray 事件

  • September 12, 2022

我來自 hetzner 的新伺服器(ubuntu22)有 2 個 ssd 和一個額外的大硬碟,僅用於備份。最初預裝了 3 個突襲,硬碟的最大部分無法訪問。我不知道為什麼,但一次突襲包括所有 3 個磁碟。我從硬碟中刪除了它,只創建了一個 100% /dev/sda1 並開始出現 mdadm 錯誤,例如:

A DegradedArray event had been detected on md device /dev/md/1.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [raid1] [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid10]
md1 : active raid1 nvme0n1p2[0] nvme1n1p2[1]
     1046528 blocks super 1.2 [3/2] [UU_]

md0 : inactive nvme0n1p1[0](S) nvme1n1p1[1](S)
     67041280 blocks super 1.2

md2 : active raid5 nvme0n1p3[0] nvme1n1p3[1]
     930740224 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [UU_]
     bitmap: 4/4 pages [16KB], 65536KB chunk

unused devices: <none>

我猜 md0 是用於救援文件系統的,但我不確定。我用 刪除了它mdadm --remove /dev/md0,但錯誤仍然存在。現在消息是:

A DegradedArray event had been detected on md device /dev/md/1.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [raid1] [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid10]
md1 : active raid1 nvme0n1p2[0] nvme1n1p2[1]
     1046528 blocks super 1.2 [3/2] [UU_]

md2 : active raid5 nvme0n1p3[0] nvme1n1p3[1]
     930740224 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [UU_]
     bitmap: 4/4 pages [16KB], 65536KB chunk

unused devices: <none>

更多輸出:

> lsblk
NAME        MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINTS
loop0         7:0    0  44.5M  1 loop  /snap/certbot/2344
loop1         7:1    0   114M  1 loop  /snap/core/13425
loop2         7:2    0    62M  1 loop  /snap/core20/1611
loop3         7:3    0  63.2M  1 loop  /snap/core20/1623
sda           8:0    0   5.5T  0 disk  
└─sda1        8:1    0   5.5T  0 part  /home/backup
nvme0n1     259:0    0 476.9G  0 disk  
├─nvme0n1p1 259:2    0    32G  0 part  
├─nvme0n1p2 259:3    0     1G  0 part  
│ └─md1       9:1    0  1022M  0 raid1 /boot
└─nvme0n1p3 259:4    0 443.9G  0 part  
 └─md2       9:2    0 887.6G  0 raid5 /
nvme1n1     259:1    0 476.9G  0 disk  
├─nvme1n1p1 259:5    0    32G  0 part  
├─nvme1n1p2 259:6    0     1G  0 part  
│ └─md1       9:1    0  1022M  0 raid1 /boot
└─nvme1n1p3 259:7    0 443.9G  0 part  
 └─md2       9:2    0 887.6G  0 raid5 /




> blkid
/dev/nvme0n1p3: UUID="826df9bd-accd-335f-14a1-2069a029de70" UUID_SUB="96648f8a-eaeb-28fb-d481-4106d12b8637" LABEL="rescue:2" TYPE="linux_raid_member" PARTUUID="5b5edee1-03"
/dev/nvme0n1p1: UUID="39a665ea-06f0-8360-a3d8-831610b52ca2" UUID_SUB="6bcef918-3006-6d2b-aeb8-0fa8973b86e1" LABEL="rescue:0" TYPE="linux_raid_member" PARTUUID="5b5edee1-01"
/dev/nvme0n1p2: UUID="b21423c4-a32f-b69b-5e42-6a413783d500" UUID_SUB="c77a1e86-e842-2d92-e8af-10ae88dc4c15" LABEL="rescue:1" TYPE="linux_raid_member" PARTUUID="5b5edee1-02"
/dev/md2: UUID="bd7e9969-8af6-49ae-b9a6-3ff7269bb962" BLOCK_SIZE="4096" TYPE="ext4"
/dev/nvme1n1p2: UUID="b21423c4-a32f-b69b-5e42-6a413783d500" UUID_SUB="52caf216-b553-cbfc-e7f8-50986a235537" LABEL="rescue:1" TYPE="linux_raid_member" PARTUUID="a69e312f-02"
/dev/nvme1n1p3: UUID="826df9bd-accd-335f-14a1-2069a029de70" UUID_SUB="72a04ab2-d87a-1c45-fbfb-556c3b93e758" LABEL="rescue:2" TYPE="linux_raid_member" PARTUUID="a69e312f-03"
/dev/nvme1n1p1: UUID="39a665ea-06f0-8360-a3d8-831610b52ca2" UUID_SUB="628713c9-8f69-e186-bbb8-ad352005c449" LABEL="rescue:0" TYPE="linux_raid_member" PARTUUID="a69e312f-01"
/dev/sda1: LABEL="datapartition" UUID="9b1b12b1-fcff-43b0-a2d2-d5e147f634c0" BLOCK_SIZE="4096" TYPE="ext4" PARTLABEL="primary" PARTUUID="a0de21cb-c74d-4aed-a6ed-2216c6a0ec5b"
/dev/md1: UUID="2f598097-fad2-4ee5-8e6f-e86a293730bb" BLOCK_SIZE="4096" TYPE="ext3"
/dev/loop1: TYPE="squashfs"
/dev/loop2: TYPE="squashfs"
/dev/loop0: TYPE="squashfs"
/dev/loop3: TYPE="squashfs"




> fdisk -l
Disk /dev/nvme0n1: 476.94 GiB, 512110190592 bytes, 1000215216 sectors
Disk model: SAMSUNG MZVLB512HBJQ-00000              
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x5b5edee1

Device         Boot    Start        End   Sectors   Size Id Type
/dev/nvme0n1p1          2048   67110911  67108864    32G fd Linux raid autodetect
/dev/nvme0n1p2      67110912   69208063   2097152     1G fd Linux raid autodetect
/dev/nvme0n1p3      69208064 1000213167 931005104 443.9G fd Linux raid autodetect


Disk /dev/nvme1n1: 476.94 GiB, 512110190592 bytes, 1000215216 sectors
Disk model: SAMSUNG MZVLB512HBJQ-00000              
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xa69e312f

Device         Boot    Start        End   Sectors   Size Id Type
/dev/nvme1n1p1          2048   67110911  67108864    32G fd Linux raid autodetect
/dev/nvme1n1p2      67110912   69208063   2097152     1G fd Linux raid autodetect
/dev/nvme1n1p3      69208064 1000213167 931005104 443.9G fd Linux raid autodetect


Disk /dev/sda: 5.46 TiB, 6001175126016 bytes, 11721045168 sectors
Disk model: HGST HUS726060AL
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 297D9FC7-CD48-4610-802B-ED8D6DF3DC2A

Device     Start         End     Sectors  Size Type
/dev/sda1   2048 11721043967 11721041920  5.5T Linux filesystem


Disk /dev/md2: 887.62 GiB, 953077989376 bytes, 1861480448 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 524288 bytes / 1048576 bytes


Disk /dev/md1: 1022 MiB, 1071644672 bytes, 2093056 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes




> cat /proc/mdstat
HOMEHOST <system>

# instruct the monitoring daemon where to send mail alerts
MAILADDR root

# definitions of existing MD arrays
ARRAY /dev/md/0  metadata=1.2 UUID=39a665ea:06f08360:a3d88316:10b52ca2 name=rescue:0
ARRAY /dev/md/1  metadata=1.2 UUID=b21423c4:a32fb69b:5e426a41:3783d500 name=rescue:1
ARRAY /dev/md/2  metadata=1.2 UUID=826df9bd:accd335f:14a12069:a029de70 name=rescue:2

# This configuration was auto-generated on Wed, 07 Sep 2022 21:20:22 +0200 by mkconf
root@mail:/home/logs# cat /proc/mdstat 
Personalities : [raid1] [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid10] 
md1 : active raid1 nvme0n1p2[0] nvme1n1p2[1]
     1046528 blocks super 1.2 [3/2] [UU_]
     
md2 : active raid5 nvme0n1p3[0] nvme1n1p3[1]
     930740224 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [UU_]
     bitmap: 4/4 pages [16KB], 65536KB chunk

unused devices: <none>




> mdadm --detail /dev/md1
/dev/md1:
          Version : 1.2
    Creation Time : Wed Sep  7 22:19:42 2022
       Raid Level : raid1
       Array Size : 1046528 (1022.00 MiB 1071.64 MB)
    Used Dev Size : 1046528 (1022.00 MiB 1071.64 MB)
     Raid Devices : 3
    Total Devices : 2
      Persistence : Superblock is persistent

      Update Time : Sun Sep 11 14:37:34 2022
            State : clean, degraded 
   Active Devices : 2
  Working Devices : 2
   Failed Devices : 0
    Spare Devices : 0

Consistency Policy : resync

             Name : rescue:1
             UUID : b21423c4:a32fb69b:5e426a41:3783d500
           Events : 182

   Number   Major   Minor   RaidDevice State
      0     259        3        0      active sync   /dev/nvme0n1p2
      1     259        6        1      active sync   /dev/nvme1n1p2
      -       0        0        2      removed





> mdadm --detail /dev/md2
/dev/md2:
          Version : 1.2
    Creation Time : Wed Sep  7 22:19:42 2022
       Raid Level : raid5
       Array Size : 930740224 (887.62 GiB 953.08 GB)
    Used Dev Size : 465370112 (443.81 GiB 476.54 GB)
     Raid Devices : 3
    Total Devices : 2
      Persistence : Superblock is persistent

    Intent Bitmap : Internal

      Update Time : Mon Sep 12 00:31:56 2022
            State : clean, degraded 
   Active Devices : 2
  Working Devices : 2
   Failed Devices : 0
    Spare Devices : 0

           Layout : left-symmetric
       Chunk Size : 512K

Consistency Policy : bitmap

             Name : rescue:2
             UUID : 826df9bd:accd335f:14a12069:a029de70
           Events : 373931

   Number   Major   Minor   RaidDevice State
      0     259        4        0      active sync   /dev/nvme0n1p3
      1     259        7        1      active sync   /dev/nvme1n1p3
      -       0        0        2      removed

我猜有些東西沒有完全刪除,我正在尋求幫助以了解發生了什麼。

md1

本質上,你/dev/md1的很好。它是 RAID1(鏡像)配置為具有三個數據副本,但您只有兩個副本。從定義上講,這是降級的,但數據受到保護,不會因任何單個設備而死亡。如果你不需要三份,你可以重新配置系統讓它知道兩份就足夠了:

mdadm --grow /dev/md1 -n2

這將釋放狀態並mdadm --detail /dev/md1應顯示“最佳”。

md2

情況更糟。/dev/md2是三個設備 RAID5,缺少一個設備。它已降級,如果有更多設備當機,您的數據就會失去。或者你可以把它想像成 RAID0——本質上就是現在這樣。如果您沒有第三個設備並且不打算安裝它,您可以選擇 RAID1。但是,有一些警告。

首先是可用空間將是現在的一半。如果使用了超過一半的文件系統,則該數據將不適合從這些設備建構的 RAID1。因此,在開始之前,請確保您的數據適合單個設備(這將是 RAID1 案例的可用空間)。

第二個是,不幸的是,恐怕沒有相當安全的方法可以即時將其轉換為 RAID1。雖然可以在 RAID 級別之間進行一些轉換,但我寧願不玩降級 RAID5 的遊戲。會有一些停機時間,您需要一些地方來臨時儲存數據。

因此,您的計劃是將內容複製到備用儲存的某個位置,完全刪除 md2 並從其以前的組件創建新的 RAID1,然後將內容複製回該陣列。這是Linux,除了完全保留所有權限和擴展屬性外,對複制方法沒有任何要求。由於這是根文件系統,因此您需要從一些可啟動的外部媒體啟動它,例如,ubuntu live 就可以(或 gparted)。

此外,當一切就緒時,您可能需要調整 initramfs 以考慮新陣列,同時仍從可移動媒體啟動(它是 Debian 所必需的,Ubuntu 是從該媒體下降的,所以我認為要求仍然適用)。您需要更新/etc/mdadm/mdadm.conf根文件系統中的文件,然後重新創建 initramfs 以使其包含更新的文件。

引用自:https://serverfault.com/questions/1110439