使用 mdadm 刪除 raid 後的 DegradedArray 事件
我來自 hetzner 的新伺服器(ubuntu22)有 2 個 ssd 和一個額外的大硬碟,僅用於備份。最初預裝了 3 個突襲,硬碟的最大部分無法訪問。我不知道為什麼,但一次突襲包括所有 3 個磁碟。我從硬碟中刪除了它,只創建了一個 100% /dev/sda1 並開始出現 mdadm 錯誤,例如:
A DegradedArray event had been detected on md device /dev/md/1. Faithfully yours, etc. P.S. The /proc/mdstat file currently contains the following: Personalities : [raid1] [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid10] md1 : active raid1 nvme0n1p2[0] nvme1n1p2[1] 1046528 blocks super 1.2 [3/2] [UU_] md0 : inactive nvme0n1p1[0](S) nvme1n1p1[1](S) 67041280 blocks super 1.2 md2 : active raid5 nvme0n1p3[0] nvme1n1p3[1] 930740224 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [UU_] bitmap: 4/4 pages [16KB], 65536KB chunk unused devices: <none>
我猜 md0 是用於救援文件系統的,但我不確定。我用 刪除了它
mdadm --remove /dev/md0
,但錯誤仍然存在。現在消息是:A DegradedArray event had been detected on md device /dev/md/1. Faithfully yours, etc. P.S. The /proc/mdstat file currently contains the following: Personalities : [raid1] [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid10] md1 : active raid1 nvme0n1p2[0] nvme1n1p2[1] 1046528 blocks super 1.2 [3/2] [UU_] md2 : active raid5 nvme0n1p3[0] nvme1n1p3[1] 930740224 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [UU_] bitmap: 4/4 pages [16KB], 65536KB chunk unused devices: <none>
更多輸出:
> lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS loop0 7:0 0 44.5M 1 loop /snap/certbot/2344 loop1 7:1 0 114M 1 loop /snap/core/13425 loop2 7:2 0 62M 1 loop /snap/core20/1611 loop3 7:3 0 63.2M 1 loop /snap/core20/1623 sda 8:0 0 5.5T 0 disk └─sda1 8:1 0 5.5T 0 part /home/backup nvme0n1 259:0 0 476.9G 0 disk ├─nvme0n1p1 259:2 0 32G 0 part ├─nvme0n1p2 259:3 0 1G 0 part │ └─md1 9:1 0 1022M 0 raid1 /boot └─nvme0n1p3 259:4 0 443.9G 0 part └─md2 9:2 0 887.6G 0 raid5 / nvme1n1 259:1 0 476.9G 0 disk ├─nvme1n1p1 259:5 0 32G 0 part ├─nvme1n1p2 259:6 0 1G 0 part │ └─md1 9:1 0 1022M 0 raid1 /boot └─nvme1n1p3 259:7 0 443.9G 0 part └─md2 9:2 0 887.6G 0 raid5 / > blkid /dev/nvme0n1p3: UUID="826df9bd-accd-335f-14a1-2069a029de70" UUID_SUB="96648f8a-eaeb-28fb-d481-4106d12b8637" LABEL="rescue:2" TYPE="linux_raid_member" PARTUUID="5b5edee1-03" /dev/nvme0n1p1: UUID="39a665ea-06f0-8360-a3d8-831610b52ca2" UUID_SUB="6bcef918-3006-6d2b-aeb8-0fa8973b86e1" LABEL="rescue:0" TYPE="linux_raid_member" PARTUUID="5b5edee1-01" /dev/nvme0n1p2: UUID="b21423c4-a32f-b69b-5e42-6a413783d500" UUID_SUB="c77a1e86-e842-2d92-e8af-10ae88dc4c15" LABEL="rescue:1" TYPE="linux_raid_member" PARTUUID="5b5edee1-02" /dev/md2: UUID="bd7e9969-8af6-49ae-b9a6-3ff7269bb962" BLOCK_SIZE="4096" TYPE="ext4" /dev/nvme1n1p2: UUID="b21423c4-a32f-b69b-5e42-6a413783d500" UUID_SUB="52caf216-b553-cbfc-e7f8-50986a235537" LABEL="rescue:1" TYPE="linux_raid_member" PARTUUID="a69e312f-02" /dev/nvme1n1p3: UUID="826df9bd-accd-335f-14a1-2069a029de70" UUID_SUB="72a04ab2-d87a-1c45-fbfb-556c3b93e758" LABEL="rescue:2" TYPE="linux_raid_member" PARTUUID="a69e312f-03" /dev/nvme1n1p1: UUID="39a665ea-06f0-8360-a3d8-831610b52ca2" UUID_SUB="628713c9-8f69-e186-bbb8-ad352005c449" LABEL="rescue:0" TYPE="linux_raid_member" PARTUUID="a69e312f-01" /dev/sda1: LABEL="datapartition" UUID="9b1b12b1-fcff-43b0-a2d2-d5e147f634c0" BLOCK_SIZE="4096" TYPE="ext4" PARTLABEL="primary" PARTUUID="a0de21cb-c74d-4aed-a6ed-2216c6a0ec5b" /dev/md1: UUID="2f598097-fad2-4ee5-8e6f-e86a293730bb" BLOCK_SIZE="4096" TYPE="ext3" /dev/loop1: TYPE="squashfs" /dev/loop2: TYPE="squashfs" /dev/loop0: TYPE="squashfs" /dev/loop3: TYPE="squashfs" > fdisk -l Disk /dev/nvme0n1: 476.94 GiB, 512110190592 bytes, 1000215216 sectors Disk model: SAMSUNG MZVLB512HBJQ-00000 Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: dos Disk identifier: 0x5b5edee1 Device Boot Start End Sectors Size Id Type /dev/nvme0n1p1 2048 67110911 67108864 32G fd Linux raid autodetect /dev/nvme0n1p2 67110912 69208063 2097152 1G fd Linux raid autodetect /dev/nvme0n1p3 69208064 1000213167 931005104 443.9G fd Linux raid autodetect Disk /dev/nvme1n1: 476.94 GiB, 512110190592 bytes, 1000215216 sectors Disk model: SAMSUNG MZVLB512HBJQ-00000 Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: dos Disk identifier: 0xa69e312f Device Boot Start End Sectors Size Id Type /dev/nvme1n1p1 2048 67110911 67108864 32G fd Linux raid autodetect /dev/nvme1n1p2 67110912 69208063 2097152 1G fd Linux raid autodetect /dev/nvme1n1p3 69208064 1000213167 931005104 443.9G fd Linux raid autodetect Disk /dev/sda: 5.46 TiB, 6001175126016 bytes, 11721045168 sectors Disk model: HGST HUS726060AL Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disklabel type: gpt Disk identifier: 297D9FC7-CD48-4610-802B-ED8D6DF3DC2A Device Start End Sectors Size Type /dev/sda1 2048 11721043967 11721041920 5.5T Linux filesystem Disk /dev/md2: 887.62 GiB, 953077989376 bytes, 1861480448 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 524288 bytes / 1048576 bytes Disk /dev/md1: 1022 MiB, 1071644672 bytes, 2093056 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes > cat /proc/mdstat HOMEHOST <system> # instruct the monitoring daemon where to send mail alerts MAILADDR root # definitions of existing MD arrays ARRAY /dev/md/0 metadata=1.2 UUID=39a665ea:06f08360:a3d88316:10b52ca2 name=rescue:0 ARRAY /dev/md/1 metadata=1.2 UUID=b21423c4:a32fb69b:5e426a41:3783d500 name=rescue:1 ARRAY /dev/md/2 metadata=1.2 UUID=826df9bd:accd335f:14a12069:a029de70 name=rescue:2 # This configuration was auto-generated on Wed, 07 Sep 2022 21:20:22 +0200 by mkconf root@mail:/home/logs# cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid10] md1 : active raid1 nvme0n1p2[0] nvme1n1p2[1] 1046528 blocks super 1.2 [3/2] [UU_] md2 : active raid5 nvme0n1p3[0] nvme1n1p3[1] 930740224 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [UU_] bitmap: 4/4 pages [16KB], 65536KB chunk unused devices: <none> > mdadm --detail /dev/md1 /dev/md1: Version : 1.2 Creation Time : Wed Sep 7 22:19:42 2022 Raid Level : raid1 Array Size : 1046528 (1022.00 MiB 1071.64 MB) Used Dev Size : 1046528 (1022.00 MiB 1071.64 MB) Raid Devices : 3 Total Devices : 2 Persistence : Superblock is persistent Update Time : Sun Sep 11 14:37:34 2022 State : clean, degraded Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Consistency Policy : resync Name : rescue:1 UUID : b21423c4:a32fb69b:5e426a41:3783d500 Events : 182 Number Major Minor RaidDevice State 0 259 3 0 active sync /dev/nvme0n1p2 1 259 6 1 active sync /dev/nvme1n1p2 - 0 0 2 removed > mdadm --detail /dev/md2 /dev/md2: Version : 1.2 Creation Time : Wed Sep 7 22:19:42 2022 Raid Level : raid5 Array Size : 930740224 (887.62 GiB 953.08 GB) Used Dev Size : 465370112 (443.81 GiB 476.54 GB) Raid Devices : 3 Total Devices : 2 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Mon Sep 12 00:31:56 2022 State : clean, degraded Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 512K Consistency Policy : bitmap Name : rescue:2 UUID : 826df9bd:accd335f:14a12069:a029de70 Events : 373931 Number Major Minor RaidDevice State 0 259 4 0 active sync /dev/nvme0n1p3 1 259 7 1 active sync /dev/nvme1n1p3 - 0 0 2 removed
我猜有些東西沒有完全刪除,我正在尋求幫助以了解發生了什麼。
md1
本質上,你
/dev/md1
的很好。它是 RAID1(鏡像)配置為具有三個數據副本,但您只有兩個副本。從定義上講,這是降級的,但數據受到保護,不會因任何單個設備而死亡。如果你不需要三份,你可以重新配置系統讓它知道兩份就足夠了:mdadm --grow /dev/md1 -n2
這將釋放狀態並
mdadm --detail /dev/md1
應顯示“最佳”。
md2
情況更糟。
/dev/md2
是三個設備 RAID5,缺少一個設備。它已降級,如果有更多設備當機,您的數據就會失去。或者你可以把它想像成 RAID0——本質上就是現在這樣。如果您沒有第三個設備並且不打算安裝它,您可以選擇 RAID1。但是,有一些警告。首先是可用空間將是現在的一半。如果使用了超過一半的文件系統,則該數據將不適合從這些設備建構的 RAID1。因此,在開始之前,請確保您的數據適合單個設備(這將是 RAID1 案例的可用空間)。
第二個是,不幸的是,恐怕沒有相當安全的方法可以即時將其轉換為 RAID1。雖然可以在 RAID 級別之間進行一些轉換,但我寧願不玩降級 RAID5 的遊戲。會有一些停機時間,您需要一些地方來臨時儲存數據。
因此,您的計劃是將內容複製到備用儲存的某個位置,完全刪除 md2 並從其以前的組件創建新的 RAID1,然後將內容複製回該陣列。這是Linux,除了完全保留所有權限和擴展屬性外,對複制方法沒有任何要求。由於這是根文件系統,因此您需要從一些可啟動的外部媒體啟動它,例如,ubuntu live 就可以(或 gparted)。
此外,當一切就緒時,您可能需要調整 initramfs 以考慮新陣列,同時仍從可移動媒體啟動(它是 Debian 所必需的,Ubuntu 是從該媒體下降的,所以我認為要求仍然適用)。您需要更新
/etc/mdadm/mdadm.conf
根文件系統中的文件,然後重新創建 initramfs 以使其包含更新的文件。