Linux

軟體 RAID1 準備失敗 - 如何手動降級?

  • September 22, 2011

我有一個 Centos 5.4 伺服器正在生產中,軟體 RAID1 中有 2 個驅動器。

最近幾天**/var/log/messages**有很多消息,表明其中一個驅動器已準備好發生故障:

Sep 23 00:48:38 milkyway kernel: SCSI device sda: 1465149168 512-byte hdwr sectors (750156 MB)
Sep 23 00:48:39 milkyway kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Sep 23 00:48:39 milkyway kernel: ata1.00: irq_stat 0x40000001
Sep 23 00:48:39 milkyway kernel: ata1.00: cmd 25/00:10:31:21:8c/00:00:28:00:00/e0 tag 0 dma 8192 in
Sep 23 00:48:40 milkyway kernel:          res 51/40:00:35:21:8c/00:00:28:00:00/e0 Emask 0x9 (media error)
Sep 23 00:48:40 milkyway kernel: ata1.00: status: { DRDY ERR }
Sep 23 00:48:40 milkyway kernel: ata1.00: error: { UNC }
Sep 23 00:48:40 milkyway kernel: ata1.00: configured for UDMA/133
Sep 23 00:48:40 milkyway kernel: ata1: EH complete
Sep 23 00:48:41 milkyway kernel: sda: Write Protect is off
Sep 23 00:48:41 milkyway kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Sep 23 00:48:58 milkyway kernel: ata1.00: irq_stat 0x40000001
Sep 23 00:49:00 milkyway kernel: ata1.00: cmd 25/00:10:31:21:8c/00:00:28:00:00/e0 tag 0 dma 8192 in
Sep 23 00:49:03 milkyway kernel:          res 51/40:00:35:21:8c/00:00:28:00:00/e0 Emask 0x9 (media error)
Sep 23 00:49:03 milkyway kernel: ata1.00: status: { DRDY ERR }
Sep 23 00:49:04 milkyway kernel: ata1.00: error: { UNC }
Sep 23 00:49:04 milkyway kernel: ata1.00: configured for UDMA/133
Sep 23 00:49:04 milkyway kernel: ata1: EH complete
Sep 23 00:49:04 milkyway kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Sep 23 00:49:04 milkyway kernel: ata1.00: irq_stat 0x40000001
Sep 23 00:49:04 milkyway kernel: ata1.00: cmd 25/00:10:31:21:8c/00:00:28:00:00/e0 tag 0 dma 8192 in
Sep 23 00:49:04 milkyway kernel:          res 51/40:00:35:21:8c/00:00:28:00:00/e0 Emask 0x9 (media error)
Sep 23 00:49:04 milkyway kernel: ata1.00: status: { DRDY ERR }
Sep 23 00:49:04 milkyway kernel: ata1.00: error: { UNC }
Sep 23 00:49:04 milkyway kernel: ata1.00: configured for UDMA/133
Sep 23 00:49:05 milkyway kernel: ata1: EH complete
Sep 23 00:49:05 milkyway kernel: SCSI device sda: drive cache: write back
Sep 23 00:49:06 milkyway kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Sep 23 00:49:06 milkyway kernel: ata1.00: irq_stat 0x40000001
Sep 23 00:49:06 milkyway kernel: ata1.00: cmd 25/00:10:31:21:8c/00:00:28:00:00/e0 tag 0 dma 8192 in
Sep 23 00:49:06 milkyway kernel:          res 51/40:00:35:21:8c/00:00:28:00:00/e0 Emask 0x9 (media error)
Sep 23 00:49:06 milkyway kernel: ata1.00: status: { DRDY ERR }
Sep 23 00:49:06 milkyway kernel: ata1.00: error: { UNC }
Sep 23 00:49:06 milkyway kernel: ata1.00: configured for UDMA/133
Sep 23 00:49:08 milkyway kernel: sd 0:0:0:0: SCSI error: return code = 0x08000002

但是在**/proc/mdstat**中沒有一個硬碟驅動器顯示為降級:

Personalities : [raid1] [raid10] [raid0] [raid6] [raid5] [raid4] 
md0 : active raid1 sdb1[1] sda1[0]
     4200896 blocks [2/2] [UU]

md1 : active raid1 sdb2[1] sda2[0]
     2104448 blocks [2/2] [UU]

md2 : active raid1 sdb3[1] sda3[0]
     726266432 blocks [2/2] [UU]

unused devices: <none>

我已經開始將所有數據遷移到新伺服器。但結果是現在這很慢,由於硬碟故障,幾乎不可能將它們全部傳輸。此外,由於硬碟瓶頸,負載飆升,導致伺服器無法使用。

是否可以在不失去數據且不停機的情況下移除故障驅動器?即使 RAID1 暫時保留 1 個驅動器,我也不介意,以便盡快完成傳輸而不會延遲。

您可以通過 mdadm 手動將驅動器標記為故障:

mdadm --manage /dev/md0 --fail /dev/sda1

這將允許您從陣列中刪除驅動器:

mdadm --manage /dev/md0 --remove /dev/sda1

對所有數組重複。

這將使陣列只有一個驅動器,這有望讓您備份另一個驅動器上的數據


或者

用備用驅動器替換故障/故障驅動器,並通過從良好驅動器鏡像磁碟分區來重建陣列,然後將這些分區添加到 md 設備以進行陣列重建。

但是,“RAID 不是備份”的正常口頭禪適用,即在即將發生磁碟故障之前很久就對陣列的內容進行備份是有先見之明的,儘管這對您現在並不是特別有幫助。

引用自:https://serverfault.com/questions/314615