Raid
驅動器出現故障,但 LSI MegaRAID 控制器未檢測到它
smartmontools 報告在 RAID1 配置中使用的驅動器上有越來越多的不可讀扇區。我認為 LSI MegaRAID 控制器還會檢查其磁碟驅動器的 SMART 狀態,因此應該將驅動器辨識為故障並將其標記為離線?
smartctl -d sat+megaraid,7 -a /dev/sda 的輸出:
... 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 69 ... Error 11 occurred at disk power-on lifetime: 9704 hours (404 days + 8 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 11 6f cd 04 0f Error: UNC at LBA = 0x0f04cd6f = 251972975 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 69 38 17 cd 04 40 00 2d+11:27:29.750 READ FPDMA QUEUED 61 10 30 98 12 55 40 00 2d+11:27:29.750 WRITE FPDMA QUEUED 61 01 28 57 86 da 40 00 2d+11:27:29.750 WRITE FPDMA QUEUED 60 09 20 f7 d1 04 40 00 2d+11:27:29.750 READ FPDMA QUEUED 60 80 18 00 d2 04 40 00 2d+11:27:29.750 READ FPDMA QUEUED ... SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 9700 - # 2 Short offline Completed without error 00% 9676 - # 3 Extended offline Completed: read failure 90% 9673 251972659
MegaCli -AdpAllInfo -aAll 的輸出:
Product Name : LSI MegaRAID SAS 9260-4i ... ================ Virtual Drives : 2 Degraded : 0 Offline : 0 Physical Devices : 5 Disks : 4 Critical Disks : 0 Failed Disks : 0
請告知 RAID 控制器行為是否正常或某處是否存在配置錯誤。控制器應該處於出廠狀態,我只將四個物理磁碟配置為兩個 RAID1 卷。
無論如何,壞盤都會被更換。
更新:我了解到實際上有一種方法可以了解此類錯誤(見下文),但是我認為此類資訊將顯示在更突出的狀態資訊中,而不是隱藏在日誌文件中。
似乎 RAID 控制器沒有標記該磁碟,因為它仍然可以從該錯誤情況中恢復。
要查看 RAID 控制器日誌,請執行以下命令:
/opt/MegaRAID/MegaCli/MegaCli -AdpEventLog -GetLatest 1000 -f events.log -aALL
events.log 文件包含如下條目,表明磁碟存在問題:
Code: 0x0000006e Class: 0 Locale: 0x02 Event Description: Corrected medium error during recovery on PD 07(e0xfc/s2) at f04cb53 Event Data: =========== Device ID: 7 Enclosure Index: 252 Slot Number: 2 LBA: 251972435 seqNum: 0x00004f65 Time: Wed Mar 6 05:36:48 2013 Code: 0x00000071 Class: 0 Locale: 0x02 Event Description: Unexpected sense: PD 07(e0xfc/s2) Path 4433221101000000, CDB: 28 00 0f 04 d1 f7 00 01 e0 00, Sense: 3/11/00 Event Data: =========== Device ID: 7 Enclosure Index: 252 Slot Number: 2 CDB Length: 10 CDB Data: 0028 0000 000f 0004 00d1 00f7 0000 0001 00e0 0000 0000 0000 0000 0000 0000 0000 Sense Length: 18 Sense Data: 00f0 0000 0003 000f 0004 00d2 0046 000a 0000 0000 0000 0000 0011 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 seqNum: 0x00004f64 Time: Wed Mar 6 05:36:43 2013