Linux

這是一個嚴重的 RAID 錯誤嗎?

  • April 14, 2021

如果我執行以下操作

/opt/MegaRAID/MegaCli/MegaCli -LDInfo -Lall -aAll -NoLog  > /tmp/tmp
/opt/MegaRAID/MegaCli/MegaCli -LDPDInfo     -aAll -NoLog >> /tmp/tmp

然後我看到這些錯誤

Media Error Count: 11
Other Error Count: 5

問題

他們是什麼意思?他們很挑剔嗎?

完整輸出:

Adapter 0 -- Virtual Drive Information:
Virtual Disk: 0 (target id: 0)
Name:Virtual Disk 0
RAID Level: Primary-5, Secondary-0, RAID Level Qualifier-3
Size:951296MB
State: Optimal
Stripe Size: 64kB
Number Of Drives:5
Span Depth:1
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Access Policy: Read/Write
Disk Cache Policy: Disk's Default


Adapter #0

Number of Virtual Disks: 1
Virtual Disk: 0 (target id: 0)
Name:Virtual Disk 0
RAID Level: Primary-5, Secondary-0, RAID Level Qualifier-3
Size:951296MB
State: Optimal
Stripe Size: 64kB
Number Of Drives:5
Span Depth:1
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Access Policy: Read/Write
Disk Cache Policy: Disk's Default
Number of Spans: 1
Span: 0 - Number of PDs: 5
PD: 0 Information
Enclosure Device ID: N/A
Slot Number: 0
Device Id: 0
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
Raw Size: 238418MB [0x1d1a94a2 Sectors]
Non Coerced Size: 237906MB [0x1d0a94a2 Sectors]
Coerced Size: 237824MB [0x1d080000 Sectors]
Firmware state: Online
SAS Address(0): 0x1221000000000000
Connected Port Number: 0 
Inquiry Data: ATA     WDC WD2500JS-75N2E04     WD-WCANK9523610

PD: 1 Information
Enclosure Device ID: N/A
Slot Number: 1
Device Id: 1
Sequence Number: 2
Media Error Count: 11
Other Error Count: 5
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
Raw Size: 238418MB [0x1d1a94a2 Sectors]
Non Coerced Size: 237906MB [0x1d0a94a2 Sectors]
Coerced Size: 237824MB [0x1d080000 Sectors]
Firmware state: Online
SAS Address(0): 0x1221000001000000
Connected Port Number: 1 
Inquiry Data: ATA     WDC WD2500JS-75N2E04     WD-WCANK9507278

PD: 2 Information
Enclosure Device ID: N/A
Slot Number: 2
Device Id: 2
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
Raw Size: 238418MB [0x1d1a94a2 Sectors]
Non Coerced Size: 237906MB [0x1d0a94a2 Sectors]
Coerced Size: 237824MB [0x1d080000 Sectors]
Firmware state: Online
SAS Address(0): 0x1221000002000000
Connected Port Number: 2 
Inquiry Data: ATA     WDC WD2500JS-75N2E04     WD-WCANK9504713

PD: 3 Information
Enclosure Device ID: N/A
Slot Number: 3
Device Id: 3
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
Raw Size: 238418MB [0x1d1a94a2 Sectors]
Non Coerced Size: 237906MB [0x1d0a94a2 Sectors]
Coerced Size: 237824MB [0x1d080000 Sectors]
Firmware state: Online
SAS Address(0): 0x1221000003000000
Connected Port Number: 3 
Inquiry Data: ATA     WDC WD2500JS-75N2E04     WD-WCANK9503028

PD: 4 Information
Enclosure Device ID: N/A
Slot Number: 4
Device Id: 4
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
Raw Size: 238418MB [0x1d1a94a2 Sectors]
Non Coerced Size: 237906MB [0x1d0a94a2 Sectors]
Coerced Size: 237824MB [0x1d080000 Sectors]
Firmware state: Online
SAS Address(0): 0x1221000004000000
Connected Port Number: 4 
Inquiry Data: ATA     WDC WD2500JS-75N2E04     WD-WCANK9503793

插槽 1 中的驅動器有問題。它是 RAID 5,因此您的數據受到保護,但您失去了冗餘(一個磁碟不可靠)。媒體錯誤意味著驅動器用完備用扇區以將壞扇區重新映射到(http://kb.lsi.com/KnowledgebaseArticle15809.aspx http://mycusthelp.info/LSI/_cs/AnswerDetail.aspx?inc=7468)。如果是我的數據,我會在備份時加倍謹慎,移除驅動器,用新驅動器替換它並同步陣列。一些供應商(例如 IBM)會接受基於故障預測指標的 RMA,而另一些則不會。如果您的供應商不接受帶有壞的、不可重新映射的扇區的磁碟有故障,則將其從陣列中取出並在測試系統中進行測試。它應該在合理的時間內失敗。

編輯:

只有插槽 ID 為 1 的磁碟的媒體事件才非零。在您提供的日誌中,每個條目都有插槽 ID。奇怪的是,儘管磁碟上有媒體錯誤,但 raid 報告其狀態為最佳狀態。不過,我不會相信磁碟。

由 n 個相同大小的磁碟組成的 RAID 5 為您提供了 (n-1) 個磁碟的容量,因為它儲存了一個磁碟的冗餘數據。因此,如果您有 6 個 250 GB 磁碟和 1T 可用空間,它們很可能被劃分為 5 個磁碟 RAID 5(為您提供 4x250 GB 可用空間)加上 1 個備用磁碟。

引用自:https://serverfault.com/questions/301520