HP P840 HDD RAID 5 許多奇怪的驅動器故障
我已經在我的 HP P840 上使用 RAID5 硬碟儲存 (8x6TB) 大約 2 年了,而且它總是出現異常多的驅動器故障。半年來一切都很好,但現在驅動器以一種奇怪的方式出現故障。例如,2 個新驅動器在添加到 RAID 幾天后出現故障。我也已經更換了 RAID 控制器,並且正在主機板和 RAID 控制器上使用最新的韌體。
我也嘗試過使用不同的驅動器。最初在該 RAID 中使用了 HGST DeskStar 6TB 驅動器,現在我在更換故障驅動器時已將它們替換為 HGST UltraStar 6TB。但是行為是一樣的。
此外,似乎(大多數)驅動器並沒有真正發生故障,因為一旦我更換了 RAID 控制器,一個故障驅動器就會再次被辨識為正常並開始重建。
我的主機支持人員說問題是我實際上使用的是 RAID5,我應該改用 RAID10。我很難相信,因為我一直在使用 RAID5 在其他系統上沒有問題(多年來沒有驅動器故障)。
誰能給我一個提示,罪魁禍首可能是什麼?RAID 控制器的配置方式有問題嗎?
謝謝!
*編輯:
伺服器是 HP DL180 G9
驅動器故障原因始終是“寫入重試失敗”*
更新:我們的主機讓我們完全更換硬體並切換到 RAID6。我們做到了,現在它已經順利執行了一段時間。雖然這並沒有真正調查,但我相信 shodanshok 對穿孔陣列的解釋似乎是合理的。因此我會接受這個答案。謝謝大家!
Smart Array P840 in Slot 1 (sn: PDNNF0ARH321GD) Port Name: 1I Port Name: 2I Internal Drive Cage at Port 1I, Box 2, OK Internal Drive Cage at Port 1I, Box 2, OK Internal Drive Cage at Port 2I, Box 1, OK array A (Solid State SATA, Unused Space: 0 MB) logicaldrive 1 (447.1 GB, RAID 1+0, OK) physicaldrive 2I:1:1 (port 2I:box 1:bay 1, Solid State SATA, 240.0 GB, OK) physicaldrive 2I:1:2 (port 2I:box 1:bay 2, Solid State SATA, 240.0 GB, OK) physicaldrive 2I:1:3 (port 2I:box 1:bay 3, Solid State SATA, 240.0 GB, OK) physicaldrive 2I:1:4 (port 2I:box 1:bay 4, Solid State SATA, 240.0 GB, OK) array B (SATA, Unused Space: 0 MB) logicaldrive 2 (38.2 TB, RAID 5, Interim Recovery Mode) physicaldrive 1I:2:1 (port 1I:box 2:bay 1, SATA, 6001.1 GB, OK) physicaldrive 1I:2:2 (port 1I:box 2:bay 2, SATA, 6001.1 GB, OK) physicaldrive 1I:2:3 (port 1I:box 2:bay 3, SATA, 6001.1 GB, OK) physicaldrive 1I:2:4 (port 1I:box 2:bay 4, SATA, 6001.1 GB, OK) physicaldrive 1I:2:5 (port 1I:box 2:bay 5, SATA, 6001.1 GB, Failed) physicaldrive 1I:2:6 (port 1I:box 2:bay 6, SATA, 6001.1 GB, OK) physicaldrive 1I:2:7 (port 1I:box 2:bay 7, SATA, 6001.1 GB, OK) physicaldrive 1I:2:8 (port 1I:box 2:bay 8, SATA, 6001.1 GB, OK)
詳細資訊:
Smart Array P840 in Slot 1 Bus Interface: PCI Slot: 1 Serial Number: PDNNF0ARH321GD Cache Serial Number: PEYFP0BRH323YZ RAID 6 (ADG) Status: Enabled Controller Status: OK Hardware Revision: B Firmware Version: 6.60 Rebuild Priority: High Expand Priority: Medium Surface Scan Delay: 3 secs Surface Scan Mode: Idle Parallel Surface Scan Supported: Yes Current Parallel Surface Scan Count: 1 Max Parallel Surface Scan Count: 16 Queue Depth: Automatic Monitor and Performance Delay: 60 min Elevator Sort: Enabled Degraded Performance Optimization: Disabled Inconsistency Repair Policy: Disabled Wait for Cache Room: Disabled Surface Analysis Inconsistency Notification: Disabled Post Prompt Timeout: 15 secs Cache Board Present: True Cache Status: OK Cache Ratio: 10% Read / 90% Write Drive Write Cache: Enabled Total Cache Size: 4.0 GB Total Cache Memory Available: 3.2 GB No-Battery Write Cache: Enabled SSD Caching RAID5 WriteBack Enabled: True SSD Caching Version: 2 Cache Backup Power Source: Batteries Battery/Capacitor Count: 1 Battery/Capacitor Status: OK SATA NCQ Supported: True Spare Activation Mode: Activate on physical drive failure (default) Controller Temperature (C): 51 Cache Module Temperature (C): 38 Number of Ports: 2 Internal only Encryption: Disabled Express Local Encryption: False Driver Name: hpsa Driver Version: 3.4.16 Driver Supports HP SSD Smart Path: True PCI Address (Domain:Bus:Device.Function): 0000:06:00.0 Negotiated PCIe Data Rate: PCIe 3.0 x8 (7880 MB/s) Controller Mode: RAID Controller Mode Reboot: Not Required Latency Scheduler Setting: Disabled Current Power Mode: MaxPerformance Host Serial Number: CZ270500GM Sanitize Erase Supported: False Primary Boot Volume: logicaldrive 1 (600508B1001CE0F9FACF3A1358647115) Secondary Boot Volume: logicaldrive 1 (600508B1001CE0F9FACF3A1358647115) Port Name: 1I Port ID: 0 Port Connection Number: 0 SAS Address: 5001438038AD05A0 Port Location: Internal Managed Cable Connected: False Port Name: 2I Port ID: 1 Port Connection Number: 1 SAS Address: 5001438038AD05A8 Port Location: Internal Managed Cable Connected: False Internal Drive Cage at Port 1I, Box 2, OK Power Supply Status: Not Redundant Drive Bays: 4 Port: 1I Box: 2 Location: Internal Physical Drives physicaldrive 1I:2:1 (port 1I:box 2:bay 1, SATA, 6001.1 GB, OK) physicaldrive 1I:2:2 (port 1I:box 2:bay 2, SATA, 6001.1 GB, OK) physicaldrive 1I:2:3 (port 1I:box 2:bay 3, SATA, 6001.1 GB, OK) physicaldrive 1I:2:4 (port 1I:box 2:bay 4, SATA, 6001.1 GB, OK) None attached Internal Drive Cage at Port 1I, Box 2, OK Power Supply Status: Not Redundant Drive Bays: 4 Port: 1I Box: 2 Location: Internal Physical Drives physicaldrive 1I:2:1 (port 1I:box 2:bay 1, SATA, 6001.1 GB, OK) physicaldrive 1I:2:2 (port 1I:box 2:bay 2, SATA, 6001.1 GB, OK) physicaldrive 1I:2:3 (port 1I:box 2:bay 3, SATA, 6001.1 GB, OK) physicaldrive 1I:2:4 (port 1I:box 2:bay 4, SATA, 6001.1 GB, OK) None attached Internal Drive Cage at Port 2I, Box 1, OK Power Supply Status: Not Redundant Drive Bays: 4 Port: 2I Box: 1 Location: Internal Physical Drives physicaldrive 2I:1:1 (port 2I:box 1:bay 1, Solid State SATA, 240.0 GB, OK) physicaldrive 2I:1:2 (port 2I:box 1:bay 2, Solid State SATA, 240.0 GB, OK) physicaldrive 2I:1:3 (port 2I:box 1:bay 3, Solid State SATA, 240.0 GB, OK) physicaldrive 2I:1:4 (port 2I:box 1:bay 4, Solid State SATA, 240.0 GB, OK) None attached Array: A Interface Type: Solid State SATA Unused Space: 0 MB (0.0%) Used Space: 894.2 GB (100.0%) Status: OK MultiDomain Status: OK Array Type: Data HP SSD Smart Path: disable Logical Drive: 1 Size: 447.1 GB Fault Tolerance: 1+0 Heads: 255 Sectors Per Track: 32 Cylinders: 65535 Strip Size: 256 KB Full Stripe Size: 512 KB Status: OK MultiDomain Status: OK Caching: Enabled Unique Identifier: 600508B1001CE0F9FACF3A1358647115 Disk Name: /dev/sda Mount Points: / 18.6 GB Partition Number 2 OS Status: LOCKED Logical Drive Label: 0216D6F9PDNNF0ARH502MC7DFA Mirror Group 1: physicaldrive 2I:1:1 (port 2I:box 1:bay 1, Solid State SATA, 240.0 GB, OK) physicaldrive 2I:1:2 (port 2I:box 1:bay 2, Solid State SATA, 240.0 GB, OK) Mirror Group 2: physicaldrive 2I:1:3 (port 2I:box 1:bay 3, Solid State SATA, 240.0 GB, OK) physicaldrive 2I:1:4 (port 2I:box 1:bay 4, Solid State SATA, 240.0 GB, OK) Drive Type: Data LD Acceleration Method: Controller Cache physicaldrive 2I:1:1 Port: 2I Box: 1 Bay: 1 Status: OK Drive Type: Data Drive Interface Type: Solid State SATA Size: 240.0 GB Drive exposed to OS: False Native Block Size: 4096 Firmware Revision: N2010101 Serial Number: PHDV712004AG240AGN Model: ATA INTEL SSDSC2BB24 SATA NCQ Capable: True SATA NCQ Enabled: True Current Temperature (C): 31 Maximum Temperature (C): 39 SSD Smart Trip Wearout: Not Supported PHY Count: 1 PHY Transfer Rate: 6.0Gbps Drive Authentication Status: Not Authenticated. Smart Array will not control drive LEDs. Sanitize Erase Supported: False physicaldrive 2I:1:2 Port: 2I Box: 1 Bay: 2 Status: OK Drive Type: Data Drive Interface Type: Solid State SATA Size: 240.0 GB Drive exposed to OS: False Native Block Size: 4096 Firmware Revision: N2010101 Serial Number: PHDV706303CH240AGN Model: ATA INTEL SSDSC2BB24 SATA NCQ Capable: True SATA NCQ Enabled: True Current Temperature (C): 29 Maximum Temperature (C): 36 SSD Smart Trip Wearout: Not Supported PHY Count: 1 PHY Transfer Rate: 6.0Gbps Drive Authentication Status: Not Authenticated. Smart Array will not control drive LEDs. Sanitize Erase Supported: False physicaldrive 2I:1:3 Port: 2I Box: 1 Bay: 3 Status: OK Drive Type: Data Drive Interface Type: Solid State SATA Size: 240.0 GB Drive exposed to OS: False Native Block Size: 4096 Firmware Revision: N2010101 Serial Number: PHDV712003V8240AGN Model: ATA INTEL SSDSC2BB24 SATA NCQ Capable: True SATA NCQ Enabled: True Current Temperature (C): 29 Maximum Temperature (C): 35 SSD Smart Trip Wearout: Not Supported PHY Count: 1 PHY Transfer Rate: 6.0Gbps Drive Authentication Status: Not Authenticated. Smart Array will not control drive LEDs. Sanitize Erase Supported: False physicaldrive 2I:1:4 Port: 2I Box: 1 Bay: 4 Status: OK Drive Type: Data Drive Interface Type: Solid State SATA Size: 240.0 GB Drive exposed to OS: False Native Block Size: 4096 Firmware Revision: N2010101 Serial Number: PHDV712004GA240AGN Model: ATA INTEL SSDSC2BB24 SATA NCQ Capable: True SATA NCQ Enabled: True Current Temperature (C): 31 Maximum Temperature (C): 37 SSD Smart Trip Wearout: Not Supported PHY Count: 1 PHY Transfer Rate: 6.0Gbps Drive Authentication Status: Not Authenticated. Smart Array will not control drive LEDs. Sanitize Erase Supported: False Array: B Interface Type: SATA Unused Space: 0 MB (0.0%) Used Space: 43.7 TB (100.0%) Status: Failed Physical Drive MultiDomain Status: OK Array Type: Data HP SSD Smart Path: disable Warning: One of the drives on this array have failed or has been removed. Logical Drive: 2 Size: 38.2 TB Fault Tolerance: 5 Heads: 255 Sectors Per Track: 32 Cylinders: 65535 Strip Size: 256 KB Full Stripe Size: 1792 KB Status: Interim Recovery Mode MultiDomain Status: OK Caching: Enabled Parity Initialization Status: Initialization Failed Unique Identifier: 600508B1001CF94F84873C91FD89B549 Disk Name: /dev/sdb Mount Points: None Logical Drive Label: 04DA1DD6PDNNF0ARH502MC546F Drive Type: Data LD Acceleration Method: Controller Cache physicaldrive 1I:2:1 Port: 1I Box: 2 Bay: 1 Status: OK Drive Type: Data Drive Interface Type: SATA Size: 6001.1 GB Drive exposed to OS: False Native Block Size: 4096 Rotational Speed: 7200 Firmware Revision: APGNW7JH Serial Number: NAHN3UZY Model: ATA HGST HDN726060AL SATA NCQ Capable: True SATA NCQ Enabled: True Current Temperature (C): 37 Maximum Temperature (C): 43 PHY Count: 1 PHY Transfer Rate: 6.0Gbps Drive Authentication Status: Not Authenticated. Smart Array will not control drive LEDs. Sanitize Erase Supported: False physicaldrive 1I:2:2 Port: 1I Box: 2 Bay: 2 Status: OK Drive Type: Data Drive Interface Type: SATA Size: 6001.1 GB Drive exposed to OS: False Native Block Size: 4096 Rotational Speed: 7200 Firmware Revision: APGNT517 Serial Number: NAHLKP0X Model: ATA HGST HDN726060AL SATA NCQ Capable: True SATA NCQ Enabled: True Current Temperature (C): 37 Maximum Temperature (C): 56 PHY Count: 1 PHY Transfer Rate: 6.0Gbps Drive Authentication Status: Not Authenticated. Smart Array will not control drive LEDs. Sanitize Erase Supported: False physicaldrive 1I:2:3 Port: 1I Box: 2 Bay: 3 Status: OK Drive Type: Data Drive Interface Type: SATA Size: 6001.1 GB Drive exposed to OS: False Native Block Size: 4096 Rotational Speed: 7200 Firmware Revision: T7MH Serial Number: NCH8E81Z Model: ATA HUS726060ALE610 SATA NCQ Capable: True SATA NCQ Enabled: True Current Temperature (C): 33 Maximum Temperature (C): 41 PHY Count: 1 PHY Transfer Rate: 6.0Gbps Drive Authentication Status: Not Authenticated. Smart Array will not control drive LEDs. Sanitize Erase Supported: False physicaldrive 1I:2:4 Port: 1I Box: 2 Bay: 4 Status: OK Drive Type: Data Drive Interface Type: SATA Size: 6001.1 GB Drive exposed to OS: False Native Block Size: 4096 Rotational Speed: 7200 Firmware Revision: APGNW7JH Serial Number: NAHYMAUY Model: ATA HGST HDN726060AL SATA NCQ Capable: True SATA NCQ Enabled: True Current Temperature (C): 34 Maximum Temperature (C): 41 PHY Count: 1 PHY Transfer Rate: 6.0Gbps Drive Authentication Status: Not Authenticated. Smart Array will not control drive LEDs. Sanitize Erase Supported: False physicaldrive 1I:2:5 Port: 1I Box: 2 Bay: 5 Status: Failed Last Failure Reason: Write retries failed Drive Type: Data Drive Interface Type: SATA Size: 6001.1 GB Drive exposed to OS: False Native Block Size: 4096 Rotational Speed: 7200 Firmware Revision: T7MH Serial Number: K1H942MD Model: ATA HUS726060ALE610 SATA NCQ Capable: True SATA NCQ Enabled: True Maximum Temperature (C): 43 PHY Count: 1 PHY Transfer Rate: 6.0Gbps Drive Authentication Status: Not Applicable Sanitize Erase Supported: False physicaldrive 1I:2:6 Port: 1I Box: 2 Bay: 6 Status: OK Drive Type: Data Drive Interface Type: SATA Size: 6001.1 GB Drive exposed to OS: False Native Block Size: 4096 Rotational Speed: 7200 Firmware Revision: TDR2 Serial Number: K8JM5TKN Model: ATA HUS726060ALE610 SATA NCQ Capable: True SATA NCQ Enabled: True Current Temperature (C): 33 Maximum Temperature (C): 38 PHY Count: 1 PHY Transfer Rate: 6.0Gbps Drive Authentication Status: Not Authenticated. Smart Array will not control drive LEDs. Sanitize Erase Supported: False physicaldrive 1I:2:7 Port: 1I Box: 2 Bay: 7 Status: OK Drive Type: Data Drive Interface Type: SATA Size: 6001.1 GB Drive exposed to OS: False Native Block Size: 4096 Rotational Speed: 7200 Firmware Revision: APGNW7JH Serial Number: K8H9BW2N Model: ATA HGST HDN726060AL SATA NCQ Capable: True SATA NCQ Enabled: True Current Temperature (C): 34 Maximum Temperature (C): 39 PHY Count: 1 PHY Transfer Rate: 6.0Gbps Drive Authentication Status: Not Authenticated. Smart Array will not control drive LEDs. Sanitize Erase Supported: False physicaldrive 1I:2:8 Port: 1I Box: 2 Bay: 8 Status: OK Drive Type: Data Drive Interface Type: SATA Size: 6001.1 GB Drive exposed to OS: False Native Block Size: 4096 Rotational Speed: 7200 Firmware Revision: T7MH Serial Number: K1H623JD Model: ATA HUS726060ALE610 SATA NCQ Capable: True SATA NCQ Enabled: True Current Temperature (C): 35 Maximum Temperature (C): 40 PHY Count: 1 PHY Transfer Rate: 6.0Gbps Drive Authentication Status: Not Authenticated. Smart Array will not control drive LEDs. Sanitize Erase Supported: False
您可能有一個嚴重穿孔的陣列,由於條帶重建失敗,這會導致替換磁碟過早“計劃死亡”。您可以在此處和此處閱讀更多資訊
解決方案是備份、銷毀陣列、重新創建陣列並從備份中恢復。
下次避免使用具有如此大驅動器的 RAID5 陣列。我強烈建議使用 RAID6,甚至更好的是 RAID10。
您應該使用具有系統中磁碟大小和類型的 RAID6。但是,在 HP Smart Array RAID 控制器上執行 RAID5 本身並沒有錯。我認為您的問題是在未經伺服器硬體認證的設置中使用消費磁碟的結果。
不過,有關伺服器的一些詳細資訊可能會有所幫助。
這是 HPE 伺服器,還是您只是使用 HPE 控制器?
這些似乎不是 HPE 驅動器或 HPE 驅動器載體。這是一個不好的跡象。
hpssacli
您提供的輸出還將顯示磁碟故障的原因。如果您不在 HPE 伺服器上並且存在背板問題或 SATA 超時(注意到您在 SATA 磁碟上),您可能會收到誤報。範例:(參見最後失敗原因行):
physicaldrive 2I:2:8 Port: 2I Box: 2 Bay: 8 Status: Failed Last Failure Reason: Aborted Command Drive Type: Data Drive