Raid

HP P840 HDD RAID 5 許多奇怪的驅動器故障

  • October 7, 2019

我已經在我的 HP P840 上使用 RAID5 硬碟儲存 (8x6TB) 大約 2 年了,而且它總是出現異常多的驅動器故障。半年來一切都很好,但現在驅動器以一種奇怪的方式出現故障。例如,2 個新驅動器在添加到 RAID 幾天后出現故障。我也已經更換了 RAID 控制器,並且正在主機板和 RAID 控制器上使用最新的韌體。

我也嘗試過使用不同的驅動器。最初在該 RAID 中使用了 HGST DeskStar 6TB 驅動器,現在我在更換故障驅動器時已將它們替換為 HGST UltraStar 6TB。但是行為是一樣的。

此外,似乎(大多數)驅動器並沒有真正發生故障,因為一旦我更換了 RAID 控制器,一個故障驅動器就會再次被辨識為正常並開始重建。

我的主機支持人員說問題是我實際上使用的是 RAID5,我應該改用 RAID10。我很難相信,因為我一直在使用 RAID5 在其他系統上沒有問題(多年來沒有驅動器故障)。

誰能給我一個提示,罪魁禍首可能是什麼?RAID 控制器的配置方式有問題嗎?

謝謝!

*編輯:

伺服器是 HP DL180 G9

驅動器故障原因始終是“寫入重試失敗”*

更新:我們的主機讓我們完全更換硬體並切換到 RAID6。我們做到了,現在它已經順利執行了一段時間。雖然這並沒有真正調查,但我相信 shodanshok 對穿孔陣列的解釋似乎是合理的。因此我會接受這個答案。謝謝大家!

 Smart Array P840 in Slot 1                (sn: PDNNF0ARH321GD)


    Port Name: 1I

    Port Name: 2I

    Internal Drive Cage at Port 1I, Box 2, OK

    Internal Drive Cage at Port 1I, Box 2, OK

    Internal Drive Cage at Port 2I, Box 1, OK
    array A (Solid State SATA, Unused Space: 0  MB)


 logicaldrive 1 (447.1 GB, RAID 1+0, OK)

 physicaldrive 2I:1:1 (port 2I:box 1:bay 1, Solid State SATA, 240.0 GB, OK)
 physicaldrive 2I:1:2 (port 2I:box 1:bay 2, Solid State SATA, 240.0 GB, OK)
 physicaldrive 2I:1:3 (port 2I:box 1:bay 3, Solid State SATA, 240.0 GB, OK)
 physicaldrive 2I:1:4 (port 2I:box 1:bay 4, Solid State SATA, 240.0 GB, OK)

    array B (SATA, Unused Space: 0  MB)


 logicaldrive 2 (38.2 TB, RAID 5, Interim Recovery Mode)

 physicaldrive 1I:2:1 (port 1I:box 2:bay 1, SATA, 6001.1 GB, OK)
 physicaldrive 1I:2:2 (port 1I:box 2:bay 2, SATA, 6001.1 GB, OK)
 physicaldrive 1I:2:3 (port 1I:box 2:bay 3, SATA, 6001.1 GB, OK)
 physicaldrive 1I:2:4 (port 1I:box 2:bay 4, SATA, 6001.1 GB, OK)
 physicaldrive 1I:2:5 (port 1I:box 2:bay 5, SATA, 6001.1 GB, Failed)
 physicaldrive 1I:2:6 (port 1I:box 2:bay 6, SATA, 6001.1 GB, OK)
 physicaldrive 1I:2:7 (port 1I:box 2:bay 7, SATA, 6001.1 GB, OK)
 physicaldrive 1I:2:8 (port 1I:box 2:bay 8, SATA, 6001.1 GB, OK)

詳細資訊:

    Smart Array P840 in Slot 1
       Bus Interface: PCI
       Slot: 1
       Serial Number: PDNNF0ARH321GD
       Cache Serial Number: PEYFP0BRH323YZ
       RAID 6 (ADG) Status: Enabled
       Controller Status: OK
       Hardware Revision: B
       Firmware Version: 6.60
       Rebuild Priority: High
       Expand Priority: Medium
       Surface Scan Delay: 3 secs
       Surface Scan Mode: Idle
       Parallel Surface Scan Supported: Yes
       Current Parallel Surface Scan Count: 1
       Max Parallel Surface Scan Count: 16
       Queue Depth: Automatic
       Monitor and Performance Delay: 60  min
       Elevator Sort: Enabled
       Degraded Performance Optimization: Disabled
       Inconsistency Repair Policy: Disabled
       Wait for Cache Room: Disabled
       Surface Analysis Inconsistency Notification: Disabled
       Post Prompt Timeout: 15 secs
       Cache Board Present: True
    Cache Status: OK
    Cache Ratio: 10% Read / 90% Write
    Drive Write Cache: Enabled
    Total Cache Size: 4.0 GB
    Total Cache Memory Available: 3.2 GB
    No-Battery Write Cache: Enabled
    SSD Caching RAID5 WriteBack Enabled: True
    SSD Caching Version: 2
    Cache Backup Power Source: Batteries
    Battery/Capacitor Count: 1
    Battery/Capacitor Status: OK
    SATA NCQ Supported: True
    Spare Activation Mode: Activate on physical drive failure (default)
    Controller Temperature (C): 51
    Cache Module Temperature (C): 38
    Number of Ports: 2 Internal only
    Encryption: Disabled
    Express Local Encryption: False
    Driver Name: hpsa
    Driver Version: 3.4.16
    Driver Supports HP SSD Smart Path: True
    PCI Address (Domain:Bus:Device.Function): 0000:06:00.0
    Negotiated PCIe Data Rate: PCIe 3.0 x8 (7880 MB/s)
    Controller Mode: RAID
    Controller Mode Reboot: Not Required
    Latency Scheduler Setting: Disabled
    Current Power Mode: MaxPerformance
    Host Serial Number: CZ270500GM
    Sanitize Erase Supported: False
    Primary Boot Volume: logicaldrive 1 (600508B1001CE0F9FACF3A1358647115)
    Secondary Boot Volume: logicaldrive 1 (600508B1001CE0F9FACF3A1358647115)


    Port Name: 1I
          Port ID: 0
          Port Connection Number: 0
          SAS Address: 5001438038AD05A0
          Port Location: Internal
          Managed Cable Connected: False

    Port Name: 2I
          Port ID: 1
          Port Connection Number: 1
          SAS Address: 5001438038AD05A8
          Port Location: Internal
          Managed Cable Connected: False

    Internal Drive Cage at Port 1I, Box 2, OK
       Power Supply Status: Not Redundant
       Drive Bays: 4
       Port: 1I
       Box: 2
       Location: Internal

    Physical Drives
       physicaldrive 1I:2:1 (port 1I:box 2:bay 1, SATA, 6001.1 GB, OK)
       physicaldrive 1I:2:2 (port 1I:box 2:bay 2, SATA, 6001.1 GB, OK)
       physicaldrive 1I:2:3 (port 1I:box 2:bay 3, SATA, 6001.1 GB, OK)
       physicaldrive 1I:2:4 (port 1I:box 2:bay 4, SATA, 6001.1 GB, OK)
       None attached


    Internal Drive Cage at Port 1I, Box 2, OK
       Power Supply Status: Not Redundant
       Drive Bays: 4
       Port: 1I
       Box: 2
       Location: Internal

    Physical Drives
       physicaldrive 1I:2:1 (port 1I:box 2:bay 1, SATA, 6001.1 GB, OK)
       physicaldrive 1I:2:2 (port 1I:box 2:bay 2, SATA, 6001.1 GB, OK)
       physicaldrive 1I:2:3 (port 1I:box 2:bay 3, SATA, 6001.1 GB, OK)
       physicaldrive 1I:2:4 (port 1I:box 2:bay 4, SATA, 6001.1 GB, OK)
       None attached


    Internal Drive Cage at Port 2I, Box 1, OK
       Power Supply Status: Not Redundant
       Drive Bays: 4
       Port: 2I
       Box: 1
       Location: Internal

    Physical Drives
       physicaldrive 2I:1:1 (port 2I:box 1:bay 1, Solid State SATA, 240.0 GB, OK)
       physicaldrive 2I:1:2 (port 2I:box 1:bay 2, Solid State SATA, 240.0 GB, OK)
       physicaldrive 2I:1:3 (port 2I:box 1:bay 3, Solid State SATA, 240.0 GB, OK)
       physicaldrive 2I:1:4 (port 2I:box 1:bay 4, Solid State SATA, 240.0 GB, OK)
       None attached

    Array: A
       Interface Type: Solid State SATA
       Unused Space: 0  MB (0.0%)
       Used Space: 894.2 GB (100.0%)
       Status: OK
       MultiDomain Status: OK
       Array Type: Data
       HP SSD Smart Path: disable



 Logical Drive: 1
    Size: 447.1 GB
    Fault Tolerance: 1+0
    Heads: 255
    Sectors Per Track: 32
    Cylinders: 65535
    Strip Size: 256 KB
    Full Stripe Size: 512 KB
    Status: OK
    MultiDomain Status: OK
    Caching:  Enabled
    Unique Identifier: 600508B1001CE0F9FACF3A1358647115
    Disk Name: /dev/sda
    Mount Points: / 18.6 GB Partition Number 2
    OS Status: LOCKED
    Logical Drive Label: 0216D6F9PDNNF0ARH502MC7DFA
    Mirror Group 1:
       physicaldrive 2I:1:1 (port 2I:box 1:bay 1, Solid State SATA, 240.0 GB, OK)
       physicaldrive 2I:1:2 (port 2I:box 1:bay 2, Solid State SATA, 240.0 GB, OK)
    Mirror Group 2:
       physicaldrive 2I:1:3 (port 2I:box 1:bay 3, Solid State SATA, 240.0 GB, OK)
       physicaldrive 2I:1:4 (port 2I:box 1:bay 4, Solid State SATA, 240.0 GB, OK)
    Drive Type: Data
    LD Acceleration Method: Controller Cache

 physicaldrive 2I:1:1
    Port: 2I
    Box: 1
    Bay: 1
    Status: OK
    Drive Type: Data Drive
    Interface Type: Solid State SATA
    Size: 240.0 GB
    Drive exposed to OS: False
    Native Block Size: 4096
    Firmware Revision: N2010101
    Serial Number: PHDV712004AG240AGN
    Model: ATA     INTEL SSDSC2BB24
    SATA NCQ Capable: True
    SATA NCQ Enabled: True
    Current Temperature (C): 31
    Maximum Temperature (C): 39
    SSD Smart Trip Wearout: Not Supported
    PHY Count: 1
    PHY Transfer Rate: 6.0Gbps
    Drive Authentication Status: Not Authenticated. Smart Array will not control drive LEDs.
    Sanitize Erase Supported: False

 physicaldrive 2I:1:2
    Port: 2I
    Box: 1
    Bay: 2
    Status: OK
    Drive Type: Data Drive
    Interface Type: Solid State SATA
    Size: 240.0 GB
    Drive exposed to OS: False
    Native Block Size: 4096
    Firmware Revision: N2010101
    Serial Number: PHDV706303CH240AGN
    Model: ATA     INTEL SSDSC2BB24
    SATA NCQ Capable: True
    SATA NCQ Enabled: True
    Current Temperature (C): 29
    Maximum Temperature (C): 36
    SSD Smart Trip Wearout: Not Supported
    PHY Count: 1
    PHY Transfer Rate: 6.0Gbps
    Drive Authentication Status: Not Authenticated. Smart Array will not control drive LEDs.
    Sanitize Erase Supported: False

 physicaldrive 2I:1:3
    Port: 2I
    Box: 1
    Bay: 3
    Status: OK
    Drive Type: Data Drive
    Interface Type: Solid State SATA
    Size: 240.0 GB
    Drive exposed to OS: False
    Native Block Size: 4096
    Firmware Revision: N2010101
    Serial Number: PHDV712003V8240AGN
    Model: ATA     INTEL SSDSC2BB24
    SATA NCQ Capable: True
    SATA NCQ Enabled: True
    Current Temperature (C): 29
    Maximum Temperature (C): 35
    SSD Smart Trip Wearout: Not Supported
    PHY Count: 1
    PHY Transfer Rate: 6.0Gbps
    Drive Authentication Status: Not Authenticated. Smart Array will not control drive LEDs.
    Sanitize Erase Supported: False

 physicaldrive 2I:1:4
    Port: 2I
    Box: 1
    Bay: 4
    Status: OK
    Drive Type: Data Drive
    Interface Type: Solid State SATA
    Size: 240.0 GB
    Drive exposed to OS: False
    Native Block Size: 4096
    Firmware Revision: N2010101
    Serial Number: PHDV712004GA240AGN
    Model: ATA     INTEL SSDSC2BB24
    SATA NCQ Capable: True
    SATA NCQ Enabled: True
    Current Temperature (C): 31
    Maximum Temperature (C): 37
    SSD Smart Trip Wearout: Not Supported
    PHY Count: 1
    PHY Transfer Rate: 6.0Gbps
    Drive Authentication Status: Not Authenticated. Smart Array will not control drive LEDs.
    Sanitize Erase Supported: False


    Array: B
       Interface Type: SATA
       Unused Space: 0  MB (0.0%)
       Used Space: 43.7 TB (100.0%)
       Status: Failed Physical Drive
       MultiDomain Status: OK
       Array Type: Data
       HP SSD Smart Path: disable

       Warning: One of the drives on this array have failed or has been removed.




 Logical Drive: 2
    Size: 38.2 TB
    Fault Tolerance: 5
    Heads: 255
    Sectors Per Track: 32
    Cylinders: 65535
    Strip Size: 256 KB
    Full Stripe Size: 1792 KB
    Status: Interim Recovery Mode
    MultiDomain Status: OK
    Caching:  Enabled
    Parity Initialization Status: Initialization Failed
    Unique Identifier: 600508B1001CF94F84873C91FD89B549
    Disk Name: /dev/sdb
    Mount Points: None
    Logical Drive Label: 04DA1DD6PDNNF0ARH502MC546F
    Drive Type: Data
    LD Acceleration Method: Controller Cache

 physicaldrive 1I:2:1
    Port: 1I
    Box: 2
    Bay: 1
    Status: OK
    Drive Type: Data Drive
    Interface Type: SATA
    Size: 6001.1 GB
    Drive exposed to OS: False
    Native Block Size: 4096
    Rotational Speed: 7200
    Firmware Revision: APGNW7JH
    Serial Number: NAHN3UZY
    Model: ATA     HGST HDN726060AL
    SATA NCQ Capable: True
    SATA NCQ Enabled: True
    Current Temperature (C): 37
    Maximum Temperature (C): 43
    PHY Count: 1
    PHY Transfer Rate: 6.0Gbps
    Drive Authentication Status: Not Authenticated. Smart Array will not control drive LEDs.
    Sanitize Erase Supported: False

 physicaldrive 1I:2:2
    Port: 1I
    Box: 2
    Bay: 2
    Status: OK
    Drive Type: Data Drive
    Interface Type: SATA
    Size: 6001.1 GB
    Drive exposed to OS: False
    Native Block Size: 4096
    Rotational Speed: 7200
    Firmware Revision: APGNT517
    Serial Number: NAHLKP0X
    Model: ATA     HGST HDN726060AL
    SATA NCQ Capable: True
    SATA NCQ Enabled: True
    Current Temperature (C): 37
    Maximum Temperature (C): 56
    PHY Count: 1
    PHY Transfer Rate: 6.0Gbps
    Drive Authentication Status: Not Authenticated. Smart Array will not control drive LEDs.
    Sanitize Erase Supported: False

 physicaldrive 1I:2:3
    Port: 1I
    Box: 2
    Bay: 3
    Status: OK
    Drive Type: Data Drive
    Interface Type: SATA
    Size: 6001.1 GB
    Drive exposed to OS: False
    Native Block Size: 4096
    Rotational Speed: 7200
    Firmware Revision: T7MH
    Serial Number: NCH8E81Z
    Model: ATA     HUS726060ALE610
    SATA NCQ Capable: True
    SATA NCQ Enabled: True
    Current Temperature (C): 33
    Maximum Temperature (C): 41
    PHY Count: 1
    PHY Transfer Rate: 6.0Gbps
    Drive Authentication Status: Not Authenticated. Smart Array will not control drive LEDs.
    Sanitize Erase Supported: False

 physicaldrive 1I:2:4
    Port: 1I
    Box: 2
    Bay: 4
    Status: OK
    Drive Type: Data Drive
    Interface Type: SATA
    Size: 6001.1 GB
    Drive exposed to OS: False
    Native Block Size: 4096
    Rotational Speed: 7200
    Firmware Revision: APGNW7JH
    Serial Number: NAHYMAUY
    Model: ATA     HGST HDN726060AL
    SATA NCQ Capable: True
    SATA NCQ Enabled: True
    Current Temperature (C): 34
    Maximum Temperature (C): 41
    PHY Count: 1
    PHY Transfer Rate: 6.0Gbps
    Drive Authentication Status: Not Authenticated. Smart Array will not control drive LEDs.
    Sanitize Erase Supported: False

 physicaldrive 1I:2:5
    Port: 1I
    Box: 2
    Bay: 5
    Status: Failed
    Last Failure Reason: Write retries failed
    Drive Type: Data Drive
    Interface Type: SATA
    Size: 6001.1 GB
    Drive exposed to OS: False
    Native Block Size: 4096
    Rotational Speed: 7200
    Firmware Revision: T7MH
    Serial Number: K1H942MD
    Model: ATA     HUS726060ALE610
    SATA NCQ Capable: True
    SATA NCQ Enabled: True
    Maximum Temperature (C): 43
    PHY Count: 1
    PHY Transfer Rate: 6.0Gbps
    Drive Authentication Status: Not Applicable
    Sanitize Erase Supported: False

 physicaldrive 1I:2:6
    Port: 1I
    Box: 2
    Bay: 6
    Status: OK
    Drive Type: Data Drive
    Interface Type: SATA
    Size: 6001.1 GB
    Drive exposed to OS: False
    Native Block Size: 4096
    Rotational Speed: 7200
    Firmware Revision: TDR2
    Serial Number: K8JM5TKN
    Model: ATA     HUS726060ALE610
    SATA NCQ Capable: True
    SATA NCQ Enabled: True
    Current Temperature (C): 33
    Maximum Temperature (C): 38
    PHY Count: 1
    PHY Transfer Rate: 6.0Gbps
    Drive Authentication Status: Not Authenticated. Smart Array will not control drive LEDs.
    Sanitize Erase Supported: False

 physicaldrive 1I:2:7
    Port: 1I
    Box: 2
    Bay: 7
    Status: OK
    Drive Type: Data Drive
    Interface Type: SATA
    Size: 6001.1 GB
    Drive exposed to OS: False
    Native Block Size: 4096
    Rotational Speed: 7200
    Firmware Revision: APGNW7JH
    Serial Number: K8H9BW2N
    Model: ATA     HGST HDN726060AL
    SATA NCQ Capable: True
    SATA NCQ Enabled: True
    Current Temperature (C): 34
    Maximum Temperature (C): 39
    PHY Count: 1
    PHY Transfer Rate: 6.0Gbps
    Drive Authentication Status: Not Authenticated. Smart Array will not control drive LEDs.
    Sanitize Erase Supported: False

 physicaldrive 1I:2:8
    Port: 1I
    Box: 2
    Bay: 8
    Status: OK
    Drive Type: Data Drive
    Interface Type: SATA
    Size: 6001.1 GB
    Drive exposed to OS: False
    Native Block Size: 4096
    Rotational Speed: 7200
    Firmware Revision: T7MH
    Serial Number: K1H623JD
    Model: ATA     HUS726060ALE610
    SATA NCQ Capable: True
    SATA NCQ Enabled: True
    Current Temperature (C): 35
    Maximum Temperature (C): 40
    PHY Count: 1
    PHY Transfer Rate: 6.0Gbps
    Drive Authentication Status: Not Authenticated. Smart Array will not control drive LEDs.
    Sanitize Erase Supported: False

您可能有一個嚴重穿孔的陣列,由於條帶重建失敗,這會導致替換磁碟過早“計劃死亡”。您可以在此處此處閱讀更多資訊

解決方案是備份、銷毀陣列、重新創建陣列並從備份中恢復。

下次避免使用具有如此大驅動器的 RAID5 陣列。我強烈建議使用 RAID6,甚至更好的是 RAID10。

您應該使用具有系統中磁碟大小和類型的 RAID6。但是,在 HP Smart Array RAID 控制器上執行 RAID5 本身並沒有錯。我認為您的問題是在未經伺服器硬體認證的設置中使用消費磁碟的結果。

不過,有關伺服器的一些詳細資訊可能會有​​所幫助。

這是 HPE 伺服器,還是您只是使用 HPE 控制器?

這些似乎不是 HPE 驅動器或 HPE 驅動器載體。這是一個不好的跡象。

hpssacli您提供的輸出還將顯示磁碟故障的原因。如果您不在 HPE 伺服器上並且存在背板問題或 SATA 超時(注意到您在 SATA 磁碟上),您可能會收到誤報。

範例:(參見最後失敗原因行)

 physicaldrive 2I:2:8
    Port: 2I
    Box: 2
    Bay: 8
    Status: Failed
    Last Failure Reason: Aborted Command
    Drive Type: Data Drive

引用自:https://serverfault.com/questions/958650