Sata

SATA 磁碟出現故障但周期性出現錯誤?

  • June 14, 2022

我有一個 Seagate St2000dm001 2TB Barracuda Sata3 磁碟,它產生類似於以下的錯誤:

[Tue Jun 14 10:02:06 2022] ata2.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
[Tue Jun 14 10:02:06 2022] ata2.00: failed command: WRITE FPDMA QUEUED
[Tue Jun 14 10:02:06 2022] ata2.00: cmd 61/00:00:00:48:9f/02:00:b2:00:00/40 tag 0 ncq 262144 out
[Tue Jun 14 10:02:06 2022]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[Tue Jun 14 10:02:06 2022] ata2.00: status: { DRDY }
[Tue Jun 14 10:02:06 2022] ata2: hard resetting link
[Tue Jun 14 10:02:16 2022] ata2: softreset failed (1st FIS failed)
[Tue Jun 14 10:02:16 2022] ata2: hard resetting link
[Tue Jun 14 10:02:26 2022] ata2: softreset failed (1st FIS failed)
[Tue Jun 14 10:02:26 2022] ata2: hard resetting link
[Tue Jun 14 10:02:42 2022] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[Tue Jun 14 10:02:42 2022] ata2.00: configured for UDMA/133
[Tue Jun 14 10:02:42 2022] ata2.00: device reported invalid CHS sector 0
[Tue Jun 14 10:02:42 2022] ata2: EH complete

我用不同的電纜在不同的機器上測試了磁碟,但錯誤仍然存在。它看起來像一個清晰的磁碟損壞案例,但有一個轉折點。對錯誤進行很長時間的mkfs.ext4 -c -c處理,給出錯誤的周期性模式:

[Mon Jun 13 10:47:02 2022] ata2.00: failed command: WRITE FPDMA QUEUED
[Mon Jun 13 11:51:08 2022] ata2.00: failed command: WRITE FPDMA QUEUED
[Mon Jun 13 12:55:14 2022] ata2.00: failed command: WRITE FPDMA QUEUED
[Mon Jun 13 14:01:21 2022] ata2.00: failed command: READ FPDMA QUEUED
[Mon Jun 13 15:08:27 2022] ata2.00: failed command: READ FPDMA QUEUED
[Mon Jun 13 16:15:33 2022] ata2.00: failed command: READ FPDMA QUEUED
[Mon Jun 13 17:22:39 2022] ata2.00: failed command: WRITE FPDMA QUEUED
[Mon Jun 13 18:29:43 2022] ata2.00: failed command: WRITE FPDMA QUEUED
[Mon Jun 13 19:36:49 2022] ata2.00: failed command: WRITE FPDMA QUEUED
[Mon Jun 13 20:43:55 2022] ata2.00: failed command: WRITE FPDMA QUEUED
[Mon Jun 13 21:50:02 2022] ata2.00: failed command: READ FPDMA QUEUED
[Mon Jun 13 22:57:08 2022] ata2.00: failed command: READ FPDMA QUEUED
[Tue Jun 14 00:04:14 2022] ata2.00: failed command: READ FPDMA QUEUED
[Tue Jun 14 01:11:17 2022] ata2.00: failed command: WRITE FPDMA QUEUED
[Tue Jun 14 02:15:24 2022] ata2.00: failed command: WRITE FPDMA QUEUED
[Tue Jun 14 03:19:30 2022] ata2.00: failed command: WRITE FPDMA QUEUED
[Tue Jun 14 04:26:36 2022] ata2.00: failed command: READ FPDMA QUEUED
[Tue Jun 14 05:33:42 2022] ata2.00: failed command: READ FPDMA QUEUED
[Tue Jun 14 06:40:48 2022] ata2.00: failed command: READ FPDMA QUEUED
[Tue Jun 14 07:47:54 2022] ata2.00: failed command: WRITE FPDMA QUEUED
[Tue Jun 14 08:55:00 2022] ata2.00: failed command: WRITE FPDMA QUEUED
[Tue Jun 14 10:02:06 2022] ata2.00: failed command: WRITE FPDMA QUEUED

幾乎每 1 小時 7 分鐘一班。我認為它可能與 相關smartd,但smartd沒有執行。所以,我被困住了:什麼樣的硬體故障會給出一個週期為 1 小時 7 分鐘的周期性錯誤?任何想法將不勝感激。

此致,

尼古拉斯

這幾乎是 4000 秒,在廉價振盪器的精度範圍內。

這意味著 SATA 驅動器或 SATA 控制器韌體中的某些東西可能會自動執行此操作。

基本上,原因可能是任何事情。例如,當某些組件檢查子程序失敗時,驅動器韌體每 4000 秒重置一次。SATA 控制器韌體在嘗試重新協商連結並且失敗或其他任何情況時每 4000 秒重置一次(這兩個範例並不比其他任何情況都更有可能)。

時間表明的唯一一件事是決定這樣做的軟體,無論是作為作業系統、控制器還是驅動器韌體執行的軟體。這可能是軟體錯誤,或者是對硬體錯誤的真實檢測。

所以,真的很難診斷。如果控制器和驅動器已經在他們最近的韌體版本(fwupdatemgr --get-updates是你的朋友,兩者都是),那麼。

引用自:https://serverfault.com/questions/1103236