Memory
Fedora Server 34 在 HP ProLiant DL380e G8 上每隔幾分鐘隨機崩潰一次
不幸的是,我的 HP ProLiant DL380e G8 伺服器執行 Fedora Server 34 時出現問題。我懷疑這些是記憶體錯誤或 DIMM 出現/變壞,但我不確定。
非常歡迎回饋!
我已經跑了
journalctl -r
,它在 PasteBin 連結中返回以下輸出(看起來與眾不同的片段):https ://pastebin.com/KPUZHceD感謝所有幫助和想法!
親切的問候
編輯:針對@Michael Hampton 的評論:此處發布的輸出:
<27>Sep 7 17:03:51 mcelog: Location: SOCKET:0 CHANNEL:3 DIMM:1 [] Sep 07 17:03:51 turbo mcelog[1304]: Location: SOCKET:0 CHANNEL:3 DIMM:1 [] Sep 07 17:03:51 turbo mcelog[1303]: <27>Sep 7 17:03:51 mcelog: corrected DIMM memory error count exceeded threshold: 10 in 24h Sep 07 17:03:51 turbo mcelog[1303]: corrected DIMM memory error count exceeded threshold: 10 in 24h Sep 07 17:03:51 turbo mcelog[1304]: <27>Sep 7 17:03:51 mcelog: Location: SOCKET:0 CHANNEL:3 DIMM:1 [] Sep 07 17:03:51 turbo mcelog[1304]: Location: SOCKET:0 CHANNEL:3 DIMM:1 [] Sep 07 17:03:51 turbo mcelog[1303]: <27>Sep 7 17:03:51 mcelog: corrected DIMM memory error count exceeded threshold: 10 in 24h Sep 07 17:03:51 turbo mcelog[1303]: corrected DIMM memory error count exceeded threshold: 10 in 24h Sep 07 17:03:51 turbo mcelog[1067]: CPUID Vendor Intel Family 6 Model 45 Step 7 Sep 07 17:03:51 turbo mcelog[1067]: MICROCODE 71a Sep 07 17:03:51 turbo mcelog[1067]: MCGCAP 1000812 APICID 2 SOCKETID 0 Sep 07 17:03:51 turbo mcelog[1067]: STATUS c80000c400800093 MCGSTATUS 0 Sep 07 17:03:51 turbo mcelog[1067]: MemCtrl: Sep 07 17:03:51 turbo mcelog[1067]: Transaction: Memory read error Sep 07 17:03:51 turbo mcelog[1067]: MCA: MEMORY CONTROLLER RD_CHANNEL3_ERR Sep 07 17:03:51 turbo mcelog[1067]: MCi_MISC register valid Sep 07 17:03:51 turbo mcelog[1067]: Corrected error Sep 07 17:03:51 turbo mcelog[1067]: Error overflow Sep 07 17:03:51 turbo mcelog[1067]: MCi status: Sep 07 17:03:51 turbo mcelog[1067]: MCG status: Sep 07 17:03:51 turbo mcelog[1067]: TIME 1631027031 Tue Sep 7 17:03:51 2021 Sep 07 17:03:51 turbo mcelog[1067]: MISC d22131295c834800 Sep 07 17:03:51 turbo mcelog[1067]: CPU 1 BANK 11 Sep 07 17:03:51 turbo mcelog[1067]: MCE 7 Sep 07 17:03:51 turbo mcelog[1067]: Hardware event. This is not a software error. Sep 07 17:03:51 turbo mcelog[1067]: CPUID Vendor Intel Family 6 Model 45 Step 7 Sep 07 17:03:51 turbo mcelog[1067]: MICROCODE 71a Sep 07 17:03:51 turbo mcelog[1067]: MCGCAP 1000812 APICID 3 SOCKETID 0 Sep 07 17:03:51 turbo mcelog[1067]: STATUS c80000c400800093 MCGSTATUS 0 Sep 07 17:03:51 turbo mcelog[1067]: MemCtrl: Sep 07 17:03:51 turbo mcelog[1067]: Transaction: Memory read error Sep 07 17:03:51 turbo mcelog[1067]: MCA: MEMORY CONTROLLER RD_CHANNEL3_ERR Sep 07 17:03:51 turbo mcelog[1067]: MCi_MISC register valid Sep 07 17:03:51 turbo mcelog[1067]: Corrected error Sep 07 17:03:51 turbo mcelog[1067]: Error overflow Sep 07 17:03:51 turbo mcelog[1067]: MCi status: Sep 07 17:03:51 turbo mcelog[1067]: MCG status: Sep 07 17:03:51 turbo mcelog[1067]: TIME 1631027031 Tue Sep 7 17:03:51 2021 Sep 07 17:03:51 turbo mcelog[1067]: MISC d22131295c834800 Sep 07 17:03:51 turbo mcelog[1067]: CPU 13 BANK 11 Sep 07 17:03:51 turbo mcelog[1067]: MCE 6 Sep 07 17:03:51 turbo mcelog[1067]: Hardware event. This is not a software error. Sep 07 17:03:51 turbo mcelog[1067]: CPUID Vendor Intel Family 6 Model 45 Step 7 Sep 07 17:03:51 turbo mcelog[1067]: MICROCODE 71a Sep 07 17:03:51 turbo mcelog[1067]: MCGCAP 1000812 APICID 0 SOCKETID 0 Sep 07 17:03:51 turbo mcelog[1067]: STATUS c80000c400800093 MCGSTATUS 0 Sep 07 17:03:51 turbo mcelog[1067]: MemCtrl: Sep 07 17:03:51 turbo mcelog[1067]: Transaction: Memory read error Sep 07 17:03:51 turbo mcelog[1067]: MCA: MEMORY CONTROLLER RD_CHANNEL3_ERR Sep 07 17:03:51 turbo mcelog[1067]: MCi_MISC register valid Sep 07 17:03:51 turbo mcelog[1067]: Corrected error Sep 07 17:03:51 turbo mcelog[1067]: Error overflow Sep 07 17:03:51 turbo mcelog[1067]: MCi status: Sep 07 17:03:51 turbo mcelog[1067]: MCG status: Sep 07 17:03:51 turbo mcelog[1067]: TIME 1631027031 Tue Sep 7 17:03:51 2021 Sep 07 17:03:51 turbo mcelog[1067]: MISC d22131295c834800 Sep 07 17:03:51 turbo mcelog[1067]: CPU 0 BANK 11 Sep 07 17:03:51 turbo mcelog[1067]: MCE 5 Sep 07 17:03:51 turbo mcelog[1067]: Hardware event. This is not a software error. Sep 07 17:03:51 turbo mcelog[1067]: Running trigger `dimm-error-trigger' (reporter: memdb) Sep 07 17:03:51 turbo mcelog[1067]: CPUID Vendor Intel Family 6 Model 45 Step 7 Sep 07 17:03:51 turbo mcelog[1067]: MICROCODE 71a Sep 07 17:03:51 turbo mcelog[1067]: MCGCAP 1000812 APICID 6 SOCKETID 0 Sep 07 17:03:51 turbo mcelog[1067]: STATUS c80000c400800093 MCGSTATUS 0 Sep 07 17:03:51 turbo mcelog[1067]: MemCtrl: Sep 07 17:03:51 turbo mcelog[1067]: Transaction: Memory read error Sep 07 17:03:51 turbo mcelog[1067]: MCA: MEMORY CONTROLLER RD_CHANNEL3_ERR Sep 07 17:03:51 turbo mcelog[1067]: MCi_MISC register valid Sep 07 17:03:51 turbo mcelog[1067]: Corrected error Sep 07 17:03:51 turbo mcelog[1067]: Error overflow Sep 07 17:03:51 turbo mcelog[1067]: MCi status: Sep 07 17:03:51 turbo mcelog[1067]: MCG status: Sep 07 17:03:51 turbo mcelog[1067]: TIME 1631027031 Tue Sep 7 17:03:51 2021 Sep 07 17:03:51 turbo mcelog[1067]: MISC d22131295c834800 Sep 07 17:03:51 turbo mcelog[1067]: CPU 3 BANK 11 Sep 07 17:03:51 turbo mcelog[1067]: MCE 4 Sep 07 17:03:51 turbo mcelog[1067]: Hardware event. This is not a software error. Sep 07 17:03:51 turbo mcelog[1067]: CPUID Vendor Intel Family 6 Model 45 Step 7 Sep 07 17:03:51 turbo mcelog[1067]: MICROCODE 71a Sep 07 17:03:51 turbo mcelog[1067]: MCGCAP 1000812 APICID a SOCKETID 0 Sep 07 17:03:51 turbo mcelog[1067]: STATUS c801c00400800093 MCGSTATUS 0 Sep 07 17:03:51 turbo mcelog[1067]: MemCtrl: Sep 07 17:03:51 turbo mcelog[1067]: Transaction: Memory read error Sep 07 17:03:51 turbo mcelog[1067]: MCA: MEMORY CONTROLLER RD_CHANNEL3_ERR Sep 07 17:03:51 turbo mcelog[1067]: MCi_MISC register valid Sep 07 17:03:51 turbo mcelog[1067]: Corrected error Sep 07 17:03:51 turbo mcelog[1067]: Error overflow Sep 07 17:03:51 turbo mcelog[1067]: MCi status: Sep 07 17:03:51 turbo mcelog[1067]: MCG status: Sep 07 17:03:51 turbo mcelog[1067]: TIME 1631027031 Tue Sep 7 17:03:51 2021 Sep 07 17:03:51 turbo mcelog[1067]: MISC d2213fa689118800 Sep 07 17:03:51 turbo mcelog[1067]: CPU 5 BANK 11 Sep 07 17:03:51 turbo mcelog[1067]: MCE 3 Sep 07 17:03:51 turbo mcelog[1067]: Hardware event. This is not a software error. Sep 07 17:03:51 turbo mcelog[1067]: CPUID Vendor Intel Family 6 Model 45 Step 7 Sep 07 17:03:51 turbo mcelog[1067]: MICROCODE 71a Sep 07 17:03:51 turbo mcelog[1067]: MCGCAP 1000812 APICID 5 SOCKETID 0 Sep 07 17:03:51 turbo mcelog[1067]: STATUS c801bd8400800093 MCGSTATUS 0 Sep 07 17:03:51 turbo mcelog[1067]: MemCtrl: Sep 07 17:03:51 turbo mcelog[1067]: Transaction: Memory read error Sep 07 17:03:51 turbo mcelog[1067]: MCA: MEMORY CONTROLLER RD_CHANNEL3_ERR Sep 07 17:03:51 turbo mcelog[1067]: MCi_MISC register valid Sep 07 17:03:51 turbo mcelog[1067]: Corrected error Sep 07 17:03:51 turbo mcelog[1067]: Error overflow Sep 07 17:03:51 turbo mcelog[1067]: MCi status: Sep 07 17:03:51 turbo mcelog[1067]: MCG status: Sep 07 17:03:51 turbo mcelog[1067]: TIME 1631027031 Tue Sep 7 17:03:51 2021 Sep 07 17:03:51 turbo mcelog[1067]: MISC d2213f0649118800 Sep 07 17:03:51 turbo mcelog[1067]: CPU 14 BANK 11 Sep 07 17:03:51 turbo mcelog[1067]: MCE 2 Sep 07 17:03:51 turbo mcelog[1067]: Hardware event. This is not a software error. Sep 07 17:03:51 turbo mcelog[1067]: CPUID Vendor Intel Family 6 Model 45 Step 7 Sep 07 17:03:51 turbo mcelog[1067]: MICROCODE 71a Sep 07 17:03:51 turbo mcelog[1067]: MCGCAP 1000812 APICID 1 SOCKETID 0 Sep 07 17:03:51 turbo mcelog[1067]: STATUS c801bec400800093 MCGSTATUS 0 Sep 07 17:03:51 turbo mcelog[1067]: MemCtrl: Sep 07 17:03:51 turbo mcelog[1067]: Transaction: Memory read error Sep 07 17:03:51 turbo mcelog[1067]: MCA: MEMORY CONTROLLER RD_CHANNEL3_ERR Sep 07 17:03:51 turbo mcelog[1067]: MCi_MISC register valid Sep 07 17:03:51 turbo mcelog[1067]: Corrected error Sep 07 17:03:51 turbo mcelog[1067]: Error overflow Sep 07 17:03:51 turbo mcelog[1067]: MCi status: Sep 07 17:03:51 turbo mcelog[1067]: MCG status: Sep 07 17:03:51 turbo mcelog[1067]: TIME 1631027031 Tue Sep 7 17:03:51 2021 Sep 07 17:03:51 turbo mcelog[1067]: MISC d221196e09118800 Sep 07 17:03:51 turbo mcelog[1067]: CPU 12 BANK 11 Sep 07 17:03:51 turbo mcelog[1067]: MCE 1 Sep 07 17:03:51 turbo mcelog[1067]: Hardware event. This is not a software error. Sep 07 17:03:51 turbo mcelog[1067]: CPUID Vendor Intel Family 6 Model 45 Step 7 Sep 07 17:03:51 turbo mcelog[1067]: MICROCODE 71a Sep 07 17:03:51 turbo mcelog[1067]: MCGCAP 1000812 APICID 0 SOCKETID 0 Sep 07 17:03:51 turbo mcelog[1067]: STATUS c0107b4000010093 MCGSTATUS 0 Sep 07 17:03:51 turbo mcelog[1067]: Transaction: Memory read error Sep 07 17:03:51 turbo mcelog[1067]: MCA: MEMORY CONTROLLER RD_CHANNEL3_ERR Sep 07 17:03:51 turbo mcelog[1067]: Corrected error Sep 07 17:03:51 turbo mcelog[1067]: Error overflow Sep 07 17:03:51 turbo mcelog[1067]: STATUS c0107b4000010093 MCGSTATUS 0 Sep 07 17:03:51 turbo mcelog[1067]: Transaction: Memory read error Sep 07 17:03:51 turbo mcelog[1067]: MCA: MEMORY CONTROLLER RD_CHANNEL3_ERR Sep 07 17:03:51 turbo mcelog[1067]: Corrected error Sep 07 17:03:51 turbo mcelog[1067]: Error overflow Sep 07 17:03:51 turbo mcelog[1067]: MCi status: Sep 07 17:03:51 turbo mcelog[1067]: MCG status: Sep 07 17:03:51 turbo mcelog[1067]: TIME 1631027031 Tue Sep 7 17:03:51 2021 Sep 07 17:03:51 turbo mcelog[1067]: CPU 0 BANK 5 Sep 07 17:03:51 turbo mcelog[1067]: MCE 0 Sep 07 17:03:51 turbo mcelog[1067]: Hardware event. This is not a software error. Sep 07 17:03:51 turbo mcelog[1067]: mcelog: mcelog read: Input/output error Sep 07 17:03:51 turbo kernel: ERST: [Firmware Warn]: Firmware does not respond in time. Sep 07 17:03:51 turbo kernel: mce: [Hardware Error]: Machine check events logged Sep 07 17:03:51 turbo kernel: mce: [Hardware Error]: Machine check events logged Sep 07 17:03:51 turbo kernel: mce_notify_irq: 6 callbacks suppressed
這篇文章已經通過從伺服器上移除 2 個有故障的 RAM 棒並重新安裝 CPU 來修復,因為這也沒有很好的接觸。
感謝所有的幫助!