Linux
記錄機器檢查事件
在 /var/log/messages 中,發生了這個錯誤:
Sep 19 13:18:15 wdc kernel: [2772302.630416] Machine check events logged
不久之後,整個伺服器變得沒有響應。這是 Xen 伺服器的 Dom0 日誌(在 Debian Squeeze 上執行最新版本)。
任何人都可以闡明這個錯誤的含義嗎?我應該訂購新硬體嗎?
編輯:另外,它似乎暗示它記錄了一些東西,我在哪裡可以找到它?
有關更多資訊,請檢查日誌文件(此日誌文件可能存在也可能不存在,這取決於它在 /etc/mcelog/mcelog.conf 中的配置方式)應該在哪裡找到問題的詳細描述。
/var/log/mcelog
或者只是執行命令
mcelog
Mcelog 正在解碼 x86 機器上的核心機器檢查日誌。來自
man mcelog
:X86 CPUs report errors detected by the CPU as machine check events (MCEs). These can be data corruption detected in the CPU caches, in main memory by an integrated memory controller, data transfer errors on the front side bus or CPU interconnect or other internal errors. Possible causes can be cosmic radiation, instable power supplies, cooling problems, broken hardware, or bad luck. Most errors can be corrected by the CPU by internal error correction mechanisms. Uncorrected errors cause machine check exceptions which may panic the machine. When a corrected error happens the x86 kernel writes a record describing the MCE into a internal ring buffer available through the /dev/mcelog device mcelog retrieves errors from /dev/mcelog, decodes them into a human readable format and prints them on the standard output or optionally into the system log.
您可以在項目網頁Mcelog 項目網頁 上找到有關 mcelog 及其配置/錯誤/觸發器的更多資訊