無法啟動 kdump

May 8, 2013

我的系統總是崩潰。所以我決定啟用 kdump 來查看問題，因為我無法在日誌文件中看到可能的錯誤。

我按照步驟從此處的站點設置 kdump。我的伺服器在 CentOS 5.8 和 16GB RAM 上執行。以下是我為配置 kdump 執行的步驟：

1. Install kexec-tools, `yum install kexec-tools` and follow the installation steps
2. Edit the /boot/grub/grub.conf to configure the kdump memory usage
3. Edit the /etc/kdump.cof to configure the target type to /var/crash/ and core_collector
4. Enable kdump through `chkconfig kdump on`.
5. Reboot the server

當我跑的時候service kdump status，它說Kdump is not operational。我應該怎麼做才能使 kdump 執行。我錯過了要配置的東西嗎？我在 /boot/grub/grub.conf 和 /etc/kdump.conf 的內容下麵包含了

下面是文件 /boot/grub/grub.conf 的內容

# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE:  You have a /boot partition.  This means that
#          all kernel and initrd paths are relative to /boot/, eg.
#          root (hd0,0)
#          kernel /vmlinuz-version ro root=/dev/sda3
#          initrd /initrd-version.img
#boot=/dev/sda
default=0
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title CentOS (2.6.18-308.el5)
       root (hd0,0)
       kernel /vmlinuz-2.6.18-308.el5 ro root=LABEL=/
crashkernel=128M
       initrd /initrd-2.6.18-308.el5.img

以下是文件 /etc/kdump.conf 的內容

# Configures where to put the kdump /proc/vmcore files
#
# This file contains a series of commands to perform (in order) when a
# kernel crash has happened and the kdump kernel has been loaded.  Directives in
# this file are only applicable to the kdump initramfs, and have no effect if
# the root filesystem is mounted and the normal init scripts are processed
#
# Currently only one dump target and path may be configured at once
# if the configured dump target fails, the default action will be preformed
# the default action may be configured with the default directive below.  If the
# configured dump target succedes
#
# For filesystem based dump, it's recommended to use UUID and LABEL
# instead of device name in dump target.
#
# See the kdump.conf(5) man page for details of configuration directives

#raw /dev/sda5
#ext3 /dev/sda3
#ext3 LABEL=/boot
#ext3 UUID=03138356-5e61-4ab3-b58e-27507ac41937
#net my.server.com:/export/tmp
#net user@my.server.com
path /var/crash
core_collector makedumpfile -c --message-level 1
#core_collector cp --sparse=always
#link_delay 60
#kdump_post /var/crash/scripts/kdump-post.sh
#extra_bins /usr/bin/lftp
#disk_timeout 30
#extra_modules gfs2
#options modulename options
#default shell
#sshkey /root/.ssh/kdump_id_rsa

我還注意到我的 /boot/grub/grub.conf 文件與教程中的範例 grub.conf 文件不同。它們在兩行上有所不同：

From tutorial
  kernel /vmlinuz-2.6.32-220.el6.x86_64 ro root=/dev/sda3
  initrd /initramfs-2.6.32-220.el6.x86_64.img

From own conf 
  kernel /vmlinuz-2.6.18-308.el5 ro root=LABEL=/
  initrd /initrd-2.6.18-308.el5.img

這些行會導致 kdump 無法啟動嗎？

$$ EDIT 1 $$ /var/log/messages 的內容

   Feb 25 02:18:28 61540 kernel: Command line: ro root=LABEL=/ crashkernel=128M
   Feb 25 02:18:28 61540 kernel: BIOS-provided physical RAM map:
   Feb 25 02:18:28 61540 kernel:  BIOS-e820: 0000000000010000 - 000000000009a000 (usable)
   Feb 25 02:18:28 61540 kernel:  BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved)
   Feb 25 02:18:28 61540 kernel:  BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
   Feb 25 02:18:28 61540 kernel:  BIOS-e820: 0000000000100000 - 00000000cfda0000 (usable)
   Feb 25 02:18:28 61540 kernel:  BIOS-e820: 00000000cfda0000 - 00000000cfdd1000 (ACPI NVS)
   Feb 25 02:18:28 61540 kernel:  BIOS-e820: 00000000cfdd1000 - 00000000cfe00000 (ACPI data)
   Feb 25 02:18:28 61540 kernel:  BIOS-e820: 00000000cfe00000 - 00000000cff00000 (reserved)
   Feb 25 02:18:28 61540 kernel:  BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
   Feb 25 02:18:28 61540 kernel:  BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
   Feb 25 02:18:28 61540 kernel:  BIOS-e820: 0000000100000000 - 000000042f000000 (usable)
   Feb 25 02:18:28 61540 kernel: DMI 2.4 present.
   Feb 25 02:18:28 61540 kernel: No NUMA configuration found
   Feb 25 02:18:28 61540 kernel: Faking a node at 0000000000000000-000000042f000000
   Feb 25 02:18:28 61540 kernel: Bootmem setup node 0 0000000000000000-000000042f000000
   Feb 25 02:18:28 61540 kernel: Memory for crash kernel (0x0 to 0x0) notwithin permissible range
   Feb 25 02:18:28 61540 kernel: disabling kdump
   Feb 25 02:44:39 61540 kdump: No crashkernel parameter was specified or crashkernel memory reservation failed
   Feb 25 02:44:39 61540 kdump: failed to start up

$$ EDIT 2 $$ 或者我應該將程式碼 ro root=LABEL= 更改為 ro root=/dev/sda3？

title CentOS (2.6.18-308.el5)
       root (hd0,0)
       kernel /vmlinuz-2.6.18-308.el5 ro root=LABEL=/
crashkernel=128M
       initrd /initrd-2.6.18-308.el5.img

看起來您將crashkernel參數放入新行。這就是消息的原因Kdump is not operational。所有核心參數必須放在同一行kernel：

title CentOS (2.6.18-308.el5)
       root (hd0,0)
       kernel /vmlinuz-2.6.18-308.el5 ro root=LABEL=/ crashkernel=128M
       initrd /initrd-2.6.18-308.el5.img

重啟後，看一下/var/log/messages，你會看到這樣的：

localhost kdump: kexec: loaded kdump kernel
localhost kdump: started up

和：

# /etc/init.d/kdump start
Starting kdump:                                            [  OK  ]
# /etc/init.d/kdump status
Kdump is operational

kdump: No crashkernel parameter was specified or crashkernel memory reservation failed
kdump: failed to start up

根據這個文件，試試這個：

crashkernel=128M@16M

引用自：https://serverfault.com/questions/482151

無法啟動 kdump

相關問答

缺少 /etc/kdump.conf 時是否未安裝 kdump？

伺服器隨機凍結並僅在冷啟動時啟動

CPanel：什麼覆蓋了我的 MySQL 配置設置？

yum list installed 包括所有已安裝包的版本 CentOS 5.4

在 CentOS 中將目錄添加到 $PATH？

CentOS 7 上的 Fail2ban 使用 Docker 驅動的 Traefik 禁令 OK，無需添加 iptables 規則