Systemd

帶有 systemd 的 Ubuntu 伺服器 - 如何獲得回溯或核心轉儲?

  • June 13, 2019

我正在使用帶有 systemd 的 Ubuntu 18.04 伺服器。最近我部門開發的一個程序一天內崩潰了兩次,出現以下錯誤:

Jun 07 06:33:07 xxx systemd[1]: xxx.service: Main process exited, code=killed, status=11/SEGV
Jun 07 06:33:07 xxx systemd[1]: xxx.service: Failed with result 'signal'.

我認為下一步是獲取回溯或核心轉儲,但是我不確定如何在帶有 systemd 的 Ubuntu Server 上執行此操作。

我不確定是否應該繼續使用systemd-coredump,coredumpctl或其他一些實用程序。

另外,我不確定要發出什麼命令。對於上述實用程序,有大量關於各種功能等的文件,但我找不到如下簡潔的範例:

sudo apt-get install xyz

(run x, y, z commands to get core dump)

任何人都可以提供一個簡潔的範例或教程網站來很好地解釋這一點嗎?我不需要或不想使用各種複雜的功能,我只是想獲得一個基本的核心轉儲。

以一個相對簡單的服務 chrony NTP 守護程序為例。

使用 dbgsym 包安裝調試符號。 不幸的是,預設情況下 ddebs 儲存庫不在源文件中。也沒有很好的腳本來查找包,所以首先將 -dbgsym 附加到包名稱。

sudo apt install chrony-dbgsym

您可能需要考慮如何在現代 Linux 伺服器上處理核心轉儲 ,在他們的情況下,他們正在考慮僅使用核心轉儲文件。就個人而言,我在伺服器上的 apport 沒有任何用處,但發現 coredumpctl 很有用。因此,Ubuntu 18.04 上的 systemd 方法:

sudo systemctl stop apport
sudo systemctl mask --now apport
sudo apt install systemd-coredump
# Verify this changed the core pattern to a pipe to systemd-coredump
sysctl kernel.core_pattern

# Reproduce the crash.
sudo killall -s SIGSEGV chronyd

# List collected dumps.
coredumpctl

# Invoke debugger on the latest one.
sudo coredumpctl gdb
# systemd >= 239  the gdb verb was renamed debug. Also, select core by PID.
sudo coredumpctl debug 5809

# In GDB, the basic thing to get is a stack trace. Ask the developer what else they want.
(gdb) thread apply all bt

啟動調試器會話可能如下所示:

John@coredump:~$ coredumpctl
TIME                            PID   UID   GID SIG COREFILE  EXE
Sat 2019-06-08 12:55:16 UTC    5809   111   115  11 error     /usr/sbin/chronyd
John@coredump:~$ sudo coredumpctl gdb
          PID: 5809 (chronyd)
          UID: 111 (_chrony)
          GID: 115 (_chrony)
       Signal: 11 (SEGV)
    Timestamp: Sat 2019-06-08 12:55:16 UTC (1h 19min ago)
 Command Line: /usr/sbin/chronyd
   Executable: /usr/sbin/chronyd
Control Group: /system.slice/chrony.service
         Unit: chrony.service
        Slice: system.slice
      Boot ID: c9a0a69a73d245c1ae5dfe7d491ead0a
   Machine ID: d2934a6e67f81ae0097be31003da0b31
     Hostname: coredump
      Storage: /var/lib/systemd/coredump/core.chronyd.111.c9a0a69a73d245c1ae5dfe7d491ead0a.5809.1559998516000000.lz4
      Message: Process 5809 (chronyd) of user 111 dumped core.

               Stack trace of thread 5809:
               #0  0x00007eff1ce1403f __GI___select (libc.so.6)
               #1  0x00005597867eb3be n/a (chronyd)
               #2  0x00005597867e1071 n/a (chronyd)
               #3  0x00007eff1cd1eb97 __libc_start_main (libc.so.6)
               #4  0x00005597867e127a n/a (chronyd)

GNU gdb (Ubuntu 8.1-0ubuntu3) 8.1.0.20180409-git
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/sbin/chronyd...Reading symbols from /usr/lib/debug/.build-id/89/dcd398c87777f4c869bfd0831215eeb8b6c7fe.debug...done.
done.
[New LWP 5809]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/sbin/chronyd'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007eff1ce1403f in __GI___select (nfds=4, readfds=readfds@entry=0x7ffc0fd73c80,
   writefds=writefds@entry=0x0, exceptfds=exceptfds@entry=0x0, timeout=timeout@entry=0x7ffc0fd73be0)
   at ../sysdeps/unix/sysv/linux/select.c:41
41      ../sysdeps/unix/sysv/linux/select.c: No such file or directory.
(gdb) thread apply all bt

Thread 1 (Thread 0x7eff1df14740 (LWP 5809)):
#0  0x00007eff1ce1403f in __GI___select (nfds=4, readfds=readfds@entry=0x7ffc0fd73c80,
   writefds=writefds@entry=0x0, exceptfds=exceptfds@entry=0x0, timeout=timeout@entry=0x7ffc0fd73be0)
   at ../sysdeps/unix/sysv/linux/select.c:41
#1  0x00005597867eb3be in SCH_MainLoop () at sched.c:747
#2  0x00005597867e1071 in main (argc=&lt;optimized out&gt;, argv=0x7ffc0fd73fb8) at main.c:605

在這個人為的例子中,當我粗魯地向等待 I/O 的任務發送信號時,在 select() 中擷取了它。

更複雜的軟體可能缺少其他符號,安裝這些符號,可能還有原始碼並繼續調試。

引用自:https://serverfault.com/questions/970617