Ubuntu

Ubuntu 負載平均峰值,但 CPU 處於空閒狀態

  • May 24, 2011

我們在云網路上有一台伺服器,由第三方提供。我們正在執行 Ubuntu 10.04 伺服器版。

問題發生在看似隨機的時間,每天大約 1 到 3 次。頂部的平均負載通常在 2 左右,伺服器執行良好,但在這些隨機時間,平均負載飆升至 30-35 左右,一切都停止了。無法訪問我們的網站,無法在伺服器上執行命令,無法執行任何操作。如果您尚未登錄,甚至無法登錄。

我們能夠看到高負載平均值的唯一方法是不斷執行 top 以便在問題發生時它已經在執行。似乎如果它已經在執行,它將繼續正常工作,但如果它沒有執行,則無法啟動它。當它進入這種狀態時無法執行任何命令使我們難以診斷問題……另外,我們不認為自己是伺服器專家。

對我來說看起來很奇怪的是,平均負載峰值如此之高,但處理器保持空閒狀態,並且有大量可用記憶體。同樣,我根本不是專家,但我非常基本的理解是,如果記憶體可用並且處理器沒有被最大化,那麼就不應該有程序在等待(很可能我錯了)。

當我輸入這個時,我抓住了它,因為它開始飆升並設法在一切鎖定之前執行了一些命令。輸出如下:

unname -a

Linux <server name> 2.6.32-308-ec2 #16-Ubuntu SMP Thu Sep 16 14:28:38 UTC 2010 i686 GNU/Linux

最佳

top - 10:55:08 up 15:28,  4 users,  load average: 12.29, 7.01, 3.89
Tasks: 313 total,   3 running, 308 sleeping,   0 stopped,   2 zombie
Cpu(s):  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   4210704k total,  2163024k used,  2047680k free,   162320k buffers
Swap:  2096440k total,        0k used,  2096440k free,  1690464k cached

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
   1 root      20   0  2804 1644 1204 S    0  0.0   0:00.18 init
   2 root      20   0     0    0    0 S    0  0.0   0:00.00 kthreadd
   3 root      RT   0     0    0    0 R    0  0.0   0:00.08 migration/0
   4 root      20   0     0    0    0 S    0  0.0   0:00.01 ksoftirqd/0
   5 root      RT   0     0    0    0 R    0  0.0   0:00.01 watchdog/0
   6 root      20   0     0    0    0 S    0  0.0   0:00.06 events/0

ps axf

 PID TTY      STAT   TIME COMMAND
   2 ?        S      0:00 [kthreadd]
   3 ?        R      0:00  \_ [migration/0]
   4 ?        S      0:00  \_ [ksoftirqd/0]
   5 ?        R      0:00  \_ [watchdog/0]
   6 ?        S      0:00  \_ [events/0]
   7 ?        S      0:00  \_ [cpuset]
   8 ?        S      0:00  \_ [khelper]

<removed a bunch of processes to save space in the post, all had status S>

   1 ?        Ss     0:00 /sbin/init
 245 ?        S      0:00 upstart-udev-bridge --daemon
 251 ?        S /var/run/motd.new
25554 ?        S      0:00          \_ run-parts --lsbsysinit /etc/update-motd.d
25558 ?        S      0:00              \_ /bin/sh /etc/update-motd.d/10-help-text
25560 ?        D      0:00                  \_ /bin/sh /etc/update-motd.d/10-help-text
 852 ?        Ss     0:00 cron
1374 ?        S      0:00  \_ CRON
1377 ?        Ss     0:00  |   \_ /bin/sh -c /var/www/secure/caddy2_prod/scripts/main.pl
1379 ?        S      0:02  |       \_ /usr/bin/perl /var/www/secure/caddy2_prod/scripts/main.pl
1385 ?        Z      0:00  |           \_ [check.pl] 
1375 ?        S      0:00  \_ CRON
1376 ?        Ss     0:00      \_ /bin/sh -c /var/www/secure/caddy2_test/scripts/main.pl
1378 ?        S      0:00          \_ /usr/bin/perl /var/www/secure/caddy2_test/scripts/main.pl
1384 ?        Z      0:00              \_ [check.pl] 
 855 ?        Ss     0:00 atd
 868 ?        Ssl    6:36 /usr/sbin/mysqld
 890 ?        S      0:00 /bin/bash /usr/sbin/xe-daemon -p /var/run/xe-daemon.pid
25563 ?        S      0:00  \_ /bin/sh /usr/sbin/xe-update-guest-attrs --memory
25564 ?        D      0:00      \_ /bin/sh /usr/sbin/xe-update-guest-attrs --memory
1161 ?        Ss     0:00 /usr/lib/postfix/master
3102 ?        S      0:00  \_ qmgr -l -t fifo -u
22013 ?        S      0:00  \_ pickup -l -t fifo -u -c
1181 ?        Ssl    3:17 /usr/sbin/asterisk -p -U asterisk
1182 ?        S      0:00  \_ astcanary /var/run/asterisk/alt.asterisk.canary.tweet.tweet.tweet
1222 ?        Ss     0:00 /usr/sbin/apache2 -k start
31682 ?        S      0:01  \_ /usr/sbin/apache2 -k start
31716 ?        S      0:01  \_ /usr/sbin/apache2 -k start
13548 ?        S      0:00  \_ /usr/sbin/apache2 -k start
25593 ?        S      0:00  |   \_ /usr/bin/perl -w /usr/lib/cgi-bin/caddy2/patch.pl
25594 ?        D      0:00  |       \_ /usr/bin/perl -w /usr/lib/cgi-bin/caddy2/patch.pl
13637 ?        S      0:00  \_ /usr/sbin/apache2 -k start
16061 ?        S      0:00  \_ /usr/sbin/apache2 -k start
23116 ?        S      0:00  \_ /usr/sbin/apache2 -k start
25565 ?        D      0:00  |   \_ /usr/sbin/apache2 -k start
23117 ?        S      0:00  \_ /usr/sbin/apache2 -k start
23118 ?        S      0:00  \_ /usr/sbin/apache2 -k start
23119 ?        S      0:00  \_ /usr/sbin/apache2 -k start
23121 ?        S      0:00  \_ /usr/sbin/apache2 -k start
1268 tty1     Ss+    0:00 /sbin/getty -8 38400 tty1
1396 ?        S      0:00 /usr/local/caddy2/servers/test/caddy2serverd localhost caddy2test 1981

<removed a bunch of processes like the one above to save space in the post, there were about 100, all with status S>

25590 ?        S      0:00  \_ /usr/local/caddy2/servers/prod/caddy2serverd localhost caddy2prod 1991
25538 ?        D      0:00 /bin/bash ./impsys-snap.sh nohup
25596 ?        Ss     0:00 /sbin/getty -L hvc0 9600 linux

我確實注意到有幾個程序處於 D 狀態,我認為這表明它是一個殭屍程序。我不知道這些是否是問題的原因,或者 D 狀態的程序與 Z 狀態的程序之間有什麼區別。

如果我們認為這些是導致問題的原因,我能做些什麼呢?我不知道是什麼導致程序進入 D 狀態,因此不知道如何防止它發生。

非常感謝您的幫助。謝謝!

更新:

我查看了我們的 kern.log,發現它充滿了這樣的消息:

<removed to clean up post, further detail added below>

其中一些的時間戳似乎與伺服器鎖定我們的時間一致,所以我認為這與它有關。我們也將此資訊傳遞給了我們的伺服器提供商,但這是否表明有什麼用處?如果是這樣,這是否表明我的問題或我的伺服器提供商的問題?

更新 2:

這是似乎相關的整個時間的kern.log。10:52:24,平均負載開始上升。我在大約 10:54:02得到了**ps axf的輸出。**一兩分鐘後(可能正好兩分鐘,如果 10:56:02 表示它),系統變得無響應,我無法執行命令。這是日誌:


Mar 25 08:08:57 cloud kernel: [45483.026983] INFO: task apache2:9642 blocked for more than 120 seconds.
Mar 25 08:08:57 cloud kernel: [45483.026986] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 25 08:08:57 cloud kernel: [45483.026989] apache2       D ea63de60     0  9642   9068 0x00000000
Mar 25 08:08:57 cloud kernel: [45483.026992]  ea63de78 00000282 ea63ded4 ea63de60 c01d9096 00000000 00000000 00000000
Mar 25 08:08:57 cloud kernel: [45483.026996]  c06f61c0 ea1a6fac c06f61c0 c06f61c0 c06f61c0 ea1a6fac c06f61c0 c06f61c0
Mar 25 08:08:57 cloud kernel: [45483.027000]  c1305740 0000296f ea1a6d00 ec844d00 9a352c09 0003c456 ea63dea0 c0106c51
Mar 25 08:08:57 cloud kernel: [45483.027003] Call Trace:
Mar 25 08:08:57 cloud kernel: [45483.027006]  [] ? __link_path_walk+0x626/0xc20
Mar 25 08:08:57 cloud kernel: [45483.027010]  [] ? sched_clock+0x21/0x80
Mar 25 08:08:57 cloud kernel: [45483.027013]  [] schedule_timeout+0x175/0x250
Mar 25 08:08:57 cloud kernel: [45483.027018]  [] ? sched_clock_cpu+0x14d/0x190
Mar 25 08:08:57 cloud kernel: [45483.027021]  [] ? find_idlest_group+0xa8/0x1b0
Mar 25 08:08:57 cloud kernel: [45483.027023]  [] wait_for_common+0xc6/0x180
Mar 25 08:08:57 cloud kernel: [45483.027026]  [] ? default_wake_function+0x0/0x10
Mar 25 08:08:57 cloud kernel: [45483.027028]  [] wait_for_completion+0x12/0x20
Mar 25 08:08:57 cloud kernel: [45483.027031]  [] sched_migrate_task+0xe4/0x100
Mar 25 08:08:57 cloud kernel: [45483.027033]  [] sched_exec+0x3b/0x50
Mar 25 08:08:57 cloud kernel: [45483.027036]  [] do_execve+0xc4/0x360
Mar 25 08:08:57 cloud kernel: [45483.027038]  [] sys_execve+0x28/0x60
Mar 25 08:08:57 cloud kernel: [45483.027041]  [] syscall_call+0x7/0xb
Mar 25 09:27:03 cloud kernel: [50344.466167] nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
Mar 25 09:27:03 cloud kernel: [50344.466452] CONFIG_NF_CT_ACCT is deprecated and will be removed soon. Please use
Mar 25 09:27:03 cloud kernel: [50344.466454] nf_conntrack.acct=1 kernel parameter, acct=1 nf_conntrack module option or
Mar 25 09:27:03 cloud kernel: [50344.466455] sysctl net.netfilter.nf_conntrack_acct=1 to enable it.
Mar 25 10:52:24 cloud kernel: [55167.785176] BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0]
Mar 25 10:52:24 cloud kernel: [55167.785202] Modules linked in: nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipv6 scsi_mod lp parport
Mar 25 10:52:24 cloud kernel: [55167.785217] 
Mar 25 10:52:24 cloud kernel: [55167.785221] Pid: 0, comm: swapper Not tainted (2.6.32-308-ec2 #16-Ubuntu) 
Mar 25 10:52:24 cloud kernel: [55167.785224] EIP: 0061:[] EFLAGS: 00000246 CPU: 0
Mar 25 10:52:24 cloud kernel: [55167.785228] EIP is at 0xc01013a7
Mar 25 10:52:24 cloud kernel: [55167.785230] EAX: 00000000 EBX: 00000001 ECX: 00000000 EDX: c0689f58
Mar 25 10:52:24 cloud kernel: [55167.785232] ESI: c06beb08 EDI: a8b0d3bc EBP: c0689f78 ESP: c0689f70
Mar 25 10:52:24 cloud kernel: [55167.785235]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069
Mar 25 10:52:24 cloud kernel: [55167.785242] CR0: 8005003b CR2: b6620000 CR3: 2a65e000 CR4: 00002660
Mar 25 10:52:24 cloud kernel: [55167.785247] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
Mar 25 10:52:24 cloud kernel: [55167.785250] DR6: ffff0ff0 DR7: 00000400
Mar 25 10:52:24 cloud kernel: [55167.785252] Call Trace:
Mar 25 10:52:24 cloud kernel: [55167.785259]  [] ? xen_safe_halt+0x15/0x40
Mar 25 10:52:24 cloud kernel: [55167.785264]  [] xen_idle+0x29/0x80
Mar 25 10:52:24 cloud kernel: [55167.785267]  [] cpu_idle+0x8f/0xc0
Mar 25 10:52:24 cloud kernel: [55167.785272]  [] rest_init+0x53/0x60
Mar 25 10:52:24 cloud kernel: [55167.785278]  [] start_kernel+0x379/0x37f
Mar 25 10:52:24 cloud kernel: [55167.785282]  [] ? unknown_bootoption+0x0/0x1a0
Mar 25 10:52:24 cloud kernel: [55167.785286]  [] i386_start_kernel+0x67/0x6e
Mar 25 10:53:30 cloud kernel: [55233.281412] BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0]
Mar 25 10:53:30 cloud kernel: [55233.281421] Modules linked in: nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipv6 scsi_mod lp parport
Mar 25 10:53:30 cloud kernel: [55233.281444] 
Mar 25 10:53:30 cloud kernel: [55233.281449] Pid: 0, comm: swapper Not tainted (2.6.32-308-ec2 #16-Ubuntu) 
Mar 25 10:53:30 cloud kernel: [55233.281453] EIP: 0061:[] EFLAGS: 00000246 CPU: 0
Mar 25 10:53:30 cloud kernel: [55233.281457] EIP is at 0xc01013a7
Mar 25 10:53:30 cloud kernel: [55233.281460] EAX: 00000000 EBX: 00000001 ECX: 00000000 EDX: c0689f58
Mar 25 10:53:30 cloud kernel: [55233.281463] ESI: c06beb08 EDI: a8b0d3bc EBP: c0689f78 ESP: c0689f70
Mar 25 10:53:30 cloud kernel: [55233.281466]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069
Mar 25 10:53:30 cloud kernel: [55233.281474] CR0: 8005003b CR2: 09827024 CR3: 013c4000 CR4: 00002660
Mar 25 10:53:30 cloud kernel: [55233.281480] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
Mar 25 10:53:30 cloud kernel: [55233.281484] DR6: ffff0ff0 DR7: 00000400
Mar 25 10:53:30 cloud kernel: [55233.281487] Call Trace:
Mar 25 10:53:30 cloud kernel: [55233.281498]  [] ? xen_safe_halt+0x15/0x40
Mar 25 10:53:30 cloud kernel: [55233.281504]  [] xen_idle+0x29/0x80
Mar 25 10:53:30 cloud kernel: [55233.281509]  [] cpu_idle+0x8f/0xc0
Mar 25 10:53:30 cloud kernel: [55233.281516]  [] rest_init+0x53/0x60
Mar 25 10:53:30 cloud kernel: [55233.281524]  [] start_kernel+0x379/0x37f
Mar 25 10:53:30 cloud kernel: [55233.281529]  [] ? unknown_bootoption+0x0/0x1a0
Mar 25 10:53:30 cloud kernel: [55233.281535]  [] i386_start_kernel+0x67/0x6e
Mar 25 10:54:36 cloud kernel: [55298.785478] BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0]
Mar 25 10:54:36 cloud kernel: [55298.785538] Modules linked in: nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipv6 scsi_mod lp parport
Mar 25 10:54:36 cloud kernel: [55298.785551] 
Mar 25 10:54:36 cloud kernel: [55298.785554] Pid: 0, comm: swapper Not tainted (2.6.32-308-ec2 #16-Ubuntu) 
Mar 25 10:54:36 cloud kernel: [55298.785556] EIP: 0061:[] EFLAGS: 00000246 CPU: 0
Mar 25 10:54:36 cloud kernel: [55298.785560] EIP is at 0xc01013a7
Mar 25 10:54:36 cloud kernel: [55298.785561] EAX: 00000000 EBX: 00000001 ECX: 00000000 EDX: c0689f58
Mar 25 10:54:36 cloud kernel: [55298.785563] ESI: c06beb08 EDI: a8b0d3bc EBP: c0689f78 ESP: c0689f70
Mar 25 10:54:36 cloud kernel: [55298.785565]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069
Mar 25 10:54:36 cloud kernel: [55298.785570] CR0: 8005003b CR2: 08ddb00c CR3: 28e7a000 CR4: 00002660
Mar 25 10:54:36 cloud kernel: [55298.785573] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
Mar 25 10:54:36 cloud kernel: [55298.785575] DR6: ffff0ff0 DR7: 00000400
Mar 25 10:54:36 cloud kernel: [55298.785576] Call Trace:
Mar 25 10:54:36 cloud kernel: [55298.785584]  [] ? xen_safe_halt+0x15/0x40
Mar 25 10:54:36 cloud kernel: [55298.785588]  [] xen_idle+0x29/0x80
Mar 25 10:54:36 cloud kernel: [55298.785591]  [] cpu_idle+0x8f/0xc0
Mar 25 10:54:36 cloud kernel: [55298.785596]  [] rest_init+0x53/0x60
Mar 25 10:54:36 cloud kernel: [55298.785602]  [] start_kernel+0x379/0x37f
Mar 25 10:54:36 cloud kernel: [55298.785605]  [] ? unknown_bootoption+0x0/0x1a0
Mar 25 10:54:36 cloud kernel: [55298.785608]  [] i386_start_kernel+0x67/0x6e
Mar 25 10:54:57 cloud kernel: [55318.911014] INFO: task impsys-snap.sh:25538 blocked for more than 120 seconds.
Mar 25 10:54:57 cloud kernel: [55318.911056] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 25 10:54:57 cloud kernel: [55318.911061] impsys-snap.s D e8ac5e60     0 25538      1 0x00000004
Mar 25 10:54:57 cloud kernel: [55318.911065]  e8ac5e78 00000282 ec53bb64 e8ac5e60 c01d9498 00000000 00000000 00000000
Mar 25 10:54:57 cloud kernel: [55318.911072]  c06f61c0 c13088dc c06f61c0 c06f61c0 c06f61c0 c13088dc c06f61c0 c06f61c0
Mar 25 10:54:57 cloud kernel: [55318.911077]  ea1d5040 00003221 c1308630 c068d280 c012abc8 0003cd53 0004ae99 e8ac5eac
Mar 25 10:54:57 cloud kernel: [55318.911083] Call Trace:
Mar 25 10:54:57 cloud kernel: [55318.911092]  [] ? __link_path_walk+0xa28/0xc20
Mar 25 10:54:57 cloud kernel: [55318.911098]  [] ? inc_rt_group+0xf8/0x110
Mar 25 10:54:57 cloud kernel: [55318.911103]  [] ? update_curr+0x169/0x2c0
Mar 25 10:54:57 cloud kernel: [55318.911114]  [] schedule_timeout+0x175/0x250
Mar 25 10:54:57 cloud kernel: [55318.911117]  [] ? check_preempt_wakeup+0x152/0x370
Mar 25 10:54:57 cloud kernel: [55318.911120]  [] wait_for_common+0xc6/0x180
Mar 25 10:54:57 cloud kernel: [55318.911122]  [] ? default_wake_function+0x0/0x10
Mar 25 10:54:57 cloud kernel: [55318.911125]  [] wait_for_completion+0x12/0x20
Mar 25 10:54:57 cloud kernel: [55318.911128]  [] sched_migrate_task+0xe4/0x100
Mar 25 10:54:57 cloud kernel: [55318.911130]  [] sched_exec+0x3b/0x50
Mar 25 10:54:57 cloud kernel: [55318.911134]  [] do_execve+0xc4/0x360
Mar 25 10:54:57 cloud kernel: [55318.911137]  [] sys_execve+0x28/0x60
Mar 25 10:54:57 cloud kernel: [55318.911139]  [] syscall_call+0x7/0xb
Mar 25 10:54:57 cloud kernel: [55318.911142] INFO: task lesspipe:25544 blocked for more than 120 seconds.
Mar 25 10:54:57 cloud kernel: [55318.911145] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 25 10:54:57 cloud kernel: [55318.911148] lesspipe      D e9487e60     0 25544  25543 0x00000000
Mar 25 10:54:57 cloud kernel: [55318.911151]  e9487e78 00000286 e9487ed4 e9487e60 c01d9096 c01e1204 00000000 00000000
Mar 25 10:54:57 cloud kernel: [55318.911155]  c06f61c0 ec3bb4ec c06f61c0 c06f61c0 c06f61c0 ec3bb4ec c06f61c0 c06f61c0
Mar 25 10:54:57 cloud kernel: [55318.911159]  ec269900 00003221 ec3bb240 c068d280 6d9860b9 0003cd53 e9487ea0 c0106c51
Mar 25 10:54:57 cloud kernel: [55318.911163] Call Trace:
Mar 25 10:54:57 cloud kernel: [55318.911165]  [] ? __link_path_walk+0x626/0xc20
Mar 25 10:54:57 cloud kernel: [55318.911169]  [] ? dput+0x84/0x160
Mar 25 10:54:57 cloud kernel: [55318.911172]  [] ? sched_clock+0x21/0x80
Mar 25 10:54:57 cloud kernel: [55318.911175]  [] schedule_timeout+0x175/0x250
Mar 25 10:54:57 cloud kernel: [55318.911179]  [] ? sched_clock_cpu+0x14d/0x190
Mar 25 10:54:57 cloud kernel: [55318.911181]  [] ? find_idlest_group+0xa8/0x1b0
Mar 25 10:54:57 cloud kernel: [55318.911184]  [] wait_for_common+0xc6/0x180
Mar 25 10:54:57 cloud kernel: [55318.911187]  [] ? default_wake_function+0x0/0x10
Mar 25 10:54:57 cloud kernel: [55318.911189]  [] wait_for_completion+0x12/0x20
Mar 25 10:54:57 cloud kernel: [55318.911192]  [] sched_migrate_task+0xe4/0x100
Mar 25 10:54:57 cloud kernel: [55318.911194]  [] sched_exec+0x3b/0x50
Mar 25 10:54:57 cloud kernel: [55318.911197]  [] do_execve+0xc4/0x360
Mar 25 10:54:57 cloud kernel: [55318.911199]  [] sys_execve+0x28/0x60
Mar 25 10:54:57 cloud kernel: [55318.911201]  [] syscall_call+0x7/0xb
Mar 25 10:54:57 cloud kernel: [55318.911204] INFO: task 10-help-text:25560 blocked for more than 120 seconds.
Mar 25 10:54:57 cloud kernel: [55318.911206] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 25 10:54:57 cloud kernel: [55318.911209] 10-help-text  D e8995e60     0 25560  25558 0x00000000
Mar 25 10:54:57 cloud kernel: [55318.911211]  e8995e78 00000282 e8995ed4 e8995e60 c01d9096 00000000 00000000 00000000
Mar 25 10:54:57 cloud kernel: [55318.911215]  c06f61c0 e8af122c c06f61c0 c06f61c0 c06f61c0 e8af122c c06f61c0 c06f61c0
Mar 25 10:54:57 cloud kernel: [55318.911219]  ec295e40 00003229 e8af0f80 c068d280 3d76c004 0003cd5c e8995ea0 c0106c51
Mar 25 10:54:57 cloud kernel: [55318.911223] Call Trace:
Mar 25 10:54:57 cloud kernel: [55318.911225]  [] ? __link_path_walk+0x626/0xc20
Mar 25 10:54:57 cloud kernel: [55318.911228]  [] ? sched_clock+0x21/0x80
Mar 25 10:54:57 cloud kernel: [55318.911231]  [] schedule_timeout+0x175/0x250
Mar 25 10:54:57 cloud kernel: [55318.911233]  [] ? sched_clock_cpu+0x14d/0x190
Mar 25 10:54:57 cloud kernel: [55318.911236]  [] ? find_idlest_group+0xa8/0x1b0
Mar 25 10:54:57 cloud kernel: [55318.911238]  [] wait_for_common+0xc6/0x180
Mar 25 10:54:57 cloud kernel: [55318.911241]  [] ? default_wake_function+0x0/0x10
Mar 25 10:54:57 cloud kernel: [55318.911243]  [] wait_for_completion+0x12/0x20
Mar 25 10:54:57 cloud kernel: [55318.911246]  [] sched_migrate_task+0xe4/0x100
Mar 25 10:54:57 cloud kernel: [55318.911248]  [] sched_exec+0x3b/0x50
Mar 25 10:54:57 cloud kernel: [55318.911251]  [] do_execve+0xc4/0x360
Mar 25 10:54:57 cloud kernel: [55318.911253]  [] sys_execve+0x28/0x60
Mar 25 10:54:57 cloud kernel: [55318.911256]  [] syscall_call+0x7/0xb
Mar 25 10:54:57 cloud kernel: [55318.911258] INFO: task xe-update-guest:25564 blocked for more than 120 seconds.
Mar 25 10:54:57 cloud kernel: [55318.911260] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 25 10:54:57 cloud kernel: [55318.911263] xe-update-gue D e9535e60     0 25564  25563 0x00000000
Mar 25 10:54:57 cloud kernel: [55318.911265]  e9535e78 00000286 e9535ed4 e9535e60 c01d9096 c01e1204 00000000 00000000
Mar 25 10:54:57 cloud kernel: [55318.911269]  c06f61c0 ea09685c c06f61c0 c06f61c0 c06f61c0 ea09685c c06f61c0 c06f61c0
Mar 25 10:54:57 cloud kernel: [55318.911273]  ec269ac0 0000322b ea0965b0 c068d280 f58f849c 0003cd5d e9535ea0 c0106c51
Mar 25 10:54:57 cloud kernel: [55318.911277] Call Trace:
Mar 25 10:54:57 cloud kernel: [55318.911279]  [] ? __link_path_walk+0x626/0xc20
Mar 25 10:54:57 cloud kernel: [55318.911282]  [] ? dput+0x84/0x160
Mar 25 10:54:57 cloud kernel: [55318.911284]  [] ? sched_clock+0x21/0x80
Mar 25 10:54:57 cloud kernel: [55318.911287]  [] schedule_timeout+0x175/0x250
Mar 25 10:54:57 cloud kernel: [55318.911290]  [] ? sched_clock_cpu+0x14d/0x190
Mar 25 10:54:57 cloud kernel: [55318.911292]  [] ? find_idlest_group+0xa8/0x1b0
Mar 25 10:54:57 cloud kernel: [55318.911294]  [] wait_for_common+0xc6/0x180
Mar 25 10:54:57 cloud kernel: [55318.911297]  [] ? default_wake_function+0x0/0x10
Mar 25 10:54:57 cloud kernel: [55318.911299]  [] wait_for_completion+0x12/0x20
Mar 25 10:54:57 cloud kernel: [55318.911302]  [] sched_migrate_task+0xe4/0x100
Mar 25 10:54:57 cloud kernel: [55318.911305]  [] sched_exec+0x3b/0x50
Mar 25 10:54:57 cloud kernel: [55318.911307]  [] do_execve+0xc4/0x360
Mar 25 10:54:57 cloud kernel: [55318.911310]  [] sys_execve+0x28/0x60
Mar 25 10:54:57 cloud kernel: [55318.911312]  [] syscall_call+0x7/0xb
Mar 25 10:54:57 cloud kernel: [55318.911314] INFO: task apache2:25565 blocked for more than 120 seconds.
Mar 25 10:54:57 cloud kernel: [55318.911316] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 25 10:54:57 cloud kernel: [55318.911319] apache2       D e899be60     0 25565  23116 0x00000000
Mar 25 10:54:57 cloud kernel: [55318.911321]  e899be78 00000282 e899bed4 e899be60 c01d9096 c068afa0 00000000 00000000
Mar 25 10:54:57 cloud kernel: [55318.911325]  c06f61c0 e949522c c06f61c0 c06f61c0 c06f61c0 e949522c c06f61c0 c06f61c0
Mar 25 10:54:57 cloud kernel: [55318.911329]  ec31f040 0000322c e9494f80 c068d280 1fcfd913 0003cd5f e899bea0 c0106c51
Mar 25 10:54:57 cloud kernel: [55318.911333] Call Trace:
Mar 25 10:54:57 cloud kernel: [55318.911336]  [] ? __link_path_walk+0x626/0xc20
Mar 25 10:54:57 cloud kernel: [55318.911338]  [] ? sched_clock+0x21/0x80
Mar 25 10:54:57 cloud kernel: [55318.911341]  [] schedule_timeout+0x175/0x250
Mar 25 10:54:57 cloud kernel: [55318.911343]  [] ? sched_clock_cpu+0x14d/0x190
Mar 25 10:54:57 cloud kernel: [55318.911346]  [] ? find_idlest_group+0xa8/0x1b0
Mar 25 10:54:57 cloud kernel: [55318.911348]  [] wait_for_common+0xc6/0x180
Mar 25 10:54:57 cloud kernel: [55318.911351]  [] ? default_wake_function+0x0/0x10
Mar 25 10:54:57 cloud kernel: [55318.911353]  [] wait_for_completion+0x12/0x20
Mar 25 10:54:57 cloud kernel: [55318.911356]  [] sched_migrate_task+0xe4/0x100
Mar 25 10:54:57 cloud kernel: [55318.911359]  [] sched_exec+0x3b/0x50
Mar 25 10:54:57 cloud kernel: [55318.911361]  [] do_execve+0xc4/0x360
Mar 25 10:54:57 cloud kernel: [55318.911364]  [] sys_execve+0x28/0x60
Mar 25 10:54:57 cloud kernel: [55318.911366]  [] syscall_call+0x7/0xb
Mar 25 10:56:02 cloud kernel: [55383.610034] BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0]
Mar 25 10:56:02 cloud kernel: [55383.610056] Modules linked in: nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipv6 scsi_mod lp parport
Mar 25 10:56:02 cloud kernel: [55383.610073] 
Mar 25 10:56:02 cloud kernel: [55383.610077] Pid: 0, comm: swapper Not tainted (2.6.32-308-ec2 #16-Ubuntu) 
Mar 25 10:56:02 cloud kernel: [55383.610080] EIP: 0061:[] EFLAGS: 00000246 CPU: 0
Mar 25 10:56:02 cloud kernel: [55383.610084] EIP is at 0xc01013a7
Mar 25 10:56:02 cloud kernel: [55383.610086] EAX: 00000000 EBX: 00000001 ECX: 00000000 EDX: c0689f58
Mar 25 10:56:02 cloud kernel: [55383.610089] ESI: c06beb08 EDI: a8b0d3bc EBP: c0689f78 ESP: c0689f70
Mar 25 10:56:02 cloud kernel: [55383.610091]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069
Mar 25 10:56:02 cloud kernel: [55383.610097] CR0: 8005003b CR2: 098c801c CR3: 2a6d2000 CR4: 00002660
Mar 25 10:56:02 cloud kernel: [55383.610104] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
Mar 25 10:56:02 cloud kernel: [55383.610106] DR6: ffff0ff0 DR7: 00000400
Mar 25 10:56:02 cloud kernel: [55383.610107] Call Trace:
Mar 25 10:56:02 cloud kernel: [55383.610115]  [] ? xen_safe_halt+0x15/0x40
Mar 25 10:56:02 cloud kernel: [55383.610119]  [] xen_idle+0x29/0x80
Mar 25 10:56:02 cloud kernel: [55383.610122]  [] cpu_idle+0x8f/0xc0
Mar 25 10:56:02 cloud kernel: [55383.610127]  [] rest_init+0x53/0x60
Mar 25 10:56:02 cloud kernel: [55383.610133]  [] start_kernel+0x379/0x37f
Mar 25 10:56:02 cloud kernel: [55383.610136]  [] ? unknown_bootoption+0x0/0x1a0
Mar 25 10:56:02 cloud kernel: [55383.610139]  [] i386_start_kernel+0x67/0x6e

D是不間斷的等待,曾幾何時是Disk等待,但現在通常更多地與等待網路文件系統有關。這些程序計入平均負載,這可能是您的問題。

從輸出中跳出來的事情ps是倒數第二行:正在對文件系統(NAS?)進行快照,並且可能在快照期間對該文件系統的所有磁碟活動都被阻止。處理這個問題的方法包括但不限於使用cachefsover it 和更頻繁地進行快照,以便它們執行得更快;使用哪個取決於 NAS(無論如何,它們中的一些在快照期間很糟糕)、可用磁碟空間和您的需求。我會首先尋找 NAS 的支持社區(不一定是供應商支持),看看其他使用者想出了什麼技巧來最小化快照延遲。

引用自:https://serverfault.com/questions/251891