Ubuntu 負載平均峰值,但 CPU 處於空閒狀態
我們在云網路上有一台伺服器,由第三方提供。我們正在執行 Ubuntu 10.04 伺服器版。
問題發生在看似隨機的時間,每天大約 1 到 3 次。頂部的平均負載通常在 2 左右,伺服器執行良好,但在這些隨機時間,平均負載飆升至 30-35 左右,一切都停止了。無法訪問我們的網站,無法在伺服器上執行命令,無法執行任何操作。如果您尚未登錄,甚至無法登錄。
我們能夠看到高負載平均值的唯一方法是不斷執行 top 以便在問題發生時它已經在執行。似乎如果它已經在執行,它將繼續正常工作,但如果它沒有執行,則無法啟動它。當它進入這種狀態時無法執行任何命令使我們難以診斷問題……另外,我們不認為自己是伺服器專家。
對我來說看起來很奇怪的是,平均負載峰值如此之高,但處理器保持空閒狀態,並且有大量可用記憶體。同樣,我根本不是專家,但我非常基本的理解是,如果記憶體可用並且處理器沒有被最大化,那麼就不應該有程序在等待(很可能我錯了)。
當我輸入這個時,我抓住了它,因為它開始飆升並設法在一切鎖定之前執行了一些命令。輸出如下:
unname -a
Linux <server name> 2.6.32-308-ec2 #16-Ubuntu SMP Thu Sep 16 14:28:38 UTC 2010 i686 GNU/Linux
最佳
top - 10:55:08 up 15:28, 4 users, load average: 12.29, 7.01, 3.89 Tasks: 313 total, 3 running, 308 sleeping, 0 stopped, 2 zombie Cpu(s): 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 4210704k total, 2163024k used, 2047680k free, 162320k buffers Swap: 2096440k total, 0k used, 2096440k free, 1690464k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1 root 20 0 2804 1644 1204 S 0 0.0 0:00.18 init 2 root 20 0 0 0 0 S 0 0.0 0:00.00 kthreadd 3 root RT 0 0 0 0 R 0 0.0 0:00.08 migration/0 4 root 20 0 0 0 0 S 0 0.0 0:00.01 ksoftirqd/0 5 root RT 0 0 0 0 R 0 0.0 0:00.01 watchdog/0 6 root 20 0 0 0 0 S 0 0.0 0:00.06 events/0
ps axf
PID TTY STAT TIME COMMAND 2 ? S 0:00 [kthreadd] 3 ? R 0:00 \_ [migration/0] 4 ? S 0:00 \_ [ksoftirqd/0] 5 ? R 0:00 \_ [watchdog/0] 6 ? S 0:00 \_ [events/0] 7 ? S 0:00 \_ [cpuset] 8 ? S 0:00 \_ [khelper] <removed a bunch of processes to save space in the post, all had status S> 1 ? Ss 0:00 /sbin/init 245 ? S 0:00 upstart-udev-bridge --daemon 251 ? S /var/run/motd.new 25554 ? S 0:00 \_ run-parts --lsbsysinit /etc/update-motd.d 25558 ? S 0:00 \_ /bin/sh /etc/update-motd.d/10-help-text 25560 ? D 0:00 \_ /bin/sh /etc/update-motd.d/10-help-text 852 ? Ss 0:00 cron 1374 ? S 0:00 \_ CRON 1377 ? Ss 0:00 | \_ /bin/sh -c /var/www/secure/caddy2_prod/scripts/main.pl 1379 ? S 0:02 | \_ /usr/bin/perl /var/www/secure/caddy2_prod/scripts/main.pl 1385 ? Z 0:00 | \_ [check.pl] 1375 ? S 0:00 \_ CRON 1376 ? Ss 0:00 \_ /bin/sh -c /var/www/secure/caddy2_test/scripts/main.pl 1378 ? S 0:00 \_ /usr/bin/perl /var/www/secure/caddy2_test/scripts/main.pl 1384 ? Z 0:00 \_ [check.pl] 855 ? Ss 0:00 atd 868 ? Ssl 6:36 /usr/sbin/mysqld 890 ? S 0:00 /bin/bash /usr/sbin/xe-daemon -p /var/run/xe-daemon.pid 25563 ? S 0:00 \_ /bin/sh /usr/sbin/xe-update-guest-attrs --memory 25564 ? D 0:00 \_ /bin/sh /usr/sbin/xe-update-guest-attrs --memory 1161 ? Ss 0:00 /usr/lib/postfix/master 3102 ? S 0:00 \_ qmgr -l -t fifo -u 22013 ? S 0:00 \_ pickup -l -t fifo -u -c 1181 ? Ssl 3:17 /usr/sbin/asterisk -p -U asterisk 1182 ? S 0:00 \_ astcanary /var/run/asterisk/alt.asterisk.canary.tweet.tweet.tweet 1222 ? Ss 0:00 /usr/sbin/apache2 -k start 31682 ? S 0:01 \_ /usr/sbin/apache2 -k start 31716 ? S 0:01 \_ /usr/sbin/apache2 -k start 13548 ? S 0:00 \_ /usr/sbin/apache2 -k start 25593 ? S 0:00 | \_ /usr/bin/perl -w /usr/lib/cgi-bin/caddy2/patch.pl 25594 ? D 0:00 | \_ /usr/bin/perl -w /usr/lib/cgi-bin/caddy2/patch.pl 13637 ? S 0:00 \_ /usr/sbin/apache2 -k start 16061 ? S 0:00 \_ /usr/sbin/apache2 -k start 23116 ? S 0:00 \_ /usr/sbin/apache2 -k start 25565 ? D 0:00 | \_ /usr/sbin/apache2 -k start 23117 ? S 0:00 \_ /usr/sbin/apache2 -k start 23118 ? S 0:00 \_ /usr/sbin/apache2 -k start 23119 ? S 0:00 \_ /usr/sbin/apache2 -k start 23121 ? S 0:00 \_ /usr/sbin/apache2 -k start 1268 tty1 Ss+ 0:00 /sbin/getty -8 38400 tty1 1396 ? S 0:00 /usr/local/caddy2/servers/test/caddy2serverd localhost caddy2test 1981 <removed a bunch of processes like the one above to save space in the post, there were about 100, all with status S> 25590 ? S 0:00 \_ /usr/local/caddy2/servers/prod/caddy2serverd localhost caddy2prod 1991 25538 ? D 0:00 /bin/bash ./impsys-snap.sh nohup 25596 ? Ss 0:00 /sbin/getty -L hvc0 9600 linux
我確實注意到有幾個程序處於 D 狀態,我認為這表明它是一個殭屍程序。我不知道這些是否是問題的原因,或者 D 狀態的程序與 Z 狀態的程序之間有什麼區別。
如果我們認為這些是導致問題的原因,我能做些什麼呢?我不知道是什麼導致程序進入 D 狀態,因此不知道如何防止它發生。
非常感謝您的幫助。謝謝!
更新:
我查看了我們的 kern.log,發現它充滿了這樣的消息:
<removed to clean up post, further detail added below>
其中一些的時間戳似乎與伺服器鎖定我們的時間一致,所以我認為這與它有關。我們也將此資訊傳遞給了我們的伺服器提供商,但這是否表明有什麼用處?如果是這樣,這是否表明我的問題或我的伺服器提供商的問題?
更新 2:
這是似乎相關的整個時間的kern.log。10:52:24,平均負載開始上升。我在大約 10:54:02得到了**ps axf的輸出。**一兩分鐘後(可能正好兩分鐘,如果 10:56:02 表示它),系統變得無響應,我無法執行命令。這是日誌:
Mar 25 08:08:57 cloud kernel: [45483.026983] INFO: task apache2:9642 blocked for more than 120 seconds. Mar 25 08:08:57 cloud kernel: [45483.026986] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 25 08:08:57 cloud kernel: [45483.026989] apache2 D ea63de60 0 9642 9068 0x00000000 Mar 25 08:08:57 cloud kernel: [45483.026992] ea63de78 00000282 ea63ded4 ea63de60 c01d9096 00000000 00000000 00000000 Mar 25 08:08:57 cloud kernel: [45483.026996] c06f61c0 ea1a6fac c06f61c0 c06f61c0 c06f61c0 ea1a6fac c06f61c0 c06f61c0 Mar 25 08:08:57 cloud kernel: [45483.027000] c1305740 0000296f ea1a6d00 ec844d00 9a352c09 0003c456 ea63dea0 c0106c51 Mar 25 08:08:57 cloud kernel: [45483.027003] Call Trace: Mar 25 08:08:57 cloud kernel: [45483.027006] [] ? __link_path_walk+0x626/0xc20 Mar 25 08:08:57 cloud kernel: [45483.027010] [] ? sched_clock+0x21/0x80 Mar 25 08:08:57 cloud kernel: [45483.027013] [] schedule_timeout+0x175/0x250 Mar 25 08:08:57 cloud kernel: [45483.027018] [] ? sched_clock_cpu+0x14d/0x190 Mar 25 08:08:57 cloud kernel: [45483.027021] [] ? find_idlest_group+0xa8/0x1b0 Mar 25 08:08:57 cloud kernel: [45483.027023] [] wait_for_common+0xc6/0x180 Mar 25 08:08:57 cloud kernel: [45483.027026] [] ? default_wake_function+0x0/0x10 Mar 25 08:08:57 cloud kernel: [45483.027028] [] wait_for_completion+0x12/0x20 Mar 25 08:08:57 cloud kernel: [45483.027031] [] sched_migrate_task+0xe4/0x100 Mar 25 08:08:57 cloud kernel: [45483.027033] [] sched_exec+0x3b/0x50 Mar 25 08:08:57 cloud kernel: [45483.027036] [] do_execve+0xc4/0x360 Mar 25 08:08:57 cloud kernel: [45483.027038] [] sys_execve+0x28/0x60 Mar 25 08:08:57 cloud kernel: [45483.027041] [] syscall_call+0x7/0xb Mar 25 09:27:03 cloud kernel: [50344.466167] nf_conntrack version 0.5.0 (16384 buckets, 65536 max) Mar 25 09:27:03 cloud kernel: [50344.466452] CONFIG_NF_CT_ACCT is deprecated and will be removed soon. Please use Mar 25 09:27:03 cloud kernel: [50344.466454] nf_conntrack.acct=1 kernel parameter, acct=1 nf_conntrack module option or Mar 25 09:27:03 cloud kernel: [50344.466455] sysctl net.netfilter.nf_conntrack_acct=1 to enable it. Mar 25 10:52:24 cloud kernel: [55167.785176] BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0] Mar 25 10:52:24 cloud kernel: [55167.785202] Modules linked in: nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipv6 scsi_mod lp parport Mar 25 10:52:24 cloud kernel: [55167.785217] Mar 25 10:52:24 cloud kernel: [55167.785221] Pid: 0, comm: swapper Not tainted (2.6.32-308-ec2 #16-Ubuntu) Mar 25 10:52:24 cloud kernel: [55167.785224] EIP: 0061:[] EFLAGS: 00000246 CPU: 0 Mar 25 10:52:24 cloud kernel: [55167.785228] EIP is at 0xc01013a7 Mar 25 10:52:24 cloud kernel: [55167.785230] EAX: 00000000 EBX: 00000001 ECX: 00000000 EDX: c0689f58 Mar 25 10:52:24 cloud kernel: [55167.785232] ESI: c06beb08 EDI: a8b0d3bc EBP: c0689f78 ESP: c0689f70 Mar 25 10:52:24 cloud kernel: [55167.785235] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069 Mar 25 10:52:24 cloud kernel: [55167.785242] CR0: 8005003b CR2: b6620000 CR3: 2a65e000 CR4: 00002660 Mar 25 10:52:24 cloud kernel: [55167.785247] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 Mar 25 10:52:24 cloud kernel: [55167.785250] DR6: ffff0ff0 DR7: 00000400 Mar 25 10:52:24 cloud kernel: [55167.785252] Call Trace: Mar 25 10:52:24 cloud kernel: [55167.785259] [] ? xen_safe_halt+0x15/0x40 Mar 25 10:52:24 cloud kernel: [55167.785264] [] xen_idle+0x29/0x80 Mar 25 10:52:24 cloud kernel: [55167.785267] [] cpu_idle+0x8f/0xc0 Mar 25 10:52:24 cloud kernel: [55167.785272] [] rest_init+0x53/0x60 Mar 25 10:52:24 cloud kernel: [55167.785278] [] start_kernel+0x379/0x37f Mar 25 10:52:24 cloud kernel: [55167.785282] [] ? unknown_bootoption+0x0/0x1a0 Mar 25 10:52:24 cloud kernel: [55167.785286] [] i386_start_kernel+0x67/0x6e Mar 25 10:53:30 cloud kernel: [55233.281412] BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0] Mar 25 10:53:30 cloud kernel: [55233.281421] Modules linked in: nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipv6 scsi_mod lp parport Mar 25 10:53:30 cloud kernel: [55233.281444] Mar 25 10:53:30 cloud kernel: [55233.281449] Pid: 0, comm: swapper Not tainted (2.6.32-308-ec2 #16-Ubuntu) Mar 25 10:53:30 cloud kernel: [55233.281453] EIP: 0061:[] EFLAGS: 00000246 CPU: 0 Mar 25 10:53:30 cloud kernel: [55233.281457] EIP is at 0xc01013a7 Mar 25 10:53:30 cloud kernel: [55233.281460] EAX: 00000000 EBX: 00000001 ECX: 00000000 EDX: c0689f58 Mar 25 10:53:30 cloud kernel: [55233.281463] ESI: c06beb08 EDI: a8b0d3bc EBP: c0689f78 ESP: c0689f70 Mar 25 10:53:30 cloud kernel: [55233.281466] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069 Mar 25 10:53:30 cloud kernel: [55233.281474] CR0: 8005003b CR2: 09827024 CR3: 013c4000 CR4: 00002660 Mar 25 10:53:30 cloud kernel: [55233.281480] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 Mar 25 10:53:30 cloud kernel: [55233.281484] DR6: ffff0ff0 DR7: 00000400 Mar 25 10:53:30 cloud kernel: [55233.281487] Call Trace: Mar 25 10:53:30 cloud kernel: [55233.281498] [] ? xen_safe_halt+0x15/0x40 Mar 25 10:53:30 cloud kernel: [55233.281504] [] xen_idle+0x29/0x80 Mar 25 10:53:30 cloud kernel: [55233.281509] [] cpu_idle+0x8f/0xc0 Mar 25 10:53:30 cloud kernel: [55233.281516] [] rest_init+0x53/0x60 Mar 25 10:53:30 cloud kernel: [55233.281524] [] start_kernel+0x379/0x37f Mar 25 10:53:30 cloud kernel: [55233.281529] [] ? unknown_bootoption+0x0/0x1a0 Mar 25 10:53:30 cloud kernel: [55233.281535] [] i386_start_kernel+0x67/0x6e Mar 25 10:54:36 cloud kernel: [55298.785478] BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0] Mar 25 10:54:36 cloud kernel: [55298.785538] Modules linked in: nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipv6 scsi_mod lp parport Mar 25 10:54:36 cloud kernel: [55298.785551] Mar 25 10:54:36 cloud kernel: [55298.785554] Pid: 0, comm: swapper Not tainted (2.6.32-308-ec2 #16-Ubuntu) Mar 25 10:54:36 cloud kernel: [55298.785556] EIP: 0061:[] EFLAGS: 00000246 CPU: 0 Mar 25 10:54:36 cloud kernel: [55298.785560] EIP is at 0xc01013a7 Mar 25 10:54:36 cloud kernel: [55298.785561] EAX: 00000000 EBX: 00000001 ECX: 00000000 EDX: c0689f58 Mar 25 10:54:36 cloud kernel: [55298.785563] ESI: c06beb08 EDI: a8b0d3bc EBP: c0689f78 ESP: c0689f70 Mar 25 10:54:36 cloud kernel: [55298.785565] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069 Mar 25 10:54:36 cloud kernel: [55298.785570] CR0: 8005003b CR2: 08ddb00c CR3: 28e7a000 CR4: 00002660 Mar 25 10:54:36 cloud kernel: [55298.785573] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 Mar 25 10:54:36 cloud kernel: [55298.785575] DR6: ffff0ff0 DR7: 00000400 Mar 25 10:54:36 cloud kernel: [55298.785576] Call Trace: Mar 25 10:54:36 cloud kernel: [55298.785584] [] ? xen_safe_halt+0x15/0x40 Mar 25 10:54:36 cloud kernel: [55298.785588] [] xen_idle+0x29/0x80 Mar 25 10:54:36 cloud kernel: [55298.785591] [] cpu_idle+0x8f/0xc0 Mar 25 10:54:36 cloud kernel: [55298.785596] [] rest_init+0x53/0x60 Mar 25 10:54:36 cloud kernel: [55298.785602] [] start_kernel+0x379/0x37f Mar 25 10:54:36 cloud kernel: [55298.785605] [] ? unknown_bootoption+0x0/0x1a0 Mar 25 10:54:36 cloud kernel: [55298.785608] [] i386_start_kernel+0x67/0x6e Mar 25 10:54:57 cloud kernel: [55318.911014] INFO: task impsys-snap.sh:25538 blocked for more than 120 seconds. Mar 25 10:54:57 cloud kernel: [55318.911056] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 25 10:54:57 cloud kernel: [55318.911061] impsys-snap.s D e8ac5e60 0 25538 1 0x00000004 Mar 25 10:54:57 cloud kernel: [55318.911065] e8ac5e78 00000282 ec53bb64 e8ac5e60 c01d9498 00000000 00000000 00000000 Mar 25 10:54:57 cloud kernel: [55318.911072] c06f61c0 c13088dc c06f61c0 c06f61c0 c06f61c0 c13088dc c06f61c0 c06f61c0 Mar 25 10:54:57 cloud kernel: [55318.911077] ea1d5040 00003221 c1308630 c068d280 c012abc8 0003cd53 0004ae99 e8ac5eac Mar 25 10:54:57 cloud kernel: [55318.911083] Call Trace: Mar 25 10:54:57 cloud kernel: [55318.911092] [] ? __link_path_walk+0xa28/0xc20 Mar 25 10:54:57 cloud kernel: [55318.911098] [] ? inc_rt_group+0xf8/0x110 Mar 25 10:54:57 cloud kernel: [55318.911103] [] ? update_curr+0x169/0x2c0 Mar 25 10:54:57 cloud kernel: [55318.911114] [] schedule_timeout+0x175/0x250 Mar 25 10:54:57 cloud kernel: [55318.911117] [] ? check_preempt_wakeup+0x152/0x370 Mar 25 10:54:57 cloud kernel: [55318.911120] [] wait_for_common+0xc6/0x180 Mar 25 10:54:57 cloud kernel: [55318.911122] [] ? default_wake_function+0x0/0x10 Mar 25 10:54:57 cloud kernel: [55318.911125] [] wait_for_completion+0x12/0x20 Mar 25 10:54:57 cloud kernel: [55318.911128] [] sched_migrate_task+0xe4/0x100 Mar 25 10:54:57 cloud kernel: [55318.911130] [] sched_exec+0x3b/0x50 Mar 25 10:54:57 cloud kernel: [55318.911134] [] do_execve+0xc4/0x360 Mar 25 10:54:57 cloud kernel: [55318.911137] [] sys_execve+0x28/0x60 Mar 25 10:54:57 cloud kernel: [55318.911139] [] syscall_call+0x7/0xb Mar 25 10:54:57 cloud kernel: [55318.911142] INFO: task lesspipe:25544 blocked for more than 120 seconds. Mar 25 10:54:57 cloud kernel: [55318.911145] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 25 10:54:57 cloud kernel: [55318.911148] lesspipe D e9487e60 0 25544 25543 0x00000000 Mar 25 10:54:57 cloud kernel: [55318.911151] e9487e78 00000286 e9487ed4 e9487e60 c01d9096 c01e1204 00000000 00000000 Mar 25 10:54:57 cloud kernel: [55318.911155] c06f61c0 ec3bb4ec c06f61c0 c06f61c0 c06f61c0 ec3bb4ec c06f61c0 c06f61c0 Mar 25 10:54:57 cloud kernel: [55318.911159] ec269900 00003221 ec3bb240 c068d280 6d9860b9 0003cd53 e9487ea0 c0106c51 Mar 25 10:54:57 cloud kernel: [55318.911163] Call Trace: Mar 25 10:54:57 cloud kernel: [55318.911165] [] ? __link_path_walk+0x626/0xc20 Mar 25 10:54:57 cloud kernel: [55318.911169] [] ? dput+0x84/0x160 Mar 25 10:54:57 cloud kernel: [55318.911172] [] ? sched_clock+0x21/0x80 Mar 25 10:54:57 cloud kernel: [55318.911175] [] schedule_timeout+0x175/0x250 Mar 25 10:54:57 cloud kernel: [55318.911179] [] ? sched_clock_cpu+0x14d/0x190 Mar 25 10:54:57 cloud kernel: [55318.911181] [] ? find_idlest_group+0xa8/0x1b0 Mar 25 10:54:57 cloud kernel: [55318.911184] [] wait_for_common+0xc6/0x180 Mar 25 10:54:57 cloud kernel: [55318.911187] [] ? default_wake_function+0x0/0x10 Mar 25 10:54:57 cloud kernel: [55318.911189] [] wait_for_completion+0x12/0x20 Mar 25 10:54:57 cloud kernel: [55318.911192] [] sched_migrate_task+0xe4/0x100 Mar 25 10:54:57 cloud kernel: [55318.911194] [] sched_exec+0x3b/0x50 Mar 25 10:54:57 cloud kernel: [55318.911197] [] do_execve+0xc4/0x360 Mar 25 10:54:57 cloud kernel: [55318.911199] [] sys_execve+0x28/0x60 Mar 25 10:54:57 cloud kernel: [55318.911201] [] syscall_call+0x7/0xb Mar 25 10:54:57 cloud kernel: [55318.911204] INFO: task 10-help-text:25560 blocked for more than 120 seconds. Mar 25 10:54:57 cloud kernel: [55318.911206] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 25 10:54:57 cloud kernel: [55318.911209] 10-help-text D e8995e60 0 25560 25558 0x00000000 Mar 25 10:54:57 cloud kernel: [55318.911211] e8995e78 00000282 e8995ed4 e8995e60 c01d9096 00000000 00000000 00000000 Mar 25 10:54:57 cloud kernel: [55318.911215] c06f61c0 e8af122c c06f61c0 c06f61c0 c06f61c0 e8af122c c06f61c0 c06f61c0 Mar 25 10:54:57 cloud kernel: [55318.911219] ec295e40 00003229 e8af0f80 c068d280 3d76c004 0003cd5c e8995ea0 c0106c51 Mar 25 10:54:57 cloud kernel: [55318.911223] Call Trace: Mar 25 10:54:57 cloud kernel: [55318.911225] [] ? __link_path_walk+0x626/0xc20 Mar 25 10:54:57 cloud kernel: [55318.911228] [] ? sched_clock+0x21/0x80 Mar 25 10:54:57 cloud kernel: [55318.911231] [] schedule_timeout+0x175/0x250 Mar 25 10:54:57 cloud kernel: [55318.911233] [] ? sched_clock_cpu+0x14d/0x190 Mar 25 10:54:57 cloud kernel: [55318.911236] [] ? find_idlest_group+0xa8/0x1b0 Mar 25 10:54:57 cloud kernel: [55318.911238] [] wait_for_common+0xc6/0x180 Mar 25 10:54:57 cloud kernel: [55318.911241] [] ? default_wake_function+0x0/0x10 Mar 25 10:54:57 cloud kernel: [55318.911243] [] wait_for_completion+0x12/0x20 Mar 25 10:54:57 cloud kernel: [55318.911246] [] sched_migrate_task+0xe4/0x100 Mar 25 10:54:57 cloud kernel: [55318.911248] [] sched_exec+0x3b/0x50 Mar 25 10:54:57 cloud kernel: [55318.911251] [] do_execve+0xc4/0x360 Mar 25 10:54:57 cloud kernel: [55318.911253] [] sys_execve+0x28/0x60 Mar 25 10:54:57 cloud kernel: [55318.911256] [] syscall_call+0x7/0xb Mar 25 10:54:57 cloud kernel: [55318.911258] INFO: task xe-update-guest:25564 blocked for more than 120 seconds. Mar 25 10:54:57 cloud kernel: [55318.911260] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 25 10:54:57 cloud kernel: [55318.911263] xe-update-gue D e9535e60 0 25564 25563 0x00000000 Mar 25 10:54:57 cloud kernel: [55318.911265] e9535e78 00000286 e9535ed4 e9535e60 c01d9096 c01e1204 00000000 00000000 Mar 25 10:54:57 cloud kernel: [55318.911269] c06f61c0 ea09685c c06f61c0 c06f61c0 c06f61c0 ea09685c c06f61c0 c06f61c0 Mar 25 10:54:57 cloud kernel: [55318.911273] ec269ac0 0000322b ea0965b0 c068d280 f58f849c 0003cd5d e9535ea0 c0106c51 Mar 25 10:54:57 cloud kernel: [55318.911277] Call Trace: Mar 25 10:54:57 cloud kernel: [55318.911279] [] ? __link_path_walk+0x626/0xc20 Mar 25 10:54:57 cloud kernel: [55318.911282] [] ? dput+0x84/0x160 Mar 25 10:54:57 cloud kernel: [55318.911284] [] ? sched_clock+0x21/0x80 Mar 25 10:54:57 cloud kernel: [55318.911287] [] schedule_timeout+0x175/0x250 Mar 25 10:54:57 cloud kernel: [55318.911290] [] ? sched_clock_cpu+0x14d/0x190 Mar 25 10:54:57 cloud kernel: [55318.911292] [] ? find_idlest_group+0xa8/0x1b0 Mar 25 10:54:57 cloud kernel: [55318.911294] [] wait_for_common+0xc6/0x180 Mar 25 10:54:57 cloud kernel: [55318.911297] [] ? default_wake_function+0x0/0x10 Mar 25 10:54:57 cloud kernel: [55318.911299] [] wait_for_completion+0x12/0x20 Mar 25 10:54:57 cloud kernel: [55318.911302] [] sched_migrate_task+0xe4/0x100 Mar 25 10:54:57 cloud kernel: [55318.911305] [] sched_exec+0x3b/0x50 Mar 25 10:54:57 cloud kernel: [55318.911307] [] do_execve+0xc4/0x360 Mar 25 10:54:57 cloud kernel: [55318.911310] [] sys_execve+0x28/0x60 Mar 25 10:54:57 cloud kernel: [55318.911312] [] syscall_call+0x7/0xb Mar 25 10:54:57 cloud kernel: [55318.911314] INFO: task apache2:25565 blocked for more than 120 seconds. Mar 25 10:54:57 cloud kernel: [55318.911316] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 25 10:54:57 cloud kernel: [55318.911319] apache2 D e899be60 0 25565 23116 0x00000000 Mar 25 10:54:57 cloud kernel: [55318.911321] e899be78 00000282 e899bed4 e899be60 c01d9096 c068afa0 00000000 00000000 Mar 25 10:54:57 cloud kernel: [55318.911325] c06f61c0 e949522c c06f61c0 c06f61c0 c06f61c0 e949522c c06f61c0 c06f61c0 Mar 25 10:54:57 cloud kernel: [55318.911329] ec31f040 0000322c e9494f80 c068d280 1fcfd913 0003cd5f e899bea0 c0106c51 Mar 25 10:54:57 cloud kernel: [55318.911333] Call Trace: Mar 25 10:54:57 cloud kernel: [55318.911336] [] ? __link_path_walk+0x626/0xc20 Mar 25 10:54:57 cloud kernel: [55318.911338] [] ? sched_clock+0x21/0x80 Mar 25 10:54:57 cloud kernel: [55318.911341] [] schedule_timeout+0x175/0x250 Mar 25 10:54:57 cloud kernel: [55318.911343] [] ? sched_clock_cpu+0x14d/0x190 Mar 25 10:54:57 cloud kernel: [55318.911346] [] ? find_idlest_group+0xa8/0x1b0 Mar 25 10:54:57 cloud kernel: [55318.911348] [] wait_for_common+0xc6/0x180 Mar 25 10:54:57 cloud kernel: [55318.911351] [] ? default_wake_function+0x0/0x10 Mar 25 10:54:57 cloud kernel: [55318.911353] [] wait_for_completion+0x12/0x20 Mar 25 10:54:57 cloud kernel: [55318.911356] [] sched_migrate_task+0xe4/0x100 Mar 25 10:54:57 cloud kernel: [55318.911359] [] sched_exec+0x3b/0x50 Mar 25 10:54:57 cloud kernel: [55318.911361] [] do_execve+0xc4/0x360 Mar 25 10:54:57 cloud kernel: [55318.911364] [] sys_execve+0x28/0x60 Mar 25 10:54:57 cloud kernel: [55318.911366] [] syscall_call+0x7/0xb Mar 25 10:56:02 cloud kernel: [55383.610034] BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0] Mar 25 10:56:02 cloud kernel: [55383.610056] Modules linked in: nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipv6 scsi_mod lp parport Mar 25 10:56:02 cloud kernel: [55383.610073] Mar 25 10:56:02 cloud kernel: [55383.610077] Pid: 0, comm: swapper Not tainted (2.6.32-308-ec2 #16-Ubuntu) Mar 25 10:56:02 cloud kernel: [55383.610080] EIP: 0061:[] EFLAGS: 00000246 CPU: 0 Mar 25 10:56:02 cloud kernel: [55383.610084] EIP is at 0xc01013a7 Mar 25 10:56:02 cloud kernel: [55383.610086] EAX: 00000000 EBX: 00000001 ECX: 00000000 EDX: c0689f58 Mar 25 10:56:02 cloud kernel: [55383.610089] ESI: c06beb08 EDI: a8b0d3bc EBP: c0689f78 ESP: c0689f70 Mar 25 10:56:02 cloud kernel: [55383.610091] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069 Mar 25 10:56:02 cloud kernel: [55383.610097] CR0: 8005003b CR2: 098c801c CR3: 2a6d2000 CR4: 00002660 Mar 25 10:56:02 cloud kernel: [55383.610104] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 Mar 25 10:56:02 cloud kernel: [55383.610106] DR6: ffff0ff0 DR7: 00000400 Mar 25 10:56:02 cloud kernel: [55383.610107] Call Trace: Mar 25 10:56:02 cloud kernel: [55383.610115] [] ? xen_safe_halt+0x15/0x40 Mar 25 10:56:02 cloud kernel: [55383.610119] [] xen_idle+0x29/0x80 Mar 25 10:56:02 cloud kernel: [55383.610122] [] cpu_idle+0x8f/0xc0 Mar 25 10:56:02 cloud kernel: [55383.610127] [] rest_init+0x53/0x60 Mar 25 10:56:02 cloud kernel: [55383.610133] [] start_kernel+0x379/0x37f Mar 25 10:56:02 cloud kernel: [55383.610136] [] ? unknown_bootoption+0x0/0x1a0 Mar 25 10:56:02 cloud kernel: [55383.610139] [] i386_start_kernel+0x67/0x6e
D
是不間斷的等待,曾幾何時是D
isk等待,但現在通常更多地與等待網路文件系統有關。這些程序計入平均負載,這可能是您的問題。從輸出中跳出來的事情
ps
是倒數第二行:正在對文件系統(NAS?)進行快照,並且可能在快照期間對該文件系統的所有磁碟活動都被阻止。處理這個問題的方法包括但不限於使用cachefs
over it 和更頻繁地進行快照,以便它們執行得更快;使用哪個取決於 NAS(無論如何,它們中的一些在快照期間很糟糕)、可用磁碟空間和您的需求。我會首先尋找 NAS 的支持社區(不一定是供應商支持),看看其他使用者想出了什麼技巧來最小化快照延遲。