apache2 掛在“發送回复”/“優雅完成”上
在我的 Ubuntu 10.04.4 伺服器上,apache2 2.2.14 會耗盡所有記憶體,直到沒有更多記憶體為止,因為它沒有關閉舊實例。
伺服器提供一個小網頁,每 3 秒左右收到一個請求(來自客戶端、Google機器人、腳本等)。
如果我查看程序列表,有很多 apache2 實例:
附言 | grep apache2
www-data 1324 0.0 6.4 580688 135600 ? Sl 07:44 0:00 /usr/sbin/apache2 -k start www-data 1407 0.0 6.2 518800 131748 ? Sl 07:49 0:00 /usr/sbin/apache2 -k start www-data 1464 0.0 6.0 518800 127740 ? S 07:52 0:00 /usr/sbin/apache2 -k start www-data 1492 0.0 6.1 580688 129852 ? S 07:56 0:00 /usr/sbin/apache2 -k start www-data 2043 0.0 6.2 501384 130864 ? Sl 08:46 0:00 /usr/sbin/apache2 -k start www-data 3207 0.0 6.1 501384 128260 ? S 08:57 0:00 /usr/sbin/apache2 -k start www-data 5711 0.0 6.2 509580 130888 ? Sl 10:48 0:00 /usr/sbin/apache2 -k start www-data 5779 0.0 6.0 509580 126120 ? S 10:52 0:00 /usr/sbin/apache2 -k start www-data 5796 0.0 6.1 491140 128660 ? Sl 10:54 0:00 /usr/sbin/apache2 -k start www-data 5878 0.0 6.2 510604 130144 ? Sl 10:59 0:00 /usr/sbin/apache2 -k start www-data 7854 0.0 6.1 492556 129712 ? Sl 11:55 0:00 /usr/sbin/apache2 -k start www-data 7925 0.0 5.9 492556 125592 ? S 12:00 0:00 /usr/sbin/apache2 -k start www-data 8018 0.0 6.3 519204 132132 ? Sl 12:07 0:00 /usr/sbin/apache2 -k start www-data 8019 0.0 6.5 512448 136388 ? Sl 12:07 0:00 /usr/sbin/apache2 -k start www-data 8020 0.0 5.8 484340 122532 ? S 12:07 0:00 /usr/sbin/apache2 -k start root 8159 0.0 0.0 7644 908 ttyp0 R+ 12:16 0:00 grep --color=auto apache2 root 23949 0.0 6.3 483316 132176 ? Ss 00:16 0:15 /usr/sbin/apache2 -k start root 23951 0.0 0.0 16956 512 ? Ss 00:16 0:00 /usr/sbin/htcacheclean -n -d120 -i -p/var/cache/apache2/mod_disk_cache -l300M www-data 24064 0.0 6.6 590932 138868 ? Sl 00:20 0:01 /usr/sbin/apache2 -k start www-data 24065 0.0 6.5 533592 137812 ? Sl 00:20 0:00 /usr/sbin/apache2 -k start www-data 24068 0.0 6.3 500408 132372 ? Sl 00:20 0:00 /usr/sbin/apache2 -k start www-data 24357 0.0 6.5 525444 138352 ? Sl 00:47 0:00 /usr/sbin/apache2 -k start www-data 24389 0.0 6.3 533592 133604 ? S 00:50 0:00 /usr/sbin/apache2 -k start www-data 24527 0.0 6.5 525396 137028 ? Sl 01:06 0:00 /usr/sbin/apache2 -k start www-data 26006 0.0 6.3 525396 132996 ? S 01:48 0:00 /usr/sbin/apache2 -k start www-data 26023 0.0 6.3 525396 133952 ? S 01:50 0:00 /usr/sbin/apache2 -k start www-data 26304 0.0 6.6 528468 140504 ? Sl 02:18 0:00 /usr/sbin/apache2 -k start www-data 26305 0.0 6.4 523348 135396 ? Sl 02:18 0:00 /usr/sbin/apache2 -k start www-data 27902 0.0 6.4 590932 134240 ? S 02:49 0:00 /usr/sbin/apache2 -k start www-data 27912 0.0 6.2 523348 130988 ? S 02:51 0:00 /usr/sbin/apache2 -k start www-data 27942 0.0 6.4 523348 134744 ? Sl 02:54 0:00 /usr/sbin/apache2 -k start www-data 28508 0.0 6.3 523348 132292 ? S 03:51 0:00 /usr/sbin/apache2 -k start www-data 28518 0.0 6.4 528468 134736 ? S 03:52 0:00 /usr/sbin/apache2 -k start www-data 29919 0.0 6.3 518800 132240 ? Sl 04:28 0:00 /usr/sbin/apache2 -k start www-data 30164 0.0 6.1 518800 128652 ? S 04:52 0:00 /usr/sbin/apache2 -k start www-data 30168 0.0 6.1 518800 129576 ? S 04:53 0:00 /usr/sbin/apache2 -k start www-data 30418 0.0 6.7 528468 140668 ? Sl 05:17 0:00 /usr/sbin/apache2 -k start www-data 30463 0.0 6.3 518800 132128 ? Sl 05:21 0:00 /usr/sbin/apache2 -k start www-data 31799 0.0 6.1 518800 129268 ? S 05:52 0:00 /usr/sbin/apache2 -k start www-data 31813 0.0 6.4 528468 134808 ? S 05:54 0:00 /usr/sbin/apache2 -k start www-data 31881 0.0 6.4 515152 135252 ? Sl 06:00 0:00 /usr/sbin/apache2 -k start www-data 32005 0.0 6.2 517776 131220 ? Sl 06:12 0:00 /usr/sbin/apache2 -k start www-data 32405 0.0 6.0 517776 127832 ? S 06:52 0:00 /usr/sbin/apache2 -k start www-data 32435 0.0 6.1 515152 129456 ? S 06:55 0:00 /usr/sbin/apache2 -k start
**Apache 伺服器狀態 (ExtendedStatus OFF)**給出:
Server Version: Apache/2.2.14 (Ubuntu) mod_ssl/2.2.14 OpenSSL/0.9.8k mod_perl/2.0.4 Perl/v5.10.1 Server Built: Mar 8 2013 16:46:35 Current Time: Wednesday, 05-Jun-2013 12:17:22 CEST Restart Time: Wednesday, 05-Jun-2013 00:16:19 CEST Parent Server Generation: 2 Server uptime: 12 hours 1 minute 3 seconds 19 requests currently being processed, 2 idle workers GGGGGGGGGGGGGG.GWG_G_..W........................................ ................................................................ ................................................................ ................................................................ Scoreboard Key: "_" Waiting for Connection, "S" Starting up, "R" Reading Request, "W" Sending Reply, "K" Keepalive (read), "D" DNS Lookup, "C" Closing connection, "L" Logging, "G" Gracefully finishing, "I" Idle cleanup of worker, "." Open slot with no current process PID Key: 24527 in state: G , 27942 in state: G , 5796 in state: G 26304 in state: G , 26305 in state: G , 30418 in state: G 24064 in state: G , 24065 in state: G , 31881 in state: G 32005 in state: G , 24068 in state: G , 29919 in state: G 1324 in state: G , 30463 in state: G , 5711 in state: G 8018 in state: W , 1407 in state: G , 8019 in state: _ 2043 in state: G , 8020 in state: _ , 7854 in state: W To obtain a full report with current status information you need to use the ExtendedStatus On directive. SSL/TLS Session Cache Status: cache type: SHMCB, shared memory: 512000 bytes, current sessions: 0 subcaches: 32, indexes per subcache: 133 index usage: 0%, cache usage: 0% total sessions stored since starting: 4 total sessions expired since starting: 4 total (pre-expiry) sessions scrolled out of the cache: 0 total retrieves since starting: 0 hit, 0 miss total removes since starting: 0 hit, 0 miss
所以你可以看到大部分程序都掛在
G
狀態(Gracefullyfinishing)。更新 1: 使用帶有“ ExtendedStatus On ”的 apache 伺服器狀態,我們看到程序卡在
W
狀態(發送回复):(我用 ipX.domain.com 替換了真實 IP)Server Version: Apache/2.2.14 (Ubuntu) mod_ssl/2.2.14 OpenSSL/0.9.8k mod_perl/2.0.4 Perl/v5.10.1 Server Built: Mar 8 2013 16:46:35 Current Time: Wednesday, 05-Jun-2013 18:49:48 CEST Restart Time: Wednesday, 05-Jun-2013 14:31:17 CEST Parent Server Generation: 0 Server uptime: 4 hours 18 minutes 31 seconds Total accesses: 3129 - Total Traffic: 111.9 MB CPU Usage: u2.29 s.18 cu0 cs0 - .0159% CPU load .202 requests/sec - 7.4 kB/second - 36.6 kB/request 6 requests currently being processed, 2 idle workers WWW.W._WW..._................................................... ................................................................ ................................................................ ................................................................ Scoreboard Key: "_" Waiting for Connection, "S" Starting up, "R" Reading Request, "W" Sending Reply, "K" Keepalive (read), "D" DNS Lookup, "C" Closing connection, "L" Logging, "G" Gracefully finishing, "I" Idle cleanup of worker, "." Open slot with no current process Srv PID Acc M CPU SS Req Conn Child Slot Client VHost Request 0-0 7926 1/28/390 W 0.00 2913 0 0.2 1.15 11.63 ip1.domain.com www.domain.com GET /de HTTP/1.1 1-0 5432 0/10/183 W 0.00 10005 0 0.0 0.59 7.51 ip6.domain.com def.domain.com HEAD / HTTP/1.1 2-0 7878 1/44/403 W 0.00 2710 0 0.2 2.29 13.22 ip1.domain.com www.domain.com GET /de HTTP/1.1 3-0 - 0/0/467 . 0.00 203 0 0.0 0.00 19.51 ip2.domain.com def.domain.com NULL 4-0 5854 1/58/223 W 0.01 6137 0 0.2 1.74 12.29 ip1.domain.com www.domain.com GET /de HTTP/1.1 5-0 - 0/0/359 . 1.40 204 0 0.0 0.00 10.42 ip2.domain.com def.domain.com NULL 6-0 9534 0/60/244 _ 0.63 69 0 0.0 1.60 6.70 ip1.domain.com def.domain.com NULL 7-0 5861 1/32/84 W 0.00 6746 0 0.2 1.55 3.18 ip1.domain.com www.domain.com GET /de HTTP/1.1 8-0 11401 98/106/440 W 0.17 0 0 189.2 0.38 15.68 ip5.domain.com def.domain.com GET /server-status HTTP/1.1 9-0 - 0/0/152 . 0.00 207 1608 0.0 0.00 8.06 ip2.domain.com def.domain.com NULL 10-0 - 0/0/31 . 0.00 206 1639 0.0 0.00 0.79 ip2.domain.com def.domain.com NULL 11-0 - 0/0/99 . 0.24 205 0 0.0 0.00 1.83 ip2.domain.com def.domain.com NULL 12-0 11404 0/4/24 _ 0.02 104 18 0.0 0.00 0.38 ip3.domain.com www.domain.com GET /robots.txt HTTP/1.1 13-0 - 0/0/8 . 0.00 208 0 0.0 0.00 0.06 ip1.domain.com www.domain.com GET /de HTTP/1.1 14-0 - 0/0/16 . 0.00 920 47 0.0 0.00 0.59 ip4.domain.com def.domain.com NULL 15-0 - 0/0/2 . 0.00 5661 3905 0.0 0.00 0.08 ip2.domain.com def.domain.com NULL 16-0 - 0/0/3 . 0.00 5686 0 0.0 0.00 0.00 ip2.domain.com def.domain.com NULL 17-0 - 0/0/1 . 0.00 5694 1 0.0 0.00 0.00 127.0.0.1 def.domain.com GET / HTTP/1.0 Srv Child Server number - generation PID OS process ID Acc Number of accesses this connection / this child / this slot M Mode of operation CPU CPU usage, number of seconds SS Seconds since beginning of most recent request Req Milliseconds required to process most recent request Conn Kilobytes transferred this connection Child Megabytes transferred this child Slot Total megabytes transferred this slot SSL/TLS Session Cache Status: cache type: SHMCB, shared memory: 512000 bytes, current sessions: 0 subcaches: 32, indexes per subcache: 133 index usage: 0%, cache usage: 0% total sessions stored since starting: 37 total sessions expired since starting: 37 total (pre-expiry) sessions scrolled out of the cache: 0 total retrieves since starting: 8 hit, 0 miss total removes since starting: 0 hit, 0 miss
正如您在此處看到的,有一些程序
SS
(自上次請求以來的時間)值非常高,這意味著它們被卡住了。這些請求針對不同的虛擬主機,也來自不同的客戶端。如果我查看 acces.log 文件並蒐索此類程序的條目(我將 PID 添加到日誌中),他們服務的最後一個請求大多數時候是由 apache 直接製作的簡單圖像文件。
apache2 配置的重要部分是:
<IfModule mpm_prefork_module> StartServers 1 MinSpareServers 1 MaxSpareServers 3 MaxClients 30 MaxRequestsPerChild 0 </IfModule> KeepAlive On MaxKeepAliveRequests 100 KeepAliveTimeout 15
更新 3: 所有啟用的 apache2 模組:
root@ser:~# apache2ctl -M Loaded Modules: core_module (static) log_config_module (static) logio_module (static) mpm_prefork_module (static) http_module (static) so_module (static) alias_module (shared) auth_basic_module (shared) authn_file_module (shared) authz_default_module (shared) authz_groupfile_module (shared) authz_host_module (shared) authz_user_module (shared) autoindex_module (shared) cache_module (shared) cgi_module (shared) deflate_module (shared) dir_module (shared) disk_cache_module (shared) env_module (shared) expires_module (shared) filter_module (shared) headers_module (shared) mem_cache_module (shared) mime_module (shared) negotiation_module (shared) version_module (shared) pagespeed_module (shared) perl_module (shared) php5_module (shared) proxy_module (shared) proxy_http_module (shared) reqtimeout_module (shared) rewrite_module (shared) setenvif_module (shared) ssl_module (shared) status_module (shared) unique_id_module (shared) Syntax OK
更新 2: 這是一個掛起程序的堆棧跟踪。卡住的所有堆棧跟踪都完全相同:
5861:/usr/sbin/apache2-kstart #0 0x00002ad420a4448d in /lib/libpthread-2.11.1.so: waitpid #1 0x00002ad4258dc507 in /usr/lib/libperl.so.5.10.1: Perl_wait4pid #2 0x00002ad4258dc760 in /usr/lib/libperl.so.5.10.1: Perl_my_pclose #3 0x00002ad425949b1d in /usr/lib/libperl.so.5.10.1: Perl_io_close #4 0x00002ad425949c89 in /usr/lib/libperl.so.5.10.1: Perl_do_close #5 0x00002ad42593d922 in /usr/lib/libperl.so.5.10.1: Perl_pp_close #6 0x00002ad4258f3346 in /usr/lib/libperl.so.5.10.1: Perl_runops_standard #7 0x00002ad42589a8cf in /usr/lib/libperl.so.5.10.1: Perl_call_sv #8 0x00002ad4256202f6 in /usr/lib/apache2/modules/mod_perl.so: modperl_callback #9 0x00002ad425620a3a in /usr/lib/apache2/modules/mod_perl.so: modperl_callback_run_handlers #10 0x00002ad42562101f in /usr/lib/apache2/modules/mod_perl.so: modperl_callback_per_dir #11 0x00002ad42561a99f in /usr/lib/apache2/modules/mod_perl.so: - #12 0x00002ad42561ad06 in /usr/lib/apache2/modules/mod_perl.so: modperl_response_handler #13 0x00002ad41ff58320 in /usr/lib/apache2/mpm-prefork/apache2: ap_run_handler (/build/buildd/apache2-2.2.14/server/config.c:159) #14 0x00002ad41ff5bc88 in /usr/lib/apache2/mpm-prefork/apache2: ap_invoke_handler (/build/buildd/apache2-2.2.14/server/config.c:377) #15 0x00002ad41ff69878 in /usr/lib/apache2/mpm-prefork/apache2: ap_process_request (/build/buildd/apache2-2.2.14/modules/http/http_request.c:282) #16 0x00002ad41ff66728 in /usr/lib/apache2/mpm-prefork/apache2: ap_process_http_connection (/build/buildd/apache2-2.2.14/modules/http/http_core.c:190) #17 0x00002ad41ff5fed8 in /usr/lib/apache2/mpm-prefork/apache2: ap_run_process_connection (/build/buildd/apache2-2.2.14/server/connection.c:43) #18 0x00002ad41ff6e847 in /usr/lib/apache2/mpm-prefork/apache2: child_main (/build/buildd/apache2-2.2.14/server/mpm/prefork/prefork.c:663) #19 0x00002ad41ff6eb5a in /usr/lib/apache2/mpm-prefork/apache2: make_child (/build/buildd/apache2-2.2.14/server/mpm/prefork/prefork.c:758) #20 0x00002ad41ff6f7e4 in /usr/lib/apache2/mpm-prefork/apache2: perform_idle_server_maintenance (/build/buildd/apache2-2.2.14/server/mpm/prefork/prefork.c:883) #21 0x00002ad41ff44350 in /usr/lib/apache2/mpm-prefork/apache2: main (/build/buildd/apache2-2.2.14/server/main.c:742) #22 0x00002ad420c70c4d in /lib/libc-2.11.1.so: __libc_start_main@@GLIBC_2.2.5 #23 0x00002ad41ff432e9 in /usr/lib/apache2/mpm-prefork/apache2: _start
對於沒有掛起的程序的堆棧跟踪:
9741:/usr/sbin/apache2-kstart #0 0x00002ad420d3d7d7 in /lib/libc-2.11.1.so: semop@@GLIBC_2.2.5 #1 0x00002ad42081a0c3 in /usr/lib/libapr-1.so.0.3.8: - #2 0x00002ad42081ae47 in /usr/lib/libapr-1.so.0.3.8: apr_proc_mutex_lock #3 0x00002ad41ff6e754 in /usr/lib/apache2/mpm-prefork/apache2: accept_mutex_on (/build/buildd/apache2-2.2.14/server/mpm/prefork/prefork.c:206) #4 0x00002ad41ff6eb5a in /usr/lib/apache2/mpm-prefork/apache2: make_child (/build/buildd/apache2-2.2.14/server/mpm/prefork/prefork.c:758) #5 0x00002ad41ff6f7e4 in /usr/lib/apache2/mpm-prefork/apache2: perform_idle_server_maintenance (/build/buildd/apache2-2.2.14/server/mpm/prefork/prefork.c:883) #6 0x00002ad41ff44350 in /usr/lib/apache2/mpm-prefork/apache2: main (/build/buildd/apache2-2.2.14/server/main.c:742) #7 0x00002ad420c70c4d in /lib/libc-2.11.1.so: __libc_start_main@@GLIBC_2.2.5 #8 0x00002ad41ff432e9 in /usr/lib/apache2/mpm-prefork/apache2: _start
這是我的問題:
如何找出阻止 apache 實例成功關閉的原因?
更新 4 Google搜尋“apache 發送回复掛起”會導致一些文章出現錯誤腳本/mysql 連接問題。從stacktrace中可以看出,卡住的程序確實使用了mod_perl,也許有問題……會深入研究
經過測試,我想我發現了問題:
這是(正如已經預料到的)我的 Perl 程式碼中的一個問題。
我使用了 Log4Perl。在我的 Log4Perl 文件中,有一個設置可以發送錯誤郵件。刪除此設置後,所有 perl 實例都成功終止,並且不再有任何掛起的 apache 程序。
我的猜測是您的程序可能無法寫入日誌?您的日誌記錄配置是什麼樣的?
- 您是否通過某些過程管道日誌?
- 通過先進先出?
- 與 SYSLOG 的 TCP 連接?
- 寫入可能有問題的磁碟分區?
如果這不能讓您獲得任何幫助,請開始使用 strace 找出工作程序在掛起之前所做的最後一件事。