Apache-2.2
伺服器突然停止響應,一小時後恢復
我的 FreeBSD 伺服器已經完美執行了 2 年多,沒有對系統進行任何重大更改。最近我使用 Apache 的 mod_ssl 安裝了 SSL 證書,經過 10 天的正常執行,伺服器突然開始崩潰。
伺服器崩潰時:
- HTTPS 和 SSH 立即變得無響應
- PING 在停止響應之前會減慢到數千毫秒
在無法訪問 15-60 分鐘後:
- 伺服器突然恢復並開始全速工作 - 因為什麼都沒發生
- 然後在 15-60 分鐘內再次崩潰並重複循環
我檢查了什麼:
- 當我重新啟動伺服器時,沒有任何變化 - 它仍然無法訪問
- CPU / RAM / HDD 使用率 - 正常(< 50%,包括高峰時段)
- 交通沒有影響 - 發生在一天中的任何時間,包括凌晨 4 點
- 禁用防火牆沒有幫助
在 httpd-error.log 我發現:
[notice] Digest: generating secret for digest authentication ... [notice] Digest: done [notice] Apache/2.2.23 (FreeBSD) mod_ssl/2.2.23 OpenSSL/0.9.8q DAV/2 configured -- resuming normal operations [error] server reached MaxClients setting, consider raising the MaxClients setting
我嘗試啟用 KeepAlive 並大幅(4 倍)增加 MaxClients 大小,但這並沒有解決問題:
Timeout 120 KeepAlive On KeepAliveTimeout 5 MaxKeepAliveRequests 1000 <IfModule mpm_prefork_module> StartServers 50 MinSpareServers 128 MaxSpareServers 1024 ServerLimit 1024 MaxClients 1024 MaxRequestsPerChild 1000 </IfModule>
在我發現第一次崩潰之前的 /var/log/messages 中:
kernel: mfi0: 228755 (454057919s/0x0008/FATAL) - Battery needs replacement - SOH Bad kernel: mfi0: 228756 (454057984s/0x0008/FATAL) - Battery needs replacement - SOH Bad kernel: mfi0: 228757 (454058049s/0x0008/FATAL) - Battery needs replacement - SOH Bad kernel: arp: 176.31.237.254 moved from 00:07:b4:00:00:01 to 00:07:b4:00:00:03 on ix0 kernel: arp: 176.31.237.251 moved from 00:25:90:02:08:fc to 00:07:b4:00:00:01 on ix0 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:01 to 00:07:b4:00:00:03 on ix0 kernel: mfi0: 228758 (454058114s/0x0008/FATAL) - Battery needs replacement - SOH Bad kernel: mfi0: 228759 (454058179s/0x0008/FATAL) - Battery needs replacement - SOH Bad
“電池需要更換”警告在第一次重新啟動後消失了,但 arp 消息不斷出現在日誌中,大約與伺服器崩潰的時間間隔相同:
May 23 05:00:00 ns228407 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:03 to 00:07:b4:00:00:01 on ix0 May 23 05:00:02 ns228407 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:01 to 00:25:90:02:08:fc on ix0 May 23 05:20:00 ns228407 kernel: arp: 176.31.237.251 moved from 00:25:90:02:08:fc to 00:07:b4:00:00:01 on ix0 May 23 05:20:00 ns228407 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:01 to 00:07:b4:00:00:03 on ix0 May 23 05:32:44 ns228407 kernel: arp: 176.31.237.254 moved from 00:07:b4:00:00:03 to 00:07:b4:00:00:01 on ix0 May 23 05:40:01 ns228407 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:03 to 00:25:90:02:08:fc on ix0 May 23 05:40:01 ns228407 kernel: arp: 176.31.237.251 moved from 00:25:90:02:08:fc to 00:07:b4:00:00:01 on ix0 May 23 05:40:01 ns228407 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:01 to 00:07:b4:00:00:03 on ix0 May 23 05:52:40 ns228407 kernel: arp: 176.31.237.254 moved from 00:07:b4:00:00:01 to 00:07:b4:00:00:03 on ix0 May 23 06:00:00 ns228407 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:03 to 00:25:90:02:08:fc on ix0 May 23 06:00:00 ns228407 kernel: arp: 176.31.237.251 moved from 00:25:90:02:08:fc to 00:07:b4:00:00:01 on ix0 May 23 06:00:00 ns228407 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:01 to 00:07:b4:00:00:03 on ix0 May 23 06:00:02 ns228407 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:03 to 00:25:90:02:08:fc on ix0 May 23 06:20:01 ns228407 kernel: arp: 176.31.237.251 moved from 00:25:90:02:08:fc to 00:07:b4:00:00:03 on ix0 May 23 06:20:01 ns228407 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:03 to 00:07:b4:00:00:01 on ix0 May 23 06:30:02 ns228407 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:01 to 00:25:90:02:08:fc on ix0 May 23 06:32:36 ns228407 kernel: arp: 176.31.237.254 moved from 00:07:b4:00:00:03 to 00:07:b4:00:00:01 on ix0 May 23 06:50:01 ns228407 kernel: arp: 176.31.237.251 moved from 00:25:90:02:08:fc to 00:07:b4:00:00:01 on ix0 May 23 06:50:01 ns228407 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:01 to 00:07:b4:00:00:03 on ix0 May 23 07:00:02 ns228407 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:03 to 00:25:90:02:08:fc on ix0 May 23 07:12:28 ns228407 kernel: arp: 176.31.237.254 moved from 00:07:b4:00:00:01 to 00:07:b4:00:00:03 on ix0 May 23 07:20:00 ns228407 kernel: arp: 176.31.237.251 moved from 00:25:90:02:08:fc to 00:07:b4:00:00:01 on ix0 May 23 07:20:00 ns228407 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:01 to 00:07:b4:00:00:03 on ix0
接下來我應該怎麼做才能找到並解決問題?
您現在應該做的最後一件事是增加 MaxClients。
這很難說。減速和 MaxClients 警告表明您對伺服器的需求過多,無法應對。除非你在伺服器上執行很多 AJAX/COMET 東西,否則你真的應該減少 keepalive 超時(比如說,最初是 2)。
“電池需要更換”不僅僅是提醒進行一些維護 - 在 BBWC 上,這意味著控制器不再嘗試記憶體寫入 - 如果您的系統設置正確,那麼您的作業系統和磁碟將不會記憶體寫入任何一個。
兩者都表明您的系統的性能應該非常糟糕 - 但您報告的第一件事是它似乎不可用 - 實際上您沒有提及性能 - 知道如何衡量性能和擷取數據應該是您的首要任務.
我不確定為什麼地址不斷移動(我假設這些是本地介面) - 這可能是其他地方負載的結果。
這是一隻生病的小狗——你將不得不開始一次修復一件事,直到你更清楚地了解出了什麼問題。
首先切換電池、調整 apache 安裝和記錄性能指標。