Apache-2.2

伺服器突然停止響應,一小時後恢復

  • May 23, 2014

我的 FreeBSD 伺服器已經完美執行了 2 年多,沒有對系統進行任何重大更改。最近我使用 Apache 的 mod_ssl 安裝了 SSL 證書,經過 10 天的正常執行,伺服器突然開始崩潰。

伺服器崩潰時:

  • HTTPS 和 SSH 立即變得無響應
  • PING 在停止響應之前會減慢到數千毫秒

在無法訪問 15-60 分鐘後:

  • 伺服器突然恢復並開始全速工作 - 因為什麼都沒發生
  • 然後在 15-60 分鐘內再次崩潰並重複循環

我檢查了什麼:

  • 當我重新啟動伺服器時,沒有任何變化 - 它仍然無法訪問
  • CPU / RAM / HDD 使用率 - 正常(< 50%,包括高峰時段)
  • 交通沒有影響 - 發生在一天中的任何時間,包括凌晨 4 點
  • 禁用防火牆沒有幫助

在 httpd-error.log 我發現:

[notice] Digest: generating secret for digest authentication ...
[notice] Digest: done
[notice] Apache/2.2.23 (FreeBSD) mod_ssl/2.2.23 OpenSSL/0.9.8q DAV/2 configured -- resuming normal operations
[error] server reached MaxClients setting, consider raising the MaxClients setting

我嘗試啟用 KeepAlive 並大幅(4 倍)增加 MaxClients 大小,但這並沒有解決問題:

Timeout 120
KeepAlive On
KeepAliveTimeout 5
MaxKeepAliveRequests 1000

&lt;IfModule mpm_prefork_module&gt;
   StartServers          50
   MinSpareServers       128
   MaxSpareServers      1024
   ServerLimit      1024
   MaxClients          1024
   MaxRequestsPerChild   1000
&lt;/IfModule&gt;

在我發現第一次崩潰之前的 /var/log/messages 中:

kernel: mfi0: 228755 (454057919s/0x0008/FATAL) - Battery needs replacement - SOH Bad
kernel: mfi0: 228756 (454057984s/0x0008/FATAL) - Battery needs replacement - SOH Bad
kernel: mfi0: 228757 (454058049s/0x0008/FATAL) - Battery needs replacement - SOH Bad
kernel: arp: 176.31.237.254 moved from 00:07:b4:00:00:01 to 00:07:b4:00:00:03 on ix0
kernel: arp: 176.31.237.251 moved from 00:25:90:02:08:fc to 00:07:b4:00:00:01 on ix0
kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:01 to 00:07:b4:00:00:03 on ix0
kernel: mfi0: 228758 (454058114s/0x0008/FATAL) - Battery needs replacement - SOH Bad
kernel: mfi0: 228759 (454058179s/0x0008/FATAL) - Battery needs replacement - SOH Bad

“電池需要更換”警告在第一次重新啟動後消失了,但 arp 消息不斷出現在日誌中,大約與伺服器崩潰的時間間隔相同:

May 23 05:00:00 ns228407 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:03 to 00:07:b4:00:00:01 on ix0
May 23 05:00:02 ns228407 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:01 to 00:25:90:02:08:fc on ix0
May 23 05:20:00 ns228407 kernel: arp: 176.31.237.251 moved from 00:25:90:02:08:fc to 00:07:b4:00:00:01 on ix0
May 23 05:20:00 ns228407 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:01 to 00:07:b4:00:00:03 on ix0
May 23 05:32:44 ns228407 kernel: arp: 176.31.237.254 moved from 00:07:b4:00:00:03 to 00:07:b4:00:00:01 on ix0
May 23 05:40:01 ns228407 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:03 to 00:25:90:02:08:fc on ix0
May 23 05:40:01 ns228407 kernel: arp: 176.31.237.251 moved from 00:25:90:02:08:fc to 00:07:b4:00:00:01 on ix0
May 23 05:40:01 ns228407 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:01 to 00:07:b4:00:00:03 on ix0
May 23 05:52:40 ns228407 kernel: arp: 176.31.237.254 moved from 00:07:b4:00:00:01 to 00:07:b4:00:00:03 on ix0
May 23 06:00:00 ns228407 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:03 to 00:25:90:02:08:fc on ix0
May 23 06:00:00 ns228407 kernel: arp: 176.31.237.251 moved from 00:25:90:02:08:fc to 00:07:b4:00:00:01 on ix0
May 23 06:00:00 ns228407 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:01 to 00:07:b4:00:00:03 on ix0
May 23 06:00:02 ns228407 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:03 to 00:25:90:02:08:fc on ix0
May 23 06:20:01 ns228407 kernel: arp: 176.31.237.251 moved from 00:25:90:02:08:fc to 00:07:b4:00:00:03 on ix0
May 23 06:20:01 ns228407 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:03 to 00:07:b4:00:00:01 on ix0
May 23 06:30:02 ns228407 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:01 to 00:25:90:02:08:fc on ix0
May 23 06:32:36 ns228407 kernel: arp: 176.31.237.254 moved from 00:07:b4:00:00:03 to 00:07:b4:00:00:01 on ix0
May 23 06:50:01 ns228407 kernel: arp: 176.31.237.251 moved from 00:25:90:02:08:fc to 00:07:b4:00:00:01 on ix0
May 23 06:50:01 ns228407 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:01 to 00:07:b4:00:00:03 on ix0
May 23 07:00:02 ns228407 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:03 to 00:25:90:02:08:fc on ix0
May 23 07:12:28 ns228407 kernel: arp: 176.31.237.254 moved from 00:07:b4:00:00:01 to 00:07:b4:00:00:03 on ix0
May 23 07:20:00 ns228407 kernel: arp: 176.31.237.251 moved from 00:25:90:02:08:fc to 00:07:b4:00:00:01 on ix0
May 23 07:20:00 ns228407 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:01 to 00:07:b4:00:00:03 on ix0 

接下來我應該怎麼做才能找到並解決問題?

您現在應該做的最後一件事是增加 MaxClients。

這很難說。減速和 MaxClients 警告表明您對伺服器的需求過多,無法應對。除非你在伺服器上執行很多 AJAX/COMET 東西,否則你真的應該減少 keepalive 超時(比如說,最初是 2)。

“電池需要更換”不僅僅是提醒進行一些維護 - 在 BBWC 上,這意味著控制器不再嘗試記憶體寫入 - 如果您的系統設置正確,那麼您的作業系統和磁碟將不會記憶體寫入任何一個。

兩者都表明您的系統的性能應該非常糟糕 - 但您報告的第一件事是它似乎不可用 - 實際上您沒有提及性能 - 知道如何衡量性能和擷取數據應該是您的首要任務.

我不確定為什麼地址不斷移動(我假設這些是本地介面) - 這可能是其他地方負載的結果。

這是一隻生病的小狗——你將不得不開始一次修復一件事,直到你更清楚地了解出了什麼問題。

首先切換電池、調整 apache 安裝和記錄性能指標。

引用自:https://serverfault.com/questions/598021