F5-Big-Ip

F5 LTM 經常使用 SIGKILL 殺死程序

  • December 31, 2015

我們有一個 BIP-IP 6400 LTM 設備以驚人的頻率殺死程序。CPU 的使用率始終保持在 23% 左右,因此這不是問題。

這是來自的範例/var/log/ltm

Oct  7 08:21:55 local/pri-4600 info bigd[3471]: reap_child: child process PID = 25338 exited with signal = 9
Oct  7 08:22:15 local/pri-4600 info bigd[3471]: reap_child: child process PID = 25587 exited with signal = 9
Oct  7 08:22:34 local/pri-4600 info bigd[3471]: reap_child: child process PID = 25793 exited with signal = 9
Oct  7 08:23:10 local/pri-4600 info bigd[3471]: reap_child: child process PID = 26260 exited with signal = 9
Oct  7 08:23:36 local/pri-4600 info bigd[3471]: reap_child: child process PID = 26584 exited with signal = 9
Oct  7 08:23:40 local/pri-4600 info bigd[3471]: reap_child: child process PID = 26647 exited with signal = 9
Oct  7 08:23:45 local/pri-4600 info bigd[3471]: reap_child: child process PID = 26699 exited with signal = 9
Oct  7 08:23:55 local/pri-4600 info bigd[3471]: reap_child: child process PID = 26805 exited with signal = 9
Oct  7 08:25:36 local/pri-4600 info bigd[3471]: reap_child: child process PID = 28079 exited with signal = 9
Oct  7 08:27:15 local/pri-4600 info bigd[3471]: reap_child: child process PID = 29286 exited with signal = 9
Oct  7 08:27:16 local/pri-4600 info bigd[3471]: reap_child: child process PID = 29307 exited with signal = 9
Oct  7 08:27:56 local/pri-4600 info bigd[3471]: reap_child: child process PID = 29793 exited with signal = 9
Oct  7 08:29:20 local/pri-4600 info bigd[3471]: reap_child: child process PID = 30851 exited with signal = 9
Oct  7 08:33:00 local/pri-4600 info bigd[3471]: reap_child: child process PID = 1122 exited with signal = 9
Oct  7 08:33:16 local/pri-4600 info bigd[3471]: reap_child: child process PID = 1299 exited with signal = 9
Oct  7 08:34:15 local/pri-4600 info bigd[3471]: reap_child: child process PID = 2054 exited with signal = 9
Oct  7 08:35:16 local/pri-4600 info bigd[3471]: reap_child: child process PID = 2784 exited with signal = 9
Oct  7 08:35:16 local/pri-4600 info bigd[3471]: reap_child: child process PID = 2807 exited with signal = 9
Oct  7 08:35:35 local/pri-4600 info bigd[3471]: reap_child: child process PID = 3015 exited with signal = 9
Oct  7 08:36:15 local/pri-4600 info bigd[3471]: reap_child: child process PID = 3601 exited with signal = 9

這是正常的嗎?如果不是,是什麼原因導致這種情況發生?

這是我們正在執行的 10.2.4 BIG-IP 軟體中的一個已知錯誤。

來自 F5 支持:

…您遇到了一個內部跟踪的已知問題:錯誤 ID539130“bigd 在處理 SIGCHLD 時可能死鎖,導致 bigd 心跳失敗和 SIGABRT”-=Condition=- 執行很長時間並被下一次迭代殺死的外部監視器monitor,可能會導致 bigd 崩潰和 core,這會導致健康監控暫時失效。

解決方法是使用Hotfix-BIGIP-10.2.4-HF12-866.11-ENG.

bigd 是 BIG-IP 上的監控守護程序,因此這似乎表明正在使用的監控器正在崩潰。您應該在支持下打開一個案例並將您的 qkview 上傳到 ihealth.f5.com。這是與該錯誤消息相關的解決方案:

https://support.f5.com/kb/en-us/solutions/public/17000/000/sol17092.html

引用自:https://serverfault.com/questions/727343