Centos7
munin-node 工作正常,現在“啟動操作超時”
我已經讓 munin-node 在我的機器上成功執行了一段時間,但最近它不再啟動了。沒有 munin-node 日誌可供我檢查,
systemctl status munin-node
也沒有提供很多有用的資訊:[root@host /]# systemctl status munin-node ● munin-node.service - Munin Node Loaded: loaded (/usr/lib/systemd/system/munin-node.service; enabled; vendor preset: disabled) Active: failed (Result: timeout) since Tue 2021-05-18 23:35:16 CEST; 1h 8min ago Docs: man:munin-node(1) http://guide.munin-monitoring.org/en/latest/node/index.html Process: 7710 ExecStart=/usr/sbin/munin-node --foreground (code=exited, status=0/SUCCESS) Main PID: 7710 (code=exited, status=0/SUCCESS) May 18 23:33:44 host systemd[1]: Starting Munin Node... May 18 23:35:14 host systemd[1]: munin-node.service start operation timed out. Terminating. May 18 23:35:16 host systemd[1]: Failed to start Munin Node. May 18 23:35:16 host systemd[1]: Unit munin-node.service entered failed state. May 18 23:35:16 host systemd[1]: munin-node.service failed.
問題原來是外掛花費的時間太長,特別是
nvidia_gpu_*
外掛,因為它是一台多 GPU 機器。沒有明確的指標表明外掛導致了超時。為了加快
nvidia_gpu_*
外掛速度,我使用了以下命令,基於https://forums.developer.nvidia.com/t/nvidia-smi-is-slow-on-ubuntu-16-04/50416:nvidia_smi --persistence-mode 1
只需執行
nvidia_smi
命令即可測試其效果,因為不必先喚醒 GPU,因此載入速度會更快。