Centos7

munin-node 工作正常,現在“啟動操作超時”

  • May 19, 2021

我已經讓 munin-node 在我的機器上成功執行了一段時間,但最近它不再啟動了。沒有 munin-node 日誌可供我檢查,systemctl status munin-node也沒有提供很多有用的資訊:

[root@host /]# systemctl status munin-node
● munin-node.service - Munin Node
  Loaded: loaded (/usr/lib/systemd/system/munin-node.service; enabled; vendor preset: disabled)
  Active: failed (Result: timeout) since Tue 2021-05-18 23:35:16 CEST; 1h 8min ago
    Docs: man:munin-node(1)
          http://guide.munin-monitoring.org/en/latest/node/index.html
 Process: 7710 ExecStart=/usr/sbin/munin-node --foreground (code=exited, status=0/SUCCESS)
Main PID: 7710 (code=exited, status=0/SUCCESS)

May 18 23:33:44 host systemd[1]: Starting Munin Node...
May 18 23:35:14 host systemd[1]: munin-node.service start operation timed out. Terminating.
May 18 23:35:16 host systemd[1]: Failed to start Munin Node.
May 18 23:35:16 host systemd[1]: Unit munin-node.service entered failed state.
May 18 23:35:16 host systemd[1]: munin-node.service failed.

問題原來是外掛花費的時間太長,特別是nvidia_gpu_*外掛,因為它是一台多 GPU 機器。沒有明確的指標表明外掛導致了超時。

為了加快nvidia_gpu_*外掛速度,我使用了以下命令,基於https://forums.developer.nvidia.com/t/nvidia-smi-is-slow-on-ubuntu-16-04/50416

nvidia_smi --persistence-mode 1

只需執行nvidia_smi命令即可測試其效果,因為不必先喚醒 GPU,因此載入速度會更快。

引用自:https://serverfault.com/questions/1064123