Icinga2
向 Icinga Master-Satellite-Agent 基礎設施中的代理添加服務檢查
我有這樣的環境:
- 掌握
- 一些分配給主控的衛星
- 許多代理分配給衛星,一些代理分配給主(沒有衛星)。
所有系統均已準備就緒,PKI 設置已完成。大多數預設檢查(apt、disk、cpu)也在執行,我可以在主伺服器上看到目前狀態。現在我已經開始實施自定義檢查(比如 check_eth 來監控網路流量)。我已將腳本發佈到所有主機並在所有主機上定義了命令:
object CheckCommand "check_eth" { import "plugin-check-command" command = [ "/usr/bin/sudo", PluginDir + "/check_eth" ] arguments = { "-w" = { value = "$eth_warning$" description = "Percent free/used when to warn" required = true } "-c" = { value = "$eth_critical$" description = "Percent free/used when critical" required = true } "-i" = { value = "$eth_interface$" description = "Given network interface" required = true } } vars.eth_interface = "enp0s31f6" vars.eth_warning = "2048G" vars.eth_critical = "4096G" }
我可以在所有主機上執行腳本。在 Master 上,可以看到衛星和所有直接分配給 master 的主機檢查的響應。在所有具有 parent=satellite 的主機上,狀態為 UNKNOWN。那是我的問題……為什麼?
主機對像如下:
# master: /etc/icinga2/zones.conf object Endpoint "monitor.domain" { } object Zone "master" { endpoints = [ "monitor.domain" ] } object Endpoint "satellite1.domain" { host = "<ip>" port = "<port>" } object Zone "satellite1.domain" { parent = "master" endpoints = [ "satellite1.domain" ] }
衛星配置如下所示:
# master: /etc/icinga2/zones.d/satellite1.domain/hosts.conf object Host "satellite1.domain" { import "generic-host" check_command = "hostalive" zone = "master" address = "<ipv4>" address6 = "<ipv6>" vars.agent_endpoint = name ... } object Host "agent1.domain" { import "generic-host" check_command = "hostalive" zone = "satellite1.domain" address = "<ipv4>" address6 = "<ipv6>" vars.agent_endpoint = name ... } ...
該區域包括。衛星內部的端點也在主節點上定義:
# master: /etc/icinga2/zones.d/satellite1.domain/zones.conf object Zone "agent1.domain" { parent = "satellite1.domain" endpoints = [ "agent1.domain" ] } object Endpoint "agent1.domain" { host = "<ip>" port = "<port>" }
現在將命令應用於主機(也在主機上定義)
# master: /etc/icinga2/zones.d/satellite1.domain/services.conf apply Service "Network Traffic" { import "generic-service" check_command = "check_eth" command_endpoint = host_name assign where host.name == "satellite1.domain" } apply Service "Network Traffic" { import "generic-service" check_command = "check_eth" command_endpoint = host_name assign where host.name == "agent1.domain" }
我想念什麼?
啊,現在我發現了問題。
eth_interface
檢查命令定義包含存在於衛星和主控上的預設值。但是虛擬機有另一個介面。如果我刪除檢查命令預設變數並為每個主機對象分配該變數,一切都很好。