Icinga2

向 Icinga Master-Satellite-Agent 基礎設施中的代理添加服務檢查

  • November 11, 2020

我有這樣的環境:

  • 掌握
  • 一些分配給主控的衛星
  • 許多代理分配給衛星,一些代理分配給主(沒有衛星)。

所有系統均已準備就緒,PKI 設置已完成。大多數預設檢查(apt、disk、cpu)也在執行,我可以在主伺服器上看到目前狀態。現在我已經開始實施自定義檢查(比如 check_eth 來監控網路流量)。我已將腳本發佈到所有主機並在所有主機上定義了命令:

object CheckCommand "check_eth" {
 import "plugin-check-command"
 command = [ "/usr/bin/sudo", PluginDir + "/check_eth" ]

 arguments       = {
  "-w" = {
     value                     = "$eth_warning$"
     description               = "Percent free/used when to warn"
     required                  = true
   }
   "-c" = {
     value                     = "$eth_critical$"
     description               = "Percent free/used when critical"
     required                  = true
   }
   "-i" = {
     value                     = "$eth_interface$"
     description               = "Given network interface"
     required                  = true
   }
 }

 vars.eth_interface  = "enp0s31f6"
 vars.eth_warning  = "2048G"
 vars.eth_critical = "4096G"
}

我可以在所有主機上執行腳本。在 Master 上,可以看到衛星和所有直接分配給 master 的主機檢查的響應。在所有具有 parent=satellite 的主機上,狀態為 UNKNOWN。那是我的問題……為什麼?

主機對像如下:

# master: /etc/icinga2/zones.conf

object Endpoint "monitor.domain" {
}

object Zone "master" {
 endpoints = [ "monitor.domain" ]
}

object Endpoint "satellite1.domain" {
   host = "<ip>"
   port = "<port>"
}

object Zone "satellite1.domain" {
   parent = "master"
   endpoints = [ "satellite1.domain" ]
}

衛星配置如下所示:

# master: /etc/icinga2/zones.d/satellite1.domain/hosts.conf

object Host "satellite1.domain" {
   import "generic-host"
   check_command = "hostalive"
   zone = "master"

   address = "<ipv4>"
   address6 = "<ipv6>"
   
   vars.agent_endpoint = name
   ...
}

object Host "agent1.domain" {
   import "generic-host"
   check_command = "hostalive"
   zone = "satellite1.domain"

   address = "<ipv4>"
   address6 = "<ipv6>"
   
   vars.agent_endpoint = name
   ...
}
...

該區域包括。衛星內部的端點也在主節點上定義:

# master: /etc/icinga2/zones.d/satellite1.domain/zones.conf
object Zone "agent1.domain" {
   parent = "satellite1.domain"
   endpoints = [ "agent1.domain" ]
}

object Endpoint "agent1.domain" {
   host = "<ip>"
   port = "<port>"
}

現在將命令應用於主機(也在主機上定義)

# master: /etc/icinga2/zones.d/satellite1.domain/services.conf

apply Service "Network Traffic" {
 import "generic-service"

 check_command = "check_eth"
 command_endpoint = host_name

 assign where host.name == "satellite1.domain"
}

apply Service "Network Traffic" {
 import "generic-service"

 check_command = "check_eth"
 command_endpoint = host_name

 assign where host.name == "agent1.domain"
}

我想念什麼?

啊,現在我發現了問題。eth_interface檢查命令定義包含存在於衛星和主控上的預設值。但是虛擬機有另一個介面。如果我刪除檢查命令預設變數並為每個主機對象分配該變數,一切都很好。

引用自:https://serverfault.com/questions/1041847