將 nagios 通知設置配置為非常頻繁

March 6, 2015

我已經建立了一個具有三個節點的 Proxmox VE 集群。每個節點都執行著許多虛擬機。我正在使用PVE Monitor 外掛來設置主機和服務，效果很好。
我的問題是 Nagios 的電子郵件發送行為有點奇怪。理想情況下，我希望每分鐘檢查一次節點以及每個節點上執行的所有服務。
我的配置文件如下所示：
# Define the cluster itself as a host
# the command check_pve_cluster_nodes give us info
# on the member's cluster state
define host {
       host_name pve-cluster
       max_check_attempts 10
       check_command check_pve_cluster_nodes
   contact_groups admins
   check_interval 1
   contact_groups admins
   notifications_enabled 1
}

# define openvz, qemu and storages as services of the cluster
define service{
       use generic-service
       host_name pve-cluster
       service_description OpenVZ VMs
       check_command check_pve_cluster_openvz
   check_interval 1
   contact_groups admins
   notifications_enabled 1
}


define service{
       use generic-service
       host_name pve-cluster
       service_description Qemu VMs
       check_command check_pve_cluster_qemu
   check_interval 1
   contact_groups admins
   notifications_enabled 1
}


define service{
       use generic-service
       host_name pve-cluster
       service_description Storages
       check_command check_pve_cluster_storage
   check_interval 1
   contact_groups admins
   notifications_enabled 1
}
我沒有更改時間單位設置，所以應該每分鐘檢查一次。Nagios Web UI 顯示主機離線，但電子郵件通知僅在幾分鐘後發送。此外，電子郵件內容缺少最重要的資訊——哪個節點/服務處於臨界狀態：
節點關閉
***** Nagios *****

Notification Type: PROBLEM
Host: pve-cluster
State: DOWN
Address: pve-cluster
Info: NODES CRITICAL  2 / 3 working nodes

Date/Time: Fri Mar 6 10:48:25 CET 2015
虛擬機停機
***** Nagios *****

Notification Type: PROBLEM

Service: Qemu VMs
Host: pve-cluster
Address: pve-cluster
State: CRITICAL

Date/Time: Fri Mar 6 10:40:44 CET 2015

Additional Info:

QEMU CRITICAL 2 / 3 working VMs
如何設置配置，以便以一分鐘的間隔檢查主機和服務（即虛擬機）？理想情況下，應在此之後每隔 15 分鐘發送一次對該狀態的重新檢查。
這甚至是最好的工作流程嗎？還是有另一種更好的方式來安排通知並確認通知？

Nagios 僅在主機或服務進入“硬”狀態後才會發送電子郵件。在基本層面上回答您的問題 - 一旦主機或服務被檢查了max_check_attempts指定的次數，就會達到硬狀態。預設情況下，這是 4。
有關軟/硬狀態的資訊：http : //nagios.sourceforge.net/docs/3_0/statetypes.html 有關 max_check_attempts 的資訊：http ://nagios.sourceforge.net/docs/3_0/objectdefinitions.html
看起來該外掛絕對打算提供退貨詳細資訊，但無論出於何種原因，它都不是。不幸的是，我沒有環境來測試這個，所以我可能不得不讓你掛在這部分問題上。
perl 的相關部分：
print "NODES $rstatus{$statusScore}  $workingNodes / " .
         scalar(@monitoredNodes) . " working nodes" . $br . $reportSummary;
print "STORAGE $rstatus{$statusScore} $workingStorages / " .
         scalar(@monitoredStorages) . " working storages" . $br . $reportSummary;
print "OPENVZ $rstatus{$statusScore} $workingVms / " .
         scalar(@monitoredOpenvz) . " working VMs" . $br . $reportSummary;
print "QEMU $rstatus{$statusScore} $workingVms / " .
         scalar(@monitoredQemus) . " working VMs" . $br .
         $reportSummary;
$reportSummary 填充了程式碼中較高問題部分的詳細資訊，但似乎沒有為您返回。

引用自：https://serverfault.com/questions/673456

將 nagios 通知設置配置為非常頻繁

節點關閉

虛擬機停機

相關問答

如何監控監控伺服器？

Nagios 已停止發送通知

如何在 Icinga/Nagios 中進行持久確認？

有什麼好的 rsnapshot nagios 外掛嗎？

如何使用 Nagios 的 check_http 檢查包含雜湊的 URI 的內容？

Nagios check_nt 外掛未正確顯示狀態