Debian

自從 debian 升級後,一個 check_nrpe 命令在 nagios 伺服器上不起作用

  • September 5, 2020

昨天我將一個伺服器從 Debian 9 升級到 Debian 10。這個伺服器由 nagios 監督。自升級以來,我收到一個警報,狀態未知:

“卷組數組 03-0 無效或未使用“-v 卷組”指定,再見。假

該服務是 VG array03-0 用法,它的命令是 check_nrpe!check_vgs_array03-0。此服務的目標是在陣列上的儲存快滿時生成警報。

check_nrpe 命令是標準的:

# 'check_NRPE' command definition
define command{
       command_name check_nrpe
       command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
       }

如果我沒記錯的話,這意味著我在受監督伺服器上的 /etc/nagios/nrpe.cfg 中有一個 check_vgs_array03-0 命令。讓我們看一下,這裡是:

命令

$$ check_vgs_array03-0 $$=/usr/lib/nagios/plugins/check_vg_size -w 20 -c 10 -v array03-0

如果我只是在受監督的伺服器上鍵入此命令,我沒有錯誤,它可以工作。

VG array03-0 OK 可用空間為 805 GB;| 數組03-0=805GB;20;10;0;19155

例如,如果我輸入了一個不存在的捲組名稱,我就會收到錯誤消息。

check_vg_size 外掛腳本是這樣的:

#!/bin/bash
#check_vg_size
#set -x
# Plugin for Nagios
# Written by M. Koettenstorfer (mko@lihas.de)
# Some additions by J. Schoepfer (jsc@lihas.de)
# Major changes into functions and input/output values J. Veverka (veverka.kuba@gmail.com)
# Last Modified: 2012-11-06
#
# Description:
#
# This plugin will check howmany space in volume groups is free

# Nagios return codes
STATE_OK=0
STATE_WARNING=1
STATE_CRITICAL=2
STATE_UNKNOWN=3
STATE_DEPENDENT=4

SERVICEOUTPUT=""
SERVICEPERFDATA=""

PROGNAME=$(basename $0)

vgs_bin=`/usr/bin/whereis -b -B /sbin /bin /usr/bin /usr/sbin -f vgs | awk '{ print $2 }'`
_vgs="$vgs_bin --units=g"

bc_bin=`/usr/bin/whereis -b -B /sbin /bin /usr/bin /usr/sbin -f bc | awk '{ print $2 }'`

exitstatus=$STATE_OK #default
declare -a volumeGroups;
novg=0; #number of volume groups
allVG=false; #Will we use all volume groups we can find on system?
inPercent=false; #Use percentage for comparison?

unitsGB="GB"
unitsPercent="%"
units=$unitsGB

########################################################################
### DEFINE FUNCTIONS
########################################################################

print_usage() {
       echo "Usage: $PROGNAME  -w <min size warning level in gb> -c <min size critical level in gb> -v <volumegroupname> [-a] [-p]"
       echo "If '-a' and '-v' are specified: all volumegroups defined by -v will be ommited and the remaining groups which are found on system are checked"
       echo "If '-p' is specified: the warning and critical levels are represented as the percent space left on device"
   echo ""
}

print_help() {
       print_usage
       echo ""
       echo "This plugin will check how much space is free in volume groups"
       echo "usage: "
       exit $STATE_UNKNOWN
}


checkArgValidity () {
# Check arguments for validity
       if [[ -z $critlevel || -z $warnlevel ]] # Did we get warn and crit values?
       then
               echo "You must specify a warning and critical level"
               print_usage
               exitstatus=$STATE_UNKNOWN
               exit $exitstatus
       elif [ $warnlevel -le $critlevel ] # Do the warn/crit values make sense?
       then
       if [ $inPercent != 'true' ]
       then
           echo "CRITICAL value of $critlevel GB is less than WARNING level of $warnlevel GB"
           print_usage
           exitstatus=$STATE_UNKNOWN
           exit $exitstatus
       else
           echo "CRITICAL value of $critlevel % is higher than WARNING level of $warnlevel %"
           print_usage
           exitstatus=$STATE_UNKNOWN
           exit $exitstatus
       fi
       fi
}

#Does volume group actually exist?
volumeGroupExists () {
       local volGroup="$@"
       VGValid=$($_vgs 2>/dev/null | grep "$volGroup" | wc -l )

       if [[  -z "$volGroup" ||  $VGValid = 0 ]]
       then
               echo "Volumegroup $volGroup wasn't valid or wasn't specified"
               echo "with \"-v Volumegroup\", bye."
               echo false
               return 1
       else
               #The volume group exists
               echo true
               return 0
       fi
}

getNumberOfVGOnSystem () {
       local novg=$($_vgs 2>/dev/null | wc -l)
       let novg--
       echo $novg
}

getAllVGOnSystem () {
       novg=$(getNumberOfVGOnSystem)
       local found=false;
       for (( i=0; i < novg; i++)); do
               volumeGroups[$i]=$($_vgs | tail -n  $(($i+1)) | head -n 1 | awk '{print $1}')
               found=true;
       done
       if ( ! $found ); then
               echo "$found"
               echo "No Volumegroup wasn't valid or wasn't found"
               exit $STATE_UNKNOWN
       fi
}

getColumnNoByName () {
       columnName=$1
       result=$($_vgs 2>/dev/null | head -n1 | awk -v name=$columnName '
               BEGIN{}
                       { for(i=1;i<=NF;i++){
                             if ($i ~ name)
                                 {print i } }
                       }')

       echo $result
}

convertToPercent () {
#$1 = xx%
#$2 = 100%
   # Make values numbers only
       local input="$(echo $1 | sed 's/g//i')"
       local max="$(echo $2 | sed 's/g//i')"
       local onePercent='';
       local freePercent='';
       if [ -x "$bc_bin" ] ; then
               onePercent=$( echo "scale=2; $max / 100" | bc );
               freePercent=$( echo "$input / $onePercent" | bc );
       else
               freePercent=$(perl -e "print int((($max-$input)*100/$max))")
       fi
       echo $freePercent;
       return 0;
}

getSizesOfVolume () {
       volumeName="$1";
       #Check the actual sizes
       cnFree=`getColumnNoByName "VFree"`;
       cnSize=`getColumnNoByName "VSize"`;
       freespace=`$_vgs $volumeName 2>/dev/null | awk -v n=$cnFree '/[0-9]/{print $n}' | sed -e 's/[\.,\,].*//'`;
       fullspace=`$_vgs $volumeName 2>/dev/null | awk -v n=$cnSize '/[0-9]/{print $n}' | sed -e 's/[\.,\,].*//'`;

       if ( $inPercent ); then
       #Convert to Percents
               freespace="$(convertToPercent $freespace $fullspace)"
       fi
}

setExitStatus () {
       local status=$1
       local volGroup="$2"
       local formerStatus=$exitstatus

       if [ $status -gt $formerStatus ]
       then
               formerStatus=$status
       fi

       if [ $status = $STATE_UNKNOWN ] ; then
               SERVICEOUTPUT="${volGroup}"
               exitstatus=$STATE_UNKNOWN
               return
       fi

       if [ "$freespace" -le "$critlevel" ]
       then
               SERVICEOUTPUT=$SERVICEOUTPUT" VG $volGroup CRITICAL Available space is $freespace $units;"
               exitstatus=$STATE_CRITICAL
       elif [ "$freespace" -le "$warnlevel" ]
       then
               SERVICEOUTPUT=$SERVICEOUTPUT"VG $volGroup WARNING Available space is $freespace $units;"
               exitstatus=$STATE_WARNING
       else
               SERVICEOUTPUT=$SERVICEOUTPUT"VG $volGroup OK Available space is $freespace $units;"
               exitstatus=$STATE_OK
       fi

       SERVICEPERFDATA="$SERVICEPERFDATA $volGroup=$freespace$units;$warnlevel;$critlevel"
       if [ $inPercent != 'true' ] ; then

               SERVICEPERFDATA="${SERVICEPERFDATA};0;$fullspace"
       fi

       if [ $formerStatus -gt $exitstatus ]
       then
               exitstatus=$formerStatus
       fi
}


checkVolumeGroups () {
checkArgValidity
       for (( i=0; i < novg; i++ )); do
               local status="$STATE_OK"
               local currentVG="${volumeGroups[$i]}"

               local groupExists="$(volumeGroupExists "$currentVG" )"

               if [ "$groupExists" = 'true' ]; then
                       getSizesOfVolume "$currentVG"
                       status=$STATE_OK
               else
                       status=$STATE_UNKNOWN
                       setExitStatus $status "${groupExists}"
                       break
               fi

               setExitStatus $status "$currentVG"
       done
}

########################################################################
### RUN PROGRAM
########################################################################


########################################################################
#Read input values
while getopts ":w:c:v:h:ap" opt ;do
       case $opt in
               h)
                       print_help;
                       exit $exitstatus;
                       ;;
               w)
                       warnlevel=$OPTARG;
                       ;;
               c)
                       critlevel=$OPTARG;
                       ;;
               v)
                       if ( ! $allVG ); then
                               volumeGroups[$novg]=$OPTARG;
                               let novg++;
                       fi
                       ;;
               a)
                       allVG=true;
                       getAllVGOnSystem;
                       ;;
               p)
                       inPercent=true;
                       units=$unitsPercent
                       ;;
               \?)
                       echo "Invalid option: -$OPTARG" >&2
                       ;;
       esac
done

checkVolumeGroups


echo $SERVICEOUTPUT"|"$SERVICEPERFDATA
exit $exitstatus

II 對 check_nrpe 命令使用另一個 arg(另一個腳本),它可以工作。

例如 :

root@nagiosserver:/usr/local/nagios# /usr/local/nagios/libexec/check_nrpe -H srv-supervised04 -c check_load OK - 平均負載:3.79, 2.99, 1.83|load1=3.790;25.000;30.000;0; 負載5=2.990;20.000;25.000;0; 負載15=1.830;15.000;20.000;0;

VG array03-0 確實存在:

root@srv-supervised04:/usr/lib/nagios/plugins# vgdisplay — Volume group — VG Name array03-0 System ID Format

lvm2 Metadata Areas 1 Metadata Sequence No 34 VG Access read/write VG Status resizable MAX LV 0 Cur LV 5 Open LV 4 Max PV

0 Cur PV 1 Act PV 1 VG Size

<18,71 TiB PE Size 4,00 MiB Total PE

4903887 Alloc PE / Size 4697600 / <17,92 TiB Free PE / Size 206287 / < 805,81 GiB VG UUID

OgzAMF-DGbW-3t3L-Wk7k-gY1g-s6fH-zYEKad

所以。VG確實存在。check_vg_size 外掛在本地使用時工作,check_nrpe 命令在與另一個外掛一起使用時在 nagios 伺服器上工作,但 check_vg_size 在 nagios 伺服器上不起作用。錯誤消息顯然是 array03-0 在它存在時不存在。我沒有更改所有文件中的任何內容。它出現在 Debian 從 9 到 10 的更新中(在安裝過程中,我決定保留我的 nrpe.cfg 修改文件)。

有誰知道它可以從哪裡來?Debian 版本?也許是新的 bash 版本?nagios 伺服器(仍然是 Debian 9)和受監督的伺服器(Debian 10)之間不兼容?

好吧,我認為我們遇到了常見問題,NRPE、Nagios 和類似工具在非特權使用者上執行nagios,您正在測試外掛和命令為root.

目前我不確定有關 LVM 數據的任何內容是否從 Debian 9 更改為 10,但肯定在較新的系統中,您需要 root 才能查看 LVM 資訊:

$ /sbin/lvs
 WARNING: Running as a non-root user. Functionality may be unavailable.
 /run/lock/lvm/P_global:aux: open failed: Permission denied

通常人們通過允許 Nagios 使用者通過 sudo 執行某些命令來解決這個問題:

nagios ALL=(root) NOPASSWD: /usr/lib/nagios/plugins/check_vg_size

請在使用者下測試外掛nagios並嘗試 sudo

引用自:https://serverfault.com/questions/1032661