cacti对流量的监控非常强大,但其他方面的监控能力相比nagios略有不足。而大多IT企业里的开源监控系统以nagios居多,而nagios上也有相应的配套流量插件 —— pnp4nagios 。不过同大多数流量监控绘图软件实现方式一样,大多是通过SNMP协议获取数据,存储为rrdtool 格式。SNMP协议确实十分强大,不过nagios用户使用nrpe的应该更多些。当然,配合nagios使用时,两者之间并不冲突。难道就没办法通过check_nrpe方式配合pnp4nagios实现绘图了吗? 当然是可以的。
通过一段时间内ifconfig获取的流量值求差并求除,计算出平均大小,然后按pnp4nagios所要求的格式输出即可。具体脚本如下:
#!/bin/sh ######################################################## # # # www.361way.com # # Useage: check_traffic -i Interface -w warn -c cirt # # # ######################################################## while getopts ":i:c:w:h" optname do case "$optname" in "i") INT=$OPTARG ;; "c") CIRT=$OPTARG ;; "w") WARN=$OPTARG ;; "h") echo "Useage: check_traffic -i Interface -w warn -c cirt" exit ;; "?") echo "Unknown option $OPTARG" exit ;; ":") echo "No argument value for option $OPTARG" exit ;; *) # Should not occur echo "Unknown error while processing options" exit ;; esac done [ -z $INT ]&& echo "Please input Device!"&&exit ifconfig $INT >/dev/null 2>&1 [ $? -ne 0 ] && echo "error: no device $INT" && exit || DEVICE=$INT [ -z $WARN ] && WARN=1048576 [ -z $CIRT ] && CIRT=2097152 DIR=/App/nagios/tmp FILE=$DIR/.network-$DEVICE.tmp [ -e $DIR ] || mkdir -p $DIR chown -R nagios.nagios $DIR [ -e $FILE ] || >$FILE if [ `cat /App/nagios/tmp/.network-$DEVICE.tmp | wc -c` -eq 0 ];then echo -en `date +%s`"t" >$FILE echo -en `ifconfig $DEVICE | grep "RX bytes" | awk '{print $2}' | awk -F: '{print $NF}'`"t" >>$FILE echo `ifconfig $DEVICE | grep "RX bytes" | awk '{print $6}' | awk -F: '{print $NF}'`>>$FILE echo "This is first run" else New_Time=`date +%s` New_In=`ifconfig $DEVICE | grep "RX bytes" | awk '{print $2}' | awk -F: '{print $NF}'` New_Out=`ifconfig $DEVICE | grep "RX bytes" | awk '{print $6}' | awk -F: '{print $NF}'` Old_Time=`cat $FILE | awk '{print $1}'` Old_In=`cat $FILE | awk '{print $2}'` Old_Out=`cat $FILE | awk '{print $3}'` Diff_Time=`echo "$New_Time-$Old_Time"|bc` [ $Diff_Time -le 5 ] && echo "less 5s" && exit Diff_In=`echo "scale=0;($New_In-$Old_In)*8/$Diff_Time"|bc` Diff_Out=`echo "scale=0;($New_Out-$Old_Out)*8/$Diff_Time"|bc` [ $Diff_In -le 0 ] && Diff_In=`cat $FILE | awk '{print $4}'` [ $Diff_Out -le 0 ] && Diff_Out=`cat $FILE | awk '{print $5}'` echo "$New_Time $New_In $New_Out $Diff_In $Diff_Out" >$FILE if [ $Diff_In -gt $CIRT -o $Diff_In -eq $CIRT ];then echo -e "CIRT - $Diff_In|In=${Diff_In};${WARN};${CIRT};0;0;Out=${Diff_Out};${WARN};${CIRT};0;0" exit 2 fi if [ $Diff_In -gt $WARN -o $Diff_In -eq $WARN ];then echo -e "WARN - $Diff_In|In=${Diff_In};${WARN};${CIRT};0;0;Out=${Diff_Out};${WARN};${CIRT};0;0" exit 1 fi if [ $Diff_In -lt $WARN ];then echo -e "OK - $Diff_In|In=${Diff_In};${WARN};${CIRT};0;0;Out=${Diff_Out};${WARN};${CIRT};0;0" exit 0 fi fi
另外我们也可以从cat /proc/net/dev的结果里进行分析,具体可以根据下面的脚本修改下,和上面ifconfig计算平均流量的原理都是一样的:
#!/bin/bash echo -n "which nic?" read eth echo "the nic is "$eth echo -n "how much seconds:" read sec echo "duration is "$sec" seconds, wait please..." infirst=$(awk '/'$eth'/{print $1 }' /proc/net/dev |sed 's/'$eth'://') outfirst=$(awk '/'$eth'/{print $10 }' /proc/net/dev) sumfirst=$(($infirst+$outfirst)) sleep $sec"s" inend=$(awk '/'$eth'/{print $1 }' /proc/net/dev |sed 's/'$eth'://') outend=$(awk '/'$eth'/{print $10 }' /proc/net/dev) sumend=$(($inend+$outend)) sum=$(($sumend-$sumfirst)) echo $sec" seconds total :"$sum"bytes" aver=$(($sum/$sec)) echo "avrage :"$aver"bytes/sec"
注:第二个脚本获取的结果都是以bytes为单位的,即B 。和iptraf等流量监控工具获取到的结果是有8倍的差距的 ,iftraf 等工具获取的结果是以bit为单位的,即b 。如kb/s 、Mb/s 。为了同一般IDC公司所谓流量统一,所以我将第一个脚本里的结果转化也了bit 。
pnp4nagios所使用的流量监控模板为:
'#FF0000', 'green' => '#00FF00', 'blue' => '#0000FF', 'yellow' => '#FFFF00', 'black' => '#000000', 'deepred' => '#330000', ); $def[1] = "DEF:var1=$rrdfile:$DS[1]:AVERAGE " ; $def[1] .= "DEF:var2=$rrdfile:$DS[2]:AVERAGE " ; $def[1] .= "HRULE:$WARN[1]#FFFF00 "; $def[1] .= "HRULE:$CRIT[1]#FF0000 "; $def[1] .= "AREA:var1$colors[green]:"In " " ; $def[1] .= "GPRINT:var1:LAST:"%6.2lf last" " ; $def[1] .= "GPRINT:var1:AVERAGE:"%6.2lf avg" " ; $def[1] .= "GPRINT:var1:MAX:"%6.2lf maxn" "; $def[1] .= "LINE:var2$colors[blue]:"Out " " ; $def[1] .= "GPRINT:var2:LAST:"%6.2lf last" " ; $def[1] .= "GPRINT:var2:AVERAGE:"%6.2lf avg" " ; $def[1] .= "GPRINT:var2:MAX:"%6.2lf Totaln" " ; /* $def[1] .= "CDEF:total=var1,var2,+ " ; $def[1] .= "LINE1:total$colors[black]:"Total " " ; */ ?>
最后,得到的监控结果如下图:
另外从http://exchange.nagios.org/ 站点上还发现有一个号称也是通过check_nrpe实现,可以达到上面效果的一个插件。不过其用的是另外一个模板。具体可以参看nagios上的相关页面:http://exchange.nagios.org/directory/Plugins/Network-Connections%2C-Stats-and-Bandwidth/check_iftraffic_nrpe/details 。有兴趣的可以试下效果 。