DIY nagios API自动更新监控主机

nagios 虽然是一个比较不错的监控软件,不过因为其和zabbix相比,API功能几乎就是空白,而在很多企业中仍然以nagios为主要监控工具。在大批量需要自动化监控的环境中,没有API的nagios 监控自动化被严重制肘。不过东西是死的,人是灵活的。本篇就结合一个现网环境要求,通过php写一个简单的更新配置nagios配置文件并可以自动加载配置的API 功能来实现主机的基本监控块可以实现自动添加。在大批量部署时,可以再结合puppt、cfengine、saltstack等自动化工具完成。

一、updatecfg.php文件

nagios更新配置文件的API文件:

n";
    $ncont.=$buffer;
}
fclose($ofp);
$wfp=fopen($wfile,'w');
fwrite($wfp,$ncont);
fclose($wfp);
echo exec("/etc/init.d/nagios3 reload");
echo "Updata config file success!";
?>
<br />

原理很简单,这里定义三个需要传入的变量---主机名、别名、IP ,并且在接受client post过来的参数值时,需要先比对一个KEY值,如果KEY值通过就通过模板文件,进行参数替换,并将新的文件保存的相应的路径。最后reload配置文件生效(reload的好处就是,既然配置文件中有错误,其照样能将解析正常的配置文件,不影响整个监控)。

二、client post脚本

这里使用的shell 脚本,内容为:

#!/bin/bash
key="31d860f7-6f7f-48d3-97b3-8407d5083f34"
hostname=$(curl -s http://169.254.169.254/latest/user-data | json.sh | grep -m1 '["hostname"]' | awk '{print $2}' | sed 's/"//g')
pip=$(curl -s http://169.254.169.254/latest/meta-data/public-ipv4)
iid=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
postdata="hostname=${hostname}&pip=${pip}&iid=${iid}&key=${key}"
curl -u "nagios:abc123" -sk -d ${postdata} http://nagios.361way.com/nagios3/cgi-bin/updatecfg.php
<br />

注:由于我这里是AWS主机,通过上面的URL可以取到本机的三个值(实际非AWS主机,可以通过shell 语句获取本机信息) 。同样,可以通过调用ec2metadata命令取值,如下:

# ec2metadata
ami-id: ami-3b879652
ami-launch-index: 0
ami-manifest-path: (unknown)
ancestor-ami-ids: unavailable
availability-zone: us-east-1a
block-device-mapping: ami
ebs1
ephemeral0
root
instance-action: none
instance-id: i-05b3e152
instance-type: c3.xlarge
local-hostname: ip-10-19-255-21.ec2.internal
local-ipv4: 10.19.255.21
kernel-id: aki-88aa75e1
mac: unavailable
profile: default-paravirtual
product-codes: unavailable
public-hostname: ec2-58-85-121-145.compute-1.amazonaws.com
public-ipv4: 58.85.121.145
//不要打我ssh 公钥的主意,IP是伪造的,key也是伪造的,嘿嘿
public-keys: ['ssh-rsa YDUNjq8WYsDh685BSofB4v4Kq+CjsXs3QF+5lERjVjet0PBwibk/Gs0tG3oBE1+HqMtiZaOTeifehnxMQUn4RsFCLqfGy US_E_VPC']
ramdisk-id: unavailable
reserveration-id: unavailable
security-groups: vpc_norules
user-data: {"hostname":"AMZ-IAD-Zabbix-255-21"}
<br />

三、TEMPLATE文件

在updatecfg.php文件中调用的TEMPLATE文件内容如下:

##### host define
define host{
    use                     generic-host,host-pnp4nagios
        host_name               DY-HOSTNAME
        alias                   INSTANCEID
        address                 IPADDRESS
}
##### common service
define service{
    use                             hour-service,service-pnp4nagios
        host_name                       DY-HOSTNAME
        service_description             Disk Space
        check_command                   check_nrpe!check_disk!check_disk -w 20% -c 10%
}
define service{
    use                             generic-service,service-pnp4nagios
        host_name                       DY-HOSTNAME
        service_description             Current Users
        check_command                   check_nrpe!check_users!check_users -w 8 -c 10
}
define service{
    use                             minute-service,service-pnp4nagios
        host_name                       DY-HOSTNAME
        service_description             Total Processes
        check_command                   check_nrpe!check_procs!check_perf check_procs -w 250 -c 400
}
define service{
    use                             minute-service,service-pnp4nagios
        host_name                       DY-HOSTNAME
        service_description             Zombie Processes
        check_command                   check_nrpe!check_procs!check_perf check_procs -w 5 -c 10 -s Z
}
define service{
    use                             minute-service,service-pnp4nagios
        host_name                       DY-HOSTNAME
        service_description             CF-Execd Processes
        check_command                   check_nrpe!check_procs!check_perf check_procs -w 1: -c 1: -C cf-execd
}
define service{
    use                             minute-service,service-pnp4nagios
        host_name                       DY-HOSTNAME
        service_description             Current Load
        check_command                   check_nrpe!check_load!check_load -w 5.0,5.0,5.0 -c 8.0,8.0,8.0
}
define service{
    use                             minute-service
        host_name                       DY-HOSTNAME
        service_description             SSH
        check_command                   check_ssh
}
define service{
    use                             minute-service,service-pnp4nagios
        host_name                       DY-HOSTNAME
        service_description             CPU Usage
        check_command                   check_nrpe!check_cpu!check_cpu -w 80 -c 90
}
define service{
    use                             minute-service,service-pnp4nagios
        host_name                       DY-HOSTNAME
        service_description             Open FD
        check_command                   check_nrpe!check_open_fds!check_open_fds -W 80 -C 90
}
define service{
    use                             minute-service,service-pnp4nagios
        host_name                       DY-HOSTNAME
        service_description             Context Switches
        check_command                   check_nrpe!check_context_switches!check_context_switches -w 20000 -c 25000
}
define service{
    use                             minute-service,service-pnp4nagios
        host_name                       DY-HOSTNAME
        service_description             Eth0 Traffic
        check_command                   check_nrpe!check_iftraffic!check_iftraffic -i eth0 -b 100 -u m
}
define service{
    use                             minute-service,service-pnp4nagios
        host_name                       DY-HOSTNAME
        service_description             Memory Usage
        check_command                   check_nrpe!check_mem!check_mem  -w 90 -c 95
}
##### unique service
<br />

以上代码已上传至本人github站点上。

发表回复

您的电子邮箱地址不会被公开。 必填项已用*标注