nagios模板、主机、服务与通知

监控的作用有两个:一是可以通过查看历史或当前了解主机一段时间内的运行情况、负载情况;一是在出现状况时及时发出通知,告知相关人员进行处理。这里主要说下后者。 在nagios的配置中,关于主机状态和服务状态通知的方式主要有三种调用方法,一是通过contacts或contact_groups;一是通过模板引用define contacts;一是通过define host模板引用。

本文主要为承接 nagios分组相关 这篇日志而写的。该文中最后提到nagios的配置引用方式非常灵活。这里就结合监控通知联系人的调用方式做一个说明。

一、联系人引用方法一(通过contacts或contact_groups)

先通过define contacts定义好通知人和通知方式,在主机或服务中的引用如下:

define service{
        use                          window-service    #引用定义的服务模板
        host_name                    jjh
        service_description             PING
        check_command               check_ping!100.0,20%!500.0,60%
        contacts                    admin1   #需事先定义过
        }
<br />

注:上面的use使用的是模板,对应我们经常说的templates.cfg中的内容。contacts引用的是contacts.cfg中的内容。

二、联系人引用方法二(通过模板引用define contacts)

1、先定义联系人

define  contact {
        contact_name                    ZheJiang
        use                             generic-contact #联系人中引用模板
        alias                           ZheJiang_Mobile
        service_notification_commands   notify-service-by-email,notify-service-by-sms
        email                           abc@361way.com,def@361way.com
        pager                           "1366XXXXXXX,13819XXXXXX"
        }
<br />

2、通过use引用

define  service{
    use                    ZheJiang   #引用联系人
    host_name              ZJ-ZJ-App
    service_description    CPU Load
    low_flap_threshold     0
    high_flap_threshold    0.999
    check_command          check_nrpe!check_load
}
define service {
    use                    ZheJiang   #引用联系人
    host_name              ZJ-ZJ-App
    service_description    Check_Disk
    check_command          check_nrpe!check_disk
}

注:这里直接使用通过use使用了contact定义,use的作用类似于编程中的 include  ,就是把前面定义过的东西直接套过来用。而上面define的contact里又use了templates.cfg中的定义。templates.cfg一般会定义通知触发条件,时间周期等。

三、联系人引用方法三(通过define host模板引用)

这里提到的方法和方法二其实是个对调,就是先定义好联系人,再在templates.cfg中通过contacts或contact_groups调用联系人。而host-xxxx.cfg中再去引用templates.cfg中的模板。由于方法二中已经提到过contacts.cfg中联系人的定义,这里就省过。这里只列几个templates.cfg中的常见定义:

#定义联系人模板
define contact{
        name                            generic-contact         ; The name of this contact template
        service_notification_period     24x7                    ; service notifications can be sent anytime
        host_notification_period        24x7                    ; host notifications can be sent anytime
        service_notification_options    w,u,c,r,f,s             ; #触发条件,下同
        host_notification_options       d,u,r,f,s               ; send notifications for all host states, flapping events, and scheduled downtime events
        service_notification_commands   notify-service-by-email ; send service notifications via email
        host_notification_commands      notify-host-by-email    ; send host notifications via email
        register                        0                       ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL CONTACT, JUST A TEMPLATE!
        }
#定义主机模板
define host{
        name                            generic-host    ; The name of this host template
        notifications_enabled           1               ; Host notifications are enabled
        event_handler_enabled           1               ; Host event handler is enabled
        flap_detection_enabled          1               ; Flap detection is enabled
        failure_prediction_enabled      1               ; Failure prediction is enabled
        process_perf_data               1               ; Process performance data
        retain_status_information       1               ; Retain status information across program restarts
        retain_nonstatus_information    1               ; Retain non-status information across program restarts
        notification_period             24x7            ; Send host notifications at any time
        register                        0               ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
        }
define host{
        name                            JJH-server      ; The name of this host template
        use                             generic-host    ; This template inherits other values from the generic-host template
        check_period                    24x7            ; By default, Linux hosts are checked round the clock
        check_interval                  5               ; Actively check the host every 5 minutes
        retry_interval                  1               ; Schedule host check retries at 1 minute intervals
        max_check_attempts              10              ; Check each Linux host 10 times (max)
        check_command                   check-host-alive ; Default command to check Linux hosts
        notification_period             workhours
        notification_interval           120             ; Resend notifications every 2 hours
        notification_options            d,u,r           ; Only send notifications for specific host states
        contact_groups                  admins-jjh      #包含引用联系人组
        hostgroups                      JJH-servers
        register                        0               ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
        }
#定义服务模板
define service{
        name                            generic-service         ; The 'name' of this service template
        active_checks_enabled           1                       ; Active service checks are enabled
        passive_checks_enabled          1                       ; Passive service checks are enabled/accepted
        parallelize_check               1                       ; Active service checks should be parallelized (disabling this can lead to major performance problems)
        obsess_over_service             1                       ; We should obsess over this service (if necessary)
        check_freshness                 0                       ; Default is to NOT check service 'freshness'
        notifications_enabled           1                       ; Service notifications are enabled
        event_handler_enabled           1                       ; Service event handler is enabled
        flap_detection_enabled          1                       ; Flap detection is enabled
        failure_prediction_enabled      1                       ; Failure prediction is enabled
        process_perf_data               1                       ; Process performance data
        retain_status_information       1                       ; Retain status information across program restarts
        retain_nonstatus_information    1                       ; Retain non-status information across program restarts
        is_volatile                     0                       ; The service is not volatile
        check_period                    24x7                    ; The service can be checked at any time of the day
        max_check_attempts              3                       ; Re-check the service up to 3 times in order to determine its final (hard) state
        normal_check_interval           10                      ; Check the service every 10 minutes under normal conditions
        retry_check_interval            2                       ; Re-check the service every two minutes until a hard state can be determined
        contact_groups                  admins                  ; Notifications get sent out to everyone in the 'admins' group
        notification_options            w,u,c,r                 ; Send notifications about warning, unknown, critical, and recovery events
        notification_interval           60                      ; Re-notify about service problems every hour
        notification_period             24x7                    ; Notifications can be sent out at any time
         register                        0                      ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
        }
#通过将notifications_enabled设为0,关闭通知
define service{
        name                            no-notice-service           ; The name of this service template
        use                             generic-service         ; Inherit default values from the generic-service definition
        max_check_attempts              4                       ; Re-check the service up to 4 times in order to determine its final (hard) state
        normal_check_interval           5                       ; Check the service every 5 minutes under normal conditions
        notifications_enabled           0                       ; Service notifications are enabled
        event_handler_enabled           0
        retry_check_interval            1                       ; Re-check the service every minute until a hard state can be determined
        register                        0                       ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
        }
#以下服务模板中指定了通知(联系人)组
define service{
        name                            windows-service           ; The name of this service template
        use                             generic-service         ; Inherit default values from the generic-service definition
        max_check_attempts              4                       ; Re-check the service up to 4 times in order to determine its final (hard) state
        normal_check_interval           5                       ; Check the service every 5 minutes under normal conditions
        retry_check_interval            1                       ; Re-check the service every minute until a hard state can be determined
        register                        0                       ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
        contact_groups                  admins-win              #包含引用联系人组
        }
define service{
        name                            JJH-service           ; The name of this service template
        use                             generic-service         ; Inherit default values from the generic-service definition
        max_check_attempts              4                       ; Re-check the service up to 4 times in order to determine its final (hard) state
        normal_check_interval           5                       ; Check the service every 5 minutes under normal conditions
        retry_check_interval            1                       ; Re-check the service every minute until a hard state can be determined
        register                        0                       ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
        contact_groups                  admins-jjh              #包含引用联系人组
        }
<br />

注:以上模板的书写可以看到非常灵活,可以设置是否通知,联系人组,通知频率,触发条件等。模板的书写为了以后在xxxhost.cfg中use引用方便,简少书写的内容。这又类似于编程中的变量。

而在比如361way.cfg之样的主机中引用模板时如下:

#模板引用
define host{
        use                     JJH-server     #使用模板
        host_name               jjh-cc
        parents                 aliyun
        statusmap_image         linux40.gd2
        alias                   jjh-cc
        address                 115.29.161.54
        notification_interval   0
        process_perf_data       1
        action_url              /pnp4nagios/graph?host=$HOSTNAME$
        }
define service{
        use                             JJH-service,srv-pnp         ; Name of service template to use
        host_name                       jjh-cc
        service_description             PING
        check_command                   check_ping!100.0,20%!500.0,60%
        }
define service{
        use                             JJH-service,srv-pnp
        host_name                       jjh-cc
        service_description             check_cpu
        check_command                   check_nrpe!check_cpu
        }

四、总结

以上主要通过示例试图说明白nagios内contacts.cfg、templates.cfg、XXXhost.cfg之间的灵活引用关系。不过这里还省略了一个timeperiods.cfg (主要用于定义时间,例如工作或休息,中国时间和美国时间等通知的时间范围)。如果直接看上面的配置或我上面提到的三种方式可能会越看越迷糊,下面几句总结可能会对理解有所帮助。

1、从最笨的一思路出发,你在hostxxx.cfg中定义监控项时,可以直接加入service_notification_options、service_notification_period、notification_interval、notification_interval、contact_groups等参数。一样的可以实现你的监控通知需要。

2、为简化上面的笨方法,你将以上参数定义了一个变量,给其取了一个名字,在templates.cfg中做了定义,然后在hostxxx.cfg中通过use + name(template.cfg中定义的)的方式调用。ok,上面提到的参数都在模板中了,可以省略了。

3、联系人比较多时,不同的应用和主机要通知到不同的人,又取了一个contacts.cfg的文件,在其中对主要对通知人员做了定义和划分。无论是contacts  use templates还是templates contact contact.cfg,最终不过是让其配置做了个汇总给hostxxx.cfg use  。

4、配置文件无论几个或者取什么名字等无所谓,如果你高兴,可以只设置一个配置文件。多个配置文件名的作用是便于区分,便于查找,简化工作。最终只要在nagios.cfg中include,nagios可以很多的做出处理。

5、define的作你就可以当做是定义变量,use的作用可以当作是引用变量或include配置文件。contacts、contact_groups这些都是nagios参数,可以看作系统内部函数。

参考页面:nagios在线手册

发表回复

您的电子邮箱地址不会被公开。 必填项已用*标注