Showing posts with label nagios. Show all posts
Showing posts with label nagios. Show all posts

Monday, February 11, 2013

Nagios, maintenance windows and puppet

Stardate: 90721.31
Nagios is an excellent network monitoring service. It is used in production where I work. We wanted to be able to create maintenance windows using the web gui. (Actually we wanted a command line utility, but thats a separate post.) Turning off the default 'readonly' mode turned out to be a real pain and poorly documented. I ended up following the recommendations of this blog as well as some of the comments by readers. I also took the time to create a snipped of puppet code you can put in your manifests to make it easy to turn the command mode on. Note that the puppet code uses the default 'nagiosadmin' user and that it uses file_line from the puppet labs stdlib module. The code is available with syntax highlighting here.
  # the following is for enabling write access to the web gui
  if $readonly_web == false {

    file_line {.
      '/etc/nagios3/nagios.cfg-external-commands-yes':
        line    => "check_external_commands=1",
        path    => '/etc/nagios3/nagios.cfg',
        notify  => Service['nagios3'],
        require => Package['nagios3'];
      '/etc/nagios3/cgi.cfg-all_service_commands':
        line    => "authorized_for_all_host_commands=nagiosadmin",
        path    => '/etc/nagios3/cgi.cfg',
        notify  => Service['nagios3'],
        require => Package['nagios3'];
      '/etc/nagios3/cgi.cfg-all_host_commands':
        line    => "authorized_for_all_service_commands=nagiosadmin",
        path    => '/etc/nagios3/cgi.cfg',
        notify  => Service['nagios3'],
        require => Package['nagios3'];
    }   

    user { 'nagios':
      groups      => ['nagios', 'www-data'],
      membership  => minimum,
      require     => Package['nagios3'];
    }   

    file { '/var/lib/nagios3/rw':
      owner   => 'nagios',
      group   => 'www-data',
      mode    => '2710',
      ensure  => directory,
      require => Package['nagios3'];
    }   

    file { '/var/lib/nagios3':
      owner   => 'nagios',
      group   => 'nagios',
      mode    => '0751',
      ensure  => directory,
      require => Package['nagios3'];
    }   

  }

Monday, January 28, 2013

PuppetDB/Storeconfigs Cache expiry

Stardate: 90682.98
After a couple of weeks of getting frustrated with puppet's Storeconfigs/puppetdb features, I have emerged victorious. PuppetDB is the newer, better, postgressier backend for puppet Storeconfigs. PuppetDB sports some really nice features including a fancy status/metrics web dashboard:
As you can see this is some interesting and potentially beneficial feedback. It is updated live and is mobile browser compatible. Personally, I'm happy to get graphs of this data any way I can, but I would prefer not to be locked into their dashboard. I would rather be able to get these data out of an often updated file or udp port so that I could send it to graphite for real time graphing and correlation with other metrics. I also don't see the point of having it be mobile friendly, since most everyone will have their puppetmaster/puppetdb server firewalled heavily and mobile devices have no business on the internal network. Some of the metrics can lead to actually tuning and performance boosts: mostly this is in the increasing the number of threads and the max jvm heap size.
The punchline here is that with
storeconfigs = true
in puppet.conf you can do exported/collected resource magics. When doing this with nagios resources I've been able to export and collect resources flawlessly. The problems came up when I tried to modify a resource. Since we make heavy use of dynamic git environments with puppet I was running something like
 puppet agent --test --environment=nagios 
on a host at random and
 puppet agent --test --environment=nagios 
on the nagios server, hoping to collect exported resources. The problem was they were not changing. As it turns out puppetdb can cache old exported resources for up to an hour. My advice for others having problem getting nagios or other exported resources to change or purge is to give it time. Run a big ssh for loop or use mcollective to hit all your boxes and hit the coffee cart for a quick pick me up. Chances are good you just need to give it time.