Splunk: integrating Nagios and OSSEC
Posted by lucas on June 18, 2012
Jun182012
I've started with my bachelor's test and decided to have a look at Splunk. We are building our own cloud using openstack and I'm in charge of monitoring and securing all of our machines and instances. The problem these days is that a lot of apps come with their own webinterface. It's ok to have 3 or 4 or even maybe 5 interfaces, but imagine you have 20 or even 30 different webapplications, all with their own interface. This is where Splunk comes in.
What is Splunk
Splunk is built to do analysis on machine generated data. This means that it can search through logs... a lot of logs. Splunk is built to digest and analyze "big data", this is data that grows rapid in amount and complexity. Splunk is made to crunch all of the data and present you with something a lot more comprehensible.You just feed data to Splunk and it will munch through it. You can do query's on it and generate graphs. This is useful to identify problems in your IT infrastructure, you might have an application that generates tons of logs. With Splunk you just make a query to look for some specific keyword, then you just extract some keywords and you can make a nice graph. For instance if you have a distributed application that generates some errors, you can quickly generate a graph to see where and when the errors occur. For a system administrator this makes your life a whole lot easier to find and resolve problems. The nice part is it can also generate weekly reports or send you alerts when it sees something passing in the logs.
There is just one big downside to all this awesomess, Splunk isn't free. You can use it for free for about 500 mb of logs per day (free version is a bit more limited than the enterprise version), if you go over that amount you can start to dig deep, for pricing have a look here.
Nagios integration
There is a plugin for Nagios, it is pretty easy to integrate with Splunk, just go to apps -> find more apps and search for"Splunk for Nagios". Install it and restart Splunk. You will need to add some data to your nagios.cfg:
Configuring Nagios
1
2
3
4
5
6
7
8
9
10
| perfdata_timeout=5 process_performance_data=1 host_perfdata_command=nagios-process-host-perfdata service_perfdata_command=nagios-process-service-perfdata host_perfdata_file_mode=a service_perfdata_file_mode=a host_perfdata_file_processing_interval=86400 service_perfdata_file_processing_interval=86400 host_perfdata_file_processing_command=nagios-process-host-perfdata- file service_perfdata_file_processing_command=nagios-process-service-perfdata- file |
Add these lines to your command.cfg (do check before you add them if they are not already present):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
| # 'nagios-process-host-perfdata' command definition define command { command_name nagios-process-host-perfdata command_line /usr/bin/printf "%b" "$TIMET$ src_host=\"$HOSTNAME$\" perfdata=\"HOSTPERFDATA\" hoststate=\"$HOSTSTATE$\" attempt=\"$HOSTATTEMPT$\" statetype=\"$HOSTSTATETYPE$\" executiontime=\"$HOSTEXECUTIONTIME$\" reason=\"$HOSTOUTPUT$\" result=\"$HOSTPERFDATA$\"\n" >> /opt/nagios/var/host-perfdata } # 'nagios-process-service-perfdata' command definition define command { command_name nagios-process-service-perfdata command_line /usr/bin/printf "%b" "$TIMET$ src_host=\"$HOSTNAME$\" perfdata=\"SERVICEPERFDATA\" name=\"$SERVICEDESC$\" severity=\"$SERVICESTATE$\" attempt=\"$SERVICEATTEMPT$\" statetype=\"$SERVICESTATETYPE$\" executiontime=\"$SERVICEEXECUTIONTIME$\" latency=\"$SERVICELATENCY$\" reason=\"$SERVICEOUTPUT$\" result=\"$SERVICEPERFDATA$\"\n" >> /opt/nagios/var/service-perfdata } # 'nagios-process-host-perfdata-file' command definition define command { command_name nagios-process-host-perfdata- file command_line /bin/cat /dev/null > /opt/nagios/var/host-perfdata } # 'nagios-process-service-perfdata-file' command definition define command { command_name nagios-process-service-perfdata- file command_line /bin/cat /dev/null > /opt/nagios/var/service-perfdata } |
The
1
| /opt/nagios/var/ * |
should be replaced with where your nagios logs are stored. For me (Ubuntu 12.04 LTS) this is
1
| /var/log/nagios3/service-perfdata |
. Now add these parameters to your templates.cfg:
1
| process_perf_data 1 |
Now reload your nagios server with
1
| /etc/init .d /nagios reload |
.
Configuring Splunk
If you are not running it on the same machine you will need to Rsync it according to the official tutorial, I personally feel 5 minutes can be a bit of a delay if you want to use realtime data. So in my opinion it is preferable to setup a Universal Forwarder for Splunk.
The next step to add sources is almost the same as you find on the Splunk for Nagios configuration page.
- a/nagios.log
- Click Manager > Data inputs > Files & Directories > New
- Specify the source: Continuously index data from a file or directory this Splunk instance can access
- Full path to your data: eg. /log/nagios/nagios.log
- Tick More settings
- Set host: constant value
- Host field value: eg. hostname.abc.com.au
- Set the source type: Manual
- Source type: nagios
- Index: nagios
- Click Save
- b/ host-perfdata :
- Click Manager > Data inputs > Files & Directories > New
- Specify the source: Continuously index data from a file or directory this Splunk instance can access
- Full path to your data: eg. /log/nagios/host-perfdata
- Tick More settings
- Set host: constant value
- Host field value: eg. hostname.abc.com.au
- Set the source type: Manual
- Source type: nagioshostperf
- Index: nagios
- Click Save
- c/ service-perfdata :
- Click Manager > Data inputs > Files & Directories > New
- Specify the source: Continuously index data from a file or directory this Splunk instance can access
- Full path to your data: eg. /log/nagios/service-perfdata
- Tick More settings
- Set host: constant value
- Host field value: eg. hostname.abc.com.au
- Set the source type: Manual
- Source type: nagiosserviceperf
- Index: nagios
- Click Save
Now this is not possible when using the forwarder. If you want to use the Universal Forwarder, create the indexes on your Splunk server. When defining the monitor for the file you need to make sure you are using the correct names:
1
2
3
4
5
6
7
8
9
10
11
| [monitor: ///var/log/nagios3/nagios .log] index=nagios sourcetype=nagios [monitor: ///var/log/nagios3/hostperfdata ] index=nagios sourcetype=hostperfadata [monitor: ///var/log/nagios3/serviceperfdata ] index=nagios sourcetype=serviceperfadata |
Configuring performance graphs
When you access the performance graphs by: apps =>Splunk for Nagios=> Performance Data=> Nagios Linux Performance Data (select a host from the drop down menu in the left corner) you will normally see some of the graphs. If for some reason you do not see CPU or Memory performance data, you will need to change the description of your memory performance in your commands.cfg on your Nagios server or in the Splunk script. The supported plugins are:
You can use these scripts with NRPE. The important part is here to use the correct service_description when defining a service. If not, Splunk for Nagios will not detect the data. My services are configured like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
| define service{ use openstack-service hostgroup_name openstack service_description CPU Usage check_command timeout_nrpe!check_cpu_perf!20 } define service{ use openstack-service hostgroup_name openstack service_description Memory Usage check_command timeout_nrpe!check_mem!20 } |
I created a new command to do the NRPE because I saw that Nagios did not wait long enough to do the remote call of the scripts (/etc/commands.cfg):
1
2
3
4
| define command { command_name timeout_nrpe command_line /usr/lib/nagios/plugins/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ -t $ARG2$ } |
If you want to use this approach you will need to download the scripts to every host. Define a command on each host at /etc/nagios/nrpe.cfg
1
2
| command [check_cpu_perf]= /usr/lib/nagios/plugins/check_cpu_perf .sh 20 10 command [check_mem]= /usr/lib/nagios/plugins/check_mem -w 10 -c 5 -f |
OSSEC Integration
Integrating Splunk for OSSEC is very easy, just go to Apps and the search for Splunk for OSSEC. Install it, restart Splunk and off you go. If your OSSEC server is running on a separate machine you will need to configure syslog_outputon OSSEC.
- Inside ossec.conf add a syslog_output block specifying your Splunk system IP address and the port it is listening on:
- Now you need to enable the syslog_output module and restart OSSEC:
- On the Splunk side, add this stanza to inputs.conf:
1
2
3
4
| <syslog_output> <server>172.10.2.3< /server > <port>10002< /port > < /syslog_output > |
1
2
| #/var/ossec/bin/ossec-control enable client-syslog #/var/ossec/bin/ossec-control restart |
1
2
3
4
5
| $SPLUNK_HOME /etc/system/local/inputs .conf [udp: //172 .10.2.4:10002] #IP address of OSSEC server disabled = false sourcetype = ossec |
By setting the sourcetype as OSSEC you’re ready to take advantage of the Splunk for OSSEC app which can be foundhere.
Make sure you update any local or network firewalls that this communication is traversing and then restart Splunk.
Make sure you update any local or network firewalls that this communication is traversing and then restart Splunk.
Remote Agent Management
Now you can add all your OSSEC agents to Splunk. This means the Splunk service account has access to log into your OSSEC server and run commands. Now be careful, because this has some security implications for your environment. The following guide was made by southeringtonp.
- Remote Access Configuration:First, you will need to make sure that the Splunk server can log into the OSSEC server to run management commands.
- On the OSSEC server, create a new login account for the Splunk server to use when connecting.
- On the Splunk server, create an SSH keypair for the root user (or whichever account splunkd is running as), and copy the public key to the OSSEC server.
- On the OSSEC server, log in as the splunk account and configure the authorized_keys file to allow SSH logins without a password:
- Verify that the Splunk server can log into the OSSEC server without a password prompt.
You MUST do this at least once and say `yes` to the SSH key prompt.
The second run should not prommpt. - On the OSSEC server, configure `sudo` to allow the splunk login account to run agent management commands without prompting.
- On the OSSEC server, verify that the new `splunk` account can run the agent management commands without prompting. If either of the following commands prompts for a password, you may have made a mistake in the previous step:
- On the Splunk server, verify that you can remotely run the commands without a password:
1root@ossec_server$
useradd
splunk
123user@splunk_server$
sudo
su
-
root@splunk_server
# ssh-keygen
root@splunk_server
# scp .ssh/id_rsa.pub splunk@ossec_server:authorized_keys
123splunk@ossecserver$
mkdir
.
ssh
splunk@ossecserver$
mv
authorized_keys .
ssh
/
splunk@ossecserver$
chmod
-R go-rwx .
ssh
1234root@splunkserver
# ssh splunk@ossec_server
splunk@ossecserver$
exit
root@splunkserver
# ssh splunk@ossec_server
splunk@ossecserver$
exit
1root@ossecserver
# /usr/sbin/visudo
(Add the following two lines):12splunk ALL=NOPASSWD:
/var/ossec/bin/agent_control
-l
splunk ALL=NOPASSWD:
/var/ossec/bin/manage_agents
12splunk@ossec_server$
sudo
/var/ossec/bin/agent_control
-l
splunk@ossec_server$
sudo
/var/ossec/bin/manage_agents
12root@splunk-server$
ssh
ossec-server -t -l splunk
sudo
/var/ossec/bin/agent_control
-l
root@splunk-server$
ssh
ossec-server -t -l splunk
sudo
/var/ossec/bin/manage_agents
- App Configuration:
- All of the following steps are performed on the Splunk server.
- Check to see if you already have a local copy of ossec_servers.conf:
- Create the `local` directory and `ossec_servers.conf` file if they are missing:
- Edit `local/ossec_servers.conf` and disable the local machine if you do not have an OSSEC server on the local machine.
- In `local/ossec_servers.conf`, add your new server:
- Restart Splunkd
1
2
| root@splunk_server # cd /opt/splunk/etc/apps/ossec root@splunk_server # ls -l local |
1
2
| root@splunk_server # mkdir local root@splunk_server # cp default/ossec_servers.conf local/ |
1
2
| [_local] DISABLED = True |
(If your ssh key is in the default path, the '-i' parameter used in some examples is not reuquired)
1
2
3
| [ossec_server] AGENT_CONTROL = ssh ossec-server -t -l splunk sudo /var/ossec/bin/agent_control MANAGE_AGENTS = ssh ossec-server -t -l splunk sudo /var/ossec/bin/manage_agents |
1
| /opt/splunk/bin/splunk restart splunkd |
Final Word
I only worked with Splunk a few weeks to see how it works and what it can do. I really am quite impressed with this piece of software, you can feed it tons of data and recover relevant information easily. I think this can increase the productivity of an administrator significantly when troubleshooting a problem.