Splunk and POSIX capabilities

I seem to catch myself talking about this a lot in Slack, so I’m just going to write it all down here and refer people to it.

A common issue for Splunk deployments is how to securely deploy the Universal Forwarder.  Best practice says “don’t run anything as root that doesn’t need to”, but there’s a counter argument:  maintaining filesystem permissions for an unprivileged process to read “all the log files” on a given system is hard.

The Unix permissions model gets a little hairy here – you wind up either making the splunk user a member of a bunch of different groups and enforcing that all of the log files have the group-read bit  (oh and all of the directories down from the root have either g+rx or o+rx), or you dive into the abyss of setfacl for each file that you want Splunk to read.  One challenge here is that maintaining permissions like these across disparate apps is often highly customized to a single server or a small group of servers running the same workload.  The apps change, the way the app handled log rotation changes, by some manner eventually permissions change.  You have to stay on top of it.

Consider the case where we are collecting logs for security monitoring. In this use case, messed up permissions equals a loss of visibility.  You don’t want to lose visibility, but you also don’t want to run as root.  So you’re left with honestly two poor choices.  Either you:

  • Run as root, be guaranteed you’ll always be able to read all the logs and won’t ever have a loss of visibility for your security monitoring.  You accept that a compromise of splunkd gives the attacker root access.

Or

  • Run splunk as an unprivileged user and configure filesystem access to allow this unprivileged user to see the files it needs to see.  You accept that any permissions errors result in a loss of visibility that could give an attacker the ability to exist in your environment undetected.

 

I tried to find a third way.  Unfortunately, it doesn’t work.  If you were looking for a great solution to this, I’ve let you read this far only to let you down.  Sorry.  Unless you’re on Solaris, but I’ll get to that in a minute!

The third way I tried to make work was POSIX capabilities.  The idea of capabilities being “let’s take all of the things that make root, well… root, and export them as granular items.”  For example, “only root can make a process listen on a port < 1024” — becomes CAP_NET_BIND_SERVICE.  Only a process with CAP_NET_BIND_SERVICE can listen on a port < 1024.  Or, “only root can kill a process owned by another user” becomes CAP_KILL.  Only a process with CAP_KILL can send signals to a process owned by any user.  In Linux, capabilities are assigned to a binary on disk.  So, you could setcap CAP_NET_BIND_SERVICE /usr/local/bin/myprogram and any instantiations of myprogram by any user should be able to listen on a port < 1024.

The capability that a Splunk UF really needs is called CAP_DAC_READ_SEARCH.  A program granted CAP_DAC_READ_SEARCH can read any file on disk without permissions checks, just like root.  It cannot change them, but it can read them.  It’s like a capability purpose-built for a log collection agent.

I did some testing of this in both Solaris and Linux.   Good news for Solaris people – if you launch Splunk using an SMF manifest then SMF can pass CAP_DAC_READ_SEARCH onward to splunkd, and it works great.

For Linux folks, the news is not as good.  There’s a couple of open issues with Splunk, SPL-115155 and SPL-112588.  These two highlight a couple of known issues with making Splunk work properly with CAP_DAC_READ_SEARCH.   If you have a support contract, and would like to be able to run Splunk as a non-root user and use CAP_DAC_READ_SEARCH to enable it to successfully read log files without you having to set and maintain granular permissions then you should open a case asking for it.

 

There!  Now I can just link people here when this comes up.  Thanks for reading.