Splunk 7.2.2 and systemd

Consider this a draft.  I’ll update it as I have time, but I’m posting now because it may help someone.

Splunk 7.2.2 brought along new features (which previously didn’t happen in a “maintenance release” – but that’s another topic for another time).  One of the new features is “systemd support”.  It didn’t take long before folks were on Splunk Answers wondering where their cheese had been moved to.  Some workarounds were provided, some of which work in some cases but not others.   So, @automine and I dug into a little more late today.  (Not done yet though)

When Splunk 7.2.2 is installed on a systemd-compatible system and you use splunk enable boot-start to create the systemd unit file, the Splunk CLI changes its mode of operation for the start, stop, and restart commands.   Specifically, it passes them through as calls to systemctl.  Below is a snippet of an strace capture of me running splunk stop as the splunk user.

29384 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fc4f84fb000
29384 write(1, "Stopping splunkd...\n", 20) = 20
29384 write(1, "Shutting down.  Please wait, as "..., 61) = 61
29384 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fc4f84f2a10) = 29417
29384 wait4(29417,  <unfinished ...>
29417 set_robust_list(0x7fc4f84f2a20, 24) = 0
29417 execve("/opt/splunk/bin/systemctl", ["systemctl", "stop", "Splunkd"], [/* 30 vars */]) = -1 ENOENT (No such file or directory)
29417 execve("/usr/local/bin/systemctl", ["systemctl", "stop", "Splunkd"], [/* 30 vars */]) = -1 ENOENT (No such file or directory)
29417 execve("/bin/systemctl", ["systemctl", "stop", "Splunkd"], [/* 30 vars */]) = 0
29417 brk(NULL)                         = 0x55c9c4485000

 

We see it fork a new child, and exec “systemctl stop Splunkd“.  Notice no call to sudo or anything here.  In a lot of customer environments I see/work in, the “Splunk Team” and the “OS team” exist on other sides of an organizational wall.  In Splunk 7.2.1, you could have easily use the splunk user as a service account and issue stop/start/restart commands to your heart’s content and it mostly just works.  In 7.2.2, those commands no longer work for you because Splunk MUST ask systemd to handle the stops and starts for it, so that systemd knows what is happening and can do process restarts and so forth.

One reasonable workaround here is adding sudo rules, and retraining the Splunk Team to use them.  Some sudo rules like these (courtesy of automine) make it possible for the splunk service account to issue the needful commands to systemd in order to stop/start/restart splunk:

splunk ALL=(root) NOPASSWD: /usr/bin/systemctl restart Splunkd.service
splunk ALL=(root) NOPASSWD: /usr/bin/systemctl stop Splunkd.service
splunk ALL=(root) NOPASSWD: /usr/bin/systemctl start Splunkd.service 
splunk ALL=(root) NOPASSWD: /usr/bin/systemctl status Splunkd.service

These don’t help without retraining though!  If your Splunk Admins continue to try to use the classic bin/splunk restart command that worked before, they will continue to be asked to authenticate as a wheel user each time.

Another workaround provided on Splunk Answers by twinspop adds rules to polkit to have systemd allow for the splunk user to make these calls without issue.  In this way, the classic bin/splunk restart would be transparently proxied to systemctl restart Splunkd, and systemctl would say “oh cool I don’t have to authenticate for this” and it would just happen.  Sadly, this workaround does not work on RHEL or Centos (tested at 7.6) because the version of systemd is too old there to provide the context that the policy needs.  Neither does it work on Ubuntu 18.04 because the version of Polkit on 18.04 is (best I can tell) too old to support Javascript polkit rules.

This workaround may work amazingly on other distributions, I’ve not tried them all yet.

 

Things you can do

  1. Use the sudo rules and retrain yourself to always use systemctl to manage your splunk processes
  2. Harass Splunk to add a capability to have their behind-the-scenes calls to systemd be prefaced w/ sudo
  3. Harass RHEL to backport the needful systemd chunks to their version of systemd.
  4. Harass Ubuntu to adopt a more modern polkit
  5. Use some other Linux distribution
  6. Stay on the Splunk 7.1 release train for the foreseeable future

I would not advise getting to 7.2.0 or 7.2.1 and “parking” there.  Any future 7.2 maintenance release is going to have this in it (unless Splunk takes it out further down the road and I hope they don’t).