Proving a Negative

I’ve got this Foo Fighters lyric stuck in my head …

All my life I’ve been searching for something.  Something never comes, never leads to nothing.

This seems, relevant, given my focus on search technologies in my career.  Today, I’m going to talk about proving a negative.  That is, I’m going to talk about searching for something that does not exist.  This is a problem that seems to come up all the time – how do I find the thing that didn’t happen?  Usually in the Splunk Usergroups Slack, or in Splunk Answers it’s disguised as things like “find my missing <X>”, where <X> is “host”, “server”, “application”, or something.

George and I talked about this a long long time ago at a Splunk conference in the context of a lookup talk.  At the time we called it a “sentinel lookup”, but the term really didn’t catch on anywhere.  I’m going to revisit that approach, and maybe improve on it a little.

Last night, I set up a new install of Splunk Enterprise 8.0.0.0, along with Clint’s excellent gogen tool.  I used an out-of the-box gogen configuration to send in logs from my 5 example webservers.  I can see this using the search:

earliest=-1d index=main sourcetype=access_combined | stats count by host

Oh dear .. one of my webservers is missing.  I can see this, just looking at it – but what if I had 5,000 webservers?  Scrolling through that list to eyeball the missing one would take some effort.

Why don’t I have a line for web-04.bar.com with a count of zero?

Because Splunk’s search facility CANNOT USE SEARCH TO FIND WHAT DOES NOT EXIST.  This is important.  There are no events for web-04.bar.com, so it’s not possible to use search to find them.  We need an enumeration of all possible webservers in order to help identify the ones that did not come out in our search results.

What is the best way of making an enumeration?  I believe it’s a lookup.  So let’s make a lookup file, in /opt/splunk/etc/apps/search/lookups/webservers.csv:

host
web-01.bar.com
web-02.bar.com
web-03.bar.com
web-04.bar.com
web-05.bar.com

Yes it’s a CSV with exactly one column.  I could have made it more complex but I didn’t.  We can test it with the inputlookup search command like so:

| inputlookup webservers.csv

There we go – all 5 are listed in my lookup.  Now let’s marry these two objects together in a search.

earliest=-1d index=main sourcetype=access_combined 
| stats count by host 
| inputlookup append=true webservers.csv 
| fillnull count 
| stats sum(count) as count by host

Hey, now I have a line for web-04.bar.com with a count of zero.  I can alert on that!  I threw this together without a lot of explanation, so let’s talk through it in pieces…

earliest=-1d index=main sourcetype=access_combined 
| stats count by host

This part you know all about.  We’re finding the events that do exist in the search results.

| inputlookup append=true webservers.csv
| fillnull count

Now we’re appending the contents of our enumeration lookup, using a fillnull to fill in the count column if it happens to be null.  Now, every host in the lookup will exist at least once, with a zero count.  And some hosts – the ones where we have data in the search results – will have a second row with a count of their actual events seen in the data.

| stats sum(count) as count by host

Here we are taking a little advantage of basic elementary school math.  Anything plus zero is itself.  So we do a second stats to sum up the counts by host.  Now, we are left with only one row per value of host – it will either be the original count from the search results (for a host that was in the original search results), or it will be zero (for a host that was not).

So it turns out it really is easy to search for something that doesn’t exist – you just have to know all of the possible values of what could or could not exist…

How do I make my lookup of all possible hosts, though?  Well, the easiest way of doing that is “know your environment”.  There is a reason why “Inventory of hardware assets” is CIS Control #1.

Until next time..

4 thoughts on “Proving a Negative

  1. This is great. But as I add instances of many things to many servers, do I really want to log onto the splunk> server and modify the various csv files for all the things I’d want to know “went zero”? No.

    Instead of the csv, can I have splunk> find all the log files (matching a set of criteria) which existed in the past 7 days, and use that as the list to discover “Hey, this log used to be current, but it ain’t logging anymore!”?

    This way, the alert learns on its own over time of all new instances, AND THEN tugs my shirt-tail if one of them stops.

    • OH wow .. I guess I’m not getting notifications on comments here.. But yeah, sure, you can do that. Use a search w/ outputlookup to make your “tracking lookup” based on what data has already been seen. “Self learning” and all that. Sorry for the year later response!

  2. Thank you for documenting the solution to the problem.
    In my lookup table, there are multiple columns, thus, in your SPL can you please suggest how do I read only from column host?

Leave a Reply

Your email address will not be published. Required fields are marked *