Effect of kernel filesystem caching on Splunk performance

Unlike a traditional relational DBMS, Splunk does not use an in-process buffering or caching mechanism.  That is to say, there is not such thing as an SGA for your Oracle types, and the DB/2 DBAs may be disappointed to find there’s no bufferpool.  Instead, Splunk counts on the operating system’s native caching for files in order to cache data.

This can sometimes make it harder to know the effectiveness of memory in an indexer on search performance.  But, there are some very nice tools to help make some more information available.  One of these is a SystemTap script, https://sourceware.org/systemtap/wiki/WSCacheHitRate.  This gives us some visibility into the Linux kernel’s VFS layer to see how frequently the kernel is able to satisfy IOs from the cache versus having to issue IO against the actual block device.  I have made a 3 or 4 line change from the script on the SystemTap site in order to add a timestamp to each output line, but that’s all.

So let’s look at an example of a very dense search:

index=* splunk_server=splunkidx01.myplace.com | stats count

I’m running this from my search head, and limiting it to the single indexer in order to accurately measure, while it’s running, the overall cache effectiveness and CPU usage. I’ll manually finalize the search after approx 10,000,000 events. But before we start, let’s dump the kernel’s cache and confirm it’s been done.

[dwaddle@splunkidx01 ~]$ sudo -i bash -c "echo 1 > /proc/sys/vm/drop_caches"
[sudo] password for dwaddle: 
[dwaddle@splunkidx01 ~]$ free -g
             total       used       free     shared    buffers     cached
Mem:            94          2         91          0          0          0
-/+ buffers/cache:          2         92
Swap:            1          0          1
[dwaddle@splunkidx01 ~]$

Now we can run our search in one window, while running the SystemTap script in another and a top command in yet a third.  When the search has finished, we have (thanks Search Inspector!):

This search has completed and has returned 1 result by scanning 10,272,984 events in 116.506 seconds.
The following messages were returned by the search subsystem:

DEBUG: Disabling timeline and fields picker for reporting search due to adhoc_search_level=smart
DEBUG: [splunkidx01.myplace.com] search context: user="dwaddle", app="search", bs-pathname="/opt/splunk/var/run/searchpeers/searchhead.myplace.com-1397608437"
DEBUG: base lispy: [ AND index::* splunk_server::splunkidx01.myplace.com ]
DEBUG: search context: user="dwaddle", app="search", bs-pathname="/opt/splunk/etc"
INFO: Search finalized.

A little math says we were scanning about 88K events per second. Running the same search immediately after shows slightly improved performance in terms of events scanned per second.

This search has completed and has returned 1 result by scanning 10,194,402 events in 101.391 seconds.

The following messages were returned by the search subsystem:

DEBUG: Disabling timeline and fields picker for reporting search due to adhoc_search_level=smart
DEBUG: [splunkidx01.myplace.com] search context: user="dwaddle", app="search", bs-pathname="/opt/splunk/var/run/searchpeers/searchhead.myplace.com-1397608920"
DEBUG: base lispy: [ AND index::* splunk_server::splunkidx01.myplace.com ]
DEBUG: search context: user="dwaddle", app="search", bs-pathname="/opt/splunk/etc"
INFO: Search finalized.

Now, we’re up closer to 100K events scanned per second. So the cache helped, but not as much as you might expect considering the speed difference between memory and disk. If I look at the output of my two other data captures, we’ll see moderately higher CPU usage on the 2nd (cached) search:

First search:

21994 splunk    20   0  276m 115m 8260 S 90.4  0.1   1:33.54 [splunkd pid=2268] search 
21994 splunk    20   0  276m 115m 8260 R 89.5  0.1   1:36.24 [splunkd pid=2268] search 
21994 splunk    20   0  292m 121m 8260 R 82.6  0.1   1:38.73 [splunkd pid=2268] search 
21994 splunk    20   0  292m 120m 8260 S 92.1  0.1   1:41.51 [splunkd pid=2268] search

Second search:

 1087 splunk    20   0  280m 115m 8260 R 99.0  0.1   1:31.59 [splunkd pid=2268] search 
 1087 splunk    20   0  280m 118m 8260 R 100.1  0.1   1:34.62 [splunkd pid=2268] search 
 1087 splunk    20   0  288m 118m 8260 R 100.0  0.1   1:37.64 [splunkd pid=2268] search 
 1087 splunk    20   0  288m 117m 8260 R 100.0  0.1   1:40.66 [splunkd pid=2268] search

These are just samples, but you get the idea. In the first run, the search process had to wait longer for data to come off disk, thusly its (instantaneous) measure of CPU was lower. With the second, the faster I/O coming out of cache aggravated a CPU bottleneck.

Now if we look at the SystemTap cache hit data from the first run:

Timestamp                        Total Reads (KB)   Cache Reads (KB)    Disk Reads (KB)  Miss Rate   Hit Rate
Tue Apr 15 20:39:59 2014 EDT                10681               6665               4016     37.59%     62.40%
Tue Apr 15 20:40:04 2014 EDT                40341              19025              21316     52.83%     47.16%
Tue Apr 15 20:40:09 2014 EDT                12593               3033               9560     75.91%     24.08%
Tue Apr 15 20:40:14 2014 EDT                22348                  0              22416    100.00%      0.00%
Tue Apr 15 20:40:19 2014 EDT                47870              25754              22116     46.19%     53.80%
Tue Apr 15 20:40:24 2014 EDT                42429              19069              23360     55.05%     44.94%
Tue Apr 15 20:40:29 2014 EDT                38192              18080              20112     52.65%     47.34%
Tue Apr 15 20:40:34 2014 EDT                30952              15860              15092     48.75%     51.24%
Tue Apr 15 20:40:39 2014 EDT                29566              16098              13468     45.55%     54.44%
Tue Apr 15 20:40:44 2014 EDT                31857              16389              15468     48.55%     51.44%
Tue Apr 15 20:40:49 2014 EDT                38048              23796              14252     37.45%     62.54%
Tue Apr 15 20:40:54 2014 EDT                31849              18397              13452     42.23%     57.76%
Tue Apr 15 20:40:59 2014 EDT                39369              23689              15680     39.82%     60.17%
Tue Apr 15 20:41:04 2014 EDT                67282              49902              17380     25.83%     74.16%
Tue Apr 15 20:41:09 2014 EDT                45992              25052              20940     45.52%     54.47%
Tue Apr 15 20:41:14 2014 EDT                32761              17581              15180     46.33%     53.66%

And from the second run:

Timestamp                        Total Reads (KB)   Cache Reads (KB)    Disk Reads (KB)  Miss Rate   Hit Rate
Tue Apr 15 20:44:21 2014 EDT                46380              46380                  0      0.00%    100.00%
Tue Apr 15 20:44:26 2014 EDT                37688              37308                380      1.00%     98.99%
Tue Apr 15 20:44:31 2014 EDT                38865              38861                  4      0.01%     99.98%
Tue Apr 15 20:44:36 2014 EDT                35688              35656                 32      0.08%     99.91%
Tue Apr 15 20:44:41 2014 EDT                37148              36876                272      0.73%     99.26%
Tue Apr 15 20:44:46 2014 EDT                45258              36758               8500     18.78%     81.21%
Tue Apr 15 20:44:51 2014 EDT                44852              44424                428      0.95%     99.04%
Tue Apr 15 20:44:56 2014 EDT                43691              43123                568      1.30%     98.69%
Tue Apr 15 20:45:01 2014 EDT                31629              31357                272      0.85%     99.14%
Tue Apr 15 20:45:06 2014 EDT                87306              79490               7816      8.95%     91.04%
Tue Apr 15 20:45:11 2014 EDT                52173              51497                676      1.29%     98.70%
Tue Apr 15 20:45:16 2014 EDT                33108              32784                324      0.97%     99.02%
Tue Apr 15 20:45:21 2014 EDT                35159              34915                244      0.69%     99.30%
Tue Apr 15 20:45:26 2014 EDT                38391              37887                504      1.31%     98.68%
Tue Apr 15 20:45:31 2014 EDT                29253              29133                120      0.41%     99.58%

So in the end, what does this tell you?

  1. The kernel cache can help make Splunk searches faster up to the limit of a single core.  Because (as of current releases of Splunk) the search process is single-threaded, you can’t expect single searches to be sped up dramatically by RAM alone.
  2. You can use SystemTap to help tell you whether or not your indexer has “enough” RAM.  (A low cache hit rate = add a little more)  This will of course be most useful to those who are IO throughput-starved on their indexers.
  • Pingback: That happened: episode 40 | Splunk Blogs()

  • scott

    Does your comment mean that we can expect multi-threaded Splunk searches in the future? That would certainly be nice…

    • Duane

      Hi Scott,

      I don’t work for Splunk so I can’t even guess if it’s on their roadmap or not. But for certain types of searches / workloads it would be nice indeed.