Splunk Charts And Tables Splunk

Oct 4th, 2019 - written by Kimserey with .

Splunk is a log aggregator in the same way as elastic search with Kibana can be used. When I started using Splunk I immediately acknowledged its capabilities, and its usage was largely limited by my own knowledge of writing queries (which is still very low). But every now and then I would see myself in a situation where I would need to compose the same query which I did the week before but now have forgotten how to. So today we’ll explore some nice Splunk functionalities.

Timechart

The function I use the most is timechart. It provides a way to plot a time series where we can specify a span, for the precision, an aggregation function for the events falling in the buckets, and a split clause to group events.

1
... | timechart span=5m p99(upstream_response_time)

This will get us the p99 for the upstream_response_time for a span of 5 minutes where we will see across all our events, useful to monitor the overall latency of our service.

1
... | timechart span=5m p99(upstream_response_time) by host

Specifying a split clause by host will generate multiple time series, one per host, useful to monitor the latency on specific instances and identify potential issue specific to a particular host.

We can only specify a single split clause but if we want to separate with two fields, we can use eval which creates a new property in the event, and we can make use of it in our split clause.

1
2
3
...
| eval host_method=host+"@"+method 
| timechart span=5m p99(upstream_response_time) by host_method

This will add a property host_method on each event combining the host and the method and allowing a split on the combination.

Formatting in two line the query is useful when we want to debug a query as we are able to comment a part of the query using the comment macro:

1
2
3
...
| eval host_method=host+"@"+method 
`comment("| timechart span=5m p99(upstream_response_time) by host_method")`

Eval can also be used to construct new properties using if or case.

1
2
3
4
...
| eval stats_str=case(status like "2%", "OK", status like "5%", "ERROR")
| search stats_str!=""
| timechart span=5m count by stats_str

This will remove the 4xx status code and tag the events of 2xx with OK and 5xx with ERROR then produce a timechart on it.

Splunk limits the split values and put the rest into an OTHER bucket. We can lift that limit off by specifying limit=0.

1
2
3
4
...
| eval stats_str=case(status like "2%", "OK", status like "5%", "ERROR")
| search stats_str!=""
| timechart span=5m limit=0 count by stats_str

The other aspect of timechart is that it produces a table of split values, indexed by the time. For example when we did by stats_str, we would have table with the first column as the time, and the rest of the columns as the stats_str.

Knowing that we can compute the overall availability of our service by using the stats_str:

1
2
3
4
5
6
...
| eval stats_str=case(status like "2%", "OK", status like "5%", "ERROR")
| search stats_str!=""
| timechart span=5m limit=0 count by stats_str
| eval success_rate = round((OK / (OK + ERROR)) * 100, 2)
| fields - ERROR OK

Once we generate the table with timechart, we use eval to compute the success rate and then use fields - [fields] to remove the fields ERROR and OK from the table leaving only the success rate which we can then visualize directly.

Another useful functionality is filling empty values,fillnull and filldown which can be used to fill missing values. For example if value were missing in a bucket, we could use:

1
2
3
...
| timechart span=1m p99(upstream_response_time) as p99
| fillnull value=1000 p99

this will fill the null value in p99 with 1000 or we can use filldown which will use the previous value for the missing values:

1
2
3
...
| timechart span=1m p99(upstream_response_time) as p99
| filldown

Chart

Timechart can be seen as a shortcut to generate charts indexed by the time. Chart can be used to create different chart where the row index wouldn’t be the time.

Just to understand how chart works, we will be recreating the timechart using chart.

Chart allows us construct a table indexed by the first property provided after the by directive,

1
[ BY <row-split> <column-split> ]

this means that the first property given will be the row split and the next value will be the column split.

Having that, we can combine it with bin, which gives us the possibility of placing replacing the _time value,

1
| bin _time span=10m

this will replace all _time property in each events by their respective bins with a span of 10 minutes, for example an event with a time of 8:23:24:227 AM will be changed to 8:20:00:000 AM, effectively making all events fit into bins.

We can then use chart to split by the bins and specify the column split as the stats_str we specified earlier:

1
2
3
4
5
...
| eval stats_str=case(status like "2%", "OK", status like "5%", "ERROR")
| search stats_str!=""
| bin _time span=10m
| chart count by _time stats_str

We end up with a table:

_time ERROR OK
2019-10-01 07:00:00 0 5
2019-10-01 07:10:00 1 4
2019-10-01 07:20:00 1 4

This is essentially the same as:

1
2
...
| timechart span=10m count by stats_str

Table

Another useful functionality is table which allows us to display a table with fields.

1
2
...
| table _time, status, upstream_response_time

Although quick limited, table is very useful to display data in a readable way in a dashboard, removing all noise from the events.

Stats

Lastly stats is used to group events and count. By using by we can group the aggregation by specific fields, it also accepts multiple values to group by separated by a comma.

1
2
...
| stats count, p99(upstream_response_time) as p99 by status, host, request

In comparison to chart, stats will use the fields as column and index by the split fields. We will end up with the following table:

status host request count p99
200 host1 POST /api/values 10 2
200 host2 POST /api/values 2 1
200 host3 POST /api/values 5 2
500 host1 POST /api/values 1 5

Conclusion

Today we looked at different Splunk displays, we started by looking at timechart, exploring the different possibilities when combined with eval and search. We then moved on to look into chart and see how we could replicate timechart using bin. We then completed this post by looking into table and stats where we saw that stats provided us a way to apply aggregation functions on top of grouping of events. I hope you liked this post and I see you on the next one!

External Sources

Designed, built and maintained by Kimserey Lam.