GETTING STARTED WITH RBA IN SPLUNK® ENTERPRISE SECURITY

A guide for Splunk users to begin their journey in Risk Based Alerting and build a winning security program.

NOTE: This was published in 2021 for ES version 6.6 - one of the first ES versions to support RBA functions after Stuart McIntosh’s .conf talk in 2018 introducing Risk Based Alerting in Splunk ES to the world (as well as Splunk for that matter). Outpost Security has since developed the ZERO to ONE Splunk App for ES that accomplishes everything you will read in this guide and more in one calendar week.

How do I get started with RBA in Splunk ES 6.6?

When talking with security professionals and Splunkers – this is the #1 one question we get at Outpost Security.  Risk Based Alerting is a powerful way to take your security alerting to the next level, but we can’t sugar coat the reality that RBA is a significant shift in how you operate as a security organization.  We address some of these issues in separate posts, but for now we are going to focus on the practical and tactical steps you can take to demonstrate the potential of RBA in your organization’s security practices – using just the data you already have and Splunk ES 6.6.

 

Before you Begin - Set your Expectations & Goals

Implementing Risk Based Alerting is a journey and it affects multiple areas of your security organization.  Splunk ES 6.6 has a lot of powerful features out of the box, including built-in RBA capabilities.  However, we’ve observed that once you reach the point of 100,000 risk events per week (your risk rules are finding and recording over 100,000 log events per week) you’ve hit the upper limit of what Splunk ES is natively capable of handling.  The complexity and scale of large environments simply requires advanced techniques and specialized programming to allow your security team to fully leverage the capabilities of Risk Based Alerting.

The reality is, if you are a smaller organization – RBA in Splunk ES and this guide, should get you well down the path of Implementing Risk Based Alerting.  However, in larger organizations, the steps in this guide will just get you started on your RBA journey.  You will have a very solid proof of concept and demonstrable impact that a RBA implementation would deliver when fully complete.  (Spoiler – when you reach that point – we are here to help you go the last mile and fully operationalize RBA with you).

For larger organizations – even with just an RBA POC – you will see things that you haven’t seen before.  Things like Pen Test objects and attacks, re-occurring infections, and threat intel matches correlated with end-points.

 

Step 1 – Update Assets & Identities Framework

The assets and identities framework is a fundamental feature of ES and a foundational element of a successful RBA implementation.  When we review a Splunk customer’s Assets & Identities we are looking for two things so we can be confident in building out the rest of RBA.

  1. Completeness – by this we mean all your LDAP data (quick review searches below)

              Identities: | `identities`
              Assets: | `assets`
  2. Inclusion of individual criticality or priority identification in the asset & identity data

Pro Tip: Your Vuln scanner discovery data is a GREAT way to enhance the “completeness” of your asset framework.

Step 2 – Configure Data Models

Consistent configuration and frequent utilization of data model in Splunk ES is simply a game changer.  When we analyze data models in Splunk customer environments, we look for three things.

  1. Performance – specifically the acceleration time frames

    Great performance dashboard:

    https://<SPLUNK>/en-US/app/SplunkEnterpriseSecuritySuite/datamodel_audit

  2. Accuracy – namely the correct indexes feeding the appropriate data models with clear and consistent tagging

    Index Configuration:

    https://<SPLUNK>/en-US/app/SplunkEnterpriseSecuritySuite/cim_setup

  3. Data Diversity – REQUIRED for RBA – network traffic, web, IDS, email, & authentication. NICE TO HAVE for RBA – endpoint, network sessions, network resolutions.

 

Step 3 – Turn on Three Risk Rules (Yes only Three)

This is probably where we see the most inconsistency with Splunk customers who’ve started down the RBA path.  The best approach that we’ve found is counter-intuitive.  The first thing you need to do is TURN EVERYTHING OFF.  All the risk rules, all the risk notable rules – just hit the giant, but proverbial pause button. 

Then you can walk through the risk rules included in ES 6.6 and chose your favorite three.  (Hint, if you aren’t sure – pick the ones that will tell the best “story” when they hit – we write more about this in another paper).  Another way you can think about it is – choose risk rules that ONLY work with RBA (e.g. traditionally noisy detections or data sources).

Now, with your three favorite risk rules, work through the configuration of each, focused on the following:

  1. Create a dynamic risk message for each – be descriptive, but concise and consistent

  2. Decide on an “intelligent” risk score – see the commentary to follow

  3. Attribute risk to multiple objects – this is where RBA really starts to become powerful. Common pairings are: user/source, source/dest, & user/source/dest.

About setting risk scores.  We see customers hung up on this ALL THE TIME.  The first thing we think is helpful to remember is that you are scoring individual events – and those event scores are aggregated over time.  As a result – the score of a single event matters less than the total number of events related to an individual object. 

(By the way, this is probably the single most “uncomfortable” mental transition to make when moving from traditional detections/alerts to Risk Based Alerts).

So when we say “intelligent” – what we really mean is don’t try and be too smart here.  Trust the RBA framework to do it’s job. 

What does this mean in practice?  To start we like to use a base of 100.  That means if a risk event is “medium” severity – then it gets a base score of around 50.  High? Something like 80.  Low can come in at 20.

Keep it simple.  Keep it consistent.  Remember to trust the system in aggregate (one more aside, you’ll begin to strengthen your trust loop in step 6  – “tuning”).

We cannot talk risk rules without discussing search window vs schedule. Our preferred default is:

Earliest: -1h@h

Latest: @h

Schedule Cron: 07 * * * *

Yes, this means there is not an immediate recording of the event but it allows you to balance your search performance, data lags, as well as prepare you for longer windows dealing with behaviors. If you choose something like this:

Earliest: -1h@h

Latest: @h

Schedule Cron: */5 * * * *

Then you need to do throttling on the searches to prevent duplication, thus inflating the risk scores. This is explored in step 6 below.

  

Step 4 – Configure Risk Notable Rules

The Risk Notable Rules are the searches that mine the Risk Index and aggregate risk around objects.  When the risk of an object reaches a certain threshold, an alert is generated.

ES 6.6 ships with two main Risk Notable Rules:

  1. MITRE ATT&CK® Tactic Count

  2. Composite Risk Score

ATT&CK Tactic Count – this is searching the risk index for diversity, as defined by MITRE ATT&CK framework.  Note that ES allows you to apply a framework of your choosing, but you will need to configure each risk rule to tag the risk events with the appropriate tactic or technique.  There are multiple schools of thought regarding the best way to do this – Outpost Security has our preferred approach, but that is a discussion topic for yet another paper.

We recommend that ATT&CK Tactic Count Risk Notable Rule be set to search a 7 day window of events, which is the default setting in 6.6.

Composite Risk Score – this is searching the risk index, aggregating risk scores by object.  A simple example is an object has 8 events found related to it, each with a calculated risk score, and this search just adds the 8 scores together.

The search has a default setting of 24 hours, but we suggest reducing that to a 12-hour window initially.   We also like to add dynamic severity, to tag per the ES severity ratings.  Something like: 

Medium – Score > 100 over 12 hours

High – Score > 150 over 12 hours

Critical – Score > 200 over 12 hours

For both of these it is important to evaluate your throttling fields to prevent duplicate notables from being created and introducing confusion into the SOC. This is explored in step 6 below.

 

Step 5 – Understand Risk Factors

A key element in calculating risk scores is the idea of Risk Factors, or Risk Modifiers.  These are in essence multipliers of risk based on the characteristics of the individual user or asset.  To start, we like to use three initial sources of risk factors:

  1. Watchlist Users: watchlist=true, multiply by 1.2

  2. Critical Priority – Asset: priority=critical, multiply by 1.2

  3. Critical Priority – User: priority=critical, multiply by 1.2

Recall that in step 1 we made sure criticality was included in the Asset & Identities framework in Splunk ES.

Pro Tip:  Risk Factor calculations are only shown in the Risk Data Model – NOT in the Risk Index – as Splunk ES 6.6 calculates the total risk score dynamically.   (FYI, Outpost Security takes a different approach to calculating risk scores, which we believe is better suited for scaling RBA in large environments).

  

Progress Check-in & Expectation Calibration

By now you should have the fundamentals of RBA running, with three risk rules writing events to the risk index. You can search the Risk Data Model to see these events and start to see some RBA notables being generated.

It is at this point we need to calibrate your expectations.  We have seen multiple Splunk customer environments, some of them very large, and like any technology or security product, in order to maximize the effectiveness of the tool for your organization, additional customization is needed.

The first example of this is Notable review.  The power of RBA is in the relation and aggregation of risk and objects.  As of this writing, Splunk ES 6.6.2 does not ship with a dashboard or GUI that demonstrates this clearly. 

But fear not dear security professional – we have provided you with a drop-in Notable drill down search:

| from datamodel:"Risk.All_Risk" | search risk_object="$risk_object$" | table _time, risk_object, risk_object_type, source, annotations.mitre_attack.mitre_tactic_id, annotations.mitre_attack.mitre_technique_id, dest, src, user, risk_message, calculated_risk_score, risk_factor* | rename annotations.mitre_attack.mitre_tactic_id as mitre_tactic_id, annotations.mitre_attack.mitre_technique_id as mitre_technique_id | eval risk_event_type="primary_object" | append [| from datamodel:"Risk.All_Risk" | search risk_object!=" $risk_object$" (dest="$risk_object$" OR src="$risk_object$" OR user="$risk_object$") | table _time, risk_object, risk_object_type, source, annotations.mitre_attack.mitre_tactic_id, annotations.mitre_attack.mitre_technique_id, dest, src, user, risk_message, calculated_risk_score, risk_factor* | rename annotations.mitre_attack.mitre_tactic_id as mitre_tactic_id, annotations.mitre_attack.mitre_technique_id as mitre_technique_id | eval risk_event_type="related_object" ]

Place this in your RBA notable correlations search under the Notable adaptive responses in the drill-down field.

So what does this search show?  

  • All relevant risk events – with risk message, src, dest, user, and the risk factors

  • Any MITRE ATT&CK Annotations associated

  • List of related objects (appended)

    • risk_event_type="primary_object" – The direct risk events applied to that risk object

    • risk_event_type="related_object" – The related objects of risk events

Step 6 – Tuning

Tuning is a critical step in generating high fidelity notables and reducing false positives.  However, while RBA has incredibly powerful potential here – the reality is that we have a few new layers of complexity – so we need to be very deliberate about how we tune in ES 6.6.

Out of the box, Splunk ES has some tuning features we can leverage and there are two layers that we have these features available to us.  Here is a breakdown:

Tuning Risk Rules – We can tune individual risk rules to reduce impact when writing to the risk index.  The methods we suggest are:

1.     Throttling – on risk object, risk score & risk message

2.     Search filtering – adding filters in the risk rule search to prevent known false positives from being written to the risk index  (e.g. signature!=””, URL!=”*.google.com*” ).

Tuning Risk Notable Rules – We can also tune the searches that aggregate risk from the risk index events. The methods we suggest are:

1.     Throttling – on risk object, source/risk rules

2.     Notable suppression – this is more of a “break glass in case of emergency” tuning – designed to prevent a sudden flood of new notables, most common when a new risk rule is deployed

Pro Tip:  While you are building out RBA you may want to force the notables to informational as to not impact your SOC. Also, work very closely with them on these so you DO NOT negatively impact their metrics during this phase.

  

Step 7 – Turn on Risk Rules #4 through #15

When you have your initial three rules running and dialed in, now is the time to enable a few more risk rules.  Running through the configuration, deployment, and tuning cycle for each will give your team an opportunity to develop and learn the differences with RBA style alerts – from data configuration to detection writing to incident response playbooks.

A word of caution here.  In our experience, the total number of risk rules doesn’t really matter.  It’s not a one-to-one relationship anymore – as dynamic rules can produce exponential amounts of visibility.  The real measure of your visibility is data source diversity and the number of risk events being recorded to your risk index on a weekly basis. These are much more important to keep an eye on.

 

The Payoff of RBA in Splunk ES 6.6.2

Limitations aside, Splunk ES 6.6 is major asset in taking your security program to the next level.  Fully leveraged you can finally start to close the gap between the attackers advancement and your ability to defend against them. 

Using this guide, and the Notable drill down provided, you will be able to demonstrate the power and potential of RBA – using real data from your environment.  This will be a key asset in getting buy-in from your peers and leadership – giving them confidence in the future of investing in RBA to change the game for your defensive security practice.

Finally, know that when you reach these natural limits in Splunk ES – Outpost Security is here to help you to the finish line.  We have solved all of these challenges and more. Most importantly – we can train your entire security organization, arming them with years of expertise of years of deploying and running RBA in Splunk ES.