A SIEM solution's anomaly detection capability, UEBA, is powered by machine learning algorithms to identify deviations from the expected behavior of users and entities. These users and entities can then be assigned a dynamic risk score (for example, between 0 to 100) based on the degree of deviation. However, you can enhance the accuracy of this scoring if you take into account peer group analysis and seasonality.

Peer group analysis

Peer group analysis is a technique wherein statistical models are employed to categorize users and hosts that share similar characteristics as one group. The idea behind peer grouping is that, by comparing a user's behavior with that of a relevant peer group, the risk scoring accuracy will increase. You can learn about the inner workings of peer group analysis here.

Seasonality

An activity is considered seasonal if it occurs with a specific degree of regularity, such as hourly, daily, weekly, or monthly. Your UEBA solution should be able to tag seasonal activities as non-anomalous. If an activity occurs out of its seasonal routine, it should be considered an anomaly, and your UEBA solution should be able to detect it.

If seasonality is not factored in, you may miss vital clues that could have detected and stopped an attack, or your security analysts may be inundated with numerous false alerts resulting in alert fatigue. Taking seasonality into account in your UEBA solution will enhance your risk scoring accuracy and reduce false positives. This will give your analysts time to prioritize and respond to genuine threats. Now that you know why seasonality is important, let's take a look at how it works.

How does seasonality work?

Seasonality is fueled by machine learning algorithms that study the past behavior of users and entities to determine if an activity is anomalous. The following four time-based parameters are looked at to determine if activities follow a seasonal pattern:

  • Time of the day (ToD)
  • Date of the month (DoM)
  • Day of the week (DoW)
  • Week of the month (WoM)

To fully understand what seasonality means and how it improves your risk scoring accuracy, let's take a look at the following example and make a comparison between how it would play out if our UEBA solution offered seasonality and if it didn't.

Consider an organization called Anthem. On September 30, Anthem's payroll server is accessed multiple times in one-hour time intervals between 9am and 2pm (see Table 1).

Time period Number of accesses in the payroll server
09:00 - 10:00 100
10:00 - 11:00 125
11:00 - 12:00 150
12:00 - 13:00 250
13:00 - 14:00 750

Table 1: Payroll server access data

In this example, the ToD is the time interval in which the server was accessed (for example, 09:00-10:00), the DoM is 30, the DoW is 6 (assuming that the algorithm indexes the first day of the week, Sunday, as "1"), and the WoM is 5 (as September 30 is the fifth Friday of the month).

Let's understand how these accesses will be treated under two scenarios: 1) with a UEBA solution that does not have seasonality, and 2) with a UEBA solution that does have seasonality.

Scenario 1: UEBA solution without seasonality

When a UEBA solution doesn't have seasonality, it sets the time interval as one hour (this might differ depending on the algorithm being used) and compares the number of accesses in each one-hour time period to a dynamically calculated threshold value. This threshold value is calculated after considering the number of accesses in all previous one-hour intervals. This means that the algorithm does not look at the history of the number of accesses made between a particular time period, say, 9 and 10am. It looks upon it as just another one-hour interval no different from the one before it or the one after.

Simply put, a UEBA solution with no seasonality does not consider the time interval in which the access happens to trigger an anomaly. That's why the difference in the number of accesses made between 12-1pm and 1-2pm are considered anomalous (see Table 2). The number of accesses during these time periods (250 and 750) must have far exceeded the threshold value.

Time period Number of accesses in a server Counts vs. time interval Result
09:00 - 10:00 100 tick Normal
10:00 - 11:00 125 tick Normal
11:00 - 12:00 150 tick Normal
12:00 - 13:00 250 cancel Anomaly
13:00 - 14:00 750 cancel Anomaly

Table 2: UEBA without seasonality

Scenario 2: UEBA solution with seasonality

In this case, the algorithm will first compare the number of accesses on the payroll server with the time of the day to decide if it's normal or not. For example, if we consider the data in Table 1, then the first check the algorithm will perform is to see if 100 accesses between 9am and 10am is normal or not. The threshold will be a function of the number of accesses between 9 and 10am across all previous days for which data is available.

If an anomaly is not triggered, the algorithm proceeds with the second check, where it determines if the number of accesses (100) is normal for a particular date of the month—in this case, the 30th—for the same time period, i.e., 9 to 10am. This means that the algorithm will check the number of accesses made between 9 and 10am on the 30th of every month for as long as the data exists. If this check also yields normal results, it proceeds with the third check. If not, it's identified as an anomaly.

Assuming that the access count is normal so far, the third check is performed, which involves taking into account the day of the week. In this case, it's the sixth day of the week, Friday. Again, this means that the algorithm compares the number of accesses between 9 and 10am on the Friday of interest to that of all the previous Fridays to arrive at a conclusion.

If this again does not yield an anomaly, the final check is done where the count is considered along with the week of the month. So, the algorithm checks if 100 accesses between 9 and 10am on the Fridays of the fifth week of a month are normal or anomalous. The same steps are followed for the other time periods as well, as shown in Table 3.

Time period Number of accesses in a server ToD DoM DoW WoM Result
09:00 - 10:00 100 tick tick tick tick Normal
10:00 - 11:00 125 tick tick tick tick Normal
11:00 - 12:00 150 tick tick tick tick Normal
12:00 - 13:00 250 tick tick tick tick Normal
13:00 - 14:00 750 tick tick tick tick Normal

Table 3: Seasonality in action

Now, you might wonder: Why is the algorithm considering 750 as normal when it clearly looks like a case of heightened activity on the payroll server?

The answer to that lies in the historical data. In this case, the historical data would have shown that employees typically access the payroll server to download their paystub during their lunch hours—between 12 and 2pm—on the last working day of the month (usually the 30th). So, when the algorithm does its usual check, it's able to identify the 750 accesses as normal.

Digging deeper

By now, you probably have a good grasp of how seasonality works. Yet, you might still have a few questions, such as:

  • Is seasonality applicable to users as well?
  • Will the threshold calculated by the algorithm differ for different time intervals?

Let's address these questions one at a time.

Is seasonality applicable to users?

Absolutely! Every user could perform a task that's seasonal in nature and specific only to that user. The algorithm will identify this behavioral pattern and alert you if there's any deviation from it. For example, Stacey, a senior marketing associate, updates the new leads generated by her team on the consolidated marketing database on the last Friday of every month. So, the algorithm expects this behavior from her. However, if Stacey were to not only access the database, but access it multiple times on a Tuesday, then the algorithm would identify this as an anomaly. Stacey's risk score increases, and an alert is triggered to notify the security analyst.

Does the algorithm calculate a different threshold for different time intervals?

Yes. The number of activities performed in a particular hour, day, and even month are different. As we saw in the Anthem example above, it's normal for the payroll server to be accessed 100 times between 9-10am and 750 times between 1-2pm on the same day. The number of times a particular activity is performed in a particular time period, or any given time, is going to differ, and your algorithm must be capable of calibrating this dynamically. Only if it does, will your risk scoring be accurate.

Now, you know the inner workings of seasonality and how it reduces false positives and improves risk scoring accuracy. To learn more about machine learning in SIEM, watch this webinar series. To personally evaluate how a unified SIEM solution like ManageEngine Log360 with DLP and CASB capabilities can improve user and entity risk scoring, sign up for a personalized demo and talk to our solution experts. Thanks for reading, folks!

×
  • Please enter a business email id
     
  • By clicking 'Read the ebook', you agree to processing of personal data according to the Privacy Policy

Get the latest content delivered
right to your inbox!

Thank you for subscribing.

You will receive regular updates on the latest news on cybersecurity.

  • Please enter a business email id
  •  
  •  
    By clicking on Keep me Updated you agree to processing of personal data according to the Privacy Policy.

Expert Talks

     
 

© 2021 Zoho Corporation Pvt. Ltd. All rights reserved.