Scrutinizing seasonality in anomaly detection in depth

User and entity behavior analytics (UEBA), a SIEM solution's anomaly detection engine, is powered by machine learning algorithms to identify deviations from the expected behavior of users and entities. UEBA identifies time, count, and pattern anomalies. Based on the behavioral analysis, these users and entities are then assigned a dynamic risk score (for example, between zero to 100) based on the degree of deviation. However, you can enhance the accuracy of this scoring if you take into account peer group analysis and seasonality.

What is seasonality in cybersecurity?

An activity is considered seasonal if it occurs with a specific degree of regularity, such as hourly, daily, weekly, or monthly. For effective cybersecurity, your UEBA solution should be able to tag seasonal activities as non-anomalous. If an activity occurs out of its seasonal routine, it should be considered an anomaly, and your anomaly detection tool should be able to detect it.

If seasonality is not factored in, you may miss vital clues that could have detected and stopped an attack, or your security analysts may be inundated with numerous false alerts resulting in alert fatigue. Taking seasonality into account in your UEBA solution will enhance your risk scoring accuracy and reduce false positives. This will give your analysts time to prioritize and respond to genuine threats. You can learn more about the importance of seasonality in anomaly detection in cybersecurity here.

Now that you know why seasonality in anomaly detection is important, let's take a look at how it works.

How does seasonality work?

Seasonality is fueled by machine learning algorithms that study the past behavior of users and entities to determine if an activity is anomalous. How the anomaly detection algorithm identifies deviations in seasonal activities will differ from vendor to vendor. However, in an UEBA-integrated SIEM solution like ManageEngine Log360, the anomaly detection model looks at the following four time-based parameters to determine if activities follow a seasonal pattern:

Time of the day (ToD)
Date of the month (DoM)
Day of the week (DoW)
Week of the month (WoM)

To fully understand what seasonality means and how it improves your risk scoring accuracy, let's take a look at the following example and make a comparison between how it would play out if our UEBA solution offered seasonality and if it didn't.

Consider an organization called Anthem. On September 30, Anthem's payroll server is accessed multiple times in one-hour time intervals between 9am and 2pm (see Table 1).

Time period	Number of accesses in the payroll server
09:00 - 10:00	100
10:00 - 11:00	125
11:00 - 12:00	150
12:00 - 13:00	250
13:00 - 14:00	750

Table 1: Payroll server access data

In this example, the ToD is the time interval in which the server was accessed (for example, 09:00-10:00), the DoM is 30, the DoW is 6 (assuming that the algorithm indexes the first day of the week, Sunday, as "1"), and the WoM is 5 (as September 30 is the fifth Friday of the month).

Let's understand how these accesses will be treated under two scenarios: 1) with a UEBA solution that does not have seasonality, and 2) with a UEBA solution that does have seasonality.

UEBA solution without seasonality

When a UEBA solution doesn't have seasonality, it sets the time interval as one hour (this might differ depending on the algorithm being used) and compares the number of accesses in each one-hour time period to a dynamically calculated threshold value. This threshold value is calculated after considering the number of accesses in all previous one-hour intervals. This means that the algorithm does not look at the history of the number of accesses made between a particular time period, say, 9 and 10am. It looks upon it as just another one-hour interval no different from the one before it or the one after.

Simply put, a UEBA solution with no seasonality does not consider the time interval in which the access happens to trigger an anomaly. That's why the difference in the number of accesses made between 12-1pm and 1-2pm are considered anomalous (see Table 2). The number of accesses during these time periods (250 and 750) must have far exceeded the threshold value.

Time period	Number of accesses in a server	Result
09:00 - 10:00	100	Normal
10:00 - 11:00	125	Normal
11:00 - 12:00	150	Normal
12:00 - 13:00	250	Anomaly
13:00 - 14:00	750	Anomaly

Table 2: UEBA without seasonality

UEBA solution with seasonality

In this case, the algorithm will first compare the number of accesses on the payroll server with the time of the day to decide if it's normal or not. For example, if we consider the data in Table 1, then the first check the algorithm will perform is to see if 100 accesses between 9am and 10am is normal or not. The threshold will be a function of the number of accesses between 9 and 10am across all previous days for which data is available.

If an anomaly is not triggered, the algorithm proceeds with the second check, where it determines if the number of accesses (100) is normal for a particular date of the month—in this case, the 30th—for the same time period, i.e., 9 to 10am. This means that the algorithm will check the number of accesses made between 9 and 10am on the 30th of every month for as long as the data exists. If this check also yields normal results, it proceeds with the third check. If not, it's identified as an anomaly.

Assuming that the access count is normal so far, the third check is performed, which involves taking into account the day of the week. In this case, it's the sixth day of the week, Friday. Again, this means that the algorithm compares the number of accesses between 9 and 10am on the Friday of interest to that of all the previous Fridays to arrive at a conclusion.

If this again does not yield an anomaly, the final check is done where the count is considered along with the week of the month. So, the algorithm checks if 100 accesses between 9 and 10am on the Fridays of the fifth week of a month are normal or anomalous. The same steps are followed for the other time periods as well, as shown in Table 3.

Time period	Number of accesses in a server	Result
09:00 - 10:00	100	Normal
10:00 - 11:00	125	Normal
11:00 - 12:00	150	Normal
12:00 - 13:00	250	Normal
13:00 - 14:00	750	Normal

Table 3: Seasonality in action

Now, you might wonder: Why is the algorithm considering 750 as normal when it clearly looks like a case of heightened activity on the payroll server?

The answer to that lies in the historical data. In this case, the historical data would have shown that employees typically access the payroll server to download their paystub during their lunch hours—between 12 and 2pm—on the last working day of the month (usually the 30th). So, when the algorithm does its usual check, it's able to identify the 750 accesses as normal.

Digging deeper into seasonality

By now, you probably have a good grasp of how seasonality works. Yet, you might still have a few questions, such as:

Is seasonality applicable to users as well?
Will the threshold calculated by the algorithm differ for different time intervals?

Let's address these questions one at a time.

Is seasonality applicable to users?

Absolutely! Every user could perform a task that's seasonal in nature and specific only to that user. The algorithm will identify this behavioral pattern and alert you if there's any deviation from it. For example, Stacey, a senior marketing associate, updates the new leads generated by her team on the consolidated marketing database on the last Friday of every month. So, the algorithm expects this behavior from her. However, if Stacey were to not only access the database, but access it multiple times on a Tuesday, then the algorithm would identify this as an anomaly. Stacey's risk score increases, and an alert is triggered to notify the security analyst.

Does the algorithm calculate a different threshold for different time intervals?

Yes. The number of activities performed in a particular hour, day, and even month are different. As we saw in the Anthem example above, it's normal for the payroll server to be accessed 100 times between 9-10am and 750 times between 1-2pm on the same day. The number of times a particular activity is performed in a particular time period, or any given time, is going to differ, and your algorithm must be capable of calibrating this dynamically. Only if it does, will your risk scoring be accurate.

Now, you know the inner workings of seasonality and how it reduces false positives and improves risk scoring accuracy. To learn more about machine learning in SIEM, watch this webinar series. To personally evaluate how a unified SIEM solution like ManageEngine Log360 with UEBA capabilities can improve user and entity risk scoring, sign up for a personalized demo and talk to our solution experts.

Download ManageEngine Log360, a unified SIEM solution

Resource Library

Expert Talks

SIEM basics

Attack library

Cloud Security

Academy

Scrutinizing seasonality in anomaly detection in depth

What is seasonality in cybersecurity?

How does seasonality work?

UEBA solution without seasonality

UEBA solution with seasonality

Digging deeper into seasonality

Is seasonality applicable to users?

Does the algorithm calculate a different threshold for different time intervals?

Awards & recognition

Features

Support

Solutions by industry

Related solutions

One-stop solution to all Log Management and Active Directory Auditing

Scrutinizing seasonality in anomaly detection in depth

What is seasonality in cybersecurity?

How does seasonality work?

UEBA solution without seasonality

UEBA solution with seasonality

Digging deeper into seasonality

Is seasonality applicable to users?

Does the algorithm calculate a different threshold for different time intervals?

Related pages

Awards & recognition

Features

Support

Solutions by industry

Related solutions

One-stop solution to all Log Management and Active Directory Auditing