Adaptive thresholds enable users to optimize the efficiency of alerts by dynamically modifying threshold values for critical monitors using OpManager's Machine Learning-based predictive algorithms. It eliminates the need for manual intervention by analyzing data patterns and adjusting thresholds to minimize false alerts while ensuring that critical issues are detected.
Over time, it learns to recognize hourly, daily, weekly, and even monthly cycles and automatically adapts thresholds to match these recurring patterns. This ensures that predictable fluctuations, such as daily traffic spikes, weekly maintenance activities, or month-end processing loads, do not generate unnecessary alerts while still highlighting genuine anomalies.
Once Adaptive Thresholds is enabled, OpManager collects the necessary performance data from all the monitors and feeds them into its advanced predictive algorithms. This data is collected over a minimum period of 14 days.
Traditionally, OpManager uses the last 14 days of data to start generating alerts. This might cause a minor delay with raising alerts when the Adaptive Threshold feature is enabled for the first time. But as OpManager is used over a longer period it gathers enough historical data to detect and adapt to recurring weekly and monthly patterns. This enables OpManager to automatically adjust thresholds for activities that occur on a regular schedule, such as weekly maintenance windows or end-of-month transaction spikes.
Example: For enterprises, network usage often varies throughout the week, with lower activity during weekends and higher loads on Monday mornings. Initially, these fluctuations might trigger false alerts as anomalies. After observing this historic data, OpManager automatically adapts and adjusts the thresholds to match these predictable changes.
For each hour, OpManager's predictive algorithms provide Forecast value based on previously observed data patterns and behavior, and the deviation values configured by the user are applied based on that value. For example, consider the following deviation values.
Kindly note that the deviation can either be described in terms of values or in terms of percentage. Let us consider this with an example.
| Attention | Trouble | Critical |
|---|---|---|
| 5 | 8 | 15 |
We can configure the deviation value either by values or percentages, as described below.
1. Deviation in terms of value: If the forecast value for the CPU utilization of a device is 34 for the first hour of the day (0:00 - 1.00), then the corresponding value for raising an alert with severity "Attention" would be 34+5=39 (Forecast + Attention deviation). Similarly, Trouble and Critical values are also calculated every hour. The calculated values for 5 consecutive hours for different forecast values would be as follows:
| Hour of time | Forecast value | Attention value | Trouble value | Critical value |
|---|---|---|---|---|
| 0:00 - 1:00 | 34 | 39 | 42 | 49 |
| 1:00 - 2:00 | 36 | 41 | 44 | 51 |
| 2:00 - 3:00 | 44 | 49 | 52 | 59 |
| 3:00 - 4:00 | 58 | 63 | 66 | 73 |
| 4:00 - 5:00 | 54 | 59 | 62 | 69 |
2. Deviation in terms of percentage: If the forecast value for the CPU utilization of a device is 34 for the first hour of the day (0:00 - 1.00), then the corresponding value for raising an alert with severity "Attention" would be 34+(5% of 34)=36 (Forecast value + Attention deviation percentage of forecast value). Similarly, Trouble and Critical values are also calculated every hour. The calculated values for 5 consecutive hours for different forecast values would be as follows:
| Hour of time | Forecast value | Attention value | Trouble value | Critical value |
|---|---|---|---|---|
| 0:00 - 1:00 | 34 | 36 | 37 | 39 |
| 1:00 - 2:00 | 36 | 38 | 39 | 41 |
| 2:00 - 3:00 | 44 | 46 | 48 | 51 |
| 3:00 - 4:00 | 58 | 61 | 63 | 67 |
| 4:00 - 5:00 | 54 | 57 | 58 | 62 |
3. Advanced Configuration: In addition to deviation values, OpManager offers the following options to tailor alert behavior.
Suppress limits: Configure a value below which alerts will be automatically suppressed, preventing unnecessary alarms for minor deviations.
Example: If the configured adaptive threshold for CPU utilization is set to 50, and you configure a suppress limit of 52, any actual value below 52 will not trigger an alert.
Static limits: Define a fixed upper thresholds that, when crossed, will always trigger an alert regardless of the configured adaptive threshold values.
Example: If you set a static upper limit of 90 for CPU utilization, an alert will be triggered immediately when the usage reaches or exceeds 90, even if the adaptive threshold value is higher.
To configure:
Before enabling the Adaptive Thresholds option, note that:
Adaptive thresholds can be enabled globally across OpManager from Settings -> Monitoring -> Adaptive Threshold. Navigate to this page and enable the "Enable Adaptive Threshold" option. You can also enable adaptive thresholds on an individual level from the respective performance monitor, perf group, or device template, and define the deviation levels in either value or percentage.
Once it has been enabled, it can be controlled on various levels based on your requirements:
Thank you for your feedback!