High availability monitoring in OpManager

Organizations rely heavily on business-critical networks and their underlying infrastructure for seamless service delivery. Apart from the hardware deployed on their networks, they also have a huge array of software-defined network components at their disposal, like databases, applications, firewalls, controllers, etc.

With an increased dependence on these digital assets, it is critical that these resources be continuously available. Organizations cannot afford major failures or downtime in regard to these resources because, if they go down, the associated services remain unavailable until the failure is fixed, costing the organization heavily.

One way to remedy this is by the use of high availability systems. High availability is a concept that aims to achieve a high level of dependability or operational performance (in this case, availability) and is fault tolerant to keep the network infrastructure running without downtime.

Learn more on how to configure devices for high availability monitoring in OpManager.

Why high availability?

Despite using highly resilient network components, no system is 100% fail-proof. Failures can occur because of any unexpected reason including overloads, power failure, network disconnections, unscheduled maintenance, etc. The goal of high availability is to ensure minimal or zero downtime so that these failures do not affect the bottom line of business operations, which would result in data and monetary losses.

IT admins utilize high availability together with fault resolution metrics like mean time to repair (MTTR) and mean time between failure (MTBF) to minimize downtime and maintain high overall network availability.

Principles of high availability

To design and build systems with continuous availability, you must consider these critical principles:

Eliminate single points of failures: A single point of failure is a component which if it fails, could cause the entire network or system to fail (like a database server, a firewall, a switch, etc.). High availability implementation builds redundancy by eliminating single points of failure and, by extension, substantial amounts of downtime. In doing so, whenever a network component fails, another instance of the failed component takes over, providing uninterrupted service delivery.
Ensure crossover reliability: In the event of a node failure, the backup node has to take over from the primary node, without data loss and without affecting the operational performance of the network. In other words, there should be reliable crossover configuration between the redundant systems.
Implement failure detection: Most importantly, whenever there is a failure in any of the components of a high availability cluster, admins need to know right away. The network monitoring systems used by organizations must be capable of delivering alert-based fault indications, to help the admin teams fix issues before they affect the overall network performance.

High availability monitoring in OpManager

Continuously monitoring and managing business-critical network components is crucial to achieving and sustaining high availability. ManageEngine OpManager, an integrated network monitoring tool, helps you deploy and monitor high availability systems, and offers a wide range of other network monitoring features. High availability systems in OpManager can be configured with either the active-active mode or the active-passive mode.

Active-active configuration: An active-active system is one that consists of two instances that deliver the same type of service at the same time. In the event of a failure, traffic intended for the failed node is redirected towards another node.
Active-passive configuration: In this system, as the name suggests, not all instances are active all the time. An active-passive configuration consists of a completely duplicate, idle instance of the main node. This secondary (passive) node remains on standby, coming online only when the associated primary (active) node fails.

With the latest version of OpManager, you can effortlessly configure, monitor, and manage the following categories of devices with high availability:

Servers (and derived categories)
VM servers
Domain controllers
Firewalls
Routers
Switches
Unified computing systems (UCS)
Load balancers
WAN accelerators
Wireless LAN controllers

OpManager uses availability monitoring protocols to check for the availability of network components and devices configured for high availability as well. The devices are polled using ICMP, and on the basis of the response received, OpManager keeps track of the uptime of these devices. For an active-active configuration, both devices are polled separately and monitored for availability and performance independent of each other. In an active-passive case, however, the secondary device is monitored only for availability since it is on standby.

Alarm-based failure detection with high availability

Being able to identify failures as they occur is critical for IT admins. This helps them take necessary steps to rectify failures before substantial damage is dealt. OpManager's alert-based fault indication gives you insights into the availability of the devices configured with high availability by raising an alarm whenever any device goes down. The architecture of alarm generation varies based on the configuration mode used for to set up the devices.

Alarms in an active-active setup

In an active-active configuration, both the devices are polled for availability, and an alarm is raised if either of the devices fail to respond to a ping. With each consequent failed ping, another alarm is raised with increasing severity, notifying the administrator of the unavailability of the corresponding device.

Alarms in an active-passive setup

Contrary to an active-active configuration, in an active passive setup, one node is always active while the secondary node is always on standby. In this case, if the secondary node goes down, the primary node will be left with no backup; so an alarm will be raised for the primary device to notify the administrator about the unavailability of the secondary device.

Likewise, if the primary goes down, an alert is raised, notifying the admin of the device's unavailability, and polling is resumed through the configured secondary IP. If, however, both the primary and the secondary devices are down, alerts are raised with increasing severity until one of them is available for service.

Notification profiles for high availability

With the latest update to OpManager, you can configure notification profiles to help with high availability monitoring. Notification profiles are an advanced class of alarm systems that let you choose the type of alarm raised and predefine the action to be performed in the event of a Notification Profile trigger. With this feature, high availability clusters can be integrated with an email and SMS-based notification system that alerts the user in case of a failure. This helps you maintain true high availability and minimal downtime.

Note: During the configuration of a notification profile, you must add both the primary and secondary devices to monitor an active-active setup. For an active-passive setup, adding just the primary device is enough.

Try OpManager's demo to check out its high availability monitoring capabilities yourself, or download a 30-day, free trial to explore other features of this top-of-the-line network monitoring software.

High Availability Monitoring

High availability monitoring in OpManager

Why high availability?

Principles of high availability

High availability monitoring in OpManager

Alarm-based failure detection with high availability

Notification profiles for high availability