Availability monitoring in OpManager allows you to continuously monitor the availability of network devices, servers, applications, and services. By monitoring key metrics such as uptime, response times, and outage durations, OpManager helps IT teams maintain high availability and reliability across their network infrastructure. With proactive alerts and real-time monitoring capabilities, OpManager ensures that potential issues are addressed swiftly, minimizing impact on business operations and maintaining optimal performance levels.
This help document covers the steps to troubleshoot the errors encountered in Availability monitoring.
This alert message is generated when OpManager server fails to contact the monitored device during its periodic availability status poll. This error generally appears in a VM environment where the Virtual devices are running any Windows OS and when they are unable to reach outside the network due to any of the following causes.
This error occurs in your VM when there is a possibility of WinSock and WinSock2 setting being corrupted.
You could try to point to the following registry paths:
This issue is caused by a duplicate Security Identifier (SID) in a Windows 2008 or Windows 2012 virtual machine, when the either of them are deployed from a template or a cloned virtual machine. And the guest customization option is not selected while deploying the virtual machine.
To resolve the issue, you need to run the sysprep tool to generate a new security identifier for the virtual machine. To do this,
When you are unable to ping the loopback address/local setup, there are chances of your TCP/IP stack being corrupted.
Turn off User Account Control (UAC) and login with the domain admin account. Follow the below steps to reset TCP/IP to its original state:
netsh int ip reset resetlog.txt
When you run the reset command, it overwrites the following registry keys, both of which are used by TCP/IP:
The Uptime column in the 'Device Availability' report shows incorrect value even when the availability is 100%.
The availability data in the reports are fetched either from the hourly or daily archive tables based on the the time period specified while generating the report. By default, the value will be 'Last 24 hours'. Whereas, in the device snapshot page, the availability data is fetched from raw table. If there is a value mismatch in the 'Uptime' column of the generated report, the root-cause could be due to an entry missing for that time period in the respective archived table.
For example, if the availability monitoring interval is 15 minutes, the raw table will have 4 entries for 1 hour. The average of those 4 values will be calculated and pushed to the hourly table. During the time of archive in the hourly table, if the OpManager service is down or if there is a database disconnection (in case of MSSQL), the update in hourly table fails. This leads to the incorrect data being reported in the generated report.
This is an environment specific issue. Please keep an eye on your database downtime or the OpManager server's unavailability to avoid such issues. Please contact our support team at opmanager-support@manageengine.com for further assistance.
This issue occurs when the monitored device is reachable from the network, but OpManager continues to display it as down. Follow the steps below to verify the polling and status sync conditions:
Thank you for your feedback!