Self-Monitoring


Self-monitoring functionality helps you detect issues across all the components of Applications Manager's services and ensures their health and performance to provide uninterrupted monitoring support. You are provided with critical information about the occurred problem to help you drill down to the root cause and thus prevent possible service outages.

Currently, Applications Manager performs a periodic checkup for the health of the following components:

Diagnostic Detail Configuration

User can modify the poll interval, consecutive poll count and threshold value for each attributes as follows:

  • Under Settings tab, click on Self Monitoring under Tools.
  • Diagnostic Details table is displayed with a description of the diagnostic.
  • You can configure the Poll Interval, Consecutive Polls and Threshold Value by clicking on the edit icon ().
  •  

Diagnostics Alerts

Diagnostics Alerts and their current status are displayed in a band at the top of the Applications Manager window. You can also view the list of alerts under the Alarms Tab - click on Diagnostics Alert button to view a list of diagnostic alerts, their status, time of generation and description. Click on the alert message to view the message history and add comments.

  • All users with ADMIN role will receive mail notifications whenever a problem is raised and cleared/discarded.
  • When a problem is detected, it is shown in Error state [red].
  • When a corrective action is taken, manually or automatically, the Error state moves [automatically or manually] to a Clear state [Green].

List of attributes supported are categorized and described below :

Server Monitoring

Attribute name Description
CPU Usage This will monitor the CPU utilization of Applications Manager's server. By default, we will alert the user when the CPU usage exceeds threshold value 90% for the last 15 minutes (polling interval 5 minutes & consecutive polls count 3) with Top 10 process which are consuming more CPU.
Memory Usage This will monitor the Memory utilization of the APM running server. By default, we will alert the user when the memory usage exceeds threshold value 90% for the last 15 minutes (polling interval 5 minutes & consecutive polls count 3) with Top 10 process which are consuming more memory.
Disk Usage This will monitor the Disk (where APM is installed) utilization of the APM running server. By default, we will alert the user when the disk usage exceeds threshold value 90% for the last 60 minutes (polling interval 60 minutes & consecutive polls count 1).
Disk I/O Usage This will monitor the Disk busy time of the physical disk (where APM is installed). By default, we will alert the user when the disk busy time exceeds threshold value 90% for the last 15 minutes (polling interval 5 minutes & consecutive polls count 3).

Database Monitoring

Attribute name Description
DB Connectivity This will monitor the DB Connectivity.
  • For DB Connectivity, alerts cannot be raised since DB Connection itself is lost. Diagnostic message will be there in Logs ( /logs/diagnostics/selfdiagnostics.txt).
  • This attribute will not be shown in Diagnostic configuration details page.
  • Entry is made in logs, when DB is down for 5 minutes and Default setting is enabled. This is supported for both MSSQL and PGSQL.
Database File Size This will monitor the DB File size, By default, if file size exceeds 90% of the total size, alert is raised.
Database Log Size This will monitor the DB Log size. By default, if log size exceeds 90% of the total size, alert is raised.

 

Note:
  • The attribute "DB Connectivity" is supported for both MSSQL and PGSQL, while Database File Size and Database Log Size are supported only for MSSQL.
  • If total size of DB File & Log is infinite in MSSQL v12 & above, DB installed disk's total size is considered, and alert is raised if used size of the disk exceeds threshold (By Default 90%).

JVM Monitoring

Attribute name Description
JVM Memory Usage This will monitor the JVM Memory Usage. By default, we will alert the user when the JVM Memory usage exceeds threshold value 90% for the last 15 minutes (polling interval 5 minutes and consecutive polls count 3).
JVM Thread Blocked This will monitor the JVM Thread Blocked details. By default, we will alert the user when the JVM Thread block exceeds threshold value 50% for the last 15 minutes (polling interval 5 minutes and consecutive polls count 3).

Load-specific performance attributes

Attribute name Description
Polling Delay This is based on load factor calculation. We will take top 50 server monitors which takes more time for datacollection, and then find the last 1 hour polled values based on the polling interval. If that polled vales are less than 70 % then we alert that there is some delay in the polling interval for that particular monitor. This will happen after 1 hour when the build has started.
Polling Stops This will alert when the polling has stopped for the past 1 hour for a particular monitor. This check will happen after 1 hour when the build has started.
Syncing Delay When a particular managed server is not syncing data for the past 30 mins by default we alert in the Admin Server.
Id Usage This parameter will alert as warning when Id Usage crosses the set threshold value for tables (AM_ManagedObject,AM_PARENTCHILDMAPPER,Alert,Event) and as critical when Id Usage crosses the allocated range for the managed server. It will also automatically prompt the Admin to disable sync for the managed server since Id is out of range. This check will happen 1 hour after the build starts.
Child Monitor Count on Monitors Alert will be raised for the monitors when the child monitor count exceeds the configured threshold values.
Data Collection time exceeds Polling Interval Alert will be raised for the monitor when the data collection time exceeds the configured Polling Interval.
Port Usage Alert will be raised if Port creation exceeds the threshold value of 60% for the last 15 minutes.
System Requirements Alert will be raised if the added monitors does not satisfy the recommended System Requirements.

diagnosticconfig.properties - This properties file is used to add the diagnostics entry in AM_DIAGNOSTICS_CONF table.