Linux Server Monitoring


Overview

ManageEngine® Applications Manager provides out-of-the-box Linux Server monitoring capabilities. It helps the operations team ensure the servers are up (ping) and also run at peak performance by monitoring CPU usage, memory utilization, processes, disk utilization, disk I/O Stats.

Creating a new Linux monitor

Prerequisites for monitoring Linux server metrics: Click here

Using the REST API to add a new Linux server monitor: Click here

Follow the steps given below to create a new Linux server monitor:

  1. Select the Mode of Monitoring (Telnet, SSH or SNMP). For IBM AIX, HP Unix, Tru64 Unix, only Telnet and SSH are supported. For Novell, only SNMP is supported.
  2. If Telnet, provide the port number (default is 23) and user name and password information of the server.
  3. If SSH, provide the port number (default is 22) and user name and password information of the server. You have an option to give Public Key Authentication (User name and Private Key). You can also give a Passphrase if the private key is protected with one.

    Note: To identify the Public/Private key, go to command prompt, type cd.SSH/ then from the list, open the files <id_dsa.pub>/<id_rsa.pub> [Public] or <id_dsa>/<id_rsa>[Private] to get the keys.

  4. If SNMP, provide the port at which it is running (default is 161) and SNMP Community String (default is 'public'). This requires no user name and password information.
  5. For Telnet/SSH mode of monitoring, specify the command prompt value, which is the last character in your command prompt. Default value is $ and possible values are >, #, etc.

    Note: In the server which you are trying to monitor through SSH, the PasswordAuthentication variable should be set as 'yes' for the data collection to happen. To ensure this, access the file /etc/ssh/sshd_config and verify the value of PasswordAuthentication variable. If it is set as 'no', modify it to 'yes'and restart the SSH Daemon using the command /etc/rc.d/sshd restart.
  6. Choose the Monitor Group from the combo box to which you want to associate the Monitor (optional).
  7. Click Add Monitor(s). This discovers the host or server from the network and starts monitoring them.

Monitored Parameters

Applications Manager monitors the key performance indicators of Linux servers to detect any performance problems. These indicators include CPU, memory, disk, etc.

  • Availability tab shows the availability history of the Linux server for the past 24 hours or 30 days.
  • Performance tab shows some key performance indicators of the Linux server such as physical memory utilization, CPU utilization, response time and swap memory utilization along with heat charts for these attributes. This tab also shows the health status and events for the past 24 hours or 30 days.
  • List view tab lists all the Linux servers monitored by Applications Manager along with their overall availability and health status. It enables you to perform bulk admin configurations.

Click on the individual monitors listed to view detailed performance metrics. The performance metrics have been categorized into 6 different tabs:

* Network Interfaces details are monitored only for Linux Monitors added in SNMP mode.

Overview

This tab provides a high-level overview of the health and performance of the Linux server along with information pertaining to the processes and services running on the system.

Parameter Description
Monitor Information
Name The name of the Linux server monitor.
System Health Denotes the health status of the Linux server(clear, critical, warning).
Type Denotes the type you are monitoring.
Host Name The host name of the Linux system.
Host OS The main OS installed on the system.
Last Polled at Specifies the time at which the last poll was performed.
Next Poll at Specifies the time at which the next poll is scheduled.
Today's Availability Shows the overall availability status of the server for the day. You can also view 7/30 reports and the current availability status of the server.

 

Parameter Description
Thread count The number threads running in the Linux machine
Process Count The number of processes. Too many open processes can give poor performance on servers. it is helpful to be warned that process count is increasing so users can remedy before an issue arises.
Zombie Process Count The number of Zombie processes. Zombie Processes can hold ports open with no control. it is helpful to see when a zombie process is spawned so it can be deal with accordingly before any issues arise

You can use the Custom Fields option in the 'Monitor Information' section to configure additional fields for the monitor.

The Overview tab shows dials for CPU, memory and disk utilization. You can click on these dials to view detailed graphs and charts for these attributes. The graphs available are History report, hour of day report, day of week report and heat chart. These graphs can be generated for both real time and historical data.

The CPU and memory utilization - last six hours graph shows the memory usage and CPU usage values for the last six hours. The attributes shown here are Swap Memory Utilization, Physical Memory Utilization (in % and MB) and CPU utilization (%).

The Breakup of CPU Utilization graph provides a break up of performance metrics for the entire system processor with attributes such as run queue, blocked process, user time(%), system time(%), I/O wait(%), idle time(%) and interrupts/sec.

The System Load graph provides you an idea of the amount of work that the system performs. The system load during the last one-, five- and fifteen-minute periods are represented by parameters such as Jobs in Minute, Jobs in 5 minutes and Jobs in 15 minutes.

The Process Details section shows information about the processes running on the Linux server. You can add processes for monitoring using the Add New Processoption. You can also delete unwanted processes and enable/disable reports for specific processes. Click on any of the attributes listed to view more details.

The Monitors in this System section shows the availability and health of the monitors configured in this server. To add new monitors for monitoring, use the Add Monitors option.

CPU

This tab provides the CPU usage statistics of the Linux server. The tab includes two graphs - one that displays the CPU utilization by CPU Cores and another that shows the Breakup of CPU utilization - by CPU cores. You can view additional reports by clicking the graphs present in the Breakup of CPU Utilization - by CPU coressection. These reports include Break up of CPU Utilization (%) Vs Time, User Time (%) Vs Time, System Time (%) Vs Time, I/O Wait Time (%) Vs Time, Idle Time (%) Vs Time, CPU Utilization (%) Vs Time and Interrupts/sec Vs Time for all the CPU cores.

The CPU tab also shows the following performance metrics:

Parameter Description Monitoring Mode
Telnet/SSH SNMP
Core The name of the CPU core    
User Time(%) The percentage of time that the processor spends on User mode operations. This generally means application code.
System Time(%) The percentage of CPU kernel processes that are in use.
I/O Wait Time(%) The time spent by the processor to waiting for I/O to complete.
Idle Time(%) The time when the CPU is idle (not being used by any program)
CPU Utilization(%) Specifies the total CPU used by the system
Interrupts/sec The rate at which CPU handles interrupts from applications or hardware each second. If the value for Interrupts/sec is high over a sustained period of time, there could be hardware issues.

You can also view graphs for these attributes by selecting the necessary CPU core and then choosing the appropriate attribute.

Disk

This tab displays disk usage and disk I/O statistics of the Linux server.

Parameters Description
Disk Utilization
Disk The name of the disk drive.
Used (%) Denotes how much disk space out of the total disk space has actually been used (in percentage)
Used (MB) The disk space used in mega bytes.
Free (%) The percentage of total usable space on the disk that was free.
Free (MB) The unallocated space on the disk in mega bytes.
Disk I/O Statistics
Transfers/sec The number of read/write operations on the disk that occur each second.
Writes/sec The percentage of elapsed time that the disk drive was busy servicing write requests.
Reads/sec The percentage of elapsed time that the disk drive was busy servicing read requests.
% Busy Time The percentage of time the disk was busy.
Average Queue Length The average number of both read and write requests that were queued for the disk during the sample interval.
Inode Usage
Inode The name of the Inode.
Total The total number of Inodes available in that particular disk.
Used The percentage of elapsed time that the disk drive was busy servicing read requests.
Free The remaining number of Inodes that are available in that particular disk.
Used (%) The number of Inodes used in that particular disk, in percentage.
Free (%) The remaining number of Inodes that are available in that particular disk, in percentage.

You can also delete disks that have been physically removed using the Delete Orphaned Disk option.

Note: Data collection for Disk I/O statistics and Inode statistics can be enabled from 'Disk I/O Statistics Monitoring' and 'Inode Monitoring' options under Admin → Performance Polling → Servers tab.

Cron Job

Cron jobs are used for scheduling tasks like backups, emails, status checks, etc. in Linux and can have a major impact on the performance of your web servers and applications. Applications Manager makes it easy by continuously monitoring them and helps you gain insight into the execution of important jobs in the back-end systems.

Adding a Cron job monitor

Prerequisites : Click here

  1. Go to the Cron Job tab and click on Add Cron Job.
  2. Enter the following details:
    • Display Name - A user-friendly name for identification.
    • Cron Expression - Expression used for scheduling the cron job.
    • Time Zone - Appropriate time zone configured in the remote Linux machine by selecting from the drop-down.
    • Job Script Path - The complete script path that needs to be executed in the cron job.
    • Cron Job Period - The amount of time within which the job should run (in Minutes). If it exceeds the configured time, then the status will be updated as EXCEEDJOBTIME.
  3. After adding a cron job monitor, the curl details for your cron job will be shown below. Copy the displayed curl details by clicking over it and close the curl details window. You will now be redirected to the Cron Job tab of Applications Manager automatically.
  4. In the remote Linux machine, open the command prompt and execute the command crontab -e. This will open the crontab in edit mode. Paste the cron details that was copied earlier, then save and close the crontab.

The below table contains information about Cron job details running in the Linux server.

Parameters Description
Cron Job Details:
Cron Name Name of the Cron job.
Cron Expression The Cron expression for the corresponding Cron job.
Job Start Time Time and date at which the Cron job started.
Job End Time Time and date at which the Cron job ended.
Next Run Time Time and date at which the next Cron job is scheduled to run.
Elapsed Time The amount of time elapsed since the Cron job started (in Minutes).
Exit Code Denotes the exit code of the Cron job.
Missed Runs The number of times Cron job had failed/missed to start at the scheduled time.
Status Status of the Cron job. Possible values are:
  • PASSED - Job has run successfully with exit code equal to 0.
  • RUNNING - Job is running currently.
  • FAILED - Job has failed with exit code greater than 0.
  • EXCEEDJOBTIME - Job has been running more than the configured job time.

Note: Once the Cron job is added, it will be in discovery state until we receive the first response from the remote server.

Updating cron jobs

To update a Cron job,

  1. Click on the Edit icon for the required cron job.
  2. Enter the required display name and the Cron Job Period for that cron job.
  3. Click Update.

Deleting cron jobs

To delete Cron jobs,

  1. Select the cron jobs that need to be deleted.
  2. Click on Delete Cron Jobs. This will delete the cron jobs from Applications Manager.
  3. Finally, make sure you remove the curl appended to the cron jobs in the remote server using the crontab -e command.

Note: Addition, update, and deletion of Cron jobs will be possible only in managed servers by the administrator user.

Network (only for SNMP mode)

Network Interfaces

Parameters Description
NETWORK INTERFACE
Name The name of the network interface present in the Windows system.
Speed(Mbps) The estimate of the current bandwidth in Mbps
Input Traffic(Kbps) The rate at which packets are received on the interface, in kilo bytes per second.
Output Traffic(Kbps) The rate at which packets are sent on the interface, in kilo bytes per second.
Errors Number of packets that could not be sent or received.
 
Note:
  • Network Interfaces details are monitored only for Linux Monitors added in SNMP mode.
  • You can also delete interfaces that have been physically removed using the Delete Orphaned Interface option.

Configuration

This tab contains information about system configuration attributes.

Parameters Description
System Information
Host Name The name of the system.
Domain The name of the domain to which the system belongs.
OS Information
OS Name The name of the operating system instance.
OS Version Version number of the operating system.
OS Release The Linux distribution
Memory Information
Total Physical Memory (MB) Total amount of physical memory as available to the operating system.
Total Swap Memory (MB) Total amount of swap memory available.
Processor Information
Id Unique identifier of a processor on the system
Model The processor model type
Implementation The processor family type.
Manufacturer Name of the processor manufacturer
Speed(MHz) Current speed of the processor
Cache (KB) Size of the processor cache. A cache is an external memory area that has a faster access time than the main memory.
Network Interface Settings
Name The name of the network adapter.
IP Address The IP address configured for this network interface
MTU The network medium in use.
Type The type of network adapter.
Mac Address The Media access control address for this network adapter. A MAC address is a unique 48-bit number assigned to the network adapter by the manufacturer. It uniquely identifies this network adapter and is used for mapping TCP/IP network communications.
Status The current status of the network adapter.
Broadcast Address The IP address to which messages are broadcast.
Printer Settings
Name Name of the printer.
Device The name of the server that controls the printer.
Default Indicates whether the printer is the default one. Values are either True or False.
Status Current status of the printer.

Note: The data present in the configuration tab is not updated during every poll. So if you make any changes to the server configuration, you need to restart Applications Manager for those changes to be reflected in the 'Configuration' tab.

Hardware Metrics

The following are metrics pertaining to the hardware of Dell and HP servers:

Category Attribute Description DELL HP
SNMP Mode WMI Mode SNMP Mode WMI Mode
Temperature Sensor The name of the temperature sensor.
Temperature Reading (deg C) The current /present temperature reading.
Status The temperature status - Critical, Warning, Clear
Fan Sensor Name of the fan sensor.
Fan Speed (RPM) The fan speed values displayed in RPM.
Status The fan status - Critical, Warning, Clear
Power Sensor Name of the power supply.
Reading (Watts) The power supply reading values displayed in Watts.
Status The power status - Critical, Warning, Clear
Voltages Sensor Name of the voltage supply.
Reading (Volts) The voltage reading values displayed in Volts.
Status The voltage status - Critical, Warning, Clear
Battery Sensor Name of the Battery sensor..
Status The battery status - Critical, Warning, Clear
Memory Sensor Name of the Memory sensor.
Memory Device Type The type of memory device
Size (MB) The amount of memory currently installed in MB.
Status The memory status - Critical, Warning, Clear
Disk Sensor Identifies the disk's label
Device Name The device name configured for the disk
Size (MB) The allocated size in MB
Status The disk status - Critical, Warning, Clear.
Array Sensor The name of the array disk
Bus protocol The bus type of the array disk
Size (MB) The amount in MB of the used space on the array disk.
Status The array status - Critical, Warning, Clear
Chassis Sensor The user-assigned chassis name of the chassis.
Model The system model type for this chassis
Status The chassis status - Critical, Warning, Clear
Processor Sensor The location name of the processor device status probe
Processor Brand The brand of the processor device.
Processor Current Speed The current speed of the processor device in MHz
 
Processor Core Count The number of processor cores detected for the processor device.
Status The processor status - Critical, Warning, Clear
  • If a component is functioning normally, the status indicator is green.
  • The status indicator changes to orange or red if a system component violates a performance threshold or is not functioning properly. Generally, an orange indicator signifies degraded performance.
  • A red indicator signifies that a component stopped operating or exceeded the highest threshold.
  • If the status is blank, then the health monitoring service cannot determine the status of the component.

Note: Currently hardware performance monitoring is supported in SNMP and WMI monitoring mode.

Hardware Device-Level Configuration

Hardware Configuration option available under Host Details in the right hand side of the details page, will allow you to opt for the various hardware components you want to monitor. This operation can also be done using the Performance Polling option under the Admin tab which will globally configure the hardware stats.

Advanced Settings

By clicking the Advanced Settings option available under Host Details in the right hand side of the details page, you can go to the Performance Data Collectionpage for Servers.

Here you can use the Hardware Health monitoring option to enable or disable hardware monitoring in servers. You can also opt the various hardware components (like power, fan, disk,etc.,) to be monitored by checking the options given. This will globally configure the hardware monitoring status. You can also configure the health status by defining values in the respective text boxes:

  • Critical Severity: If the status matches with any of the values defined in the Critical Severity text box, then Applications Manager displays the status of the hardware device as Critical. The values defined by default are failed, error, failure, nonRecoverable, criticalUpper, criticalLower, nonRecoverableLower and critical.
  • Warning Severity: If the status matches with any of the values defined in the Warning Severity text box, then Applications Manager displays the status of the hardware device as Warning. The values defined by default are degraded, warning, nonCritical, nonCriticalUpper, nonRecoverableUpper and nonCriticalLower.
  • Clear Severity: If the status matches with any of the values defined in the Clear Severity text box, then Applications Manager displays the status of the hardware device as clear. The value defined by default is 'ok'.
  • Note: If the status of the device does not match with any of the values defined in the severity text box, the device status is displayed as unknown. Status values defined within the severity text boxes are comma-separated and case-insensitive.