Server Monitoring


Overview

In network-level management, maintaining the status and connectivity of the network is a picture at a higher level. It is of prime importance to know the status of the machines in the network, how loaded (or overloaded) they are and how efficiently they are utilized (or overused) to enable necessary corrective administrative functions to be performed on the identified overloaded/poorly performing systems. Server-level management is a down-to-earth concept which involves a lot of manual intervention, human resources, and administrative tasks to be performed. Applications Manager provides a server-level monitoring functionality to achieve such goals and to ease the process of configuration management of hosts.

Supported Operating Systems

Creating a new server monitor

To create any of the above server monitors, follow the steps given below:

  1. Go to New Monitor and click Add New Monitor. Choose the required server under Servers section.
  2. Enter the IP Address or hostname of the host.
  3. Enter the subnet mask of the network.
  4. Select the Mode of Monitoring. (WMI/SNMP/SSH/TELNET)
  5. Enter the credential details like username and password for authentication, or select the required credentials from the Credential Manager list by enabling the Select from Credential List option.
  6. Enter the polling interval time in minutes.
  7. If you are adding a new monitor from an Admin Server, select a Managed Server.
  8. Provide the monitor specific authentication information: Choose the OS type Windows ((2000, 2003, 2003 R2, 2008, 2008 R2, 2012, 2012 R2, XP, NT, Vista, 7, 8 and 10), Linux, Sun OS, IBM AIX, IBM AS400 / iSeries, HP Unix, Tru64 Unix, FreeBSD, Mac OS, Novell, Windows Clusters 2008, 2008 R2) . Based on the type of OS, the 'Mode of Monitoring' information changes.
  9. Choose the Monitor Group from the combo box with which you want to associate the server monitor (optional). You can choose multiple groups to associate your monitor.
  10. Click Add Monitor(s). This discovers the required server from the network and starts monitoring them.
Note:

You can diagnose issues that occur while adding a server monitor whenever any input details are wrongly entered. By clicking the Diagnose the Problem link, you will be able to view various information associated with the server such as Ping test, host details, monitoring modes along with the list of tables having stray entries of the same host. However, this is not applicable for WMI monitoring mode.

Monitored Parameters

  • Availability tab gives the availability history for the past 24 hours or 30 days.
  • Performance tab gives the health status and events for the past 24 hours or 30 days.
  • List view enables you to perform bulk admin configurations.

Click on the individual monitors listed to view the following information:

ParametersDescription
System Load Specifies the number of jobs handled by the system in 1/ 5/ 15 minutes with its peak and current value, and current status.
Average System Load Specifies the average amount of work that the system performs per core in 1/ 5/ 15 minutes with its peak value, current value, and current status, along with the total number of CPU Cores present.
Disk Utilization Specifies the hard disk space utilized by the system and updates with the peak and current value, and current status of the Disk Partition parameter.(The parameter includes C, D, E, F drives, etc. in windows, /home, etc. in Linux.)
Memory Utilization
  • Swap Memory Utilization: Specifies the swap space or the virtual memory utilized by the system with peak and current value, and current status of the parameter.
  • Physical Memory Utilization: Specifies the amount of physical memory utilized by the system with peak and current value, and current status of the parameter.
Disk I/O Stats Specifies read/writes per second, transfers per second, for each device.
Network Interface Specifies details about the network interface in the system, rate of traffic transferred and the status of physical and logical network ports.

Note: Network Interface monitoring (for both IBM AIX and Linux) is possible in SSH and Telnet modes.

Network Adapter Specifies the name, SCSI link status etc.. of the Fibre Channel device.
CPU Utilization Specifies the total CPU used by the system with its peak and current value, and current status.
LPAR CPU Stats
  • CPU Entitlement: Assigned capacity entitlement for a shared partition.
  • No. of Virtual CPUs: Number of virtual processors that are current assigned.
  • Type: Indicates whether the LPAR is using dedicated or shared CPU resource.
  • Utilization in CPU cores: Utilization represents consumed capacity entitlement. This utilization aids in determining the amount of permissible capacity that ought to be allocated for the environment and workload.
  • CPU Entitlement Utilization (%): The percentage of the physical processor consumed in the assigned capacity entitlement for a shared partition.

Note: Utilization in CPU cores and CPU Entitlement Utilization attributes will be shown only when the value for Type attribute is shared.

 

Note: Option is provided for ignoring the monitoring of a specific disk drive in a server. Open <AMServer.properties> file in <AppManager Home/Conf> and add the drive that you do not want to monitor to <am.disks.ignore>. For eg.,

# The drives beginning with the characters given below will not be monitored in the server monitor.
am.disks.ignore=C:

Here, monitoring will not happen for C: drive. Likewise, you can add further disks comma-separated(C;D:/home).

The following table briefs the parameters monitored & the mode of monitoring ( - yes).

 
Note: If the server monitor is added in Telnet & SSH mode, you have the option to directly access Telnet client by clicking on the 'Execute Commands on this server' link found below Today's Availability pie chart. This option is disabled by default.

To enable it, permissions need to be given to the admin or operator to use this telnet client. The permissions can be given from Settings → User Management → Permissions link.

 

Operating System

TelnetSSHSNMPWMI
Windows     (only if Applications Manager is installed on a Windows machine)
Linux  
Solaris  
HP-UX / Tru64 Unix    
FreeBSD  
Mac OS  
IBM AIX    
Novell      
Attributes        
CPU Utilization (all types except Windows NT)
Disk Utilization (all types)
Physical Memory Utilization (IBM AIX -only for the root user, Windows - WMI mode, all other types)
Swap Memory Utilization (IBM AIX - only for the root user, FreeBSD, Linux, Solaris, Windows, Novell)
Computational Memory Utilization (IBM AIX)
Network Interface (all types) available available available available [status attribute data is not available]
Network Adapter (IBM AIX) available available    
Connection Stats (IBM AIX)
LPAR CPU Stats (IBM AIX)    
Average System Load (IBM AIX and Linux)    
NTP Stats (IBM AIX and Linux)    
Process Monitoring (all types)
Process Monitoring - Memory Utilization (all types)
Process Monitoring - CPU Utilization (IBM AIX - FreeBSD, Linux, Mac OS, Solaris, HP Unix / Tru64)  
Process Monitoring - Zombie Process Count (IBM AIX and Linux)    
Service Monitoring (only for Windows, Linux and AIX)  
Event log (only for Windows )      
System Load ( IBM AIX, FreeBSD, Linux, Mac OS, HP-Unix, Solaris, Novell )  
Disk I/O Stats (only for IBM AIX, Linux, Solaris, Novell)    
Hardware monitoring ( Dell & HP)      
Server Uptime ( IBM AIX, FreeBSD, Linux, Mac OS, HP-Unix, Solaris, Novell, Windows )
Firewall monitoring ( Only for Windows )      
Note: To know more about the configuration details required while discovering the host resource, click here.

Avg. Queue Length and % Busy Time in Disk I/O Statistics for AIX is not supported for physical and fibre card.

When it comes to choosing the mode of monitoring for servers, we recommend Telnet/SSH over SNMP.

Page Space in AIX Servers:

To get in-depth details on Page Space in AIX servers, you can use the following command "lsps -a".

The command "lspa -a" lists the location of the paging space logical volumes as they were, not as they are.

Normally page spaces are used when the process running in the system has used the entire allocated memory and it has run out of memory space. It then uses the page spaces in the system to move the piece of code/data that is not currently referenced by the running process into the page space area so that it could be moved back to the Primary memory when it is been referenced again by the currently running process.

While trying to monitor the AIX server, if you get "No data available" for Page Space, you can troubleshoot it by following the steps given below:

First, you need to establish a connection only through TELNET or SSH mode.

Second, check whether the command lsps -a exists in the system and then execute it.

Note: Avg. Queue Length and % Busy Time in Disk I/O Statistics for AIX is not supported for physical and fibre card.

Displaying Paging Space Characteristics

The "lsps" command displays the characteristics of paging spaces, such as the paging space name, physical volume name, volume group name, size, percentage of the paging space used, whether space is active or inactive, and whether the paging space is set to automatic. The paging space parameter specifies the paging space whose characteristics are to be shown.

The following examples show the use of lsps command with various flags to obtain the paging space information. The "-c" flag will display the information in colon format and paging space size in physical partitions.

# lsps -a

Page SpacePhysical VolumeVolume GroupSize%UsedActiveAutoType
paging00 hdisk1 rootvg 80MB 1 yes yes lv
hd6 hdisk1 rootvg 256MB 1 yes yes lv

Adding and Activating a Paging Space

To make a paging space available to the operating system, you must add the paging space and then make it available. The total space available to the system for paging is the sum of the sizes of all active paging-space logical volumes.

Note: You should not add paging space to volume groups on portable disks because removing a disk with an active paging space will cause the system to crash.

You can get more details about the command here: https://www.ibm.com/support/knowledgecenter/en/ssw_aix_72/devicemanagement/pscpag_space_config.html

Apart from the above-mentioned parameters, you can also monitor the following:

To monitor processes in a server

  1. In the Server Monitor page under Process Details, click Add New Process.
  2. All the processes that are running would be displayed along with CPU and Memory utilization statistics. (Only memory statistics is shown for Windows and SNMP mode of monitoring)
  3. Select the processes that you want to monitor.

After configuring the processes, they are listed under the Process Details section of the Server Monitor page. By clicking on the process, you can view its availability graph. You can also configure alarms for a particular process.

You can edit the Display Name, Process Name, Commands and Arguments of the particular process by clicking on the Edit Process icon.

To monitor Windows, Linux and IBM AIX services

  1. In the Monitor page, under Service Details, click on Add New Service.
  2. All the services that are running will be displayed along with the service name and its status.
  3. Select the services that you want to monitor.

After configuring the services, they are listed under the Service Details section of the Monitor page. By clicking on the service, you can view its availability graph and also configure alarms for the availability of that particular service.

Apart from monitoring the availability of the service, you can manage the services by using the start, stop, restart options. When the service goes down, configure action 'Restart the Service' along with other actions.

Note:
  • Windows Services monitoring is possible only in WMI mode of monitoring
  • Linux and AIX Services monitoring is possible only in SSH/telnet mode of monitoring

To monitor Network Interfaces

In the Server Monitor page, under Network Interfaces, all the network interfaces will be listed. The various attributes that can be monitored are:

  • Interface Traffic - Input traffic (bits received), Output Traffic (bits transmitted). You can set alarm thresholds for these attributes.
  • Interface utilization - Input Utilization %, Output Utilization %. You can set alarm thresholds for these attributes.
  • Packets received - Packets received per second
  • Packets transmitted - Packets transmitted per second
  • Error packets - No. of packets in error per second after receiving the packets
  • Discarded packets - No. of packets discarded per second after receiving the packets
  • Health - the health of the interface based on the attributes
  • Status - whether the interface is up or down (shown only in SNMP mode of monitoring)
Note: Network Interface monitoring is possible only in SNMP and WMI mode of monitoring

Associating Scripts and URLs to the Host Resource

By associating a script or a URL to a Host resource, their attributes become one among the other attributes of the Host and their data is also shown under Host Details itself. The health of the Host resource is dependent on the Health of the Scripts and URLs as well.

For eg., If you wish to monitor RequestExecutionTime, RequestsCurrent, RequestsDisconnected of the ASP.NET application, WMI scripts can be used to get the statistics (this info is not available when Applications Manager is used). You can write your own script that would fetch these details then configure this script to the Applications Manager. After configuring this script to the Applications Manager you can associate this script to the host monitor itself. Then the attributes of the script would behave like the other attributes of the Host monitor. Hence, you can configure in such a way that the Health of the script directly affects the health of the host.

Likewise, If you want to monitor a website hosted in a system in such a way that, whenever there is a change in the health of the website, the health of the server should reflect the change. In this case, you can configure the URL monitor and then associate that URL to the host. Hence, if the website is down, the health of the Host resource is affected.

  • Associate/Remove Scripts: Click on the 'Associate/Remove Scripts' link in Host Details. Scripts that are associated and that are not associated with the Host would be listed. Accordingly, you can then select the scripts that you want to associate or remove.
  • Associate/Remove URLs: Click on the 'Associate/Remove URLs' link in Host Details. URLs that are associated and that are not associated with the Host would be listed. Accordingly, you can then select the URLs that you want to associate or remove.

Mode of Monitoring - SSH/Telnet vs SNMP

We recommend Telnet or SSH mode of monitoring because the following attributes are not available through SNMP:

  • Disk I/O Stats
  • Process Monitoring - CPU Utilization
  • Swap Memory Utilization

Please check this link for more details.

System administrators generally prefer to check system resources with commands and will prefer to compare it with the SSH/telnet mode output, rather than running SNMP walk to compare. Also, having the connection to the Linux boxes over SSH will make it easier for you to configure the same for script monitors or 'execute program' actions if required.

Commands used for server monitoring

Here is a list of commands used by Applications Manager for both Windows, Linux and Unix Servers:

Windows:

ParameterCommand
Disk Utilization disk.vbs
Win Physical Disk Stats diskio.vbs
Network Interface NetworkInterface.vbs
Network Adapter NetworkAdapter.vbs
Memory Utilization memory.vbs
CPU Utilization cpu.vbs
CPU Core Utilization cpucore.vbs
Services services.vbs
Process PhyMemCpuImportProduct.vbs
Server Uptime uptime.vbs

Linux:

ParameterCommand
Memory Utilization free -b
Memory Utilization in the Memory tab LANG=C cat /proc/meminfo;echo '-----FREE_MEM_STATS-----';LANG=C free -m
System reboot date +%s;/bin/cat /proc/uptime | cut -d "." -f1
Current Date and Time LANG=C date
ThreadCount ps -eo nlwp | awk '{ threadcount += $1 } END { print threadcount }'
Disk Utilization /bin/df -Pm
Disk IO Stats export S_COLORS='never';LANG=C iostat -d;echo '-----DISK_EXTENDED_STATS-----';iostat -d -x 1 3
Inode Usage /bin/df -Pi
System Load uptime
CPU Utilization /usr/bin/vmstat 1 3
CPU Core Utilization export S_COLORS='never';mpstat -P ALL 1 3
Kernel Statistics export S_COLORS='never';LANG=C sar -B -w 1 3 | awk '{if(NR>2)print}'
Server Uptime uptime|cut -d ',' -f1,2|tr -s ' ' '^'|cut -d '^' -f 2-
Network Interface LANG=C ip -s -j link (if json format is supported) (or) LANG=C ip -s link
Network State LANG=C netstat -nat | awk '{if(NR>1)print}' | awk '{print $6}' | sort | uniq -c | sort -n
NTP Monitoring ntpstat (or) chronyc tracking
NTP Status yes N | LANG=C ntpstat (or) LANG=C chronyc tracking (if Chronyc installed)

Unix:

ParameterCommand
Memory Utilization export UNIX95;top -d 1 -n 2
Disk Utilization /bin/df -m
System Load uptime
CPU Utilization /usr/bin/vmstat 1 3
CPU Core Utilization /usr/bin/vmstat -n 0 -P 1 3
Server Uptime uptime|cut -d ',' -f1,2|tr -s ' ' '^'|cut -d '^' -f 2-

 

Note:
1.) For Windows Open command prompt as administrator, execute the commands under <Applications Manager Home>/working/conf/applications/scripts directory and execute in the following format: cscript command hostname domain\username password
Replace the command with the command specified above respectively, the hostname with the actual hostname of the server and username and password with the credentials.

2.) Also the command for CPU Utilization for Windows 2008 and Windows 2000 alone changes as "cpu_2008.vbs" and "cpu_2000.vbs" respectively.