IBM AIX Monitoring


Creating a new AIX monitor

Using the REST API to add a new AIX monitor: Click here

Follow the steps given below to create a new AIX Unix monitor:

  1. Click on Add New Monitor link under New Monitor.
  2. Select AIX under Servers category.
  3. Specify the Display Name of the AIX server.
  4. Enter the Hostname/IP Address of the server on which AIX is running.
  5. Enter the Subnet Mask of the network.
  6. Select the Mode of Monitoring (Telnet or SSH).
    • If Telnet is selected, provide user name, password, and port number (default is 23) of the server.
    • If SSH, provide the user name, password and port number (default is 22) of the server. You have an option to give Public Key Authentication (User name and Private Key). You can also give a Passphrase if the private key is protected with one.

      Note: To identify the Public/Private key, go to command prompt, type cd.SSH/ then from the list, open the files <id_dsa.pub>/<id_rsa.pub> [Public] or <id_dsa>/<id_rsa>[Private] to get the keys.

  7. Enter the credential details like user name and password for authentication, or select the required credentials from the Credential Manager list after enabling the Select from Credential list option.
  8. Specify the Port number at which the AIX server is running. Default value is 23.
  9. Specify the command prompt value, which is the last character in your command prompt. Default value is $ and possible values are >, #, etc.
    Note: In the server which you are trying to monitor through SSH, the PasswordAuthentication variable should be set as 'yes' for the data collection to happen. To ensure this, access the file /etc/ssh/sshd_config and verify the value of PasswordAuthentication variable. If it is set as 'no', modify it to 'yes'and restart the SSH Daemon using the command /etc/rc.d/sshd restart.
  10. Enter the Timeout value in seconds.
  11. Specify the Polling interval in minutes.
  12. Choose the Monitor Group from the combo box to which you want to associate the Monitor (optional).
  13. If you are adding a new monitor from an Admin Server, select a Managed Server.
  14. Click Add Monitor(s). This discovers the AIX servers from the network and starts monitoring them.

Monitored Parameters

Go to the Monitors Category View by clicking the Monitors tab. Click on the AIX monitors under the Servers table. Displayed is the bulk configuration view of the AIX monitors distributed into three tabs:

  • Availability tab displays the availability history for the past 24 hours or 30 days.
  • Performance tab displays the health status and events for the past 24 hours or 30 days.
  • List view tab enables you to perform bulk admin configuration.

Click on the individual monitors listed to view detailed AIX server performance metrics. The performance metrics have been categorized into 3 different tabs:

Overview

The Overview tab displays basic monitor information along with the dials for CPU, Memory and Disk utilizations (in percentage). You can click on these dials to view detailed graphs and charts for these attributes. The graphs available are History report, hour of day report, day of week report and heat chart. These graphs can be generated for both real time and historical data.

Parameters Description
Monitor Information
Name The name of the AIX server monitor.
System Health Denotes the health status of the AIX server. (Critical, Warning, and Clear)
Type Denotes the type you are monitoring.
Host Name The hostname of the AIX system.
Host OS The main OS installed on the system.
Last Polled at Specifies the time at which the last poll was performed.
Next Poll at Specifies the time at which the next poll is scheduled.
Today's Availability Shows the overall availability status of the server for the day. You can also view 7/30 reports and the current availability status of the server.
Response Time Amount of time taken by the server to respond (in ms).
Server Uptime Indicates the server uptime of the AIX monitor.
Server Time Current date and time of the AIX server with its timezone.
Time Difference Time difference between the monitoring server time and the Applications Manager's server time (in minutes).
Zombie Process Count The number of Zombie processes. Zombie Processes can hold ports open with no control. It is helpful to see when a zombie process is spawned so it can be deal with accordingly before any issues arise.
Context Switches/s Total number of context switches per second.
Major Page Faults/s Number of major faults the system has made per second, those which have required loading a memory page from disk.
CPU and memory utilization - last six hours - Displays a graphical representation of CPU and memory utilization values (in percentage) for the last six hours. The attributes shown here are Computational Memory Utilization (in % and MB), Swap Memory Utilization (in % and MB), Physical Memory Utilization (in % and MB) and CPU utilization (in %).
Learn more about Page Spaces in AIX

Note: Computational Memory Utilization and Swap Memory Utilization attributes are displayed only for the root user in AIX.

Breakup of CPU Utilization - Displays a graphical representation of the break up of CPU performance metrics for the entire system processor (in percentage) with attributes such as Run Queue, Blocked Process, User Time (%), System Time (%), I/O Wait Time (%), Idle Time (%), and Interrupts/sec.
System Load - Displays a graphical representation of the amount of work that the system performs in terms of number of jobs. The system load during the last one-, five- and fifteen-minute periods are represented by parameters such as Jobs in Minute, Jobs in 5 minutes and Jobs in 15 minutes along with their current and peak values.
The Average System Load graph provides you with an idea of the amount of work that the system performs per core. The Average system load during the last one-, five- and fifteen-minute periods are represented by parameters such as Average Load in Minute, Average Load in 5 minutes and Average Load in 15 minutes.
Page Space - Displays information about the Paging Space details of the AIX server. The Page Space parameter specifies the paging space name along with corresponding metrics such as Size (in MB), Used (in % and MB), and Free (% and MB).
Process Details - Displays information about the processes running on the server. You can add processes for monitoring using the Add New Process option. You can also delete unwanted processes and enable/disable reports for specific processes. Click on any of the attributes listed to view more details.
Monitors in this System - Displays information about the availability and health of the monitors configured in this server. To add new monitors for monitoring, use the Add Monitors option.

You can use the Custom Fields option in the 'Monitor Information' section to configure additional fields for the monitor.

CPU

This tab provides the CPU usage statistics of the Linux server. The tab includes two graphs - one that displays the CPU utilization by CPU Cores and another that shows the Breakup of CPU utilization - by CPU cores. You can view additional reports by clicking the graphs present in the Breakup of CPU Utilization - by CPU coressection. These reports include the following graphs for all the CPU cores:

  • Break up of CPU Utilization (%) vs Time
  • User Time (%) vs Time
  • System Time (%) vs Time
  • I/O Wait Time (%) vs Time
  • Idle Time (%) vs Time
  • Steal Time (%) vs Time
  • CPU Utilization (%) vs Time
  • Interrupts/sec Vs Time

The CPU tab also shows the following performance metrics:

Parameter Description
CPU Utilization - by CPU Cores
Core The name of the CPU core
User Time(%) The percentage of time that the processor spends on User mode operations. This generally means application code.
System Time(%) The percentage of CPU kernel processes that are in use.
Idle Time(%) The time when the CPU is idle (not being used by any program)
I/O Wait Time(%) The time spent by the processor to waiting for I/O to complete.
CPU Utilization(%) Specifies the total CPU used by the system.
Interrupts/sec The rate at which CPU handles interrupts from applications or hardware each second. If the value for Interrupts/sec is high over a sustained period of time, there could be hardware issues.
LPAR CPU Stats
CPU Entitlement Assigned capacity entitlement for a shared partition.
No. of Virtual CPUs Number of virtual processors that are current assigned.
Type Indicates whether the LPAR is using dedicated or shared CPU resource.
Utilization in CPU cores Utilization represents consumed capacity entitlement. This utilization aids in determining the amount of permissible capacity that ought to be allocated for the environment and workload.
CPU Entitlement Utilization (%) The percentage of the physical processor consumed in the assigned capacity entitlement for a shared partition.

You can also view graphs for these attributes by selecting the necessary CPU core and then choosing the appropriate attribute from the dropdown.

Note: Utilization in CPU cores and CPU Entitlement Utilization attributes will be shown only when the value for Type attribute is shared.

Disk

This tab displays disk usage statistics of the AIX server.

Parameters Description
Disk Utilization
Disk The name of the disk drive.
Total Size (MB) The total amount of memory available in that particular disk drive (in MB).
Used (%) Denotes how much disk space out of the total disk space has actually been used (in percentage).
Used (MB) Amount of disk space used (in MB).
Free (%) The percentage of total usable space on the disk that was free.
Free (MB) The unallocated space on the disk (in MB).
Inode Usage
Inode The name of the Inode.
Total The total number of Inodes available in that particular disk.
Used The number of Inodes that are used in that particular disk.
Free The remaining number of Inodes that are available in that particular disk.
Used (%) The percentage of the number of Inodes that are used in that particular disk.
Free (%) The percentage of the remaining number of Inodes that are available in that particular disk.
Disk I/O Statistics
Device The name of the disk device.
Transfers/sec The number of read/write operations that occur on the disk each second.
Writes/sec The number of write operations that occur on the disk each second.
Reads/sec The number of read operations that occur on the disk each second.
% Busy Time The percentage of elapsed time that the disk device was busy servicing read/write requests.
Average Queue Length The average number of both read and write requests that were queued for the disk during the sample interval.
Note: Avg. Queue Length in Disk I/O Statistics is not supported in AIX.

Note: Data collection for Disk I/O statistics and Inode statistics can be enabled from 'Disk I/O Statistics Monitoring' and 'Inode Monitoring' options under Settings → Performance Polling → Servers tab.

Network

Parameter Description
Network Interface
Name The name of the network interface present in the system.
Speed The estimate of the current bandwidth (in Mbps).
MTU Maximum Transmission Unit (MTU) is a measurement of the largest data packet that a network-connected device can accept.
Input Traffic (Kbps) The rate at which packets are received on the interface (in Kbps).
Output Traffic (Kbps) The rate at which packets are sent on the interface (in Kbps).
Input Discards Number of received packets dropped.
Output Discards Number of transmitted packets dropped.
Input Errors Number of received damaged packets.
Output Errors Number of transmitted damaged packets.
Output Queue Length Number of network packets in the output packet queue.
Connection Stats
Socket State The state in which the sockets are present. Following are the list of sockets that are shown:
  • ESTABLISHED - The socket has an established connection.
  • FIN_WAIT1 - The socket is closed, and the connection is shutting down.
  • FIN_WAIT2 - Connection is closed, and the socket is waiting for a shutdown from the remote end.
  • LISTEN - The socket is listening for incoming connections.
  • TIME_WAIT - The socket is waiting after close to handle packets still in the network.
No. of Connections Number of connections that are available for the particular socket state.
NTP Stats
NTP Status Indicates whether the client is synchronized with the server or not.
Server Name Indicates the hostname of the server to which the client is synchronized.
Stratum Level Indicates the level of the strata at which the client is located.
NTP Time correct to within Indicates the time offset value (in milliseconds) displayed for 'time correct to within' after executing the npstat/chrony command.

Time correct to within = (Root dispersion + Root Delay) / 2
Poll Interval Indicates the polling time interval between each sync (in seconds).