IBM AIX Monitoring


Creating a new AIX monitor

Using the REST API to add a new AIX monitor: Click here

Follow the steps given below to create a new AIX Unix monitor:

  1. Click on Add New Monitor link under New Monitor.
  2. Select AIX under Servers category.
  3. Specify the Display Name of the AIX server.
  4. Enter the Hostname/IP Address of the server on which AIX is running.
  5. Enter the Subnet Mask of the network.
  6. Select the Mode of Monitoring (Telnet or SSH).
    • If Telnet is selected, provide user name, password, and port number (default is 23) of the server.
    • If SSH, provide the user name, password and port number (default is 22) of the server. You have an option to give Public Key Authentication (User name and Private Key). You can also give a Passphraseif the private key is protected with one.

      Note: To identify the Public/Private key, go to command prompt, type cd.SSH/ then from the list, open the files <id_dsa.pub>/<id_rsa.pub> [Public] or <id_dsa>/<id_rsa>[Private] to get the keys.

  7. Enter the credential details like user name and password for authentication, or select the required credentials from the Credential Manager list after enabling the Select from Credential list option.
  8. Specify the Port number at which the AIX server is running. Default value is 23.
  9. Specify the command prompt value, which is the last character in your command prompt. Default value is $ and possible values are >, #, etc.
    Note: In the server which you are trying to monitor through SSH, the PasswordAuthentication variable should be set as 'yes' for the data collection to happen. To ensure this, access the file /etc/ssh/sshd_config and verify the value of PasswordAuthentication variable. If it is set as 'no', modify it to 'yes'and restart the SSH Daemon using the command /etc/rc.d/sshd restart.
  10. Enter the Timeout value in seconds.
  11. Specify the Polling interval in minutes.
  12. Choose the Monitor Group from the combo box to which you want to associate the Monitor (optional).
  13. If you are adding a new monitor from an Admin Server, select a Managed Server.
  14. Click Add Monitor(s). This discovers the AIX servers from the network and starts monitoring them.

Monitored Parameters

Go to the Monitors Category View by clicking the Monitors tab. Click on the AIX monitors under the Servers table. Displayed is the bulk configuration view of the AIX monitors distributed into three tabs:

  • Availability tab displays the availability history for the past 24 hours or 30 days.
  • Performance tab displays the health status and events for the past 24 hours or 30 days.
  • List view tab enables you to perform bulk admin configuration.

Click on the individual monitors listed to view detailed AIX server performance metrics. The performance metrics have been categorized into 6 different tabs:

Overview

The Overview tab displays basic monitor information along with the dials for CPU, Memory and Disk utilizations (in percentage). You can click on these dials to view detailed graphs and charts for these attributes. The graphs available are History report, hour of day report, day of week report and heat chart. These graphs can be generated for both real time and historical data.

ParametersDescription
Monitor Information
Name The name of the AIX server monitor.
System Health Denotes the health status of the AIX server. (Critical, Warning, and Clear)
Type Denotes the type you are monitoring.
Host Name The hostname of the AIX system.
Host OS The main OS installed on the system.
Last Polled at Specifies the time at which the last poll was performed.
Next Poll at Specifies the time at which the next poll is scheduled.
Today's Availability Shows the overall availability status of the server for the day. You can also view 7/30 reports and the current availability status of the server.
Response Time Amount of time taken by the server to respond (in ms).
Server Uptime Indicates the server uptime of the AIX monitor.
Server Time Current date and time of the AIX server with its timezone.
Time Difference Time difference between the monitoring server time and the Applications Manager's server time (in minutes).
Zombie Process Count The number of Zombie processes. Zombie Processes can hold ports open with no control. It is helpful to see when a zombie process is spawned so it can be deal with accordingly before any issues arise.
Context Switches/s Total number of context switches per second.
Major Page Faults/s Number of major faults the system has made per second, those which have required loading a memory page from disk.
CPU and memory utilization - last six hours - Displays a graphical representation of CPU and memory utilization values (in percentage) for the last six hours. The attributes shown here are Computational Memory Utilization (in % and MB), Swap Memory Utilization (in % and MB), Physical Memory Utilization (in % and MB) and CPU utilization (in %).
Learn more about Page Spaces in AIX

Note: Computational Memory Utilization and Swap Memory Utilization attributes are displayed only for the root user in AIX.

Breakup of CPU Utilization - Displays a graphical representation of the break up of CPU performance metrics for the entire system processor (in percentage) with attributes such as Run Queue, Blocked Process, User Time (%), System Time (%), I/O Wait Time (%), Idle Time (%), and Interrupts/sec.
The System Load graph indicates the average system load on the central processing unit (CPU) over predefined time intervals. The system load during the last one-, five- and fifteen-minute periods are represented by parameters such as Load average in minute, Load average in 5 minutes and Load average in 15 Minutes along with their current and peak values.
Note: The attributes are displayed differently for Applications Manager versions below 170500:
  • Load average in minute → Jobs in Minute
  • Load average in 5 Minutes → Jobs in 5 Minutes
  • Load average in 15 Minutes → Jobs in 15 Minutes
The Average System Load graph indicates the average system load per core over predefined time intervals. The Average system load during the last one-, five- and fifteen-minute periods are represented by parameters such as Average load per core in minute, Average load per core in 5 minutes and Average load per core in 15 minutes
Note: The attributes are displayed differently for Applications Manager versions below 170500:
  • Average load per core in minute → Average Load in Minute
  • Average load per core in 5 minutes → Average Load in 5 Minutes
  • Average load per core in 15 minutes → Average Load in 15 Minutes
Page Space - Displays information about the Paging Space details of the AIX server. The Page Space parameter specifies the paging space name along with corresponding metrics such as Size (in MB), Used (in % and MB), and Free (% and MB).
Process Details - Displays information about the processes running on the server. You can add processes for monitoring using the Add New Process option. You can also delete unwanted processes and enable/disable reports for specific processes. Click on any of the attributes listed to view more details.
Service Details - Displays information about the services configured in the server. Users can choose to delete, restart, start, stop, manage, unmanage, and unmanage & reset the status of selected services. To add new services for monitoring, use the Add New Service option.
Monitors in this System - Displays information about the availability and health of the monitors configured in this server. To add new monitors for monitoring, use the Add Monitors option.

You can use the Custom Fields option in the 'Monitor Information' section to configure additional fields for the monitor.

CPU

This tab provides the CPU usage statistics of the Linux server. The tab includes two graphs - one that displays the CPU utilization by CPU Cores and another that shows the Breakup of CPU utilization - by CPU cores. You can view additional reports by clicking the graphs present in the Breakup of CPU Utilization - by CPU coressection. These reports include the following graphs for all the CPU cores:

  • Break up of CPU Utilization (%) vs Time
  • User Time (%) vs Time
  • System Time (%) vs Time
  • I/O Wait Time (%) vs Time
  • Idle Time (%) vs Time
  • Steal Time (%) vs Time
  • CPU Utilization (%) vs Time
  • Interrupts/sec Vs Time

The CPU tab also shows the following performance metrics:

ParameterDescription
CPU Utilization - by CPU Cores
Core The name of the CPU core
User Time(%) The percentage of time that the processor spends on User mode operations. This generally means application code.
System Time(%) The percentage of CPU kernel processes that are in use.
Idle Time(%) The time when the CPU is idle (not being used by any program)
I/O Wait Time(%) The time spent by the processor to waiting for I/O to complete.
CPU Utilization(%) Specifies the total CPU used by the system.
Interrupts/sec The rate at which CPU handles interrupts from applications or hardware each second. If the value for Interrupts/sec is high over a sustained period of time, there could be hardware issues.
LPAR CPU Stats
CPU Entitlement Assigned capacity entitlement for a shared partition.
No. of Virtual CPUs Number of virtual processors that are current assigned.
Type Indicates whether the LPAR is using dedicated or shared CPU resource.
Utilization in CPU cores Utilization represents consumed capacity entitlement. This utilization aids in determining the amount of permissible capacity that ought to be allocated for the environment and workload.
CPU Entitlement Utilization (%) The percentage of the physical processor consumed in the assigned capacity entitlement for a shared partition.

You can also view graphs for these attributes by selecting the necessary CPU core and then choosing the appropriate attribute from the dropdown.

Note: Utilization in CPU cores and CPU Entitlement Utilization attributes will be shown only when the value for Type attribute is shared.

Disk

This tab displays disk usage statistics of the AIX server.

ParametersDescription
Disk Utilization
Disk The name of the disk drive.
Total Size (MB) The total amount of memory available in that particular disk drive (in MB).
Used (%) Denotes how much disk space out of the total disk space has actually been used (in percentage).
Used (MB) Amount of disk space used (in MB).
Free (%) The percentage of total usable space on the disk that was free.
Free (MB) The unallocated space on the disk (in MB).
Inode Usage
Inode The name of the Inode.
Total The total number of Inodes available in that particular disk.
Used The number of Inodes that are used in that particular disk.
Free The remaining number of Inodes that are available in that particular disk.
Used (%) The percentage of the number of Inodes that are used in that particular disk.
Free (%) The percentage of the remaining number of Inodes that are available in that particular disk.
Disk I/O Statistics
Device The name of the disk device.
Transfers/sec The number of read/write operations that occur on the disk each second.
Writes/sec The number of write operations that occur on the disk each second.
Reads/sec The number of read operations that occur on the disk each second.
Throughput (Kbps) The rate of read and write operations that occur in a disk each second.
% Busy Time The percentage of elapsed time that the disk device was busy servicing read/write requests.
Note: Not applicable for physical card and fibre card.
Average Queue Length The average number of both read and write requests that were queued for the disk during the sample interval.
Note: Not applicable for physical card and fibre card.

Note: Data collection for Disk I/O statistics and Inode statistics can be enabled from 'Disk I/O Statistics Monitoring' and 'Inode Monitoring' options under Settings → Performance Polling → Servers tab.

Network

ParameterDescription
Network Interface
Note: Applications Manager supports the Network Interface parameters for AIX in SSH/Telnet mode.
Name The name of the network interface present in the system.
Speed The estimate of the current bandwidth (in Mbps).
MTU Maximum Transmission Unit (MTU) is a measurement of the largest data packet that a network-connected device can accept.
Input Traffic (Kbps) The rate at which packets are received on the interface (in Kbps).
Output Traffic (Kbps) The rate at which packets are sent on the interface (in Kbps).
Input Discards Number of received packets dropped.
Output Discards Number of transmitted packets dropped.
Input Errors Number of received damaged packets.
Output Errors Number of transmitted damaged packets.
Output Queue Length Number of network packets in the output packet queue.
DMA Overrun Number of DMA overrun errors. If the count is high, it indicates that the network interface card is not able to keep up with the data transfer rate.
Align Errors Number of alignment errors in the network. If the count is high, it indicates a problem with the network interface card or the network cable.
CRC Errors Number of CRC errors in the network. If the count is high, it indicates a problem with the network interface card or the network cable.
Physical Port Status The status of physical network ports. Possible values: UP or DOWN.
Logical Port Status The status of logical network ports. Possible values: UP or DOWN.
Network Adapter
Fibre Channel Name Name of the Fiber Channel.
Fibre Channel Status The status of the Fiber Channel device. Possible values: DEFINED, AVAILABLE, and STOPPED.
Fibre Channel SCSI Link Status The SCSI link status of the Fiber Channel device. Possible values: NONE, SWITCH, and AL modes.
Connection Stats
Socket State The state in which the sockets are present. Following are the list of sockets that are shown:
  • ESTABLISHED - The socket has an established connection.
  • FIN_WAIT1 - The socket is closed, and the connection is shutting down.
  • FIN_WAIT2 - Connection is closed, and the socket is waiting for a shutdown from the remote end.
  • LISTEN - The socket is listening for incoming connections.
  • TIME_WAIT - The socket is waiting after close to handle packets still in the network.
No. of Connections Number of connections that are available for the particular socket state.
NTP Stats
NTP Status Indicates whether the client is synchronized with the server or not.
Server Name Indicates the hostname of the server to which the client is synchronized.
Stratum Level Indicates the level of the strata at which the client is located.
NTP Time correct to within Indicates the time offset value (in milliseconds) displayed for 'time correct to within' after executing the npstat/chrony command.

Time correct to within = (Root dispersion + Root Delay) / 2
Poll Interval Indicates the polling time interval between each sync (in seconds).

Errpt

ParametersDescription
Hardware Errors Displays hardware error information.
Error Log Time Log time of the error.
Identifier Unique identifier Value of the error.
Name Name of the error.
Error Type Type of the error.
Hardware Error Message Displays error message of hardware.

Configuration

ParametersDescription
System Information
Model Model of the system.
Serial Number Serial number of the system.
OS Information
OS Version Version of the OS.
OS Release Indicates the Technology Level (TL) of the AIX system.
Technology Level Technology level of the OS.
Service Pack The service pack value.
Memory Information
Memory Size (MB) Total size of the memory.
Good Memory Size (MB) Total size of the good memory.
Processor Information
Processor Type Type of the processor.
Processor Implementation Mode Implementation mode of the processor.
Processor Version Version of the processor.
Number Of Processors Total number of processors used.
Processor Clock Speed (MHz) Clock speed of the processor.
CPU Type Type of the CPU.
Firmware Information
Platform Firmware level The current firmware level (often referred to as System Firmware) on an AIX system.
Firmware Version Version of the firmware.
Paging Space Information
Total Paging Space (MB) Total paging space available.
Percent Used (%) Total percentage of the paging space used.
Volume Group Information
Volume Group Name of the Volume Group.
Volume Group State State of the Volume Group.
Volume Group Permission Indicates the permission enabled for the volume group. Possible values: READ/WRITE.
Total Physical Partitions Total number of Physical partitions in the Volume group.
Free Physical Partitions Total number of Free Physical partitions in the Volume group.
Used Physical Partitions Total number of Used Physical partitions in the Volume group.
Max Physical Partitions per Volume Group The maximum physical partitions per Volume group.
Total Logical Volumes The total number of Logical volumes in the group.
Open Logical Volumes The total number of Open Logical volumes in the group.
Total Physical Volumes The total number of Physical volumes in the group.
Stale Physical Volumes The number of stale physical volumes in the group.
Physical volumes Restriction Indicates whether there are any restrictions imposed on the physical volume.
Total Quorum Total number of Quorum in the volume group.
Physical Volume Group Information
Volume Name Name of the Volume.
Volume State State of the Volume.
Total Physical Partitions Total number of physical partitions.
Free Physical Partitions Total number of free physical partitions.
Free Distribution The number of Free distributions in the physical volume group.
Logical Volume Group Information
Volume Name Name of the Volume.
Volume Type Type of the Volume.
Total Logical Partitions Total number of logical partitions.
Total Physical Partitions Total number of physical partitions.
Total Physical Volumes Total number of physical volumes.
Volume State State of the Volume.
Mount Point Mount point of the logical volume group.