Monitoring GPU with OpManager

From Version 12.8.658, NVIDIA GPU monitoring is supported through 12 CLI monitors introduced for monitoring key metrics across GPU performance, power and thermal management, device status, and memory utilization. These monitors give sysadmins, DevOps teams, and AI engineers direct visibility into GPU resource consumption and health on Linux-based infrastructure — where NVIDIA GPUs are most commonly deployed for compute-intensive workloads.

Associating NVIDIA GPU Monitors at Device Level

To associate a NVIDIA GPU Monitor, go to the respective Linux-based device's Snapshot page → Monitors → Performance Monitors → Actions → Add Performance Monitors.

Here, monitors are listed in the "NVIDIA GPU Monitors" section. Select the monitors required, and click on Add.

GPU Monitoring Support in Opmanager

Associating NVIDIA GPU Monitors to Multiple Devices

Performance monitors can be associated to multiple devices in bulk from the Settings page. The steps below cover associating the 12 NVIDIA GPU monitors available under the net-snmp vendor to one or more devices. Devices will be associated with the monitors only when the respective CLI credentials are configured for those devices.

  1. Navigate to Settings → Monitoring → Performance Monitors.
  2. Click the Associate button at the top right corner of the page.
  3. In the Associate Monitors panel, locate the Vendors dropdown and select Net-SNMP. The monitor list refreshes to display monitors available under the Net-SNMP vendor.
  4. GPU Monitoring Support in Opmanager

  5. From the filtered list, select the required monitors from the 12 NVIDIA GPU monitors displayed. To include all 12, select the checkbox at the top of the list and click Next.
  6. In the Monitor Association panel, browse the available devices and select the devices requiring GPU monitors, and move them to the Selected Devices section.
  7. GPU Monitoring Support in Opmanager

  8. Click Apply to save the configuration.

Associating NVIDIA GPU monitors to devices through Settings in OpManager enables continuous performance tracking across GPU-enabled hosts from a single configuration point. Once associated, the collected metrics are available on the Device Snapshot page, where thresholds can be configured per device, or at the Settings level to apply uniform threshold values across multiple devices simultaneously.

NVIDIA GPU Monitors (Linux-Based)

Category Monitor Description
GPU Performance NVIDIA GPU Utilization Percentage of GPU resources engaged in processing workloads
NVIDIA GPU Memory Utilization Percentage of the total GPU memory (VRAM) that is actively allocated and being used by running applications and processes
NVIDIA GPU Clock Speed in Percent Percentage of GPU core clock speed relative to its maximum rated frequency
NVIDIA GPU Memory Clock Speed in Percent Percentage of GPU memory clock speed relative to its maximum rated frequency
GPU Status NVIDIA GPU Availability Indicates whether the NVIDIA GPU is currently available and accessible by the system
NVIDIA GPU Compute Mode Indicates whether the NVIDIA GPU compute mode is set to a specific configuration
NVIDIA GPU Display Status Indicates whether the GPU's display output is currently active or inactive
NVIDIA GPU Persistence Mode Indicates whether NVIDIA GPU Persistence Mode is enabled, allowing the GPU to remain initialized between sessions and reducing initialization delay
GPU Power / Thermal Management NVIDIA GPU Temperature in Celsius Current operating temperature of the GPU in degrees Celsius
NVIDIA GPU Power Draw in Watts Amount of electrical power currently consumed by the GPU, measured in watts
NVIDIA GPU Power Draw in Percent Percentage of GPU power usage relative to its maximum rated power limit
NVIDIA GPU Fan Speed in Percent Percentage of GPU fan speed relative to its maximum rated speed

 

Thank you for your feedback!

Was this content helpful?

We are sorry. Help us improve this page.

How can we improve this page?
Do you need assistance with this topic?
By clicking "Submit", you agree to processing of personal data according to the Privacy Policy.