From Version 12.8.658, NVIDIA GPU monitoring is supported through 12 CLI monitors introduced for monitoring key metrics across GPU performance, power and thermal management, device status, and memory utilization. These monitors give sysadmins, DevOps teams, and AI engineers direct visibility into GPU resource consumption and health on Linux-based infrastructure — where NVIDIA GPUs are most commonly deployed for compute-intensive workloads.
To associate a NVIDIA GPU Monitor, go to the respective Linux-based device's Snapshot page → Monitors → Performance Monitors → Actions → Add Performance Monitors.
Here, monitors are listed in the "NVIDIA GPU Monitors" section. Select the monitors required, and click on Add.
Performance monitors can be associated to multiple devices in bulk from the Settings page. The steps below cover associating the 12 NVIDIA GPU monitors available under the net-snmp vendor to one or more devices. Devices will be associated with the monitors only when the respective CLI credentials are configured for those devices.
Associating NVIDIA GPU monitors to devices through Settings in OpManager enables continuous performance tracking across GPU-enabled hosts from a single configuration point. Once associated, the collected metrics are available on the Device Snapshot page, where thresholds can be configured per device, or at the Settings level to apply uniform threshold values across multiple devices simultaneously.
| Category | Monitor | Description |
|---|---|---|
| GPU Performance | NVIDIA GPU Utilization | Percentage of GPU resources engaged in processing workloads |
| NVIDIA GPU Memory Utilization | Percentage of the total GPU memory (VRAM) that is actively allocated and being used by running applications and processes | |
| NVIDIA GPU Clock Speed in Percent | Percentage of GPU core clock speed relative to its maximum rated frequency | |
| NVIDIA GPU Memory Clock Speed in Percent | Percentage of GPU memory clock speed relative to its maximum rated frequency | |
| GPU Status | NVIDIA GPU Availability | Indicates whether the NVIDIA GPU is currently available and accessible by the system |
| NVIDIA GPU Compute Mode | Indicates whether the NVIDIA GPU compute mode is set to a specific configuration | |
| NVIDIA GPU Display Status | Indicates whether the GPU's display output is currently active or inactive | |
| NVIDIA GPU Persistence Mode | Indicates whether NVIDIA GPU Persistence Mode is enabled, allowing the GPU to remain initialized between sessions and reducing initialization delay | |
| GPU Power / Thermal Management | NVIDIA GPU Temperature in Celsius | Current operating temperature of the GPU in degrees Celsius |
| NVIDIA GPU Power Draw in Watts | Amount of electrical power currently consumed by the GPU, measured in watts | |
| NVIDIA GPU Power Draw in Percent | Percentage of GPU power usage relative to its maximum rated power limit | |
| NVIDIA GPU Fan Speed in Percent | Percentage of GPU fan speed relative to its maximum rated speed |
Thank you for your feedback!