Prerequisites for Hardware Monitoring

It is essential to monitor the hardware components of various critical devices in your network to ensure continuous service availability and network uptime. OpManager, the advanced hardware monitor solution, supports monitoring the hardware status of the servers and network devices in your environment from vendors such as Cisco, Juniper, HP and Dell. It monitors various important hardware parameters such as voltage, temperature, power, fan speed, processors, etc., via SNMP for your network and server devices and via vSphere for VMware ESX/ ESXi hosts. OpManager offers in-depth server and hardware monitor functionality for your network.

Prerequisites for HP/Dell Servers:

HP:

If Hardware Sensor Monitors are not displayed, then please make sure that these tools are installed on that server:

  • HP Insight Server Agents
  • HP Insight Foundation Agents
  • HP Insight Storage Agents

 

Dell:

If Hardware Sensor Monitors are not displayed, then please make sure that Dell OpenManage has been installed on that server.

Where are the hardware tabs?

If you find the hardware tabs missing, follow the below steps:

1. If the device is a VMware ESX/ESXi host:

OpManager uses the methods hardwareStatusInfo and numericSensorInfo from VMware API to poll the hardware status and stats of devices in the VMware environment. To make sure hardware monitoring works properly, check whether sensor information are available on MOB by using the following MOB link:

  • In case of ESX discovery:
    • For numericSensorInfo:

      https://<<hostname/IPAddress>>/mob/?moid=ha-host&doPath=runtime.healthSystemRuntime.systemHealthInfo.numericSensorInfo

    • For hardwareStatusInfo (cpuStatusInfo / memoryStatusInfo / storageStatusInfo):

      https://<<hostname/IPAddress>>/mob/?moid=ha-host&doPath=runtime.healthSystemRuntime.hardwareStatusInfo

  • In case of vCenter discovery:

    https://<<vcentrename/IPAdress>>/mob/?

    After logging into the MOB, navigate to the paths given below and check if values are being populated for both the methods:
    • For numericSensorInfo: content → rootFolder → childEntity → hostFolder → childEntity [select appropriate host] → host → runtime → healthSystemRuntime → systemHealthInfo → numericSensorInfo
    • For hardwareStatusInfo: content → rootFolder → childEntity → hostFolder → childEntity [select appropriate host] → host → runtime → healthSystemRuntime → hardwareStatusInfo → cpuStatusInfo (or) memoryStatusInfo (or) storageStatusInfo
    Note that OpManager raises alerts based on the colour value available (alerts are raised if the colour is anything other than "green").

If the sensors are not available, install VMware tools on that host.

2. If the device is HP/Dell/Cisco/Juniper:

Query the below OIDs and check if it responds for all the OIDs if it responds then rediscover the device. If it is not responding, then OpManager won't show the tabs.

  • HP:

    OID Parameter
    .1.3.6.1.4.1.232.11.2.2.1.0 Operating System
    .1.3.6.1.4.1.232.11.2.2.2.0 OS Version
    .1.3.6.1.4.1.232.2.2.4.2.0 Model
    .1.3.6.1.4.1.232.2.2.2.6.0 Service tag
    .1.3.6.1.4.1.232.2.2.2.1.0 Serial number
  • Dell:

    OID Parameter
    .1.3.6.1.4.1.674.10892.1.300.10.1.8.1 Manufacturer
    .1.3.6.1.4.1.674.10892.1.300.10.1.9.1 Model
    .1.3.6.1.4.1.674.10892.1.300.10.1.11.1 Service Tag
    .1.3.6.1.4.1.674.10892.1.400.10.1.6.1 Operating System
    .1.3.6.1.4.1.674.10892.1.400.10.1.7.1 OS Version
  • Cisco:

    OID Parameter
    .1.3.6.1.2.1.47.1.1.1.1.13.1 Hardware Model
    .1.3.6.1.2.1.47.1.1.1.1.11.1 Serial Number
  • Juniper:

    OID Parameter
    .1.3.6.1.4.1.2636.3.1.2.0 Model
    .1.3.6.1.4.1.2636.3.1.3.0 Serial Number

3. Check whether Hardware monitoring is enabled under Settings → Monitoring → Monitor Settings → Hardware.

4. Check if Hardware monitoring is enabled for the individual devices in the Device snapshot → Hardware tab.

5. Suppress Hardware Alarms:

  1. Check if the hardware alarms for the respective devices have been suppressed in OpManager.
  2. To suppress all the Hardware Alarms for all devices: Go to Settings → Monitoring → Monitor Settings → Hardware tab and click on Suppress Alarms under Hardware section.
  3. You can also go to the Hardware tab in the Device Snapshot page and suppress the hardware alarm for a particular device.

6. Check if Hardware status is not updated:

For OpManager to monitor the hardware of your devices, check if the following OIDs are responding properly.

  • For Cisco devices:

    Supported MIBs: Cisco-envmon-mib | ENTITY-MIB MIB
    (All Cisco devices that use these MIBs can be monitored using OpManager)

    .1.3.6.1.2.1.47.1.1.1.1.13.1 - HW_MODEL
    .1.3.6.1.2.1.47.1.1.1.1.11.1 - HW Serial num

    Metric type OID of corresponding metric name OID of corresponding metric status OID of corresponding metric value
    Temperature .1.3.6.1.4.1.9.9.13.1.3.1.2
    (TemperatureStatusDescr)
    .1.3.6.1.4.1.9.9.13.1.3.1.3
    (TemperatureStatusValue)
    .1.3.6.1.4.1.9.9.13.1.3.1.6
    (TemperatureState)
    Voltage .1.3.6.1.4.1.9.9.13.1.2.1.2
    (VoltageStatusDescr)
    .1.3.6.1.4.1.9.9.13.1.2.1.3
    (VoltageStatusValue)
    .1.3.6.1.4.1.9.9.13.1.2.1.7
    (VoltageState)
    Fan .1.3.6.1.4.1.9.9.13.1.4.1.2
    (FanStatusDescr)
    .1.3.6.1.4.1.9.9.13.1.4.1.3
    (FanState)
    NA
    Power .1.3.6.1.4.1.9.9.13.1.5.1.2
    (SupplyStatusDescr)
    .1.3.6.1.4.1.9.9.13.1.5.1.3
    (SupplyState)
    NA
  • For Cisco Nexus devices:

    Supported MIB: CISCO-ENTITY-FRU-CONTROL-MIB
    (All Cisco Nexus devices that use this MIB can be monitored using OpManager)

    Metric type OID
    Power .1.3.6.1.4.1.9.9.117.1.1.2.1.1
    {FRUPowerAdminStatus)
    .1.3.6.1.4.1.9.9.117.1.1.2.1.2
    (FRUPowerOperStatus)
    .1.3.6.1.4.1.9.9.117.1.1.2.1.3
    (FRUCurrent)
    Fan .1.3.6.1.4.1.9.9.117.1.4.1.1.1
    (FanTrayOperStatus)
  • For Checkpoint devices:

    Supported MIBs: CHECKPOINT-MIB
    (All Checkpoint devices that use these MIBs can be monitored using OpManager)

    Metric type OID of corresponding metric name OID of corresponding metric status OID of corresponding metric value
    Voltage .1.3.6.1.4.1.2620.1.6.7.8.3.1.2
    (voltageSensorName)
    1.3.6.1.4.1.2620.1.6.7.8.3.1.6
    (voltageSensorStatus)
    .1.3.6.1.4.1.2620.1.6.7.8.3.1.3
    (voltageSensorValue)
    Fan .1.3.6.1.4.1.2620.1.6.7.8.2.1.2
    (fanSpeedSensorName )
    1.3.6.1.4.1.2620.1.6.7.8.2.1.6
    (fanSpeedSensorStatus)
    1.3.6.1.4.1.2620.1.6.7.8.2.1.3
    (fanSpeedSensorValue)
    Temperature .1.3.6.1.4.1.2620.1.6.7.8.1.1.2
    (tempertureSensorName)
    .1.3.6.1.4.1.2620.1.6.7.8.1.1.6
    (tempertureSensorStatus )
    1.3.6.1.4.1.2620.1.6.7.8.1.1.3
    (tempertureSensorValue)
  • For HP servers:

    Supported MIBs: CPQHOST-Mib | CPQHLTH-Mib | CPQSINFO-Mib
    (All HP servers that use these MIBs can be monitored using OpManager)

    Metric type OID of corresponding metric name OID of corresponding metric status OID of corresponding metric value
    Temperature .1.3.6.1.4.1.232.6.2.6.8.1.8
    (TemperatureHwLocation)
    (or)
    .1.3.6.1.4.1.232.6.2.6.8.1.3
    (TemperatureLocale)
    .1.3.6.1.4.1.232.6.2.6.8.1.6 .1.3.6.1.4.1.232.6.2.6.8.1.4
    Fan .1.3.6.1.4.1.232.6.2.6.7.1.11
    (FanHwLocation)
    (or)
    .1.3.6.1.4.1.232.6.2.6.7.1.3
    (FanLocale)
    .1.3.6.1.4.1.232.6.2.6.7.1.9
    (FanCondition)
    .1.3.6.1.4.1.232.6.2.6.7.1.12
    (FanCurrentSpeed)
    Processors .1.3.6.1.4.1.232.1.2.2.1.1.3
    (CpuName)
    .1.3.6.1.4.1.232.1.2.2.1.1.6
    CpuStatus)
    .1.3.6.1.4.1.232.1.2.2.1.1.4
    (CpuSpeed)
    Power .1.3.6.1.4.1.232.6.2.9.3.1.11
    (PowerSupplySerialNumber)
    .1.3.6.1.4.1.232.6.2.9.3.1.4
    (PowerSupplyCondition)
    .1.3.6.1.4.1.232.6.2.9.3.1.8
    (PowerSupplyCapacityMaximum)
    Partition details .1.3.6.1.4.1.232.11.2.4.1.1.2
    (FileSysDesc)
    .1.3.6.1.4.1.232.11.2.4.1.1.8
    (FileSysStatus)
    .1.3.6.1.4.1.232.11.2.4.1.1.5
    FileSysPercentSpaceUsed)
    Memory .1.3.6.1.4.1.232.6.2.14.12.1.3
    (BoardCpuNum)
    .1.3.6.1.4.1.232.6.2.14.12.1.11
    (BoardCondition)
    .1.3.6.1.4.1.232.6.2.14.12.1.9
    (BoardOsMemSize)
  • For Dell servers:

    Supported MIBs: DELL-RAC-Mib | StorageManagement-MIB.mib | MIB-Dell-10892.mib
    (All Dell servers that use these MIBs can be monitored using OpManager)

    Metric type OID of corresponding metric name OID of corresponding metric status OID of corresponding metric value
    Temperature .1.3.6.1.4.1.674.10892.1.700.20.1.8
    (ProbeLocationName)
    .1.3.6.1.4.1.674.10892.1.700.20.1.5
    (ProbeStatus)
    .1.3.6.1.4.1.674.10892.1.700.20.1.6
    (ProbeReading)
    Fan .1.3.6.1.4.1.674.10892.1.700.12.1.8
    (DeviceLocationName)
    .1.3.6.1.4.1.674.10892.1.700.12.1.5
    (DeviceStatus)
    .1.3.6.1.4.1.674.10892.1.700.12.1.6
    (DeviceReading)
    Processors .1.3.6.1.4.1.674.10892.1.1100.30.1.23
    (DeviceBrandName)
    .1.3.6.1.4.1.674.10892.1.1100.30.1.5
    (DeviceStatus)
    .1.3.6.1.4.1.674.10892.1.1100.30.1.11
    (DeviceMaximumSpeed)
    Power .1.3.6.1.4.1.674.10892.1.600.60.1.6
    (EntityName)
    .1.3.6.1.4.1.674.10892.1.600.60.1.5
    (Status)
    .1.3.6.1.4.1.674.10892.1.600.60.1.9
    (PeakWatts)
    Voltage .1.3.6.1.4.1.674.10892.1.600.20.1.8
    (ProbeLocationName)
    .1.3.6.1.4.1.674.10892.1.600.20.1.5
    (ProbeStatus)
    .1.3.6.1.4.1.674.10892.1.600.20.1.6
    (ProbeReading)
    Disk Array Data .1.3.6.1.4.1.674.10893.1.20.130.4.1.2
    (arrayDiskName)
    .1.3.6.1.4.1.674.10893.1.20.130.4.1.4
    (arrayDiskStatus)
    .1.3.6.1.4.1.674.10893.1.20.130.4.1.17
    (arrayDiskUsedSpaceInMB)
    Battery .1.3.6.1.4.1.674.10892.1.600.50.1.7
    (LocationName)
    .1.3.6.1.4.1.674.10892.1.600.50.1.5
    (Status)
    .1.3.6.1.4.1.674.10892.1.600.50.1.4
    (StateSettings)
  • For Juniper devices:

    Supported MIB: JUNIPER-MIB
    (All Juniper devices that use these MIBs can be monitored using OpManager)

    • For Juniper devices, performing a walk on the OID 1.3.6.1.4.1.2636.3.1.15.1.6 gives us a list of all hardware components or 'Field-Replaceable Units' (FRUs) present in the Juniper device(s). OpManager primarily monitors Power, Temperature and Fan speed, and these are the responses for the corresponding FRU types:

      Temperature - 6 | Power - 7 | Fan - 13

    • The instances that respond with these values are noted, and the suffix for the instance can be used to obtain data for that FRU.

      For example, consider an SNMP walk being performed on a Juniper device, on the FruType OID (1.3.6.1.4.1.2636.3.1.15.1.6) and it returns the following response:

      1.3.6.1.4.1.2636.3.1.15.1.6.A → 13
      1.3.6.1.4.1.2636.3.1.15.1.6.B → 6
      1.3.6.1.4.1.2636.3.1.15.1.6.C → 7
      1.3.6.1.4.1.2636.3.1.15.1.6.D → 2
      1.3.6.1.4.1.2636.3.1.15.1.6.E → 6

      Note: The values of A, B, C, D, E can be anywhere from one to four octets, i.e, they can have the value of 'z', 'z.y', 'z.y.x' or 'z.y.x.w'.

       

    • Now we take the instances that returned 6 (or) 7 (or) 13 as the response, and we note down their instance IDs. Here, A, B, C and E are the instances that provided the required responses. Therefore, these are the instances that OpManager should be able to query to perform hardware monitoring on that device.

    • Now that we know the instance IDs, we can use them to check if we can query the required parameters from that instance.
      OpManager queries the name, status and value of each instance. So, if you want to perform hardware monitoring on the gives Juniper device, the following OIDs must respond when queried:

      Response for FruType Metric Type Instance ID OID of corresponding metric identifier (OperatingDescr) OID of corresponding metric status (OperatingState) OID of corresponding metric value (OperatingTemp)
      6 Temperature B .1.3.6.1.4.1.2636.3.1.13.1.5.B .1.3.6.1.4.1.2636.3.1.13.1.6.B .1.3.6.1.4.1.2636.3.1.13.1.7.B
      6 Temperature E .1.3.6.1.4.1.2636.3.1.13.1.5.E .1.3.6.1.4.1.2636.3.1.13.1.6.E .1.3.6.1.4.1.2636.3.1.13.1.7.E
      7 Power C .1.3.6.1.4.1.2636.3.1.13.1.5.C .1.3.6.1.4.1.2636.3.1.13.1.6.C NA
      13 Fan A .1.3.6.1.4.1.2636.3.1.13.1.5.A .1.3.6.1.4.1.2636.3.1.13.1.6.A NA
    Note:

    The following are the Hardware sensor status responses for devices from various supported vendors (N/A for VMware Hosts):

    HP: 1 - Unknown | 2 - Clear | 3 - Trouble | 4 - Critical

    Dell: 1 - Unknown | 2 - Unknown | 3 - Clear | 4 - Trouble | 5 - Critical | 6 - Service Down

    Cisco: 1 - Clear | 2 - Trouble | 3 - Critical | 4 - Service Down | 5 - Unknown | 6 - Unknown

    Cisco Nexus: 2 - Clear | 3 - Critical | 4 - Trouble (Any other response is considered as 'Unknown')

    Checkpoint: 1 - Clear | 2 - Trouble | 3 - Critical | 4 - Service Down | 5 - Unknown | 6 - Unknown

    Juniper: 1 - Unknown | 2 - Clear | 3 - Clear | 4 - Clear | 5 - Clear | 6 - Critical | 7 - Attention

7. Check if SNMP is installed:

It is mandatory that SNMP is enabled in the corresponding devices, since OpManager primarily uses SNMP to query device status and metrics. To install SNMP agent in a Linux device, follow this steps.