Prerequisites for Hardware Monitoring

It is essential to monitor the hardware components of various critical devices in your network to ensure continuous service availability and network uptime. OpManager, the advanced hardware monitor solution, supports monitoring the hardware status of the servers and network devices in your environment from vendors such as Cisco, Juniper, HP and Dell. It monitors various important hardware parameters such as voltage, temperature, power, fan speed, processors, etc., via SNMP for your network and server devices (Including HyperV host) and via vSphere for VMware ESX/ ESXi hosts. OpManager offers in-depth server and hardware monitor functionality for your network.

 

Prerequisites for HP/Dell Servers:

HP:

If Hardware Sensor Monitors are not displayed, then please make sure that these tools are installed on that server:

  • HP Insight Server Agents
  • HP Insight Foundation Agents
  • HP Insight Storage Agents

 Dell:

If Hardware Sensor Monitors are not displayed, then please make sure that Dell OpenManage has been installed on that server.

Where are the hardware tabs?

If you find the hardware tabs missing, follow the below steps:

1. If the device is a VMware ESX/ESXi host:

OpManager uses the methods hardwareStatusInfo and numericSensorInfo from VMware API to poll the hardware status and stats of devices in the VMware environment. To make sure hardware monitoring works properly, check whether sensor information are available on MOB by using the following MOB link:

  • In case of ESX discovery:
    • For numericSensorInfo:

      https://<<hostname/IPAddress>>/mob/?moid=ha-host&doPath=runtime.healthSystemRuntime.systemHealthInfo.numericSensorInfo

    • For hardwareStatusInfo (cpuStatusInfo / memoryStatusInfo / storageStatusInfo):

      https://<<hostname/IPAddress>>/mob/?moid=ha-host&doPath=runtime.healthSystemRuntime.hardwareStatusInfo

  • In case of vCenter discovery:

    https://<<vcentrename/IPAdress>>/mob/?

    After logging into the MOB, navigate to the paths given below and check if values are being populated for both the methods:
    • For numericSensorInfo: content → rootFolder → childEntity → hostFolder → childEntity [select appropriate host] → host → runtime → healthSystemRuntime → systemHealthInfo → numericSensorInfo
    • For hardwareStatusInfo: content → rootFolder → childEntity → hostFolder → childEntity [select appropriate host] → host → runtime → healthSystemRuntime → hardwareStatusInfo → cpuStatusInfo (or) memoryStatusInfo (or) storageStatusInfo
    Note that OpManager raises alerts based on the colour value available (alerts are raised if the colour is anything other than "green").

If the sensors are not available, install VMware tools on that host.

2. If the device is HP/Dell/Cisco/Juniper:

Query the below OIDs and check if it responds for all the OIDs if it responds then rediscover the device. If it is not responding, then OpManager won't show the tabs.

  • HP (Only servers):

    OID Parameter
    .1.3.6.1.4.1.232.11.2.2.1.0 Operating System
    .1.3.6.1.4.1.232.11.2.2.2.0 OS Version
    .1.3.6.1.4.1.232.2.2.4.2.0 Model
    .1.3.6.1.4.1.232.2.2.2.6.0 Service tag
    .1.3.6.1.4.1.232.2.2.2.1.0 Serial number
  • Dell:

    OID Parameter
    .1.3.6.1.4.1.674.10892.1.300.10.1.8.1 Manufacturer
    .1.3.6.1.4.1.674.10892.1.300.10.1.9.1 Model
    .1.3.6.1.4.1.674.10892.1.300.10.1.11.1 Service Tag
    .1.3.6.1.4.1.674.10892.1.400.10.1.6.1 Operating System
    .1.3.6.1.4.1.674.10892.1.400.10.1.7.1 OS Version
  • Cisco / HP switches:

    OID Parameter
    .1.3.6.1.2.1.47.1.1.1.1.13.1 Hardware Model
    .1.3.6.1.2.1.47.1.1.1.1.11.1 Serial Number
  • Juniper:

    OID Parameter
    .1.3.6.1.4.1.2636.3.1.2.0 Model
    .1.3.6.1.4.1.2636.3.1.3.0 Serial Number

3. Check whether Hardware monitoring is enabled under Settings → Monitoring → Monitor Settings → Hardware.

4. Check if Hardware monitoring is enabled for the individual devices in the Device snapshot → Hardware tab (Navigate to Inventory -> Devices and then click on a device to open its snapshot page.)

5. Suppress Hardware Alarms:

  1. Check if the hardware alarms for the respective devices have been suppressed in OpManager.
  2. To suppress all the Hardware Alarms for all devices: Go to Settings → Monitoring → Monitor Settings → Hardware tab and click on Suppress Alarms under Hardware section.
  3. You can also go to the Hardware tab in the Device Snapshot page and suppress the hardware alarm for a particular device.

6. Check if Hardware status is not updated:

For OpManager to monitor the hardware of your devices, check if the following OIDs are responding properly.

  • For Cisco devices:

    Supported MIBs: Cisco-envmon-mib | ENTITY-MIB MIB
    (All Cisco devices that use these MIBs can be monitored using OpManager)

    .1.3.6.1.2.1.47.1.1.1.1.13.1 - HW_MODEL
    .1.3.6.1.2.1.47.1.1.1.1.11.1 - HW Serial num

    Metric type OID of corresponding metric name OID of corresponding metric status OID of corresponding metric value
    Temperature .1.3.6.1.4.1.9.9.13.1.3.1.2
    (TemperatureStatusDescr)
    .1.3.6.1.4.1.9.9.13.1.3.1.3
    (TemperatureStatusValue)
    .1.3.6.1.4.1.9.9.13.1.3.1.6
    (TemperatureState)
    Voltage .1.3.6.1.4.1.9.9.13.1.2.1.2
    (VoltageStatusDescr)
    .1.3.6.1.4.1.9.9.13.1.2.1.3
    (VoltageStatusValue)
    .1.3.6.1.4.1.9.9.13.1.2.1.7
    (VoltageState)
    Fan .1.3.6.1.4.1.9.9.13.1.4.1.2
    (FanStatusDescr)
    .1.3.6.1.4.1.9.9.13.1.4.1.3
    (FanState)
    NA
    Power .1.3.6.1.4.1.9.9.13.1.5.1.2
    (SupplyStatusDescr)
    .1.3.6.1.4.1.9.9.13.1.5.1.3
    (SupplyState)
    NA
  • For Cisco Nexus devices:

    Supported MIB: CISCO-ENTITY-FRU-CONTROL-MIB
    (All Cisco Nexus devices that use this MIB can be monitored using OpManager)

    Metric type OID
    Power .1.3.6.1.4.1.9.9.117.1.1.2.1.1
    {FRUPowerAdminStatus)
    .1.3.6.1.4.1.9.9.117.1.1.2.1.2
    (FRUPowerOperStatus)
    .1.3.6.1.4.1.9.9.117.1.1.2.1.3
    (FRUCurrent)
    Fan .1.3.6.1.4.1.9.9.117.1.4.1.1.1
    (FanTrayOperStatus)

    Temperature in Cisco Nexus devices: For temperature, a different MIB (CISCO-ENTITY-SENSOR-MIB.php) is also being used here.

    To check if the temperature sensors are responding properly, follow these steps:

    1. Perform an SNMP walk on the following OID: .1.3.6.1.4.1.9.9.91.1.1.1.1.1 (entPhySensorType)
    2. In the list of responses received, find which OID has responded with "Celsius(8)" and note it down. This is the instance ID of the temperature sensor. For example, consider the OID .1.3.6.1.4.1.9.9.91.1.1.1.1.1.X has responded with "Celsius(8)".
    3. The instance ID Xcan now be used to query temperature-related data from the device:
      1. .1.3.6.1.2.1.47.1.1.1.1.7.X - entPhysicalName (from ENTITY-MIB)
      2. .1.3.6.1.4.1.9.9.91.1.1.1.1.5.X - entSensorStatus (CISCO-ENTITY-SENSOR-MIB.php)
      3. .1.3.6.1.4.1.9.9.91.1.1.1.1.4.X - entSensorValue (CISCO-ENTITY-SENSOR-MIB.php)
    4. Example:
        1. A walk is performed on .1.3.6.1.4.1.9.9.91.1.1.1.1.1 (entPhySensorType).
        2. The OID .1.3.6.1.4.1.9.9.91.1.1.1.1.1.A has responded with "Celsius(8)". Now A is our instance ID.
        3. Now we can use this instance ID to get the corresponding instance's data from the device:


      OID Description MIB being used Obtained response
      .1.3.6.1.2.1.47.1.1.1.1.7.A entPhysicalName ENTITY-MIB module-1 FRONT
      .1.3.6.1.4.1.9.9.91.1.1.1.1.5.A entSensorStatus CISCO-ENTITY-SENSOR-MIB.php ok(1)
      .1.3.6.1.4.1.9.9.91.1.1.1.1.4.A entSensorValue CISCO-ENTITY-SENSOR-MIB.php 37
  • For Cisco ASA devices:

    Supported MIBs: ENTITY-MIB, ENTITY-SENSOR-MIB

    For Cisco ASA devices, OpManager primarily monitors Power, Temperature and Fan speed. The type OID (.1.3.6.1.2.1.99.1.1.1.1) returns the type of the sensor and the following are the responses for the corresponding types:

    Temperature - 8 | Power - 6 | Fan - 10

    The instance ID X received from executing the type OID(.1.3.6.1.2.1.99.1.1.1.1.X) can later be used to get the value of other hardware metrics.

    • .1.3.6.1.2.1.47.1.1.1.1.7.X - Name
    • .1.3.6.1.2.1.99.1.1.1.5.X - Status
    • .1.3.6.1.2.1.99.1.1.1.4.X - Value

    For example, consider a SNMP walk performed on a Cisco ASA device, on the type OID (.1.3.6.1.2.1.99.1.1.1.1) and it returns the following responses:

      .1.3.6.1.2.1.99.1.1.1.1.A → 10
    .1.3.6.1.2.1.99.1.1.1.1.B → 2
    .1.3.6.1.2.1.99.1.1.1.1.C → 6
    .1.3.6.1.2.1.99.1.1.1.1.D → 8
    .1.3.6.1.2.1.99.1.1.1.1.E → 6

    Since OpManager's hardware monitoring supports only Fan, Power and Temperature sensors for Cisco ASA devices, the instance IDs that returned only 6, 8 and 10 as responses are noted. These are the instances that must be queried in order to retrieve the data. OpManager allows you to query the instance IDs to get the name, status and value for each instance. To perform hardware monitoring on the given Cisco ASA device, the following OIDs must respond when queried:

    Response Metric Type Instance ID OID of corresponding metric identifier OID of corresponding metric status OID of corresponding metric value
    6 Power C .1.3.6.1.2.1.47.1.1.1.1.7.C .1.3.6.1.2.1.99.1.1.1.5.C .1.3.6.1.2.1.99.1.1.1.4.C
    6 Power E .1.3.6.1.2.1.47.1.1.1.1.7.E .1.3.6.1.2.1.99.1.1.1.5.E .1.3.6.1.2.1.99.1.1.1.4.E
    8 Temperature D .1.3.6.1.2.1.47.1.1.1.1.7.D .1.3.6.1.2.1.99.1.1.1.5.D .1.3.6.1.2.1.99.1.1.1.4.D
    10 Fan A .1.3.6.1.2.1.47.1.1.1.1.7.A .1.3.6.1.2.1.99.1.1.1.5.A .1.3.6.1.2.1.99.1.1.1.4.A
  • For Cisco ASR/ ISR devices:

    Supported MIBs: ENTITY-MIB, CISCO-ENTITY-SENSOR-MIB

    For Cisco ASR/ ISR devices, OpManager primarily monitors Power, Temperature and Fan speed. The type OID (.1.3.6.1.4.1.9.9.91.1.1.1.1.1) returns the type of the sensor and the following are the responses for the corresponding types:

    Temperature - 8 | Power - 6 | Fan - 10

    The instance ID X received from executing the type OID (.1.3.6.1.4.1.9.9.91.1.1.1.1.1.X) can later be used to get the value of other hardware metrics.

    • .1.3.6.1.2.1.47.1.1.1.1.7.X - Name
    • .1.3.6.1.4.1.9.9.91.1.1.1.1.5.X - Status
    • .1.3.6.1.4.1.9.9.91.1.1.1.1.4.X - Value

    For example, consider a SNMP walk performed on a Cisco ASA device, on the type OID (.1.3.6.1.4.1.9.9.91.1.1.1.1.1) and it returns the following responses:

      .1.3.6.1.4.1.9.9.91.1.1.1.1.1.A → 10
    .1.3.6.1.4.1.9.9.91.1.1.1.1.1.B → 2
    .1.3.6.1.4.1.9.9.91.1.1.1.1.1.C → 6
    .1.3.6.1.4.1.9.9.91.1.1.1.1.1.D → 8
    .1.3.6.1.4.1.9.9.91.1.1.1.1.1.E → 6

    To perform hardware monitoring on the given Cisco ISR/ ASR devices, the following OIDs must respond when queried:

    Response Metric Type Instance ID OID of corresponding metric identifier OID of corresponding metric status OID of corresponding metric value
    6 Power C .1.3.6.1.2.1.47.1.1.1.1.7.C .1.3.6.1.4.1.9.9.91.1.1.1.1.5.C .1.3.6.1.4.1.9.9.91.1.1.1.1.4.C
    6 Power E .1.3.6.1.2.1.47.1.1.1.1.7.E .1.3.6.1.4.1.9.9.91.1.1.1.1.5.E .1.3.6.1.4.1.9.9.91.1.1.1.1.4.E
    8 Temperature D .1.3.6.1.2.1.47.1.1.1.1.7.D .1.3.6.1.4.1.9.9.91.1.1.1.1.5.D .1.3.6.1.4.1.9.9.91.1.1.1.1.4.D
    10 Fan A .1.3.6.1.2.1.47.1.1.1.1.7.A .1.3.6.1.4.1.9.9.91.1.1.1.1.5.A .1.3.6.1.4.1.9.9.91.1.1.1.1.4.A
  • For HP switches:

    Data is retrieved from HP switches by the following two ways:

    Flow - I) Using HP-ICF-CHASSIS MIB

    Cisco devices & HP switches use the same OIDs for static information like serial number and model.

    Supported MIB: HP-ICF-CHASSIS (Only provides us with the state of the sensor and no values. Hence, no graph can be obtained.)

    For HP switches, OpManager primarily monitors Power, Temperature and Fan speed. The type OID (.1.3.6.1.4.1.11.2.14.11.1.2.6.1.2) returns the type of the sensor and the following are the values that will be contained in the responses for the corresponding types:

    "icfTemperatureSensor" or "2.3.7.8.3.3" - Temperature
    "icfPowerSupplySensor"
    or "2.3.7.8.3.1" - Power
    "icfFanSensor"
    or "2.3.7.8.3.2" - Fan 

    The instance ID X received from executing the type OID (.1.3.6.1.4.1.11.2.14.11.1.2.6.1.2.X) can later be used to get the value of other hardware metrics.

    • .1.3.6.1.4.1.11.2.14.11.1.2.6.1.7.X - Name
    • .1.3.6.1.4.1.11.2.14.11.1.2.6.1.4.X - Status

    For example, consider a SNMP walk performed on a HP Switch, on the type OID (.1.3.6.1.4.1.11.2.14.11.1.2.6.1.2) and responses contain the following values:

    .1.3.6.1.4.1.11.2.14.11.1.2.6.1.2.A ->.1.3.6.1.4.1.11.2.3.7.8.3.2
     .1.3.6.1.4.1.11.2.14.11.1.2.6.1.2.B ->.1.3.6.1.4.1.11.2.3.7.8.3.3
    .1.3.6.1.4.1.11.2.14.11.1.2.6.1.2.C ->.1.3.6.1.4.1.11.2.3.7.8.3.1

    The instance IDs that returned the above responses are noted and queried in order to retrieve the data. OpManager allows you to query the instance IDs to get the name and status for each instance. To perform hardware monitoring on the given HP switch, the following OIDs must respond when queried:

    Response Contains Metric Type Instance ID OID of corresponding metric identifier OID of corresponding metric status
    "icfPowerSupplySensor" or "2.3.7.8.3.1" Power C .1.3.6.1.4.1.11.2.14.11.1.2.6.1.7.C .1.3.6.1.4.1.11.2.14.11.1.2.6.1.4.C
    "icfTemperatureSensor" or "2.3.7.8.3.3" Temperature B .1.3.6.1.4.1.11.2.14.11.1.2.6.1.7.B .1.3.6.1.4.1.11.2.14.11.1.2.6.1.4.B
    "icfFanSensor" or "2.3.7.8.3.2" Fan A .1.3.6.1.4.1.11.2.14.11.1.2.6.1.7.A .1.3.6.1.4.1.11.2.14.11.1.2.6.1.4.A

    Flow - II) Using OIDs from different MIBs for each sensor category:

    Supported MIBs: FAN-MIB, HP-ICF-CHASSIS-MIB(hpSystemAirTempEntry tree), POWERSUPPLY-MIB

    Metric Type OID of corresponding metric name OID of corresponding metric status OID of corresponding metric value
    Power .1.3.6.1.4.1.11.2.14.11.5.1.55.1.1.1.5 .1.3.6.1.4.1.11.2.14.11.5.1.55.1.1.1.2 .1.3.6.1.4.1.11.2.14.11.5.1.55.1.1.1.6
    Temperature .1.3.6.1.4.1.11.2.14.11.1.2.8.1.1.2 .1.3.6.1.4.1.11.2.14.11.1.2.8.1.1.6 .1.3.6.1.4.1.11.2.14.11.1.2.8.1.1.3
    Fan .1.3.6.1.4.1.11.2.14.11.5.1.54.2.1.1.3 .1.3.6.1.4.1.11.2.14.11.5.1.54.2.1.1.4 NA
  • For Checkpoint devices:

    Supported MIBs: CHECKPOINT-MIB
    (All Checkpoint devices that use these MIBs can be monitored using OpManager)

    Metric type OID of corresponding metric name OID of corresponding metric status OID of corresponding metric value
    Voltage .1.3.6.1.4.1.2620.1.6.7.8.3.1.2
    (voltageSensorName)
    1.3.6.1.4.1.2620.1.6.7.8.3.1.6
    (voltageSensorStatus)
    .1.3.6.1.4.1.2620.1.6.7.8.3.1.3
    (voltageSensorValue)
    Fan .1.3.6.1.4.1.2620.1.6.7.8.2.1.2
    (fanSpeedSensorName )
    1.3.6.1.4.1.2620.1.6.7.8.2.1.6
    (fanSpeedSensorStatus)
    1.3.6.1.4.1.2620.1.6.7.8.2.1.3
    (fanSpeedSensorValue)
    Temperature .1.3.6.1.4.1.2620.1.6.7.8.1.1.2
    (tempertureSensorName)
    .1.3.6.1.4.1.2620.1.6.7.8.1.1.6
    (tempertureSensorStatus )
    1.3.6.1.4.1.2620.1.6.7.8.1.1.3
    (tempertureSensorValue)
  • For HP servers:

    Supported MIBs: CPQHOST-Mib | CPQHLTH-Mib | CPQSINFO-Mib
    (All HP servers that use these MIBs can be monitored using OpManager)

    Metric type OID of corresponding metric name OID of corresponding metric status OID of corresponding metric value
    Temperature .1.3.6.1.4.1.232.6.2.6.8.1.8
    (TemperatureHwLocation)
    (or)
    .1.3.6.1.4.1.232.6.2.6.8.1.3
    (TemperatureLocale)
    .1.3.6.1.4.1.232.6.2.6.8.1.6 .1.3.6.1.4.1.232.6.2.6.8.1.4
    Fan .1.3.6.1.4.1.232.6.2.6.7.1.11
    (FanHwLocation)
    (or)
    .1.3.6.1.4.1.232.6.2.6.7.1.3
    (FanLocale)
    .1.3.6.1.4.1.232.6.2.6.7.1.9
    (FanCondition)
    .1.3.6.1.4.1.232.6.2.6.7.1.12
    (FanCurrentSpeed)
    Processors .1.3.6.1.4.1.232.1.2.2.1.1.3
    (CpuName)
    .1.3.6.1.4.1.232.1.2.2.1.1.6
    CpuStatus)
    .1.3.6.1.4.1.232.1.2.2.1.1.4
    (CpuSpeed)
    Power .1.3.6.1.4.1.232.6.2.9.3.1.11
    (PowerSupplySerialNumber)
    .1.3.6.1.4.1.232.6.2.9.3.1.4
    (PowerSupplyCondition)
    .1.3.6.1.4.1.232.6.2.9.3.1.8
    (PowerSupplyCapacityMaximum)
    Partition details .1.3.6.1.4.1.232.11.2.4.1.1.2
    (FileSysDesc)
    .1.3.6.1.4.1.232.11.2.4.1.1.8
    (FileSysStatus)
    .1.3.6.1.4.1.232.11.2.4.1.1.5
    FileSysPercentSpaceUsed)
    Memory .1.3.6.1.4.1.232.6.2.14.12.1.3
    (BoardCpuNum)
    .1.3.6.1.4.1.232.6.2.14.12.1.11
    (BoardCondition)
    .1.3.6.1.4.1.232.6.2.14.12.1.9
    (BoardOsMemSize)
  • For Dell servers:

    Supported MIBs: DELL-RAC-Mib | StorageManagement-MIB.mib | MIB-Dell-10892.mib
    (All Dell servers that use these MIBs can be monitored using OpManager)

    Metric type OID of corresponding metric name OID of corresponding metric status OID of corresponding metric value
    Temperature .1.3.6.1.4.1.674.10892.1.700.20.1.8
    (ProbeLocationName)
    .1.3.6.1.4.1.674.10892.1.700.20.1.5
    (ProbeStatus)
    .1.3.6.1.4.1.674.10892.1.700.20.1.6
    (ProbeReading)
    Fan .1.3.6.1.4.1.674.10892.1.700.12.1.8
    (DeviceLocationName)
    .1.3.6.1.4.1.674.10892.1.700.12.1.5
    (DeviceStatus)
    .1.3.6.1.4.1.674.10892.1.700.12.1.6
    (DeviceReading)
    Processors .1.3.6.1.4.1.674.10892.1.1100.30.1.23
    (DeviceBrandName)
    .1.3.6.1.4.1.674.10892.1.1100.30.1.5
    (DeviceStatus)
    .1.3.6.1.4.1.674.10892.1.1100.30.1.11
    (DeviceMaximumSpeed)
    Power .1.3.6.1.4.1.674.10892.1.600.60.1.6
    (EntityName)
    .1.3.6.1.4.1.674.10892.1.600.60.1.5
    (Status)
    .1.3.6.1.4.1.674.10892.1.600.60.1.9
    (PeakWatts)
    Voltage .1.3.6.1.4.1.674.10892.1.600.20.1.8
    (ProbeLocationName)
    .1.3.6.1.4.1.674.10892.1.600.20.1.5
    (ProbeStatus)
    .1.3.6.1.4.1.674.10892.1.600.20.1.6
    (ProbeReading)
    Disk Array Data .1.3.6.1.4.1.674.10893.1.20.130.4.1.2
    (arrayDiskName)
    .1.3.6.1.4.1.674.10893.1.20.130.4.1.4
    (arrayDiskStatus)
    .1.3.6.1.4.1.674.10893.1.20.130.4.1.17
    (arrayDiskUsedSpaceInMB)
    Battery .1.3.6.1.4.1.674.10892.1.600.50.1.7
    (LocationName)
    .1.3.6.1.4.1.674.10892.1.600.50.1.5
    (Status)
    .1.3.6.1.4.1.674.10892.1.600.50.1.4
    (StateSettings)
  • For Juniper devices:

    Supported MIB: JUNIPER-MIB
    (All Juniper devices that use these MIBs can be monitored using OpManager)

    • For Juniper devices, performing a walk on the OID 1.3.6.1.4.1.2636.3.1.15.1.6 gives us a list of all hardware components or 'Field-Replaceable Units' (FRUs) present in the Juniper device(s). OpManager primarily monitors Power, Temperature and Fan speed, and these are the responses for the corresponding FRU types:

      Temperature - 6 | Power - 7 | Fan - 13

    • The instances that respond with these values are noted, and the suffix for the instance can be used to obtain data for that FRU.

      For example, consider an SNMP walk being performed on a Juniper device, on the FruType OID (1.3.6.1.4.1.2636.3.1.15.1.6) and it returns the following response:

      1.3.6.1.4.1.2636.3.1.15.1.6.A → 13
      1.3.6.1.4.1.2636.3.1.15.1.6.B → 6
      1.3.6.1.4.1.2636.3.1.15.1.6.C → 7
      1.3.6.1.4.1.2636.3.1.15.1.6.D → 2
      1.3.6.1.4.1.2636.3.1.15.1.6.E → 6

      Note: The values of A, B, C, D, E can be anywhere from one to four octets, i.e, they can have the value of 'z', 'z.y', 'z.y.x' or 'z.y.x.w'.

       

    • Now we take the instances that returned 6 (or) 7 (or) 13 as the response, and we note down their instance IDs. Here, A, B, C and E are the instances that provided the required responses. Therefore, these are the instances that OpManager should be able to query to perform hardware monitoring on that device.

    • Now that we know the instance IDs, we can use them to check if we can query the required parameters from that instance.
      OpManager queries the name, status and value of each instance. So, if you want to perform hardware monitoring on the gives Juniper device, the following OIDs must respond when queried:

      Response for FruType Metric Type Instance ID OID of corresponding metric identifier (OperatingDescr) OID of corresponding metric status (OperatingState) OID of corresponding metric value (OperatingTemp)
      6 Temperature B .1.3.6.1.4.1.2636.3.1.13.1.5.B .1.3.6.1.4.1.2636.3.1.13.1.6.B .1.3.6.1.4.1.2636.3.1.13.1.7.B
      6 Temperature E .1.3.6.1.4.1.2636.3.1.13.1.5.E .1.3.6.1.4.1.2636.3.1.13.1.6.E .1.3.6.1.4.1.2636.3.1.13.1.7.E
      7 Power C .1.3.6.1.4.1.2636.3.1.13.1.5.C .1.3.6.1.4.1.2636.3.1.13.1.6.C NA
      13 Fan A .1.3.6.1.4.1.2636.3.1.13.1.5.A .1.3.6.1.4.1.2636.3.1.13.1.6.A NA
  • For Supermicro devices (supported from OpManager v12.5.216):

    Supported MIB: SUPERMICRO-SSM-MIB

    Prerequisite: Supermicro's Superdoctor agent has to be installed to monitor hardware metrics through OpManager.

    Hardware Manufacturer - .1.3.6.1.4.1.10876.100.1.6.1.10.1
    OS - .1.3.6.1.4.1.10876.100.1.7.1.6.1
    OS Version - .1.3.6.1.4.1.10876.100.1.7.1.7.1

      • For Supermicro devices, the process is similar to the one mentioned above for Juniper devices.
      • Initially, an SNMP walk has to be performed on this OID: .1.3.6.1.4.1.10876.2.1.1.1.1.3. The OIDs that provide either of these responses are noted down:

        0 - Fan | 1 - Voltage | 2 - Temperature | 8 - Power

      • The instance ID X from the OID that provided any of these responses (.1.3.6.1.4.1.10876.2.1.1.1.1.3.X) can then be used to get the values of that hardware metric.
        • .1.3.6.1.4.1.10876.2.1.1.1.1.2.X - smHealthMonitorName - Name
        • .1.3.6.1.4.1.10876.2.1.1.1.1.4.X - smHealthMonitorReading - Value
        • .1.3.6.1.4.1.10876.2.1.1.1.1.10.X - smHealthMonitorMonitor - Status
        • .1.3.6.1.4.1.10876.2.1.1.1.1.5.X - smHealthMonitorHighLimit - Max threshold
        • .1.3.6.1.4.1.10876.2.1.1.1.1.6.X - smHealthMonitorLowLimit - Min threshold
    EXAMPLE:
    • Consider an SNMP walk being performed on the smHealthMonitorType OID (.1.3.6.1.4.1.10876.2.1.1.1.1.3.). The following responses are received:
      • .1.3.6.1.4.1.10876.2.1.1.1.1.3.A → 0
      • .1.3.6.1.4.1.10876.2.1.1.1.1.3.B → 8
      • .1.3.6.1.4.1.10876.2.1.1.1.1.3.C → 7
      • .1.3.6.1.4.1.10876.2.1.1.1.1.3.D → 2
      • .1.3.6.1.4.1.10876.2.1.1.1.1.3.E → 1
    • The OIDs that responded with either 0 (Fan), 1 (Voltage), 2 (Temperature) or 8 (Power) are taken, and their instance IDs are noted. In this case, the instances are A (for Fan), B (for Power), D (for Temperature) and E (for Voltage).
    • Now these instance IDs can be used to poll the related information for that sensor from the device.

      Response / Metric type / Instance ID OID of metric name OID of metric value OID of metric status OID of metric's Max threshold OID of metric's Min threshold
      0 / Fan / A .1.3.6.1.4.1.10876.2.1.1.1.1.2.A .1.3.6.1.4.1.10876.2.1.1.1.1.4.A .1.3.6.1.4.1.10876.2.1.1.1.1.10.A .1.3.6.1.4.1.10876.2.1.1.1.1.5.A .1.3.6.1.4.1.10876.2.1.1.1.1.6.A
      8 / Power / B .1.3.6.1.4.1.10876.2.1.1.1.1.2.B .1.3.6.1.4.1.10876.2.1.1.1.1.4.B .1.3.6.1.4.1.10876.2.1.1.1.1.10.B .1.3.6.1.4.1.10876.2.1.1.1.1.5.B .1.3.6.1.4.1.10876.2.1.1.1.1.6.B
      2 / Temp / D .1.3.6.1.4.1.10876.2.1.1.1.1.2.D .1.3.6.1.4.1.10876.2.1.1.1.1.4.D .1.3.6.1.4.1.10876.2.1.1.1.1.10.D .1.3.6.1.4.1.10876.2.1.1.1.1.5.D .1.3.6.1.4.1.10876.2.1.1.1.1.6.D
      1 / Voltage / E .1.3.6.1.4.1.10876.2.1.1.1.1.2.E .1.3.6.1.4.1.10876.2.1.1.1.1.4.E .1.3.6.1.4.1.10876.2.1.1.1.1.10.E .1.3.6.1.4.1.10876.2.1.1.1.1.5.E .1.3.6.1.4.1.10876.2.1.1.1.1.6.E

      For Power and Voltage, we will divide the obtained values by 1000 to show the correct values.
    • The status metric usually responds only with two values - 1 - Manage/Clear status or 2 - Unmanaged/Unknown status, so it is not possible for OpManager to determine if the device is critical. For displaying critical status for devices, OpManageruses the Max Threshold and Min Threshold values to determine if the performance is abnormal. The criteria for threshold violation for different sensor types are as below:
      1. Fan: If the status is 1 (Manage) AND fan sensor value is less than the Minimum Threshold Value, the status will be considered as Critical. For example, if FVis the current value of fan:

        if (smHealthMonitorMonitor == 1 && (FV < smHealthMonitorLowLimit) )
        {
          Status = "Critical"
        }
        else
        {
          Status = "Clear"
        }

      2. Temperature: If the status is 1 (Manage) AND the temperature sensor value is greater than the Maximum Threshold Value, the status will be considered as Critical. For example, if TVis the current value of temperature:

        if (smHealthMonitorMonitor == 1 && (TV > smHealthMonitorHighLimit) )
        {
          Status = "Critical"
        }
        else
        {
          Status = "Clear"
        }

      3. Voltage and power: If the status is 1 (Manage) AND sensor value is less than the minThresholdVal OR greater than the maxThresholdVal, we will consider that as Critical. For example, if PVis the current value of power/voltage:

        if( (smHealthMonitorMonitor == 1) && ((PV < Min threshold value) || (PV > Max threshold value)) )
        {
          Status = "Critical"
        }
        else
        {
          Status = "Clear"
        }

    Note:

    The following are the Hardware sensor status responses for devices from various supported vendors (N/A for VMware Hosts):

    HP: 1 - Unknown | 2 - Clear | 3 - Trouble | 4 - Critical

    Dell: 1 - Unknown | 2 - Unknown | 3 - Clear | 4 - Trouble | 5 - Critical | 6 - Service Down

    Cisco: 1 - Clear | 2 - Trouble | 3 - Critical | 4 - Service Down | 5 - Unknown | 6 - Unknown

    Cisco Nexus: 2 - Clear | 3 - Critical | 4 - Trouble (Any other response is considered as 'Unknown')

    Cisco Nexus (temperature): 1 - Clear | 2 - Attention (unavailable) | 3 - Critical (not operational) | Any other response is considered as 'Unknown'

    Cisco ASA/ ISR/ ASR: 1 - Clear | 2 - Trouble | 3 - Critical

    HP Switches: 

    Flow - I) 1 - Unknown | 2 - Critical | 3 - Attention | 4 - Clear | 5 - Unknown

    Flow - II) a) Fan: 0 - Critical | 1 - Unknown | 2 - Service Down | 4 - Attention | 6 - Trouble | Remaining vaues - Clear

    b) Temperature: 1 - Critical | 2 - Clear

    c) Power: 1 - Unknown | 2 - Unknown | 4 - Critical | 5 - Attention | 6 - Trouble | Remaining values - Clear

    Checkpoint: 1 - Clear | 2 - Trouble | 3 - Critical | 4 - Service Down | 5 - Unknown | 6 - Unknown

    Juniper: 1 - Unknown | 2 - Clear | 3 - Clear | 4 - Clear | 5 - Clear | 6 - Critical | 7 - Attention

    Supermicro: 1 - Manage/Clear | 2 - Unmanaged/Unknown status

7. Check if SNMP is installed:

It is mandatory that SNMP is enabled in the corresponding devices, since OpManager primarily uses SNMP to query device status and metrics. To install SNMP agent in a Linux device, follow this steps.

8. Hardware status alerts:

The hardware status alert responses were previously hard coded and were not customizable by users. With the introduction of HardwareStatus.xml file that is under the path "/conf/opmanager"(relaesed in version 125583), users can now configure all the alert severities for all supported hardware vendors. Based on the vendor, Device type and Sensor Category, status severity can be configured using this XML.

Note: The hardware status alerts are hardcoded and not customisable for VMWare-host hardware.