IPMI-based monitoring with OpManager

IPMI is a standard for an autonomous computer subsystem using which you can monitor the hardware health of your servers irrespective of the status of the device (the host system's CPU/firmware/operating system). Using IPMI, you can comprehensively monitor your devices' hardware health constantly.

Supported vendors and protocols

For now, OpManager supports the following vendor-protocol combinations for IPMI monitoring:

  1. Dell iDrac:
    1. SNMP: using MIB IDRAC-MIB-SMIv2.mib
    2. API: using version 8 or above
  2. HP iLo: Using API
  3. IBM IMM: Using SNMP MIB IMM.mib
  4. Supermicro: Using API

OIDs/APIs used for data collection:

  1. Dell iDrac:
    • SNMP:

      These are the OIDs from the IDRAC-MIB-SMIv2.mib that are used for hardware data collection:

      • Manufacturer - .1.3.6.1.4.1.674.10892.5.1.1.4
      • Model - .1.3.6.1.4.1.674.10892.5.1.3.12
      • ServiceTag - .1.3.6.1.4.1.674.10892.5.1.3.2
      • OS - .1.3.6.1.4.1.674.10892.5.1.3.6
      • OS Version - .1.3.6.1.4.1.674.10892.5.1.3.14
      Category Units Sensor Name  Sensor Status Sensor Value
      Fan RPM .1.3.6.1.4.1.674.10892.5.4.700.12.1.8 .1.3.6.1.4.1.674.10892.5.4.700.12.1.5 .1.3.6.1.4.1.674.10892.5.4.700.12.1.6
      Temperature degree C .1.3.6.1.4.1.674.10892.5.4.700.20.1.8 .1.3.6.1.4.1.674.10892.5.4.700.20.1.5 .1.3.6.1.4.1.674.10892.5.4.700.20.1.6
      Power Watts .1.3.6.1.4.1.674.10892.5.4.600.12.1.8 .1.3.6.1.4.1.674.10892.5.4.600.12.1.5 .1.3.6.1.4.1.674.10892.5.4.600.12.1.6
      Voltage Volts .1.3.6.1.4.1.674.10892.5.4.600.20.1.8 .1.3.6.1.4.1.674.10892.5.4.600.20.1.5 NIL
      Processors MHz .1.3.6.1.4.1.674.10892.5.4.1100.30.1.8 .1.3.6.1.4.1.674.10892.5.4.1100.30.1.5 .1.3.6.1.4.1.674.10892.5.4.1100.30.1.12
      Memory MB .1.3.6.1.4.1.674.10892.5.4.1100.50.1.8 .1.3.6.1.4.1.674.10892.5.4.1100.50.1.5 .1.3.6.1.4.1.674.10892.5.4.1100.50.1.14
      Battery -NA- .1.3.6.1.4.1.674.10892.5.4.600.50.1.7 .1.3.6.1.4.1.674.10892.5.4.600.50.1.5 .1.3.6.1.4.1.674.10892.5.4.600.50.1.6
      Disk array data MB .1.3.6.1.4.1.674.10892.5.5.1.20.130.4.1.55 .1.3.6.1.4.1.674.10892.5.5.1.20.130.4.1.4 .1.3.6.1.4.1.674.10892.5.5.1.20.130.4.1.17
    • API:

      The base API call used for monitoring iDrac devices is:

      /redfish/v1/systems/system.embedded.1/

      Based on the set of sensors that need to be monitored, the last part of the API is modified.

      Category Units API to get sensor details
      Fan, Temperature RPM (fan), degree C (temperature) /redfish/v1/Chassis/System.Embedded.1/Thermal/
      Power, Voltage Watts (power), Volts (Voltage) /redfish/v1/Chassis/System.Embedded.1/Power/
      Processors MHz /redfish/v1/Systems/System.Embedded.1/Processors/
      Memory MB /redfish/v1/Systems/System.Embedded.1/Memory/
      Disk array data MB /redfish/v1/Systems/System.Embedded.1/Storage/
  2. HP iLO (using API):

    IPMI Hardware monitoring in HP iLO devices is performed using the base API call:

    /redfish/v1/systems/1/

    The different APIs used for hardware systems in the HP device is as listed below:

    Category Units APIs to get Sensor Details
    Fan, Temperature Percentage (fan), degree C (temperature) /redfish/v1/Chassis/1/Thermal/
    Power Watts /redfish/v1/Chassis/1/Power/
    Processors MHz /redfish/v1/Systems/1/Processors/
    Memory MB /redfish/v1/Systems/1/Memory/
    Disk array data MB /redfish/v1/Systems/1/Storage/ - To get SSD details
    /redfish/v1/Systems/1/SmartStorage/ArrayControllers/ - To get HDD details
  3. IBM IMM (using SNMP):

    The OIDs used for IPMI-based hardware monitoring under the IMM.mib are as follows:

    1. Model - .1.3.6.1.4.1.2.3.51.3.1.5.2.1.2
    2. Serialnumber - .1.3.6.1.4.1.2.3.51.3.1.5.2.1.3
    3. UUID - .1.3.6.1.4.1.2.3.51.3.1.5.2.1.4
    Category Units Sensor Name Sensor Status Sensor Value
    Fan Percentage .1.3.6.1.4.1.2.3.51.3.1.3.2.1.2 .1.3.6.1.4.1.2.3.51.3.1.3.2.1.10 .1.3.6.1.4.1.2.3.51.3.1.3.2.1.3
    Temperature degree C .1.3.6.1.4.1.2.3.51.3.1.1.2.1.2 .1.3.6.1.4.1.2.3.51.3.1.1.2.1.11 .1.3.6.1.4.1.2.3.51.3.1.1.2.1.3
    Power Watts .1.3.6.1.4.1.2.3.51.3.1.11.2.1.2 .1.3.6.1.4.1.2.3.51.3.1.11.2.1.6 NIL
    Voltage Volts .1.3.6.1.4.1.2.3.51.3.1.2.2.1.2 .1.3.6.1.4.1.2.3.51.3.1.2.2.1.11 .1.3.6.1.4.1.2.3.51.3.1.2.2.1.3
    Processors Mhz .1.3.6.1.4.1.2.3.51.3.1.5.20.1.2 .1.3.6.1.4.1.2.3.51.3.1.5.20.1.11 .1.3.6.1.4.1.2.3.51.3.1.5.20.1.3
    Memory MB .1.3.6.1.4.1.2.3.51.3.1.5.21.1.2 .1.3.6.1.4.1.2.3.51.3.1.5.21.1.8 .1.3.6.1.4.1.2.3.51.3.1.5.21.1.7
    Disk array data MB .1.3.6.1.4.1.2.3.51.3.1.12.2.1.2 .1.3.6.1.4.1.2.3.51.3.1.12.2.1.3 NIL
  4. SuperMicro (Using API):

    SuperMicro devices are monitored by using the base call API: /redfish/v1/Systems/1

    Based on the set of sensors that need to be monitored, the last part of the API is modified as follows:

    Category Units APIs to get sensor details
    Fan, Temperature rpm(fan), degree C(temperature) /redfish/v1/Chassis/1/Thermal
    Power, Voltage Watts(power), Volts(Voltage) /redfish/v1/Chassis/1/Power
    Processors Watts /redfish/v1/Systems/1/Processors
    Memory MB /redfish/v1/Systems/1/Memory (For higher versions)
    /redfish/v1/Systems/1 (For lower versions)
    Disk array data MB /redfish/v1/Systems/1/SimpleStorage

Alerts based on hardware status codes

Below are the status codes for each vendor-protocol combination, and what criticality of alerts would be triggered based on those codes:

  1. Dell iDrac (SNMP)

    Disk array data:

    • Unknown - 1, 4, 8
    • Clear - 2, 3, 10
    • Attention - 5, 9
    • Trouble - 6
    • Critical - 7

    Other sensors:

    • Unknown - 1, 2
    • Clear - 3, 4
    • Critical - 5
    • Trouble - 6
  2. Dell iDrac (API)
    • Unknown - if the status string contains "null"
    • Clear - if the status string contains "OK"
    • Critical - if the status string contains "Critical"
    • Attention - if the status string contains "Warning"
  3. IBM IMM (SNMP)

    Since status messages are passed as strings in IMM, only the following criticalities can be raised in the related alerts:

    • Unknown - if the status string contains "Unknown"
    • Clear - if the status string contains "Normal"
  4. HP iLO (API)
    • Unknown - if the status string contains "null"
    • Clear - if the status string contains "OK"
    • Critical - if the status string contains "Critical"
    • Attention - if the status string contains "Warning"
    HP iLO4 (DIMM)

    HP iLO4 is the fourth generation in the iLO series and, below are the status strings that correspond to its respective memory status.

    • Unknown - if the status string contains "Other"
    • Clear - if the status string contains any of the following:
      • "GoodInUse"
      • "AddedButUnused"
      • "GoodPartiallyInUse"
      • "PresentSpare"
      • "PresentUnused"
      • "UpgradedButUnused"
    • Warning- If the status string contains any of the following:
      • "ConfigurationError"
      • "NotPresent"
      • "NotSupported"
      • "DoesNotMatch"
    • Attention - if the status string contains any of the following:
      • "Degraded"
      • "ExpectedButMissing"
  5. SuperMicro
    • Unknown - if the status string contains "null"
    • Clear - if the status string contains "OK"
    • Critical - if the status string contains "Critical"
    • Attention - if the status string contains "Warning"