If monitoring the performance of business critical applications is indisputable, then so is monitoring the status of the hardware on which these applications run. Often, servers develop critical problems without warning, resulting in expensive hardware repairs. Hence, keeping an eye on the status and performance of your servers hardware component is critical, so that server downtimes can be minimized.
Applications Manager proactively monitors details like health of processors, failure of any physical or logical drive, cooling units/fans and current voltage, etc. for hardware servers (HP Proliant and Dell Power Edge) and VMware ESX/ESXi servers, to identify issues caused by a malfunctioning hardware component.
Any kind of degradation in server performance is closely linked to the status of the following parameters.
Power supply rating: Voltages or power ratings outside permissible range can damage electrical components or cause system failure. Monitor the voltage/wattage readings of servers to make sure that they are within the safe operating limit.
CPU fan speed : It is critical to check the working of the fan, to detect overheating, due to prolonged temperature spikes. For instance, if the fan stops working, there will be severe damage to the server components and it will fail.
Temperature : In cases, where the temperature increases beyond the operational range, the processor gets a burn out. Monitor temperatures at server CPU as well as at the system board inlet, to determine if the components are operating at a safe limit.
Disk and Array : View disk details(physical and logical) and ensure data availability by detecting disk failures or file system corruptions before it becomes completely nonrecoverable.
Memory: Get statistics for the type and size of installed memory modules. Detect faulty /improper installation or configuration of memory modules and rectify them.
Processor: View CPU configuration details and ensure proper functioning of CPUs by monitoring the status( speed, core count, etc.) of processor devices.
Configure alarms to receive proactive alerts when the states of the components falls into these categories: failed, error, non-recoverable, warning, degraded, and critical, so that corrective action can be taken quickly.