Hadoop Monitoring


Overview

Hadoop is an open source software framework designed for distributed storage and distributed processing of big data (very large data sets). Hadoop's primary architecture mainly consists of a storage part and a processing part. Hadoop splits files into large blocks and distributes them amongst the nodes in the cluster. The processor part of Hadoop transfers tasks to nodes for processing in parallel, thus taking advantage of data locality (nodes manipulating data they have on hand), for faster and efficient processing.

Applications Manager's Hadoop Monitor provides monitoring for both versions of Hadoop i.e. Hadoop 1.x and Hadoop 2.x and helps you maintain the overall health of your distributed Hadoop cluster, ensures their availability and processes tasks faster and accurately.

Creating a new Hadoop monitor

Prerequisites for monitoring Hadoop metrics: Click here

Using the REST API to add a new Hadoop monitor: Click here

To create a Hadoop monitor, follow the steps given below:

  1. Click on New Monitor link and select Hadoop under Services.
  2. Enter a Display Name for the monitor.
  3. Choose the Mode of Monitoring. (REST API or JMX)

For REST API mode:

  1. Specify the Version of Hadoop to be monitored.
  2. Specify the host of the NameNode.
  3. Specify the web port of the NameNode.
  4. Choose YES or NO to ensure SSL is enabled or not.
  5. Select Authentication type. If you select Simple Authentication, specify a username.
  6. Specify the name of the ResourceManager host.
  7. Specify the name of the ResourceManager web port.
  8. Again, choose YES or NO to ensure SSL is enabled or not.
  9. Again, select Authentication type. If you select Simple Authentication, specify a username.
  10. Specify a duration for Polling Interval.
  11. Choose the Monitor Group from the combo box to which you want to associate the Monitor(optional). You can choose multiple groups to associate your monitor.
  12. Click Add Monitor(s). This discovers the Monitor from the network and starts monitoring them.

For JMX mode:

  1. Specify the Version of Hadoop to be monitored.
  2. Specify the host of the NameNode.
  3. Specify the JMX port of the NameNode.
  4. Enter a Username and Password.
  5. Enter a JNDIPath for the NameNode.
  6. Specify the name of the ResourceManager host.
  7. Specify the name of the ResourceManager JMX port.
  8. Enter a Username and set a Password.
  9. Set a ResourceManager JNDIPath.
  10. Set the duration for Polling Interval.
  11. Choose the Monitor Group from the combo box to which you want to associate the Monitor(optional). You can choose multiple groups to associate your monitor.
  12. Click Add Monitor(s). This discovers the Monitor from the network and starts monitoring them.
  13. Note:
    In case you are unable to add the monitor even after enabling JMX, try providing the below argument:
     -Djava.rmi.server.hostname=[YOUR_IP]

Hadoop Server - Monitored Parameters

Go to the Monitors Category View by clicking the Monitors tab. Click on Hadoop under the Services table. Displayed is the Hadoop bulk configuration view distributed into three tabs:

  • Availability tab, gives the availability history for the past 24 hours or 30 days.
  • Performance tab gives the health status and events for the past 24 hours or 30 days.
  • List view enables you to perform bulk admin configurations.

Click on the monitor name to see all the server details listed under the following tabs.

Hadoop Version 1.x Hadoop Version 2.x

Hadoop 1.x

Overview:

SAFEMODE  

Safemode status

Safemode status

Possible values:

-Operational

-Safemode

DFS  
Total DFS Capacity (in GB) Total capacity of the HDFS .
NonDFS Used Space (in GB) Used memory of the HDFS which is not done using DFS commands.
DFS Used Space (in GB) Used memory of the HDFS which is done using DFS commands.
DFS Used (in %) Percentage of HDFS memory used. 
DFS Free Space (in GB) Free memory of the HDFS.
DFS Free (in %) Percentage of free memory in HDFS.
BLOCKS  
Block Capacity Total block capacity of Hadoop.
Total Blocks Total number of blocks in Hadoop.
Missing Blocks Number of missing blocks in Hadoop.
Corrupt Blocks Number of corrupt blocks in Hadoop.
Excess Blocks Number of excess blocks in Hadoop.
UnderReplicated Blocks Number of under replicated blocks in Hadoop.
Pending Deletion Blocks Number of pending deletion blocks in Hadoop.
Pending Replication Blocks Number of pending replication blocks in Hadoop.
FILES  
Total Files and Directories Total number of file and directories in HDFS.
Files and Directories created per sec Number of files and directories created per sec.
LOAD  
Total Load Total load over the Hadoop service.

HDFS:

NameNode JVM  
NonHeap Memory Committed Total nonheap memory committed for usage currently.
NonHeap Memory Used Currently used nonheap memory. 
Heap Memory Commited Total heap memory committed for usage currently.
Heap Memory Used Currently used heap memory. 
Namenode OS  
Total Physical Memory (in GB)  Total RAM of namenode.
Free Physical Memory (in GB) Free RAM of namenode.
Total Swap Space (in GB) Total swap space available in namenode OS.
Free Swap Space (in GB) Free swap space available in namenode OS.
Maximum File Descriptor Count Total  file descriptor capacity.
Open File Descriptor Count Number of file descriptor in open state.
Average System Load Average load in namenode OS.
DataNodes  
Node Name Name of the datanode
State Current state of namenode:
  • Live
  • Dead
  • Decommissioned
Used Space (in GB) Used space in HDFS.

MapReduce:

Tracker Summary  
Total TaskTracker Total number of tasktracker.
Alive Tasktracker Number of tasktracker in alive state.
Blacklisted TaskTracker Number of tasktracker in blacklisted state.
Graylisted TaskTracker Number of tasktracker in graylisted state.
Total Number of Jobs Total number of job executed in mapreduce.
Slots Summary  
Total Map Slots Total map slots capacity in mapreduce.
Used Map Slots Number of map slots used currently.
Total Reduce Slots Total reduce slots capacity in mapreduce.
Used Reduce Slots Number of reduce slots used currently.
TaskTrackers  
TaskTracker Name Name of the tasktracker
State Current state of tasktracker:
  • Alive
  • Blacklisted
  • Graylisted
  • Dead
Health Current health state of tasktracker:
  • OK
  • <health error message>
Failure Count Number of failure in tasktracker.
Queue  
Queue Name Name of the queue.
State Current state of queue.
Info Any error information that is thrown from queue. 

Job:

Jobs Summary

 
Jobs Submitted Number of jobs in submitted state.
Jobs Preparing Number of jobs in preparing state.
Jobs Running Number of jobs in running state.
Jobs Failed Number of jobs in failed state.
Jobs Killed Number of jobs in killed state.
Jobs Completed Number of jobs in completed state.
Completed Percent (in %) Percentage of completed jobs.
Killed Percent (in %) Percentage of killed jobs.
Failed Percent (in %) Percentage of failed jobs.
Jobs Stats (in last pillong interval)  
Submitted jobs count Number of jobs submitted in last polling interval.
Failed jobs count Number of jobs failed in last polling interval.
Killed jobs count Number of jobs killed in last polling interval.
Completed jobs count Number of jobs completed in last polling interval.

Hadoop 2.x

Overview:

SAFEMODE  

Safemode status

Safemode status

Possible values:

-Operational

-Safemode
DFS  
Total DFS Capacity (in GB) Total capacity of the HDFS .
NonDFS Used Space (in GB) Used memory of the HDFS which is not done using DFS commands.
DFS Used Space (in GB) Used memory of the HDFS which is done using DFS commands.
DFS Used (in %) Percentage of HDFS memory used. 
DFS Free Space (in GB)

Free memory of the HDFS.

DFS Free (in %) Percentage of free memory in HDFS.
BLOCKS  
Block Capacity Total block capacity of Hadoop.
Total Blocks Total number of blocks in Hadoop.
Missing Blocks Number of missing blocks in Hadoop.
Corrupt Blocks Number of corrupt blocks in Hadoop.
Excess Blocks Number of excess blocks in Hadoop.
UnderReplicated Blocks Number of under replicated blocks in Hadoop.
Pending Deletion Blocks Number of pending deletion blocks in Hadoop.
Pending Replication Blocks Number of pending replication blocks in Hadoop.
FILES  
Total Files and Directories Total number of file and directories in HDFS.
Files and Directories created per sec Number of files and directories created per sec.
LOAD  
Total Load Total load over the Hadoop service.

HDFS:

DataNode Summary  
Live Datanodes Number of datanode in live state.
Dead Datanodes Number of datanode in dead state.
Live-Decommissioned Datanodes Number of datanode in live but decommissioned.
Dead-Decommissioed Datanodes Number of datanode in dead and decommissioned.
Decommissioning Datanodes Numer of datanode in decommissioned state.
Stale Datanodes Number of datanode in stale state.
Live Datanode Percent (in %) Percentage of datanode in live state.
Dead Datanode Percent (in %) Percentage of datanode in dead state.
DataNodes  
Node Name Name of datanode.
State Current state of the datanode:
  • Live
  • Decommission In Progress
  • Live - Decommissioned
  • Dead - Decommissioned
  • Dead
Total Capacity (in GB) Total capacity of the HDFS.
NonDFS Used (in GB) Amount of memory used in HDFS by non- HDFS commands.
DFS Used (in GB) Amount of memory used in HDFS by HDFS commands.
DFS Used Percent (in %) Percentage of memory used in HDFS by HDFS commands
DFS Free (in GB) Amount of memory free in HDFS.
DFS Free Percent (in GB) Percentage of memory free in HDFS.

YARN:

NodeManger Summary  
Active NodeManagers Number of nodemanagers in active state.
Decommissioned NodeManagers Number of nodemanagers in decommissioned state.
Lost NodeManagers Number of nodemanagers in lost state.
UnHealthy NodeManagers Number of nodemanagers in unhealthy state.
Rebooted NodeManagers Number of nodemanagers in rebooted state.
Active NodeManager Percent (in %) Percentage of nodemanager in active state.
Lost NodeManager Percent (in %) Percentage of nodemanager in lost state.
UnHealthy NodeManager Percent (in %) Percentage of nodemanager in unhealthy state.
NodeManager  
HostName Hostname of nodemanager.
Rack Rack to which this nodemanager belongs.
State

Current state of nodemanager.

  • Running
  • Unhealthy
  • Dead
Memory used (in %) Percentage of main memory used by nodemanager.
Version Version of nodemanager.

Applications:

Applications  
Apps Submitted Number of applications in submitted state.
Apps Completed Number of applications in completed state.
Apps Pending Number of applications in pending state.
Apps Running Number of applications in running state.
Apps Failed Number of applications in failed state.
Apps Killed Number of applications in killed state.
Percent Completed (in %) Percentage of completed applications.
Percent Killed (in %) Percentage of killed applications.
Percent Failed (in %) Percentage of failed applications.
Applications stat (in last polling interval)  
Submitted apps count Number of applications submitted in last polling interval.
Failed apps count Number of applications failed in last polling interval.
Killed apps count Number of applications killed in last polling interval.
Completed apps count Number of applications completed in last polling interval.