Schedule demo

Setting up Hadoop Monitoring with Applications Manager


Overview

Hadoop is an open source software framework designed for distributed storage and distributed processing of big data (very large data sets). Hadoop's primary architecture mainly consists of a storage part and a processing part. Hadoop splits files into large blocks and distributes them amongst the nodes in the cluster. The processor part of Hadoop transfers tasks to nodes for processing in parallel, thus taking advantage of data locality (nodes manipulating data they have on hand), for faster and efficient processing.

Applications Manager's Hadoop Monitor offers extensive monitoring support for Hadoop versions 1.x and above. It plays a critical role in overseeing the health and performance of your distributed Hadoop cluster, ensuring its availability, and optimizing Hadoop task processes.

Creating a new Hadoop monitor

Prerequisites for monitoring Hadoop metrics:Click here

Using the REST API to add a new Hadoop monitor:Click here

To create a Hadoop monitor, follow the steps given below:

  1. Click on New Monitor link and select Hadoop under Services.
  2. Enter a Display Name for the monitor.
  3. Choose the Mode of Monitoring. (REST API or JMX)

For REST API mode:

  1. Specify the Version of Hadoop to be monitored.
  2. Enter the host of the NameNode.
  3. Enter the web port of the NameNode.
  4. Choose YES or NO to ensure SSL is enabled or not.
  5. Select your preferred Authentication type:
    • Simple Authentication - Enter a username
    • Kerberos Authentication - Enter the corresponding Key Distribution Center (KDC), Namenode keytab location, and Namenode Service Principal Name
  6. Specify the name of the ResourceManager host.
  7. Specify the name of the ResourceManager web port.
  8. Choose Yes or No to enable or disable SSL.
  9. Again, select your preferred Authentication type:
    • Simple Authentication - Enter a username
    • Kerberos Authentication - Enter the corresponding JobTracker/ResourceManager keytab location and JobTracker/ResourceManager Service Principal Name
  10. Specify the duration for the Polling Interval.
  11. Choose the Monitor Group from the combo box to which you want to associate the Monitor (optional). You can choose multiple groups to associate your monitor.
  12. Click on Add Monitor(s). This discovers the Monitor from the network and starts monitoring them.

For JMX mode:

  1. Specify the Version of Hadoop to be monitored.
  2. Enter the host of the NameNode.
  3. Enter the JMX port of the NameNode.
  4. Select your preferred Authentication type:
    • Simple Authentication - Enter a username
    • Kerberos Authentication - Enter the corresponding Key Distribution Center (KDC), Namenode keytab location, and Namenode Service Principal Name
  5. Enter a JNDIPath for the NameNode .
  6. Specify the name of the ResourceManager host
  7. Specify the name of the ResourceManager JMX port.
  8. Again, select your preferred Authentication type:
    • Simple Authentication - Enter a username
    • Kerberos Authentication - Enter the corresponding JobTracker/ResourceManager keytab location and JobTracker/ResourceManager Service Principal Name
  9. Set a ResourceManager JNDIPath.
  10. Set the duration for the Polling Interval.
  11. Choose the Monitor Group from the combo box to which you want to associate the Monitor (optional). You can choose multiple groups to associate your monitor.
  12. Click on Add Monitor(s). This discovers the Monitor from the network and starts monitoring them.

Note: In case you are unable to add the monitor even after enabling JMX, try providing the below argument:
-Djava.rmi.server.hostname=[YOUR_IP]

Hadoop Server - Monitored Parameters

Go to the Monitors Category View by clicking the Monitors tab. Click on Hadoop under the Services table. Displayed is the Hadoop bulk configuration view distributed into three tabs:

  • Availability tab, gives the availability history for the past 24 hours or 30 days.
  • Performance tab gives the health status and events for the past 24 hours or 30 days.
  • List view enables you to perform bulk admin configurations.

Click on the monitor name to see all the server details listed under the following tabs.

Hadoop Version 1.xHadoop Version 2.x & above

Hadoop 2.x & above

Overview:

SAFEMODE 

Safemode status

Safemode status

Possible values:

-Operational

-Safemode
DFS 
Total DFS Capacity (in GB)Total capacity of the HDFS .
NonDFS Used Space (in GB)Used memory of the HDFS which is not done using DFS commands.
DFS Used Space (in GB)Used memory of the HDFS which is done using DFS commands.
DFS Used (in %)Percentage of HDFS memory used. 
DFS Free Space (in GB)

Free memory of the HDFS.

DFS Free (in %)Percentage of free memory in HDFS.
BLOCKS 
Block CapacityTotal block capacity of Hadoop.
Total BlocksTotal number of blocks in Hadoop.
Missing BlocksNumber of missing blocks in Hadoop.
Corrupt BlocksNumber of corrupt blocks in Hadoop.
Excess BlocksNumber of excess blocks in Hadoop.
UnderReplicated BlocksNumber of under replicated blocks in Hadoop.
Pending Deletion BlocksNumber of pending deletion blocks in Hadoop.
Pending Replication BlocksNumber of pending replication blocks in Hadoop.
FILES 
Total Files and DirectoriesTotal number of file and directories in HDFS.
Files and Directories created per secNumber of files and directories created per sec.
LOAD 
Total LoadTotal load over the Hadoop service.

HDFS:

DataNode Summary 
Live DatanodesNumber of datanode in live state.
Dead DatanodesNumber of datanode in dead state.
Live-Decommissioned DatanodesNumber of datanode in live but decommissioned.
Dead-Decommissioed DatanodesNumber of datanode in dead and decommissioned.
Decommissioning DatanodesNumer of datanode in decommissioned state.
Stale DatanodesNumber of datanode in stale state.
Live Datanode Percent (in %)Percentage of datanode in live state.
Dead Datanode Percent (in %)Percentage of datanode in dead state.
DataNodes 
Node NameName of datanode.
StateCurrent state of the datanode:
  • Live
  • Decommission In Progress
  • Live - Decommissioned
  • Dead - Decommissioned
  • Dead
Total Capacity (in GB)Total capacity of the HDFS.
NonDFS Used (in GB)Amount of memory used in HDFS by non- HDFS commands.
DFS Used (in GB)Amount of memory used in HDFS by HDFS commands.
DFS Used Percent (in %)Percentage of memory used in HDFS by HDFS commands
DFS Free (in GB)Amount of memory free in HDFS.
DFS Free Percent (in GB)Percentage of memory free in HDFS.

YARN:

NodeManger Summary 
Active NodeManagersNumber of nodemanagers in active state.
Decommissioned NodeManagersNumber of nodemanagers in decommissioned state.
Lost NodeManagersNumber of nodemanagers in lost state.
UnHealthy NodeManagersNumber of nodemanagers in unhealthy state.
Rebooted NodeManagersNumber of nodemanagers in rebooted state.
Active NodeManager Percent (in %)Percentage of nodemanager in active state.
Lost NodeManager Percent (in %)Percentage of nodemanager in lost state.
UnHealthy NodeManager Percent (in %)Percentage of nodemanager in unhealthy state.
NodeManager 
HostNameHostname of nodemanager.
RackRack to which this nodemanager belongs.
State

Current state of nodemanager.

  • Running
  • Unhealthy
  • Dead
Memory used (in %)Percentage of main memory used by nodemanager.
VersionVersion of nodemanager.

Applications:

Applications 
Apps SubmittedNumber of applications in submitted state.
Apps CompletedNumber of applications in completed state.
Apps PendingNumber of applications in pending state.
Apps RunningNumber of applications in running state.
Apps FailedNumber of applications in failed state.
Apps KilledNumber of applications in killed state.
Percent Completed (in %)Percentage of completed applications.
Percent Killed (in %)Percentage of killed applications.
Percent Failed (in %)Percentage of failed applications.
Applications stat (in last polling interval) 
Submitted apps countNumber of applications submitted in last polling interval.
Failed apps countNumber of applications failed in last polling interval.
Killed apps countNumber of applications killed in last polling interval.
Completed apps countNumber of applications completed in last polling interval.

Hadoop 1.x

Overview:

SAFEMODE 

Safemode status

Safemode status

Possible values:

-Operational

-Safemode

DFS 
Total DFS Capacity (in GB)Total capacity of the HDFS .
NonDFS Used Space (in GB)Used memory of the HDFS which is not done using DFS commands.
DFS Used Space (in GB)Used memory of the HDFS which is done using DFS commands.
DFS Used (in %)Percentage of HDFS memory used. 
DFS Free Space (in GB)Free memory of the HDFS.
DFS Free (in %)Percentage of free memory in HDFS.
BLOCKS 
Block CapacityTotal block capacity of Hadoop.
Total BlocksTotal number of blocks in Hadoop.
Missing BlocksNumber of missing blocks in Hadoop.
Corrupt BlocksNumber of corrupt blocks in Hadoop.
Excess BlocksNumber of excess blocks in Hadoop.
UnderReplicated BlocksNumber of under replicated blocks in Hadoop.
Pending Deletion BlocksNumber of pending deletion blocks in Hadoop.
Pending Replication BlocksNumber of pending replication blocks in Hadoop.
FILES 
Total Files and DirectoriesTotal number of file and directories in HDFS.
Files and Directories created per secNumber of files and directories created per sec.
LOAD 
Total LoadTotal load over the Hadoop service.

HDFS:

NameNode JVM 
NonHeap Memory CommittedTotal nonheap memory committed for usage currently.
NonHeap Memory UsedCurrently used nonheap memory. 
Heap Memory CommitedTotal heap memory committed for usage currently.
Heap Memory UsedCurrently used heap memory. 
Namenode OS 
Total Physical Memory (in GB) Total RAM of namenode.
Free Physical Memory (in GB)Free RAM of namenode.
Total Swap Space (in GB)Total swap space available in namenode OS.
Free Swap Space (in GB)Free swap space available in namenode OS.
Maximum File Descriptor CountTotal  file descriptor capacity.
Open File Descriptor CountNumber of file descriptor in open state.
Average System LoadAverage load in namenode OS.
DataNodes 
Node NameName of the datanode
StateCurrent state of namenode:
  • Live
  • Dead
  • Decommissioned
Used Space (in GB)Used space in HDFS.

MapReduce:

Tracker Summary 
Total TaskTrackerTotal number of tasktracker.
Alive TasktrackerNumber of tasktracker in alive state.
Blacklisted TaskTrackerNumber of tasktracker in blacklisted state.
Graylisted TaskTrackerNumber of tasktracker in graylisted state.
Total Number of JobsTotal number of job executed in mapreduce.
Slots Summary 
Total Map SlotsTotal map slots capacity in mapreduce.
Used Map SlotsNumber of map slots used currently.
Total Reduce SlotsTotal reduce slots capacity in mapreduce.
Used Reduce SlotsNumber of reduce slots used currently.
TaskTrackers 
TaskTracker NameName of the tasktracker
StateCurrent state of tasktracker:
  • Alive
  • Blacklisted
  • Graylisted
  • Dead
HealthCurrent health state of tasktracker:
  • OK
  • <health error message>
Failure CountNumber of failure in tasktracker.
Queue 
Queue NameName of the queue.
StateCurrent state of queue.
InfoAny error information that is thrown from queue. 

Job:

Jobs Summary

 
Jobs SubmittedNumber of jobs in submitted state.
Jobs PreparingNumber of jobs in preparing state.
Jobs RunningNumber of jobs in running state.
Jobs FailedNumber of jobs in failed state.
Jobs KilledNumber of jobs in killed state.
Jobs CompletedNumber of jobs in completed state.
Completed Percent (in %)Percentage of completed jobs.
Killed Percent (in %)Percentage of killed jobs.
Failed Percent (in %)Percentage of failed jobs.
Jobs Stats (in last pillong interval) 
Submitted jobs countNumber of jobs submitted in last polling interval.
Failed jobs countNumber of jobs failed in last polling interval.
Killed jobs countNumber of jobs killed in last polling interval.
Completed jobs countNumber of jobs completed in last polling interval.

Loved by customers all over the world

"Standout Tool With Extensive Monitoring Capabilities"

It allows us to track crucial metrics such as response times, resource utilization, error rates, and transaction performance. The real-time monitoring alerts promptly notify us of any issues or anomalies, enabling us to take immediate action.

Reviewer Role: Research and Development

carlos-rivero
"I like Applications Manager because it helps us to detect issues present in our servers and SQL databases."
Carlos Rivero

Tech Support Manager, Lexmark

Trusted by over 6000+ businesses globally