Apache Spark Monitoring

Apache Spark- An Overview

Apache Spark is an open source big data processing framework built for speed, with built-in modules for streaming, SQL, machine learning and graph processing. Apache Spark has an advanced DAG execution engine that supports acyclic data flow and in-memory computing. Spark runs on Hadoop, Mesos, standalone, or in the cloud. It can access diverse data sources including HDFS, Cassandra, HBase, and S3.

Monitoring Apache Spark - What we do

Let’s take a look at what you need to get real-time operational visibility into Spark applications, the performance metrics to gather and how you can ensure that your search server is up and operating as expected with Applications Manager's Apache Spark Monitoring:

Resource Utilization Details - Applications Manager automatically discovers your Spark components and shows key metrics of Apache Spark clusters (master and worker nodes), monitors memory and CPU and notifies you of changes in resource consumption of memory pool.
Real-Time Data - Track garbage collection and memory across the cluster on each component, specifically, the executors and the driver. Get useful information about the application and cores.
Fix Performance Problems Faster - Get instant notifications when there are performance issues. Become aware of performance bottlenecks and take quick remedial actions before your end users experience issues.

Apache Spark - Adding a new monitor

Prerequisites for monitoring Apache Spark metrics:Click here

Using the REST API to add a new Apache Spark monitor:Click here

To create an Apache Spark monitor, follow the steps given below:

Click on New Monitor link. Choose Apache Spark.
Specify the Display Name of the Apache Spark monitor.
Enter the HostName or IP Address of the host where Apache Spark Master runs.
Enter the Port of the Apache Spark Master. By default, it will be 8080.
Enter the polling interval time in minutes.
Click Test Credentials button, if you want to test the access to Spark server.
Choose the Monitor Group from the combo box with which you want to associate Spark Monitor (optional). You can choose multiple groups to associate your monitor.
Click Add Monitor(s). This discovers Spark from the network and starts monitoring.

Note:
Uncomment the following lines in the file SPARK_HOME/conf/metrics.properties.template and save it as metrics.properties and restart the Apache Spark instances to collect the metrics:

master.source.jvm.class=org.apache.spark.metrics.source.JvmSource
worker.source.jvm.class=org.apache.spark.metrics.source.JvmSource
driver.source.jvm.class=org.apache.spark.metrics.source.JvmSource
executor.source.jvm.class=org.apache.spark.metrics.source.JvmSource

You can monitor the Worker Nodes under the given Apache Spark Master by checking the option Discover All Nodes.

Monitored Parameters

Go to the Monitors Category View by clicking the Monitors tab. Click on the Apache Spark Master or Apache Spark Worker monitors under the Web Server/Services Table. Displayed is the Apache Spark bulk configuration view distributed into three tabs:

Availability tab displays the Availability history for the past 24 hours or 30 days.
Performance tab displays the Health Status and events for the past 24 hours or 30 days.
List view tab enables you to perform bulk admin configurations.

Click on the monitor name to see all the server details listed under the following tabs:

Apache Spark Master

Overview

Parameter	Description
NODE DETAILS
Node Name	The name of the Apache Spark worker node.
Used Memory (%)	The percentage of total memory that the Spark worker node uses on the machine.
Free Memory (%)	The percentage of total free memory on the machine.
MEMORY UTILIZATION
Used Memory	The percentage of total memory that the Spark Master node uses on the machine.
Free Memory	The percentage of total free memory of the Spark Master node.
Total Memory	The total amount of memory to allow Spark applications to use on the machine
Used Memory	The total amount of memory used by Spark applications.
MASTER OVERVIEW
Alive Workers	The number of alive workers in the Spark cluster. A worker in the ALIVE state can accept applications.
Active Applications	The number of active applications that run on the Spark infrastructure.
Waiting Applications	The number of waiting applications.
Completed Applications	The number of completed applications.
Used Cores	The number of used CPU cores on the Apache Spark Master.

Workers

In standalone mode, the workers are processes running on individual nodes that manage resource allocation requests for that node and also monitor the executors.

The number of CPU cores used by the particular Worker node.

Parameter	Description
WORKER DETAILS
Web UI Address	The URL of the worker's Web UI. The Web UI is the web interface of a running Spark application to monitor and inspect Spark job executions in a web browser.
ID	The ID of the particular worker node, to uniquely identify them.
Cores Used	The number of CPU cores used
Cores Free	The number of free CPU cores, which are unused.
Used Memory (GB)	The total memory used by the Worker Node.
Free Memory (GB)	The total free memory in the Worker node.
Used Memory (%)	Percentage of memory used by the Worker node.
Time Since Last Heart Beat (seconds)	The time elapsed since last heart beat (i.e.) The last time when the Worker node contacted the Master Node.
State	The current state of the Worker node, say, ALIVE or DEAD.

Applications

Parameter	Description
APPLICATION DETAILS
Application Name	The name of your application.
ID	The application is referenced by its application ID.
User	The user associated with the particular application.
Memory Allocated Per Slave (GB)	The amount of memory allocated for each worker.
Running Duration (min)	The total running duration of the application, since it is started.
State	The current state of the particular Application, say, WAITING or RUNNING

Memory

The maximum heap memory that the Spark can use.

Parameter	Description
HEAP MEMORY
Used Heap	The amount of heap memory used, in percentage.
Free Heap	The amount of heap memory that is free, in percentage.
Max Heap Size	The maximum heap memory that the Spark can use, in MB.
Init Heap Size	The minimum heap memory allocated, in MB.
Committed Heap Size	The total amount of committed heap memory, in MB.
Used Heap Size	The total used heap memory, in MB.
NON HEAP MEMORY
Used Non Heap	The amount of non-heap memory used, in percentage.
Free Non Heap	The amount of non-heap memory that is free, in percentage.
Max Non Heap Size	The maximum non-heap memory that the Spark can use, in MB.
Initial Non Heap Size	The minimum non-heap memory allocated, in MB.
Committed Non Heap Size	The total amount of committed non-heap memory, in MB.
Used Non Heap Size	The total used non-heap memory, in MB.
JVM
Used JVM	The amount of JVM memory used, in percentage.
Free JVM	The amount of JVM memory that is free, in percentage.
Max JVM Size	The maximum amount of heap that can be used for memory management, in GB.
Initial JVM Size	The amount of heap that the Java virtual machine initially requests from the operating system, in MB.
Committed JVM Size	The total amount of committed JVM memory, in MB.
Used JVM Size	The total amount of used JVM memory, in MB.
MARKSWEEP AND SCAVENGE
MarkSweep Count	The number of times garbage collection have occurred in the Marksweep GC.
MarkSweep Time	The time taken for garbage collection that have occurred in the Marksweep GC.
Scavenge Count	The number of times garbage collection have occurred in the Scavenge GC.
Scavenge Time	The time taken for garbage collection that have occurred in the Scavenge GC.
MEMORY POOL DETAILS
Memory Pool	The memory pool name
Maximum (MB)	The maximum pool memory allocated in MB.
Committed (MB)	The total amount of committed pool memory.
Initial (MB)	The pool memory initially requests from the operating system in MB.
Used (MB)	The total amount of used pool memory.
Utilization (%)	The percentage of used pool memory.

RDD Details

Parameter	Description
COMPILATION DETAILS
Compilation Time (Mean)	The time it took to compile source code text.
Compilation Count	The total number compilations occurred while loading the files.
COMPILATION DETAILS
Generated Class Size (Mean)	The size of the class generated.
Generated Method Size (Mean)	The size of each method in classes generated.
Source Code Size (Mean)	The time it took to compile source code text.
Generated Class Count	The number of classes generated.
Generated Method Count	The number of methods in classes generated.
Source Code Count	The total number of source code files, that were loaded into the node for compilation.
COUNTERS
File Cache Hits	The total number of file level cache hits occurred.
Files Discovered	The total number of files discovered.
Hive Client Calls	The total number of client calls sent to Hive for query processing.
Parallel Listing Job Count	The total number of jobs running in parallel.
Partitions Fetched	The total number of partitions fetched.

Configuration

Parameter	Description
CONFIGURATION DETAILS
Master URL	The URL of the master node.
Total Workers	The total number of workers provisioned in the cluster.
Available Cores	The number of CPU cores to allow Spark applications to use on the machine.
Total Memory	Total memory allocated for the Spark Master node.

Apache Spark Worker

Overview

Parameter	Description
MEMORY UTILIZATION
Used Memory Percentage	The percentage of total memory that the Spark worker node uses on the machine.
Free Memory Percentage	The percentage of total free memory on the machine.
Used Memory	The total memory used by the Worker node, from the available memory.
Free Memory	The total free memory available for the Worker node.
WORKER OVERVIEW
Active Executors	Number of active executors
Finished Executors	Number of finished executors (Spark executor exits either on failure or when the associated application has also exited.)
Free Cores	The total number of cores free and available for the particular Worker.
Used Cores	The total number of cores used by the particular Worker.

Executors

Parameter	Description
EXECUTOR DETAILS
Executor ID	The unique ID for the particular Executor.
Executor Memory (GB)	The total memory available for the particular Executor.
Application ID	The unique ID for the application associated with the Executor.
Application Name	The name of the particular Application.
User	The user associated with the particular Application.
Memory Allocated Per Slave (GB)	The amount of memory allocated for each worker.

Memory

Parameter	Description
HEAP MEMORY
Used Heap	The percentage of total used heap memory.
Free Heap	The percentage of free heap memory.
Max Heap Size	The maximum heap memory that the Spark can use, in MB.
Init Heap Size	The minimum heap memory allocated, in MB.
Committed Heap Size	The total amount of committed heap memory, in MB.
Used Heap Size	The total used heap memory, in MB.
NON-HEAP MEMORY
Used Non Heap Memory	The percentage of total used non-heap memory.
Free Non Heap Memory	The percentage of free non-heap memory.
Max Non Heap Size	The maximum non-heap memory that the Spark can use, in MB.
Initial Non Heap Size	The minimum non-heap memory allocated, in MB.
Committed Non Heap Size	The total amount of committed non-heap memory, in MB.
Used Non Heap Size	The total used non-heap memory, in MB.
JVM
Used JVM Memory	The amount of used JVM memory, in percentage.
Free JVM Memory	The amount of memory available for the JVM, in percentage.
Max JVM Size	The maximum amount of heap that can be used for memory management, in GB.
Initial JVM Size	The amount of heap that the Java virtual machine initially requests from the operating system, in MB.
Committed JVM Size	The total amount of committed JVM memory, in MB.
Used JVM Size	The total amount of used JVM memory, in MB.
MARKSWEEP AND SCAVENGE
MarkSweep Count	The number of times garbage collection have occurred in the Marksweep GC.
MarkSweep Time	The time taken for garbage collection that have occurred in the Marksweep GC.
Scavenge Count	The number of times garbage collection have occurred in the Scavenge GC.
Scavenge Time	The time taken for garbage collection that have occurred in the Scavenge GC.
MEMORY POOL DETAILS
Maximum (MB)	The maximum pool memory allocated, in MB.
Initial (MB)	The pool memory initially requests from the operating system, in MB.
Committed (MB)	The total amount of committed pool memory, in MB.
Used (MB)	The total amount of used pool memory, in MB.
Utilization (%)	The percentage of used pool memory.

RDD Details

Parameter	Description
COMPILATION DETAILS
Compilation Time (Mean)	The time it took to compile source code text.
Compilation Count	The total number compilations occurred while loading the files.
COMPILATION DETAILS
Generated Class Size (Mean)	The size of the class generated.
Generated Method Size (Mean)	The size of each method in classes generated.
Source Code Size (Mean)	The size of the compiled source code text.
Generated Class Count	The number of classes generated.
Generated Method Count	The number of methods in classes generated.
Source Code Count	The total number of source code files, that were loaded into the node for compilation.
COUNTERS
File Cache Hits	The total number of file level cache hits occurred.
Files Discovered	The total number of files discovered.
Hive Client Calls	The total number of client calls sent to Hive for query processing.
Parallel Listing Job Count	The total number of jobs running in parallel.
Partitions Fetched	The total number of partitions fetched.

Configuration

Parameter	Description
CONFIGURATION DETAILS
Worker ID	The worker is referenced by its worker ID.
Master URL	The URL of the master node.
Master Web UI URL	The URL of the master node's Web UI.
Total Memory	The total memory allocated and available for the particular Worker node.

Apache Spark Monitoring

Apache Spark- An Overview

Monitoring Apache Spark - What we do

Apache Spark - Adding a new monitor

Monitored Parameters

Apache Spark Master

Overview

Workers

Applications

Memory

RDD Details

Configuration

Apache Spark Worker

Overview

Executors

Memory

RDD Details

Configuration

Loved by customers all over the world

"Standout Tool With Extensive Monitoring Capabilities"

"I like Applications Manager because it helps us to detect issues present in our servers and SQL databases."

Carlos Rivero

Trusted by thousands of leading businesses globally