Apache Spark Monitoring

Apache Spark- An Overview

Apache Spark is an open source big data processing framework built for speed, with built-in modules for streaming, SQL, machine learning and graph processing. Apache Spark has an advanced DAG execution engine that supports acyclic data flow and in-memory computing. Spark runs on Hadoop, Mesos, standalone, or in the cloud. It can access diverse data sources including HDFS, Cassandra, HBase, and S3. 

Monitoring Apache Spark  - What we do

Let’s take a look at what you need to get real-time operational visibility into Spark applications, the performance metrics to gather and how you can ensure that your search server is up and operating as expected with Applications Manager:

  • Resource Utilization Details - Applications Manager automatically discovers your Spark components and shows key metrics of Apache Spark clusters (master and worker nodes), monitors memory and CPU and notifies you of changes in resource consumption of memory pool.
  • Real-Time Data - Track garbage collection and memory across the cluster on each component, specifically, the executors and the driver. Get useful information about the application and cores.
  • Fix Performance Problems Faster - Get instant notifications when there are performance issues. Become aware of performance bottlenecks and take quick remedial actions before your end users experience issues.

 

Apache Spark - Adding a new monitor

Note:
Uncomment the following lines in the file SPARK_HOME/conf/metrics.properties.template and save it as metrics.properties and restart the Apache Spark instances to collect the metrics:-
master.source.jvm.class=org.apache.spark.metrics.source.JvmSource
worker.source.jvm.class=org.apache.spark.metrics.source.JvmSource
driver.source.jvm.class=org.apache.spark.metrics.source.JvmSource
executor.source.jvm.class=org.apache.spark.metrics.source.JvmSource

You can monitor the Worker Nodes under the given Apache Spark Master by checking the option Discover All Nodes.

 

Steps to create a new monitor

To create an Apache Spark Monitor, follow the steps given below: 

  • Click on New Monitor link. Choose Apache Spark.
  • Specify the Display Name of the Apache Spark monitor.
  • Enter the HostName or IP Address of the host where Apache Spark Master runs.
  • Enter the Port of the Apache Spark Master. By default, it will be 8080.
  • Enter the polling interval time in minutes.
  • Click Test Credentials button, if you want to test the access to Spark server.
  • Choose the Monitor Group from the combo box with which you want to associate Spark Monitor (optional). You can choose multiple groups to associate your monitor.
  • Click Add Monitor(s). This discovers Spark from the network and starts monitoring.

 

Use the AddMonitor API to add an Apache Spark Monitor

Syntax of Rest API for adding monitor:
http://[Host]:[Port]/AppManager/xml/AddMonitor?apikey=[APIKEY]&type=ApacheSparkMaster&displayname=[DISPLAYNAME]&host=[HOST]&port=[PORT]&SSL=[TRUE/FALSE]&DiscoverAllNodes=[YES/NO]

 

Request Parameters:

The parameters involved in the API request are described below. Also, refer the list of common Request Parameters.

FieldDescription
apikey The key generated from the Generate API Key option in the 'Admin' tab.
type The type of the monitor you want to add. Value should be ApacheSparkMaster.
displayname The display name of the Apache Spark monitor.
host The name of the host in which Apache Spark server is running.
port The port number where Apache Spark server is running.
ssl Specfies if SSL is enabled or not. Value can be either true or false.
DiscoverAllNodes Specfies if you wish to discover all worker nodes.

 

Monitored Parameters

Go to the Monitors Category View by clicking the Monitors tab. Click on the Apache Spark Master or Apache Spark Worker monitors under the Web Server/Services Table. Displayed is the Apache Spark bulk configuration view distributed into three tabs:

  • Availability tab  displays the Availability history for the past 24 hours or 30 days.
  • Performance tab displays the Health Status and events for the past 24 hours or 30 days.
  • List view enables you to perform bulk admin configurations.

 

Click on the monitor name to see all the server details listed under the following tabs:

 

Apache Spark Master

Overview

ParameterDescription
NODE DETAILS
Node Name The name of the Apache Spark worker node.
Used Memory  (%) The percentage of total memory that the Spark worker node uses on the machine.
Free Memory (%) The percentage of total free memory on the machine.
MEMORY UTILIZATION
Used Memory The percentage of total memory that the Spark Master node uses on the machine.
Free Memory The percentage of total free memory of the Spark Master node.
Total Memory The total amount of memory to allow Spark applications to use on the machine
Used Memory The total amount of memory used by Spark applications.
MASTER OVERVIEW
Alive Workers The number of alive workers in the Spark cluster. A worker in the ALIVE state can accept applications.
Active Applications The number of active applications  that run on the Spark infrastructure.
Waiting Applications The number of waiting applications.
Completed Applications The number of completed applications.
Used Cores The number of used CPU cores on the Apache Spark Master.

 

Workers

In standalone mode, the workers are processes running on individual nodes that manage resource allocation requests for that node and also monitor the executors.

The number of CPU cores used by the particular Worker node.

ParameterDescription
WORKER DETAILS
Web UI Address The URL of the worker's Web UI. The Web UI  is the web interface of a running Spark application to monitor and inspect Spark job executions in a web browser.
ID The ID of the particular worker node, to uniquely identify them.
Cores Used   
Cores Free The number of free CPU cores, which are unused.
Used Memory  (GB) The total memory used by the Worker Node.
Free Memory  (GB) The total free memory in the Worker node.
Used Memory (%) Percentage of memory used by the Worker node.
Time Since Last Heart Beat  (seconds) The time elapsed since last heart beat (i.e.) The last time when the Worker node contacted the Master Node.
State The current state of the Worker node, say, ALIVE or DEAD.

 

Applications

ParameterDescription
APPLICATION DETAILS
Application Name The name of your application.
ID The application is referenced by its application ID.
User The user associated with the particular application.
Memory Allocated Per Slave  (GB) The amount of memory allocated for each worker.
Running Duration  (min) The total running duration of the application, since it is started.
State The current state of the particular Application, say, WAITING or RUNNING

 

Memory

The maximum heap memory that the Spark can use.

ParameterDescription
HEAP MEMORY
Used Heap The percentage of total used heap memory.
Free Heap The percentage of free heap memory.
Max Heap Size
Init Heap Size The minimum heap memory allocated.
Committed Heap Size The total amount of committed heap memory.
Used Heap Size The total used heap memory.
NON HEAP MEMORY  
Used Non Heap The percentage of total used non-heap memory.
Free Non Heap The percentage of free non-heap memory.
Max Non Heap Size The maximum non-heap memory that the Spark can use.
Initial Non Heap Size The minimum non-heap memory allocated.
Committed Non Heap Size The total amount of committed non-heap memory.
Used Non Heap Size The total used non-heap memory.
JVM  
Used JVM The amount of used JVM memory, in MB.
Free JVM The amount of memory available for the JVM, in MB
Max JVM Size The maximum amount of heap that can be used for memory management in GB.
Initial JVM Size The amount of heap that the Java virtual machine initially requests from the operating system in MB.
Committed JVM Size The total amount of committed JVM memory.
Used JVM Size The total amount of used JVM memory.
MARKSWEEP AND SCAVENGE  
MarkSweep Count The number of times garbage collection have occurred in the Marksweep GC.
MarkSweep Time The time taken for garbage collection that have occurred in the Marksweep GC.
Scavenge Count The number of times garbage collection have occurred in the Scavenge GC.
Scavenge Time The time taken for garbage collection that have occurred in the Scavenge GC.
MEMORY POOL DETAILS  
Memory Pool The memory pool name
Maximum  (MB) The maximum pool memory allocated in MB.
Committed  (MB) The total amount of committed pool memory.
Initial  (MB) The pool memory initially requests from the operating system in MB.
Used  (MB) The total amount of used pool memory.
Utilization  (%) The percentage of used pool memory.

 

RDD Details

ParameterDescription
COMPILATION DETAILS
Compilation Time (Mean) The time it took to compile source code text.
Compilation Count The total number compilations occurred while loading the files.
COMPILATION DETAILS
Generated Class Size (Mean) The size of each method in classes generated.
Generated Method Size (Mean) The size of each method in classes generated.
Source Code Size (Mean) The time it took to compile source code text.
Generated Class Count The number of classes generated.
Generated Method Count The number of methods in classes generated.
Source Code Count The total number of source code files, that were loaded into the node for compilation.
COUNTERS
File Cache Hits The total number of file level cache hits occurred.
Files Discovered The total number of files discovered.
Hive Client Calls The total number of client calls sent to Hive for query processing.
Parallel Listing Job Count The total number of jobs running in parallel.
Partitions Fetched The total number of partitions fetched.

 

Configuration

ParameterDescription
CONFIGURATION DETAILS
Master URL The URL of the master node.
Total Workers The total number of workers provisioned in the cluster.
Available Cores The number of CPU cores to allow Spark applications to use on the machine.
Total Memory Total memory allocated for the Spark Master node.

 

Apache Spark Worker

Overview

ParameterDescription
MEMORY UTILIZATION  
Used Memory Percentage The percentage of total memory that the Spark worker node uses on the machine.
Free Memory Percentage The percentage of total free memory on the machine.
Used Memory The total memory used by the Worker node, from the available memory.
Free Memory The total free memory available for the Worker node.
WORKER OVERVIEW
Active Executors Number of active executors
Finished Executors Number of finished executors (Spark executor exits either on failure or when the associated application has also exited.)
Free Cores The total number of cores free and available for the particular Worker.
Used Cores The total number of cores used by the particular Worker.

 

Executors

ParameterDescription
EXECUTOR DETAILS
Executor ID The unique ID for the particular Executor.
Executor Memory  (GB) The total memory available for the particular Executor.
Application ID The unique ID for the application associated with the Executor.
Application Name The name of the particular Application.
User The user associated with the particular Application.
Memory Allocated Per Slave  (GB) The amount of memory allocated for each worker.

 

Memory

ParameterDescription
HEAP MEMORY
Used Heap The percentage of total used heap memory.
Free Heap The percentage of free heap memory.
Max Heap Size The maximum heap memory that the Spark can use.
Init Heap Size The minimum heap memory allocated.
Committed Heap Size The total amount of committed heap memory.
Used Heap Size The total used heap memory.
NON-HEAP MEMORY
Used Non Heap The percentage of total used non-heap memory.
Free Non Heap The percentage of free non-heap memory.
Max Non Heap Size The maximum non-heap memory that the Spark can use.
Initial Non Heap Size The minimum non-heap memory allocated.
Committed Non Heap Size The total amount of committed non-heap memory.
Used Non Heap Size The total used non-heap memory.
JVM
Used JVM The amount of used JVM memory, in MB.
Free JVM The amount of memory available for the JVM, in MB.
Max JVM Size The maximum amount of heap that can be used for memory management in GB.
Initial JVM Size The amount of heap that the Java virtual machine initially requests from the operating system in MB.
Committed JVM Size The total amount of committed JVM memory.
Used JVM Size The total amount of used JVM memory.
MARKSWEEP AND SCAVENGE
MarkSweep Count The number of times garbage collection have occurred in the Marksweep GC.
MarkSweep Time The time taken for garbage collection that have occurred in the Marksweep GC.
Scavenge Count The number of times garbage collection have occurred in the Scavenge GC.
Scavenge Time The time taken for garbage collection that have occurred in the Scavenge GC.
MEMORY POOL DETAILS
Maximum  (MB) The maximum pool memory allocated in MB.
Initial  (MB) The pool memory initially requests from the operating system in MB.
Committed  (MB) The total amount of committed pool memory.
Used  (MB) The total amount of used pool memory.
Utilization  (%) The percentage of used pool memory.

 

RDD Details

ParameterDescription
COMPILATION DETAILS
Compilation Time (Mean) The time it took to compile source code text.
Compilation Count The total number compilations occurred while loading the files.
COMPILATION DETAILS
Generated Class Size (Mean) The size of the class generated.
Generated Method Size (Mean) The size of each method in classes generated.
Source Code Size (Mean) The size of the compiled source code text.
Generated Class Count The number of classes generated.
Generated Method Count The number of methods in classes generated.
Source Code Count The total number of source code files, that were loaded into the node for compilation.
COUNTERS
File Cache Hits The total number of file level cache hits occurred.
Files Discovered The total number of files discovered.
Hive Client Calls The total number of client calls sent to Hive for query processing.
Parallel Listing Job Count The total number of jobs running in parallel.
Partitions Fetched The total number of partitions fetched.

 

Configuration

ParameterDescription
CONFIGURATION DETAILS
Worker ID The worker is referenced by its worker ID.
Master URL The URL of the master node.
Master Web UI URL The URL of the master node's Web UI.
Total Memory The total memory allocated and available for the particular Worker node.