Schedule demo

Apache Spark Monitoring


Apache Spark- An Overview

Apache Spark is an open source big data processing framework built for speed, with built-in modules for streaming, SQL, machine learning and graph processing. Apache Spark has an advanced DAG execution engine that supports acyclic data flow and in-memory computing. Spark runs on Hadoop, Mesos, standalone, or in the cloud. It can access diverse data sources including HDFS, Cassandra, HBase, and S3. 

Monitoring Apache Spark  - What we do

Let’s take a look at what you need to get real-time operational visibility into Spark applications, the performance metrics to gather and how you can ensure that your search server is up and operating as expected with Applications Manager's Apache Spark Monitoring:

  • Resource Utilization Details - Applications Manager automatically discovers your Spark components and shows key metrics of Apache Spark clusters (master and worker nodes), monitors memory and CPU and notifies you of changes in resource consumption of memory pool.
  • Real-Time Data - Track garbage collection and memory across the cluster on each component, specifically, the executors and the driver. Get useful information about the application and cores.
  • Fix Performance Problems Faster - Get instant notifications when there are performance issues. Become aware of performance bottlenecks and take quick remedial actions before your end users experience issues.

Apache Spark - Adding a new monitor

Prerequisites for monitoring Apache Spark metrics:Click here

Using the REST API to add a new Apache Spark monitor:Click here

To create an Apache Spark monitor, follow the steps given below: 

  1. Click on New Monitor link. Choose Apache Spark.
  2. Specify the Display Name of the Apache Spark monitor.
  3. Enter the HostName or IP Address of the host where Apache Spark Master runs.
  4. Enter the Port of the Apache Spark Master. By default, it will be 8080.
  5. Enter the polling interval time in minutes.
  6. Click Test Credentials button, if you want to test the access to Spark server.
  7. Choose the Monitor Group from the combo box with which you want to associate Spark Monitor (optional). You can choose multiple groups to associate your monitor.
  8. Click Add Monitor(s). This discovers Spark from the network and starts monitoring.
Note:
Uncomment the following lines in the file SPARK_HOME/conf/metrics.properties.template and save it as metrics.properties and restart the Apache Spark instances to collect the metrics:
master.source.jvm.class=org.apache.spark.metrics.source.JvmSource
worker.source.jvm.class=org.apache.spark.metrics.source.JvmSource
driver.source.jvm.class=org.apache.spark.metrics.source.JvmSource
executor.source.jvm.class=org.apache.spark.metrics.source.JvmSource

You can monitor the Worker Nodes under the given Apache Spark Master by checking the option Discover All Nodes.

Monitored Parameters

Go to the Monitors Category View by clicking the Monitors tab. Click on the Apache Spark Master or Apache Spark Worker monitors under the Web Server/Services Table. Displayed is the Apache Spark bulk configuration view distributed into three tabs:

  • Availability tab  displays the Availability history for the past 24 hours or 30 days.
  • Performance tab displays the Health Status and events for the past 24 hours or 30 days.
  • List view tab enables you to perform bulk admin configurations.

Click on the monitor name to see all the server details listed under the following tabs:

Apache Spark Master

Overview

ParameterDescription
NODE DETAILS
Node NameThe name of the Apache Spark worker node.
Used Memory (%)The percentage of total memory that the Spark worker node uses on the machine.
Free Memory (%)The percentage of total free memory on the machine.
MEMORY UTILIZATION
Used MemoryThe percentage of total memory that the Spark Master node uses on the machine.
Free MemoryThe percentage of total free memory of the Spark Master node.
Total MemoryThe total amount of memory to allow Spark applications to use on the machine
Used MemoryThe total amount of memory used by Spark applications.
MASTER OVERVIEW
Alive WorkersThe number of alive workers in the Spark cluster. A worker in the ALIVE state can accept applications.
Active ApplicationsThe number of active applications  that run on the Spark infrastructure.
Waiting ApplicationsThe number of waiting applications.
Completed ApplicationsThe number of completed applications.
Used CoresThe number of used CPU cores on the Apache Spark Master.

Workers

In standalone mode, the workers are processes running on individual nodes that manage resource allocation requests for that node and also monitor the executors.

The number of CPU cores used by the particular Worker node.

ParameterDescription
WORKER DETAILS
Web UI AddressThe URL of the worker's Web UI. The Web UI  is the web interface of a running Spark application to monitor and inspect Spark job executions in a web browser.
IDThe ID of the particular worker node, to uniquely identify them.
Cores UsedThe number of CPU cores used
Cores FreeThe number of free CPU cores, which are unused.
Used Memory  (GB)The total memory used by the Worker Node.
Free Memory  (GB)The total free memory in the Worker node.
Used Memory (%)Percentage of memory used by the Worker node.
Time Since Last Heart Beat  (seconds)The time elapsed since last heart beat (i.e.) The last time when the Worker node contacted the Master Node.
StateThe current state of the Worker node, say, ALIVE or DEAD.

Applications

ParameterDescription
APPLICATION DETAILS
Application NameThe name of your application.
IDThe application is referenced by its application ID.
UserThe user associated with the particular application.
Memory Allocated Per Slave  (GB)The amount of memory allocated for each worker.
Running Duration  (min)The total running duration of the application, since it is started.
StateThe current state of the particular Application, say, WAITING or RUNNING

Memory

The maximum heap memory that the Spark can use.

ParameterDescription
HEAP MEMORY
Used HeapThe amount of heap memory used, in percentage.
Free HeapThe amount of heap memory that is free, in percentage.
Max Heap SizeThe maximum heap memory that the Spark can use, in MB.
Init Heap SizeThe minimum heap memory allocated, in MB.
Committed Heap SizeThe total amount of committed heap memory, in MB.
Used Heap SizeThe total used heap memory, in MB.
NON HEAP MEMORY
Used Non HeapThe amount of non-heap memory used, in percentage.
Free Non HeapThe amount of non-heap memory that is free, in percentage.
Max Non Heap SizeThe maximum non-heap memory that the Spark can use, in MB.
Initial Non Heap SizeThe minimum non-heap memory allocated, in MB.
Committed Non Heap SizeThe total amount of committed non-heap memory, in MB.
Used Non Heap SizeThe total used non-heap memory, in MB.
JVM
Used JVMThe amount of JVM memory used, in percentage.
Free JVMThe amount of JVM memory that is free, in percentage.
Max JVM SizeThe maximum amount of heap that can be used for memory management, in GB.
Initial JVM SizeThe amount of heap that the Java virtual machine initially requests from the operating system, in MB.
Committed JVM SizeThe total amount of committed JVM memory, in MB.
Used JVM SizeThe total amount of used JVM memory, in MB.
MARKSWEEP AND SCAVENGE
MarkSweep CountThe number of times garbage collection have occurred in the Marksweep GC.
MarkSweep TimeThe time taken for garbage collection that have occurred in the Marksweep GC.
Scavenge CountThe number of times garbage collection have occurred in the Scavenge GC.
Scavenge TimeThe time taken for garbage collection that have occurred in the Scavenge GC.
MEMORY POOL DETAILS
Memory PoolThe memory pool name
Maximum  (MB)The maximum pool memory allocated in MB.
Committed  (MB)The total amount of committed pool memory.
Initial  (MB)The pool memory initially requests from the operating system in MB.
Used  (MB)The total amount of used pool memory.
Utilization  (%)The percentage of used pool memory.

RDD Details

ParameterDescription
COMPILATION DETAILS
Compilation Time (Mean)The time it took to compile source code text.
Compilation CountThe total number compilations occurred while loading the files.
COMPILATION DETAILS
Generated Class Size (Mean)The size of the class generated.
Generated Method Size (Mean)The size of each method in classes generated.
Source Code Size (Mean)The time it took to compile source code text.
Generated Class CountThe number of classes generated.
Generated Method CountThe number of methods in classes generated.
Source Code CountThe total number of source code files, that were loaded into the node for compilation.
COUNTERS
File Cache HitsThe total number of file level cache hits occurred.
Files DiscoveredThe total number of files discovered.
Hive Client CallsThe total number of client calls sent to Hive for query processing.
Parallel Listing Job CountThe total number of jobs running in parallel.
Partitions FetchedThe total number of partitions fetched.

Configuration

ParameterDescription
CONFIGURATION DETAILS
Master URLThe URL of the master node.
Total WorkersThe total number of workers provisioned in the cluster.
Available CoresThe number of CPU cores to allow Spark applications to use on the machine.
Total MemoryTotal memory allocated for the Spark Master node.

Apache Spark Worker

Overview

ParameterDescription
MEMORY UTILIZATION
Used Memory PercentageThe percentage of total memory that the Spark worker node uses on the machine.
Free Memory PercentageThe percentage of total free memory on the machine.
Used MemoryThe total memory used by the Worker node, from the available memory.
Free MemoryThe total free memory available for the Worker node.
WORKER OVERVIEW
Active ExecutorsNumber of active executors
Finished ExecutorsNumber of finished executors (Spark executor exits either on failure or when the associated application has also exited.)
Free CoresThe total number of cores free and available for the particular Worker.
Used CoresThe total number of cores used by the particular Worker.

Executors

ParameterDescription
EXECUTOR DETAILS
Executor IDThe unique ID for the particular Executor.
Executor Memory  (GB)The total memory available for the particular Executor.
Application IDThe unique ID for the application associated with the Executor.
Application NameThe name of the particular Application.
UserThe user associated with the particular Application.
Memory Allocated Per Slave  (GB)The amount of memory allocated for each worker.

Memory

ParameterDescription
HEAP MEMORY
Used HeapThe percentage of total used heap memory.
Free HeapThe percentage of free heap memory.
Max Heap SizeThe maximum heap memory that the Spark can use, in MB.
Init Heap SizeThe minimum heap memory allocated, in MB.
Committed Heap SizeThe total amount of committed heap memory, in MB.
Used Heap SizeThe total used heap memory, in MB.
NON-HEAP MEMORY
Used Non Heap MemoryThe percentage of total used non-heap memory.
Free Non Heap MemoryThe percentage of free non-heap memory.
Max Non Heap SizeThe maximum non-heap memory that the Spark can use, in MB.
Initial Non Heap SizeThe minimum non-heap memory allocated, in MB.
Committed Non Heap SizeThe total amount of committed non-heap memory, in MB.
Used Non Heap SizeThe total used non-heap memory, in MB.
JVM
Used JVM MemoryThe amount of used JVM memory, in percentage.
Free JVM MemoryThe amount of memory available for the JVM, in percentage.
Max JVM SizeThe maximum amount of heap that can be used for memory management, in GB.
Initial JVM SizeThe amount of heap that the Java virtual machine initially requests from the operating system, in MB.
Committed JVM SizeThe total amount of committed JVM memory, in MB.
Used JVM SizeThe total amount of used JVM memory, in MB.
MARKSWEEP AND SCAVENGE
MarkSweep CountThe number of times garbage collection have occurred in the Marksweep GC.
MarkSweep TimeThe time taken for garbage collection that have occurred in the Marksweep GC.
Scavenge CountThe number of times garbage collection have occurred in the Scavenge GC.
Scavenge TimeThe time taken for garbage collection that have occurred in the Scavenge GC.
MEMORY POOL DETAILS
Maximum  (MB)The maximum pool memory allocated, in MB.
Initial  (MB)The pool memory initially requests from the operating system, in MB.
Committed  (MB)The total amount of committed pool memory, in MB.
Used  (MB)The total amount of used pool memory, in MB.
Utilization  (%)The percentage of used pool memory.

RDD Details

ParameterDescription
COMPILATION DETAILS
Compilation Time (Mean)The time it took to compile source code text.
Compilation CountThe total number compilations occurred while loading the files.
COMPILATION DETAILS
Generated Class Size (Mean)The size of the class generated.
Generated Method Size (Mean)The size of each method in classes generated.
Source Code Size (Mean)The size of the compiled source code text.
Generated Class CountThe number of classes generated.
Generated Method CountThe number of methods in classes generated.
Source Code CountThe total number of source code files, that were loaded into the node for compilation.
COUNTERS
File Cache HitsThe total number of file level cache hits occurred.
Files DiscoveredThe total number of files discovered.
Hive Client CallsThe total number of client calls sent to Hive for query processing.
Parallel Listing Job CountThe total number of jobs running in parallel.
Partitions FetchedThe total number of partitions fetched.

Configuration

ParameterDescription
CONFIGURATION DETAILS
Worker IDThe worker is referenced by its worker ID.
Master URLThe URL of the master node.
Master Web UI URLThe URL of the master node's Web UI.
Total MemoryThe total memory allocated and available for the particular Worker node.

Loved by customers all over the world

"Standout Tool With Extensive Monitoring Capabilities"

It allows us to track crucial metrics such as response times, resource utilization, error rates, and transaction performance. The real-time monitoring alerts promptly notify us of any issues or anomalies, enabling us to take immediate action.

Reviewer Role: Research and Development

carlos-rivero
"I like Applications Manager because it helps us to detect issues present in our servers and SQL databases."
Carlos Rivero

Tech Support Manager, Lexmark

Trusted by over 6000+ businesses globally