Elasticsearch Monitoring


Elasticsearch - An Overview

Elasticsearch is a highly scalable, distributed, open source RESTful search and analytics engine. It is multitenant-capable with an HTTP web interface and schema-free JSON documents. Based on Apache Lucene, Elasticsearch is one of the most popular enterprise search engines today and is capable of solving a growing number of use cases like log analytics, real-time application monitoring, and click stream analytics.

Monitoring Elasticsearch - What we do

Let’s take a look at what you need to see to monitor Elasticsearch, the performance metrics to gather and how you can ensure that your search server is up and operating as expected with Applications Manager's Elasticsearch monitoring:

  • Resource Utilization Details - Applications Manager automatically discovers Elasticsearch servers, monitors memory and CPU and notifies you of changes in resource consumption of thread pool queues.
  • Real-Time Data - You get up-to-the-second insight into cluster runtime metrics, individual cluster nodes, real-time threads and configurations.
  • Cluster and Node Monitoring - Stay on top of your cluster and node health in real-time with fine-grained statistics of performance from Disk I/O Java to Memory usage metrics.
  • Search and Indexing Performance - Gain complete control of your indexes and mappings. Monitor query latency, file system cache usage and request rates and take action if it surpasses a threshold.
  • Fix Performance Problems Faster - Get instant notifications when there are performance issues. Become aware of performance bottlenecks and take quick remedial actions before your end users experience issues.

Creating a new Elasticsearch monitor

Using the REST API to add a new Elasticsearch monitor: Click here

To create an Elasticsearch Monitor, follow the steps given below:

  • Click on New Monitor link. Choose ElasticsearchCluster.
  • Specify the Display Name of the Elasticsearch monitor.
  • Enter the HostName or IP Address of the host where Elasticsearch Cluster runs.
  • Enter the Port of the Elasticsearch Cluster. By default, it will be 9200.
  • Enter the polling interval time in minutes.
  • Click Test Credentials button, if you want to test the access to Elasticsearch server.
  • Choose the Monitor Group from the combo box with which you want to associate Elasticsearch Monitor (optional). You can choose multiple groups to associate your monitor.
  • Click Add Monitor(s). This discovers Elasticsearch from the network and starts monitoring.
Note:
  • Security/Firewall Requirements - The Elastic Search Cluster host and port should be accessible from the machine where Applications Manager is installed.
  • User Privilege - The required user credentials should be provided.

Demo

Monitored Parameters

Go to the Monitors Category View by clicking the Monitors tab. Click on the Elasticsearch and ElasticsearchCluster monitors under the Web Server/Services Table. Displayed is the Elasticsearch or the ElasticsearchCluster bulk configuration view distributed into three tabs:

  • Availability tab displays the Availability history for the past 24 hours or 30 days.
  • Performance tab displays the Health Status and events for the past 24 hours or 30 days.
  • List view enables you to perform bulk admin configurations.

Click on the monitor name to see all the server details listed under the following tabs:

Elasticsearch Cluster

Overview

Parameter Description
Node Details
Node Name The name of the node
Node Type The type of the node (Client or Data or Master-Eligible or Master-Data)
Avg Query Time The first phase of search operation is Query. The time taken to process the query in all shards
Avg Fetch Time The second phase of search operation is Fetch. The time taken to retrieve the query result, only from the shards which have the requested data.
CLUSTER OVERVIEW
Cluster Status The status of the cluster depending on the replicas of the cluster.
Total Nodes The total number of nodes in the cluster.
Total Indices The total number of indices in the cluster.
Total Shards The total number of shards in the cluster.
Total Docs The total number of documents present in the cluster.

Cluster Details

Parameter Description
NODES SPLITUP
Client Node The total number of Client Nodes in the cluster.
Data Node The total number of Data Nodes in the cluster.
Master Node The total number of Master Eligible Nodes in the cluster.
Data-Master Node The total number of Data Nodes, which also acts as Master Eligible Nodes in the cluster.
SHARDS COUNT
Active Shards The number of Active Shards present in the cluster.
Active Primary Shards The number of Primary Shards that are Active in the cluster.
Relocating Shards The number of Relocating Shards present in the cluster.
Initializing Shards The number of Initializing Shards present in the cluster.
Unassigned Shards The number of Unassigned Shards present in the cluster.
Delayed Unassigned Shards The number of Delayed Unassigned Shards present in the cluster.
Total Shards The number of Shards present in the cluster.
Top 20 Pending Tasks by Priority
Insert Order The order of the task in which the pending task is inserted into the queue.
Priority The priority assigned for the particular task.
Source The source for the pending task.
Wait Time by Priority The total waiting time of the pending task in that queue based on priority (in milliseconds).
Top 20 Pending Tasks by Wait Time
Insert Order The order of the task in which the pending task is inserted into the queue.
Priority by Wait Time The priority assigned for the particular task based on Wait Time.
Source The source for the pending task
Wait Time The total waiting time of the pending task in that queue (in milliseconds).

Indices

PARAMETER DESCRIPTION
Indices Overview
Index Name The name of the index representing a collection of documents.
Documents Indicates the number of documents that are available in the particular index.
Indexing Latency Amount of time taken to index a document in the particular index (in millisecond).
Indexing Rate The number of documents that are indexed per second.
Query Latency Amount of time taken to process the query in the particular index (in millisecond).
Query Rate The number of queries that are processed by the index per second.
Fetch Latency Amount of time taken to run the query and retrieve the data in the particular index (in millisecond).
Fetch Rate The number of queries that are run and retrieved data by the index per second.
Current Merges Indicates the number of merges that have occurred in the particular index.
Merge Time Amount of time taken to merge segments in the particular index (in millisecond).
Flush Time Amount of time taken to flush one or more indices to disk (in millisecond).
Refresh Time Amount of time taken to refresh an index (in millisecond).

Configuration

PARAMETER DESCRIPTION
CONFIGURATION DETAILS
Cluster Name The name of the cluster.
Total Nodes The total number of nodes in the cluster.
Master Node Name The name of the Master Node in the cluster.
Master Node Port The port on which the Master node of Elasticsearch runs.
Master Node IP The IP address in which the Master Node runs.
Publish Port The publish port of the cluster.

Elasticsearch

Overview

PARAMETER DESCRIPTION
AVERAGE SYSTEM LOAD
Avg. System Load The average value of the amount of load that is being processed by the system (in the last 1 minute, 5 minutes, and 15 minutes).
CPU UTILIZATION
CPU Utilization Amount of CPU currently being utilized by the node (in %). 
SEARCH TIME
Average Query Time The first phase of search operation is Query. The time taken to process the query in all shards
Average Fetch Time The second phase of search operation is Fetch. The time taken to retrieve the query result, only from the shards which have the requested data.
SEGMENT TIME
Average Merge Time The average time taken for segment merging in a node. (A shard in elasticsearch is a Lucene index, broken down into segments. Segments are, periodically, merged into larger segments to keep the index size at bay and expunge deletes.)
Average Refresh Time The average time spent in refreshing an index. (Refresh time increases with the number of file operations for the Lucene index).
INDEXING TIME
Average Index Time The average time taken to index a document. (Documents are indexed i.e stored and made searchable.)
Average Delete Time The average time taken to delete an existing index.
Indexed Count The number of documents indexed.
Deleted Count The number of deleted documents.
Indexing Rate The number of documents that are indexed per second.
GET TIME
Average Get Time The average time taken to retrieve information about one or more indexes
Existing Count The number of get requests that were present.
Missing Count The number of get requests that were missing.
FLUSH TIME
Average Flush Time The average time taken to flush one or more indices to disk. (The flush process of an index basically frees memory from the index by flushing data to the index storage and clearing the internal transaction log.)
WARMER TIME
Average Warmer Time The average time taken to perform a warmup search on an index. (Index warming allows to run registered search requests to warm up the index before it is available for search.)
PERCOLATE TIME
Average Percolate Time The average time spent running percolator queries. (One of Elasticsearch's core feature is the ability to do search in reverse with the percolator. The percolator automatically indexes the query terms with the percolator queries. This allows the percolator to percolate documents more quickly.)

Memory Details

The total space used in the Direct Buffer pool.

PARAMETER DESCRIPTION
HEAP MEMORY
Used Heap Percent The percentage of JVM heap currently in use.
Free Heap Percent The percentage of JVM heap currently free
NON-HEAP MEMORY
Used Non-Heap Percent The percentage of non-heap memory currently in use.
Free Non-Heap Percent The percentage of non-heap memory currently free.
GARBAGE COLLECTION
GC Time - Young The total time spent on young-generation garbage collections.
GC Time - Old The total time spent on old-generation garbage collections.
GC Count - Young The total number of young-generation garbage collections.
GC Count - Old The total number of old-generation garbage collections.
BUFFER POOLS
Direct Buffer Space Used The total space used in the Direct Buffer pool.
Mapped Buffer Space Used The total space used in the Mapped Buffer pool.
Direct Buffer Connection Count The total connections to Direct Buffer pool.
Mapped Buffer Connection Count The total connections to Mapped Buffer pool.

I/O Details

PARAMETER DESCRIPTION
DISK I/O COUNT
Disk Read Count The number of read ( from the disk) requests by Elasticsearch.
Disk Write Count The number of write ( to the disk) requests by Elasticsearch.
DISK I/O SIZE
Disk Read Size The total size of read requests ( from the disk) by Elasticsearch.
Disk Write Size The total size of write requests ( to the disk) by Elasticsearch.
CACHE DETAILS
Cache Name The name of the cache.
Total Size (MB) The size of the cache.
Evictions The number of evictions from the filter cache.
BREAKER DETAILS
Breaker Name The name of the Circuit Breaker. (Circuit breakers are designed to deal with situations when request processing needs more memory than available. This would mean OOM (OutOfMemoryException). So sometimes it is better to fail a query instead of getting OOM, because when OOM appears JVM becomes not responsive.)
Limit Size (MB) The limit size of the particular Breaker.
Used Size (MB) The used size of the particular Breaker.
Tripped The total number of times the breaker circuit tripped.

Thread Pools

PARAMETER DESCRIPTION
THREAD DETAILS
Thread Name The name of the thread.
Configured Threads The number of threads of current configured type.
Queue The number of thread of current type in queue.
Active The number of active threads of current type.
Rejected The number of rejected threads of current type.
Largest The number of largest threads of current type.

Network

PARAMETER DESCRIPTION
TRANSPORT
Transmitted Bytes The number of bytes sent by the network. (Transport metrics about cluster communication)
Received Bytes The number of bytes received by the network. (Transport metrics about cluster communication)
Transmitted Packets The number of data packets sent by the network. (Transport metrics about cluster communication)
Received Packets The number of data packets received by the network. (Transport metrics about cluster communication)
TCP CONNECTOR
Active Connections The number of active TCP connections.
Passive Connections The number of passive TCP connections.
HTTP CONNECTOR
Current Connections The number of http connections currently active.
Total Connections The total number of http connections.

Configuration

PARAMETER DESCRIPTION
CONFIGURATION DETAILS
Cluster Name The name of the cluster.
Node Name The name of the node in the cluster.
Node Type The type of the node (Client/Data/Master-Eligible/Data-Master).
Host The IP address of the Host.
ElasticSearch Version The version of the installed Elasticsearch.
Port The port in which Elasticsearch runs.
ElasticSearch Home The home directory of Elasticsearch.
Total Processors The total number of processors in the current node
Java Version The version of Java running in the node.
Java Vendor The Java vendor.