Apache Kafka Monitoring


Apache Kafka - An Overview

Apache Kafka is an open-sourced, fault-tolerant publish-subscribe-based messaging system developed by LinkedIn. A distributed log-service, Kafka is often used in place of traditional message brokers because of its higher throughput, scalability, reliability and replication.

Monitoring Apache Kafka - What we do

An attractive option for data integration, Apache Kafka is fast and highly scalable. Kafka nodes are created and taken down in an elastic manner; with a single node handling hundreds of read/writes from thousands of clients in real-time. Data streams are split into partitions and spread over different brokers. Although very simple at a high level, Kafka has an incredible depth of technical detail, for which, having a robust Kafka monitoring software is essential to troubleshoot issues and optimize performance.

Applications Manager's Kafka monitoring aims to help administrators collect Kafka metrics, manage clusters and be alerted automatically on potential issues. Let us take a look at what you need to see to monitor Kafka and the performance metrics to gather with Applications Manager Kafka monitor:

  • Resource utilization details - Automatically discover Kafka servers, monitor memory and CPU and get alerts of changes in resource consumption.
  • Thread and JVM usage - Track thread usage with metrics like Daemon, Peak and Live Thread Count. Ensure that started threads don’t overload the server's memory.
  • Broker, Controller and Replication Statistics - Gauge active controllers and see if brokers are up with the number of unavailable partitions. Monitor broker stats like log flush latency (to make sure longer flushes don’t back up the pipeline) and under-replicated partitions (indicating replication is not going as fast as configured).
  • Network and Topic Details - Pinpoint the requests segment causing a slowdown. Keep an eye on network usage on your host so degraded performance is not network-related. Ensure disk throughput does not cause performance bottlenecks, with Broker Topic byte rates metrics.
  • Fix Performance Problems Faster - Get instant notifications when there are performance issues with the components of Apache Kafka. Become aware of performance bottlenecks and take quick remedial actions before your end users experience issues.

Creating a new Kafka monitor

Supported versions: Versions 0.7.0 to 3.4.1

Prerequisites for monitoring Apache Kafka: JMX support must be enabled in order to monitor Apache Kafka in Applications Manager. To learn more about enabling JMX in Kafka, click here.

Using the REST API to create a new Kafka monitor: Click here

To create an Apache Kafka Monitor, follow the steps given below: 

  1. Click on New Monitor link. Choose Apache Kafka.
  2. Enter Display Name of the monitor.
  3. Enter the IP Address or hostname of the host in which Kafka is running.
  4. Enter the JMX Port in the JMX Port field.
  5. Enter the credential details like user name, password and JNDIPath or select credentials from a Credential Manager list.
  6. Enter the polling interval time in minutes.
  7. Click Test Credentials button, if you want to test the access to Apache Kafka Server.
  8. Choose the Monitor Group from the combo box with which you want to associate Apache Kafka Monitor (optional). You can choose multiple groups to associate your monitor.
  9. Click Add Monitor(s). This discovers Apache Kafka from the network and starts monitoring.

Note:
In case you are unable to add the monitor even after enabling JMX, try providing the below argument:
 -Djava.rmi.server.hostname=[YOUR_IP]

Monitored Parameters

Go to the Monitors Category View by clicking the Monitors tab. Click on Apache Kafka under the Middleware/Portal Table. Displayed is the Apache Kafka bulk configuration view distributed into three tabs:

  • Availability tab gives the Availability history for the past 24 hours or 30 days.
  • Performance tab gives the Health Status and events for the past 24 hours or 30 days.
  • List view enables you to perform bulk admin configurations.

Applications Manager's Kafka performance monitoring provides complete visibility into your Kafka servers based on the metrics listed in the following tabs:

Overview

ParameterDescription
Memory Details
Total Physical Memory Size The total amount of physical memory in Megabytes.
Free Physical Memory Size The amount of free physical memory in Megabytes.
Committed Virtual Memory Size The amount of virtual memory that is guaranteed to be available to the running process in Megabytes.
Total Swap Space Size The total size of virtual memory hold by the JVM.
Free Swap Space Size The free virtual memory size.
Thread Details
Daemon Thread Count The number of daemon threads currently running.
Peak Thread Count The peak live thread count since the Java virtual machine started or peak was reset.
Live Thread Count The number of live threads currently running.
Total Started Thread Count The total number of threads created and also started since the Java virtual machine started.
Heap and Non Heap Memory Details
NonHeapMemoryUsage The non-heap memory currently in use.
HeapMemoryUsage The heap memory currently in use.

Controller Details

In a Kafka cluster, one of the brokers serves as the controller, which is responsible for managing the states of partitions and replicas and for performing administrative tasks like reassigning partitions.

ParameterDescription
Kafka Controller Details
Active Controller Count Number of active controllers in the cluster.
Offline Partitions Count The number of unavailable partitions.
Leader Election Rate The rate of leader elections. (When a partition leader dies, an election for a new leader is triggered.)
Unclean Leader Election Rate The rate of Unclean Leader Elections. (Unclean leader elections are caused by the inability to find a qualified partition leader among Kafka brokers. When a broker that is the leader for a partition goes offline, a new leader is elected from the set of ISRs for the partition. An unclean leader election is a special case in which no available replicas are in sync)

Broker Details

ParameterDescription
Log Details
Log Flush Rate The asynchronous disk log flush rate.
Broker Topic Metrics
Bytes In / Min The aggregate incoming byte rate (amount of data written to topic on this broker) per minute.
Bytes Out / Min The aggregate outgoing byte rate per minute.
Bytes Rejected / Min The amount of data in messages rejected by broker per minute.
Failed Fetch Requests / Min The number of data read requests from consumers that brokers failed to process for this topic per minute.
Failed Produce Requests / Min The number of requests from producer that have failed.
Messages In / Min The number of Messages that comes into the Kafka broker.
Replication Manager
IsrExpands / Min The number of "in-sync" replica expansions. (If a broker goes down, ISR for some of the partitions will shrink. When that broker is up again, ISR will be expanded once the replicas are fully caught up).
IsrShrinks / Min The number of "in-sync" replica shrinks. (If a broker goes down, ISR for some of the partitions will shrink. When that broker is up again, ISR will be expanded once the replicas are fully caught up) .
Leader Count The number of partitions for which a particular host is the leader.
Partition Count The number of partitions in the cluster.
Under Replicated Partitions This indicates the number of partitions in the cluster are under-replicated.
Request Handler Avg Idle Percent The average fraction of time the request handler threads are idle.

Network Details

ParameterDescription
Requests Process Rate
Request Produce / Min The number of messages written to topic on this broker.
Request Fetch Consumer / Min The amount of data that the consumers fetched from this topic on this broker.
Request Fetch Follower / Min The requests from brokers that are the followers of a partition to get new data.
Time Taken For Requests
Total Time Produce / Min The total time to serve the specified request.
Total Time Fetch Consumer / Min The total time that the consumers fetched data from this topic on this broker.
Total Time Fetch Follower / Min The total time that is taken by the followers of a partition to get new data
Network Processor Rate
Network Processor Avg Idle Percent / Min The average free capacity of the network processors per minutes.

Topics Details

ParameterDescription
Topic Details
Topic Name Specifies the name of the topic.
Bytes in / Min The aggregate incoming byte rate (amount of data written to topic on this broker) per minute.
Bytes Out / Min The aggregate outgoing byte rate per minute.
Failed Fetch Requests / Min The total number of failed Fetch Requests per minute.
Failed Produce Requests / Min The total number of failed producer requests.
Messages In / Min The number of messages that comes into the Kafka broker.

Configurations

ParameterDescription
Storage Details
Boot Class Path The boot class path that is used by the bootstrap class loader to search for class files.
Class Path The Java class path that is used by the system class loader to search for class files.
Spec Vendor The vendor of the JMX specification implemented by this product.
Spec Version The version of the JMX specification implemented by this product.
VM Name The Java virtual machine name.
VM Vendor The Java virtual machine implementation vendor.