Apache Kafka Monitoring

Apache Kafka Monitoring

Apache Kafka is an open-sourced, fault-tolerant publish-subscribe-based messaging system developed by LinkedIn. A distributed log-service, Kafka is often used in place of traditional message brokers because of its higher throughput, scalability, reliability and replication.

Kafka's cluster-centric design offers strong durability and fault-tolerance. Since Kafka is a distributed system, topics are partitioned and replicated across multiple nodes; it has an incredible depth of technical detail when you dig deeper. With meaningful performance monitoring and prompt alerting of issues, Kafka can be a highly attractive option for data integration. Applications Manager collects all performance metrics that can help when troubleshooting Kafka issues and alerts you on those that require corrective action.

Track System Resource Utilization

Automatically discover Kafka servers, and track the resource utilization details like memory, CPU and disk growth over time to ensure you don't run out of resources. Make sure your Apache Kafka server is up and continuously operating as expected. Get notified quickly whenever there are sudden surges in resource consumption or unusual patterns.

Kafka Memory Utilization

Keep tabs on threads and JVM usage

Because Kafka runs in the Java Virtual Machine (JVM), it relies on Java garbage collection processes to free up memory. The more activity in your Kafka cluster, the more often the garbage collection will run. Track JVM heap sizes and ensure that started threads don’t overload the server's memory. Track thread usage with metrics like Daemon, Peak and Live Thread Count to prevent performance bottlenecks in your system.

Kafka Thread Usage

Understand Broker, Controller and Replication Statistics

In a Kafka cluster, one of the brokers serves as the controller, which is responsible for managing the states of partitions and replicas and for performing administrative tasks like reassigning partitions. Monitor active controllers to see which broker was the controller when an issue occurred and offline partitions count to prevent service interruptions. Monitor the broker's Log flush latency - the longer it takes to flush log to disk, the more the pipeline backs up. Track under-replicated partitions to know if replication is going as fast as configured.

Kafka Replication Statistics


Monitor Network and Topic Details

Get the full picture of network usage on your host, track network throughput or aggregate incoming and outgoing byte rate on your broker topics for more information as to where potential bottlenecks lie. Make informed decisions like whether or not you should enable end-to-end compression of your messages.

Kafka Network Usage

Fix performance problems faster

Get instant notifications when there are performance issues with the components of Apache Kafka. Become aware of performance bottlenecks and find out which application is causing the excessive load. Take quick remedial actions before your end users experience issues.