Apache Kafka is an open-source, fault-tolerant distributed event streaming platform developed by LinkedIn. A distributed log service, Kafka is often used in place of traditional message brokers due to its higher throughput, scalability, reliability, and replication. Since Kafka is a distributed system, topics are partitioned and replicated across multiple nodes. Over the years, Kafka has grown considerably in terms of both volume and complexity, and being a crucial component in the IT infrastructure, it's necessary to monitor its operations and performance. Applications Manager's Kafka monitoring tool collects all performance metrics that can help when troubleshooting Kafka issues, and it shows you which ones require corrective action.
Important metrics to look for while performing Kafka monitoring include:
With Applications Manager's Kafka monitoring feature, you can automatically discover Kafka servers and track resource utilization details, such as memory, CPU, and disk growth, over time; this will ensure that you don't run out of resources. Make sure your Kafka server is continuously operating as expected with alerts which are sent out whenever there are sudden surges in resource consumption or unusual patterns.
Because it runs in the Java Virtual Machine (JVM), Kafka relies on Java garbage collection processes to free up memory. The more activity in your Kafka cluster, the more often the garbage collection will run. With Applications Manager's Kafka monitoring tool, it's easy to track JVM heap sizes and ensure that started threads donêt overload the server's memory. You can also track thread usage with metrics like daemon, peak, and live thread count to prevent performance bottlenecks in your system.
In a Kafka cluster, the broker that serves as the controller manages the states of partitions and replicas, in addition to performing administrative tasks like reassigning partitions. With Applications Manager's Kafka monitoring capabilities, you can monitor active controllers to see which broker was the controller when an issue occurred, as well as what the offline partitions count was at the time. You can also monitor the broker's log flush latency; the longer it takes to flush logs to a disk, the more the pipeline backs up. Track under-replicated partitions to see if replication is going as fast as configured.
Get a full picture of the network usage on your host and track network throughputãor aggregate incoming and outgoing byte rate on your broker topicsãto understand where potential bottlenecks lie. Make informed decisions like whether you should enable end-to-end compression of your messages.
With its powerful¾fault management system, Applications Manager procures data on the faults that occur in¾the system, as well as drilled-down data on the origins of those faults.¾This¾speeds up the¾fault analysis and troubleshooting process¾considerably.¾It's easy to configure thresholds for various performance attributes and raise alarms whenever those thresholds are breached.¾
You can also associate actions,¾such as¾email/SMS escalation,¾Windows service action,¾and¾MBean operation,¾with thresholds. These can be performed automatically when thresholds are violated.¾You can also set up¾anomaly profiles with dynamic baselines to investigate gradual performance degradation which might otherwise go unnoticed.
Applications Manager's Kafka monitoring tool provides¾extensive reports¾on all important performance attributes.¾With these reports, you can¾analyze the historical trends of various metrics to make informed decisions. Along with Kafka monitoring, Applications Manager also enables you to predict growth and utilization trends using machine learning techniques,¾which helps you¾during capacity planning.
Apart from Kafka monitoring, Applications Manager also offers monitoring for the following¾middleware servers: