ClickHouse is a high-performance, open-source columnar database management system designed for real-time analytics and large-scale data processing. It is optimized for fast query execution on massive datasets, making it ideal for use cases such as log analysis, business intelligence, and monitoring. With its efficient data compression, distributed architecture, and support for complex analytical queries, ClickHouse enables organizations to process and analyze large volumes of data with high speed and scalability.
Applications Manager provides comprehensive monitoring support for ClickHouse databases by collecting key performance metrics across multiple areas such as server health, query activity, resource utilization, replication, and background operations. These metrics help administrators analyze performance trends, detect anomalies, and ensure optimal database functioning.
ClickHouse monitoring in Applications Manager is supported only through the Prometheus mode of monitoring. Before adding a ClickHouse monitor, you must first configure the Prometheus integration in Applications Manager. Learn how to configure Prometheus integration
Go to the Monitors Category View by clicking the Monitors tab. Click on ClickHouse under the Database table. Displayed is the ClickHouse bulk configuration view distributed into three tabs:
Click on the tabs listed below to view the corresponding metrics monitored:
This tab provides a high-level snapshot of the ClickHouse server's identity, availability, and client connectivity.
| Parameter | Description |
|---|---|
| SERVER SUMMARY | |
| Server Name | The hostname or identifier of the ClickHouse server instance as reported by the server itself. |
| Version | The version number of the ClickHouse server software currently running (e.g., 24.8.1.2684). |
| Database Count | The total number of databases that currently exist on the ClickHouse server, including system databases. |
| Table Count | The total number of tables across all databases on the ClickHouse server. |
| Uptime | The elapsed time since the ClickHouse server process was last started. |
| RESPONSE TIME | |
| Response Time | The round-trip time (in milliseconds) taken by the APM data collector to execute the Prometheus queries and receive a response from the Prometheus server for this ClickHouse instance. |
| CONNECTION SUMMARY | |
| TCP Connections | The number of currently active TCP connections to the ClickHouse native protocol interface (Default port is 9000). |
| HTTP Connections | The number of currently active HTTP connections to the ClickHouse HTTP interface (Default port is 8123). |
| MySQL Connections | The number of currently active connections via the MySQL wire protocol compatibility interface (Default port is 9004). |
| Interserver Connections | The number of currently active connections between ClickHouse nodes for internal cluster communication (Default port is 9009). |
This tab tracks query execution activity, including throughput, currently running operations, failures, and insert data volume.
| Parameter | Description |
|---|---|
| TOTAL QUERIES | |
| Total Queries | The cumulative total number of queries (of all types) that have been executed since the ClickHouse server started. |
| Total Select Queries | The cumulative total number of SELECT queries executed since server startup. |
| Total Insert Queries | The cumulative total number of INSERT queries executed since server startup. |
| CURRENT QUERIES | |
| Current Running Queries | The number of queries currently being executed at the instant of measurement. |
| Current Merges | The number of background merge operations currently in progress. |
| Current Mutations | The number of mutation operations (ALTER TABLE UPDATE/DELETE) currently being processed. |
| FAILED QUERIES | |
| Failed Queries | The cumulative total number of queries (all types) that failed with an error since server startup. |
| Failed Select Queries | The cumulative total number of SELECT queries that failed since server startup. |
| Failed Insert Queries | The cumulative total number of INSERT queries that failed since server startup. |
| Queries Preempted | The number of queries currently waiting in the preemption queue. |
| INSERT THROUGHPUT | |
| Inserted Rows | The cumulative total number of rows that have been successfully inserted into all tables since server startup. |
| Inserted Bytes | The cumulative total volume of data (in gigabytes) that has been inserted into all tables since server startup, measured at the uncompressed level. |
| Delayed Inserts | The number of INSERT queries currently being throttled (delayed) because the target table has too many active data parts. |
This tab provides visibility into the server's memory consumption and disk storage utilization.
| Parameter | Description |
|---|---|
| MEMORY UTILIZATION | |
| Memory Utilization | The percentage of total operating system memory currently in use. Memory Utilization (%) = (Used Memory / Total Memory) × 100 |
| DISK UTILIZATION | |
| Disk Utilization | The percentage of total disk space currently consumed on the default storage disk. Disk Utilization (%) = (Disk Used Space / Disk Total Space) × 100 |
| MEMORY DETAILS | |
| Total Memory | The total amount of physical RAM (in GB) available on the operating system where ClickHouse is running. |
| Used Memory | The amount of physical RAM (in GB) currently in use at the OS level. |
| Free Memory | The amount of physical RAM (in GB) currently available for new allocations at the OS level. |
| Memory Tracking | The amount of memory (in MB) currently allocated and tracked by ClickHouse's internal memory allocator. |
| DISK DETAILS | |
| Disk Total Space | The total capacity (in GB) of the default storage disk configured for ClickHouse. |
| Disk Used Space | The amount of disk space (in GB) currently consumed on the default storage disk. |
| Disk Available Space | The amount of free disk space (in GB) remaining on the default storage disk. |
This tab provides insight into the MergeTree storage engine, which is the core table engine in ClickHouse responsible for data storage, indexing, and background merge operations.
| Parameter | Description |
|---|---|
| MERGETREE SUMMARY | |
| Merge Tree Data Size | The total compressed data size (in GB) stored across all MergeTree-family tables on the server. |
| Merge Tree Total Rows | The total number of rows stored across all MergeTree-family tables on the server. |
| Merge Tree Total Parts | The total number of active data parts across all MergeTree-family tables. |
| Max Part Count For Partition | The highest number of active data parts in any single partition across all MergeTree-family tables. |
| Merged Rows | The cumulative total number of rows processed by background merge operations since server startup. |
| Merged Bytes | The cumulative total volume of data (in GB, uncompressed) processed by background merge operations since server startup. |
| BACKGROUND OPERATIONS | |
| Background Merge Pool Tasks | The number of merge and mutation tasks currently active in the background merge thread pool. |
| Background Merge Pool Size | The configured maximum number of threads in the background merge and mutations thread pool. |
| Background Schedule Pool Tasks | The number of tasks currently active in the background schedule thread pool. |
| Background Schedule Pool Size | The configured maximum number of threads in the background schedule thread pool, determined by the Background Merge Pool Size (Default value is 128). |
This tab monitors ClickHouse's data replication health and ZooKeeper/Keeper coordination activity, which are critical for high-availability cluster deployments.
| Parameter | Description |
|---|---|
| REPLICATION SUMMARY | |
| Readonly Replicas | The number of ReplicatedMergeTree tables currently in read-only mode on this server. |
| Replicas Max Queue Size | The maximum replication queue length across all replicated tables on this server. |
| Replicated Part Fetches | The cumulative number of data part fetch operations performed from other replicas since server startup. |
| Replicated Part Merges | The cumulative number of merge operations performed on replicated tables since server startup. |
| Replicated Data Loss | The cumulative number of data loss events detected in replicated tables since server startup. |
| ZOOKEEPER SUMMARY | |
| ZooKeeper Sessions | The number of active sessions between this ClickHouse server and the ZooKeeper/ClickHouse Keeper ensemble. |
| ZooKeeper Requests | The number of ZooKeeper/Keeper requests currently in flight (pending a response). |
| ZooKeeper Transactions | The cumulative total number of ZooKeeper/Keeper multi-operation transactions executed since server startup. |
This tab covers network throughput, disk I/O, file handle usage, and open file descriptors.
| Parameter | Description |
|---|---|
| NETWORK IO | |
| Network Receive Bytes | The cumulative total volume of data (in MB) received over the network by the ClickHouse server since startup. |
| Network Send Bytes | The cumulative total volume of data (in MB) sent over the network by the ClickHouse server since startup. |
| DISK IO | |
| Disk Read Bytes | The cumulative total volume of data (in GB) read from disk (file descriptors) by the ClickHouse server since startup. |
| Disk Write Bytes | The cumulative total volume of data (in GB) written to disk (file descriptors) by the ClickHouse server since startup. |
| OPEN FILES | |
| Open Files For Read | The number of files currently open for reading by the ClickHouse server process. |
| Open Files For Write | The number of files currently open for writing by the ClickHouse server process. |
This tab monitors ClickHouse's internal thread pool utilization and distributed table activity.
| Parameter | Description |
|---|---|
| THREAD UTILIZATION | |
| Thread Utilization | The percentage of ClickHouse's global thread pool that is currently active (executing work). Thread Utilization (%) = (Active Threads / Total Threads) × 100 |
| THREAD DETAILS | |
| Total Threads | The total number of threads in ClickHouse's global thread pool, including both active and idle threads. |
| Active Threads | The number of threads in the global thread pool currently executing work. |
| Idle Threads | The number of threads in the global thread pool that are currently idle (waiting for work). Idle Threads = Total Threads - Active Threads |
| DISTRIBUTED TABLES | |
| Distributed Send | The number of active connections currently sending data to remote shards for distributed INSERT operations. |
| Distributed Files To Insert | The number of pending files queued for asynchronous distributed INSERT operations. |
It allows us to track crucial metrics such as response times, resource utilization, error rates, and transaction performance. The real-time monitoring alerts promptly notify us of any issues or anomalies, enabling us to take immediate action.
Reviewer Role: Research and Development