Schedule demo

ClickHouse Monitoring


ClickHouse - Overview

ClickHouse is a high-performance, open-source columnar database management system designed for real-time analytics and large-scale data processing. It is optimized for fast query execution on massive datasets, making it ideal for use cases such as log analysis, business intelligence, and monitoring. With its efficient data compression, distributed architecture, and support for complex analytical queries, ClickHouse enables organizations to process and analyze large volumes of data with high speed and scalability.

Applications Manager provides comprehensive monitoring support for ClickHouse databases by collecting key performance metrics across multiple areas such as server health, query activity, resource utilization, replication, and background operations. These metrics help administrators analyze performance trends, detect anomalies, and ensure optimal database functioning.

Creating a new ClickHouse monitor

ClickHouse monitoring in Applications Manager is supported only through the Prometheus mode of monitoring. Before adding a ClickHouse monitor, you must first configure the Prometheus integration in Applications Manager. Learn how to configure Prometheus integration

Monitored Parameters

Go to the Monitors Category View by clicking the Monitors tab. Click on ClickHouse under the Database table. Displayed is the ClickHouse bulk configuration view distributed into three tabs:

  • Availability tab gives the Availability history for the past 24 hours or 30 days.
  • Performance tab gives the Health Status and events for the past 24 hours or 30 days.
  • List view enables you to perform bulk admin configurations.

Click on the tabs listed below to view the corresponding metrics monitored:

Overview

This tab provides a high-level snapshot of the ClickHouse server's identity, availability, and client connectivity.

ParameterDescription
SERVER SUMMARY
Server NameThe hostname or identifier of the ClickHouse server instance as reported by the server itself.
VersionThe version number of the ClickHouse server software currently running (e.g., 24.8.1.2684).
Database CountThe total number of databases that currently exist on the ClickHouse server, including system databases.
Table CountThe total number of tables across all databases on the ClickHouse server.
UptimeThe elapsed time since the ClickHouse server process was last started.
RESPONSE TIME
Response TimeThe round-trip time (in milliseconds) taken by the APM data collector to execute the Prometheus queries and receive a response from the Prometheus server for this ClickHouse instance.
CONNECTION SUMMARY
TCP ConnectionsThe number of currently active TCP connections to the ClickHouse native protocol interface (Default port is 9000).
HTTP ConnectionsThe number of currently active HTTP connections to the ClickHouse HTTP interface (Default port is 8123).
MySQL ConnectionsThe number of currently active connections via the MySQL wire protocol compatibility interface (Default port is 9004).
Interserver ConnectionsThe number of currently active connections between ClickHouse nodes for internal cluster communication (Default port is 9009).

Queries

This tab tracks query execution activity, including throughput, currently running operations, failures, and insert data volume.

ParameterDescription
TOTAL QUERIES
Total QueriesThe cumulative total number of queries (of all types) that have been executed since the ClickHouse server started.
Total Select QueriesThe cumulative total number of SELECT queries executed since server startup.
Total Insert QueriesThe cumulative total number of INSERT queries executed since server startup.
CURRENT QUERIES
Current Running QueriesThe number of queries currently being executed at the instant of measurement.
Current MergesThe number of background merge operations currently in progress.
Current MutationsThe number of mutation operations (ALTER TABLE UPDATE/DELETE) currently being processed.
FAILED QUERIES
Failed QueriesThe cumulative total number of queries (all types) that failed with an error since server startup.
Failed Select QueriesThe cumulative total number of SELECT queries that failed since server startup.
Failed Insert QueriesThe cumulative total number of INSERT queries that failed since server startup.
Queries PreemptedThe number of queries currently waiting in the preemption queue.
INSERT THROUGHPUT
Inserted RowsThe cumulative total number of rows that have been successfully inserted into all tables since server startup.
Inserted BytesThe cumulative total volume of data (in gigabytes) that has been inserted into all tables since server startup, measured at the uncompressed level.
Delayed InsertsThe number of INSERT queries currently being throttled (delayed) because the target table has too many active data parts.

Memory

This tab provides visibility into the server's memory consumption and disk storage utilization.

ParameterDescription
MEMORY UTILIZATION
Memory Utilization

The percentage of total operating system memory currently in use.

Memory Utilization (%) = (Used Memory / Total Memory) × 100

DISK UTILIZATION
Disk Utilization

The percentage of total disk space currently consumed on the default storage disk.

Disk Utilization (%) = (Disk Used Space / Disk Total Space) × 100

MEMORY DETAILS
Total MemoryThe total amount of physical RAM (in GB) available on the operating system where ClickHouse is running.
Used MemoryThe amount of physical RAM (in GB) currently in use at the OS level.
Free MemoryThe amount of physical RAM (in GB) currently available for new allocations at the OS level.
Memory TrackingThe amount of memory (in MB) currently allocated and tracked by ClickHouse's internal memory allocator.
DISK DETAILS
Disk Total SpaceThe total capacity (in GB) of the default storage disk configured for ClickHouse.
Disk Used SpaceThe amount of disk space (in GB) currently consumed on the default storage disk.
Disk Available SpaceThe amount of free disk space (in GB) remaining on the default storage disk.

MergeTree

This tab provides insight into the MergeTree storage engine, which is the core table engine in ClickHouse responsible for data storage, indexing, and background merge operations.

ParameterDescription
MERGETREE SUMMARY
Merge Tree Data SizeThe total compressed data size (in GB) stored across all MergeTree-family tables on the server.
Merge Tree Total RowsThe total number of rows stored across all MergeTree-family tables on the server.
Merge Tree Total PartsThe total number of active data parts across all MergeTree-family tables.
Max Part Count For PartitionThe highest number of active data parts in any single partition across all MergeTree-family tables.
Merged RowsThe cumulative total number of rows processed by background merge operations since server startup.
Merged BytesThe cumulative total volume of data (in GB, uncompressed) processed by background merge operations since server startup.
BACKGROUND OPERATIONS
Background Merge Pool TasksThe number of merge and mutation tasks currently active in the background merge thread pool.
Background Merge Pool SizeThe configured maximum number of threads in the background merge and mutations thread pool.
Background Schedule Pool TasksThe number of tasks currently active in the background schedule thread pool.
Background Schedule Pool SizeThe configured maximum number of threads in the background schedule thread pool, determined by the Background Merge Pool Size (Default value is 128).

Replication

This tab monitors ClickHouse's data replication health and ZooKeeper/Keeper coordination activity, which are critical for high-availability cluster deployments.

ParameterDescription
REPLICATION SUMMARY
Readonly ReplicasThe number of ReplicatedMergeTree tables currently in read-only mode on this server.
Replicas Max Queue SizeThe maximum replication queue length across all replicated tables on this server.
Replicated Part FetchesThe cumulative number of data part fetch operations performed from other replicas since server startup.
Replicated Part MergesThe cumulative number of merge operations performed on replicated tables since server startup.
Replicated Data LossThe cumulative number of data loss events detected in replicated tables since server startup.
ZOOKEEPER SUMMARY
ZooKeeper SessionsThe number of active sessions between this ClickHouse server and the ZooKeeper/ClickHouse Keeper ensemble.
ZooKeeper RequestsThe number of ZooKeeper/Keeper requests currently in flight (pending a response).
ZooKeeper TransactionsThe cumulative total number of ZooKeeper/Keeper multi-operation transactions executed since server startup.

IO

This tab covers network throughput, disk I/O, file handle usage, and open file descriptors.

ParameterDescription
NETWORK IO
Network Receive BytesThe cumulative total volume of data (in MB) received over the network by the ClickHouse server since startup.
Network Send BytesThe cumulative total volume of data (in MB) sent over the network by the ClickHouse server since startup.
DISK IO
Disk Read BytesThe cumulative total volume of data (in GB) read from disk (file descriptors) by the ClickHouse server since startup.
Disk Write BytesThe cumulative total volume of data (in GB) written to disk (file descriptors) by the ClickHouse server since startup.
OPEN FILES
Open Files For ReadThe number of files currently open for reading by the ClickHouse server process.
Open Files For WriteThe number of files currently open for writing by the ClickHouse server process.

Threads

This tab monitors ClickHouse's internal thread pool utilization and distributed table activity.

ParameterDescription
THREAD UTILIZATION
Thread Utilization

The percentage of ClickHouse's global thread pool that is currently active (executing work).

Thread Utilization (%) = (Active Threads / Total Threads) × 100

THREAD DETAILS
Total ThreadsThe total number of threads in ClickHouse's global thread pool, including both active and idle threads.
Active ThreadsThe number of threads in the global thread pool currently executing work.
Idle Threads

The number of threads in the global thread pool that are currently idle (waiting for work).

Idle Threads = Total Threads - Active Threads

DISTRIBUTED TABLES
Distributed SendThe number of active connections currently sending data to remote shards for distributed INSERT operations.
Distributed Files To InsertThe number of pending files queued for asynchronous distributed INSERT operations.

Loved by customers all over the world

"Standout Tool With Extensive Monitoring Capabilities"

It allows us to track crucial metrics such as response times, resource utilization, error rates, and transaction performance. The real-time monitoring alerts promptly notify us of any issues or anomalies, enabling us to take immediate action.

Reviewer Role: Research and Development

carlos-rivero
"I like Applications Manager because it helps us to detect issues present in our servers and SQL databases."
Carlos Rivero

Tech Support Manager, Lexmark

Trusted by over 6000+ businesses globally