Ceph Storage Monitoring


Ceph is an open source software platform designed to provide highly scalable object, block and file-based storage from a single distributed computer cluster. Ceph's main goals are to be completely distributed without a single point of failure, scalable to the exabyte level, and freely-available. Applications Manager's Ceph storage monitors the performance and helps you maintain the overall health of your distributed Ceph cluster, ensures the availability of OSD nodes, proactively tracks the status of Placement Groups and storage availability.

Ceph Monitor

Ceph Storage Versions Supported: v0.66 and above (We use ceph status command and get the output in Json format.Returning the output in Json is supported from Ceph release v0.66)

Prerequisites for monitoring Ceph Storage Clusters: To collect performance stats of Ceph Storage Monitor a user should be given read privileage to ceph.keyring file. Read More.

Attributes Monitored: Refer Ceph Storage Parameters to know more about the attributes monitored.

To create a Ceph Storage monitor, follow the steps given below:

  1. Specify the Display Name of the Ceph Storage monitor.
  2. Enter the HostName or IP Address of the host where the Ceph storage cluster runs.
  3. Specify the command prompt value, which is the last character in your command prompt. Default value is $ and possible values are >, #, etc.
  4. Enter the Username and Password
  5. Provide the Polling interval for monitoring the Ceph Storage montior.
  6. If you are adding a new monitor from an Admin Server, select a Managed Server.
  7. Choose the Monitor Group from the combo box to which you want to associate the Monitor (optional). You can choose multiple groups to associate your monitor.
  8. Click Add Monitor(s). This discovers the Monitor from the network and starts monitoring them.

Ceph Server - Monitored Parameters

Go to the Monitors Category View by clicking the Monitors tab. Click on Ceph Storage under the Services Table. Displayed is the Ceph Storage bulk configuration view distributed into three tabs:

  • Availability tab, gives the Availability history for the past 24 hours or 30 days.
  • Performance tab gives the Health Status and events for the past 24 hours or 30 days.
  • List view enables you to perform bulk admin configurations.

Click on the monitor name to see all the server details listed under the following tabs:

Performance Overview

Parameter Description
PG Status
PGS (Placement groups) The number of Placement Group.
Active PGs The total number of Active Placement Groups. (Ceph processes requests to the placement group.)
Active+Clean PGs The total number of Active and Clean Placement Groups.
  • Active PGs - Ceph processes requests to the placement group.
  • Clean PGs - Ceph replicates all objects in the placement group the correct number of times.
Active+Remapped PGs The total number of Active and Remapped Placement Groups.
  • Active PGs - Ceph processes requests to the placement group.
  • Remapped PGs - The placement group is temporarily mapped to a different set of OSDs from what CRUSH specified.
Active+Degraded PGs The total number of Active and Degraded Placement Groups.
  • Active PGs - Ceph processes requests to the placement group.
  • Degraded PGs - Ceph has not replicated some objects in the placement group the correct number of times yet.
Down+Remapped+Peering The total number of Down, Remapped and Peering Placement Groups.
  • Down PGs - A replica with necessary data is down, so the placement group is offline.
  • Remapped PGs - The placement group is temporarily mapped to a different set of OSDs from what CRUSH specified.
  • Peering PGs - The placement group is undergoing the peering process.
Active+Clean+Scrubbing+Deep The total number of Active, Clean, Scrubbing and Deep Placement Groups.
  • Active PGs - Ceph processes requests to the placement group.
  • Clean PGs - Ceph replicates all objects in the placement group the correct number of times.
  • Scrubbing PGs - Ceph is checking the placement group for inconsistencies.
  • Deep PGs - Ceph automatically takes care of deep-scrubbing all placement groups periodically.
Down The number of replica placement groups where the necessary data is down, so the placement group is offline.
Degraded The number of placement groups, that has not replicated some objects in the placement group the correct number of times.
Peering The number of placement groups undergoing the peering process.
Incomplete The number of placement groups in Incomplete state i.e PGs with missing information about writes that may have occurred, or does not have any healthy copies.
Stale The number of placement groups in an unknown state - the monitors have not received an update for it since the placement group mapping changed.
OSD Status
OSDS Number of OSDS Present.
OSDUP Number of OSDs up and running.
OSDIN Number of OSDs in the cluster.
OSDOUT Number of OSDs out of the cluster.
OSDs In and Down If an OSD is down and in, there is a problem and the cluster will not be in a healthy state.This attribute will capture the number os OSDs In and Down
FULL Is the OSD Full.
NEARFULL Is the OSD reaching near full.
Time Checks
Monitor Name The name of the monitor in the cluster.
Severity The health severity message of the monitor.

Monitor Details

Parameter Description
Monitor Health Summary
Monitor Name Name of the monitor.
Total (GB) The total disk memory in GB.
Used  (GB) The total used memory in GB.
Available(GB) The available free memory in GB.
Available In  (%) The percentage of available free memory.
Last Updated The last time of Monitor status updated.
Severity The health severity of the monitor.
Rank Rank of the Ceph monitor in the Cluster. Ranks are (re)calculated whenever you add or remove a monitor (Lower the value, higher the rank). Ceph monitor with lowest value will be the lead or admin, clients will try to connect to the lead first and when lead is down, clients connect to the next rank monitor.
Monitor address The address required for monitors to discover each other using the monitor map.

Storage Availability

Parameter Description
Read Bytes The rate of bytes read per sec.
Write Bytes The rate of bytes written per sec.
Data Size The total storage data size in GB.
Total Bytes The total storage space available in GB.
Available The total free storage space available in GB.
Used The total amount of used storage space in GB.
Available % The percentage of free storage space.
Used % The percentage of used storage space.