Ceph Storage Monitoring


Ceph Storage Monitoring - An Overview

Ceph is an open source software platform designed to provide highly scalable object, block and file-based storage from a single distributed computer cluster. Ceph's main goals are to be completely distributed without a single point of failure, scalable to the exabyte level, and freely-available.

Applications Manager's Ceph storage monitor helps to monitor the performance and to maintain the overall health of your distributed Ceph cluster, ensures the availability of OSD nodes, proactively tracks the status of Placement Groups and storage availability.

Creating a new Ceph Storage monitor

Ceph Storage Versions Supported: v0.66 and above, Luminous version 12.2.0 onwards (We use Ceph status command and get the output in JSON format. Returning the output in JSON is supported from Ceph release v0.66)

Prerequisites for monitoring Ceph Storage Clusters: To collect performance stats of Ceph Storage Monitor a user should be given read privileage to ceph.keyring file. Read more

Using the REST API to add a new Ceph Storage monitor: Click here

To create a Ceph Storage monitor, follow the steps given below:

  1. Specify the Display Name of the Ceph Storage monitor.
  2. Enter the HostName or IP Address of the host where the Ceph storage cluster runs.
  3. Select the Mode of Monitoring you want (Telnet and SSH based). For SSH, provide the port number (22 by default), username and password information of the server. You also have the option to give Public Key Authentication (User name and Public Key).
  4. Under Credential Details, if you opt the "Use below Credential" option, provide credentials as per the mode selected. If you opt to fetch the details from the preconfigured credential details in Credentials Manager, select the option "Select from Credential list".
  5. Specify the command prompt value, which is the last character in your command prompt. Default value is $ and possible values are >, #, etc.
  6. Enter the Username and Password
  7. Provide the Polling interval for monitoring the Ceph Storage monitor.
  8. If you are adding a new monitor from an Admin Server, select a Managed Server.
  9. Choose the Monitor Group from the combo box to which you want to associate the Monitor (optional). You can choose multiple groups to associate your monitor.
  10. Click Add Monitor(s). This discovers the monitor from the network and starts monitoring them.

Ceph Server - Monitored Parameters

Go to the Monitors Category View by clicking the Monitors tab. Click on Ceph Storage under the Services Table. Displayed is the Ceph Storage bulk configuration view distributed into three tabs:

  • Availability tab, gives the Availability history for the past 24 hours or 30 days.
  • Performance tab gives the Health Status and events for the past 24 hours or 30 days.
  • List view enables you to perform bulk admin configurations.

Click on the monitor name to see all the server details listed under the following tabs:

Performance Overview

Parameter Description
PG Status
PGS (Placement groups) The number of Placement Group.
Active PGs The total number of Active Placement Groups. (Ceph processes requests to the placement group.)
Active+Clean PGs The total number of Active and Clean Placement Groups.
  • Active PGs - Ceph processes requests to the placement group.
  • Clean PGs - Ceph replicates all objects in the placement group the correct number of times.
Active+Remapped PGs The total number of Active and Remapped Placement Groups.
  • Active PGs - Ceph processes requests to the placement group.
  • Remapped PGs - The placement group is temporarily mapped to a different set of OSDs from what CRUSH specified.
Active+Degraded PGs The total number of Active and Degraded Placement Groups.
  • Active PGs - Ceph processes requests to the placement group.
  • Degraded PGs - Ceph has not replicated some objects in the placement group the correct number of times yet.
Down+Remapped+Peering The total number of Down, Remapped and Peering Placement Groups.
  • Down PGs - A replica with necessary data is down, so the placement group is offline.
  • Remapped PGs - The placement group is temporarily mapped to a different set of OSDs from what CRUSH specified.
  • Peering PGs - The placement group is undergoing the peering process.
Active+Clean+Scrubbing+Deep The total number of Active, Clean, Scrubbing and Deep Placement Groups.
  • Active PGs - Ceph processes requests to the placement group.
  • Clean PGs - Ceph replicates all objects in the placement group the correct number of times.
  • Scrubbing PGs - Ceph is checking the placement group for inconsistencies.
  • Deep PGs - Ceph automatically takes care of deep-scrubbing all placement groups periodically.
Down The number of replica placement groups where the necessary data is down, so the placement group is offline.
Degraded The number of placement groups, that has not replicated some objects in the placement group the correct number of times.
Peering The number of placement groups undergoing the peering process.
Incomplete The number of placement groups in Incomplete state i.e PGs with missing information about writes that may have occurred, or does not have any healthy copies.
Stale The number of placement groups in an unknown state - the monitors have not received an update for it since the placement group mapping changed.
OSD Status
OSDS Number of OSDS Present.
OSDUP Number of OSDs up and running.
OSDIN Number of OSDs in the cluster.
OSDOUT Number of OSDs out of the cluster.
OSDs In and Down If an OSD is down and in, there is a problem and the cluster will not be in a healthy state.This attribute will capture the number os OSDs In and Down
FULL Is the OSD Full.
NEARFULL Is the OSD reaching near full.
Time Checks [Overall Monitor Details]
Monitor Name The name of the monitor in the cluster.
Severity The health severity message of the monitor.

Monitor Details

Parameter Description
Monitor Health Summary
Monitor Name Name of the monitor.
Total (GB) The total disk memory in GB.
Used  (GB) The total used memory in GB.
Available(GB) The available free memory in GB.
Available In  (%) The percentage of available free memory.
Last Updated The last time of Monitor status updated.
Severity The health severity of the monitor.
Rank Rank of the Ceph monitor in the Cluster. Ranks are (re)calculated whenever you add or remove a monitor (Lower the value, higher the rank). Ceph monitor with lowest value will be the lead or admin, clients will try to connect to the lead first and when lead is down, clients connect to the next rank monitor.
Monitor address The address required for monitors to discover each other using the monitor map.

Storage Availability

Parameter Description
Read Bytes The rate of bytes read per sec.
Write Bytes The rate of bytes written per sec.
Data Size The total storage data size in GB.
Total Bytes The total storage space available in GB.
Available The total free storage space available in GB.
Used The total amount of used storage space in GB.
Available % The percentage of free storage space.
Used % The percentage of used storage space.

OSD Details (For Luminous versions)

Parameter Description
OSD Storage Information
Id Monitor ID.
OSD Monitor Name The OSD Monitor name.
Disk Usage Graphical representation of the disk storage used.
Used Storage (GB) The disk storage used in GB.
Available Storage(GB) The free memory available in GB
Total Storage(GB) The total disk memory in GB
Available Storage (%) The percentage of free memory available.
Last Down Time Last time and date at which OSD status went down.