Skip to main content

RGW Grafana Dashboard Reporting No Data on Reef

Summary

On a Ceph cluster running Reef (v18.2.x), all RGW metrics in Grafana were reporting "no data". The root cause was introduced in Reef, a change that disables the export of perf counters via the MGR Prometheus module by default. A workaround can be effected with a single configuration change.


Environment

DetailValue
Affected Ceph versionReef (18.2.x)
Working reference clusterQuincy
Deployment methodCephadm

Symptoms

  • The entire RGW dashboard in Grafana reported no data.
  • Prometheus did not expose RGW metrics.
  • The RGW section of the Ceph Dashboard functioned normally (e.g. creating and editing buckets worked).
  • No relevant configuration options were found that would obviously prevent metric collection.
  • The same setup worked correctly on a Quincy cluster.

Root Cause

A change was introduced in Reef that disables the export of Ceph daemon perf counters as Prometheus metrics by default in the MGR Prometheus module.

Reference: https://docs.ceph.com/en/reef/mgr/prometheus/#id1

In Reef, perf counters are intended to be exposed by ceph-exporter instead. However, if ceph-exporter deployment incomplete, there may be no ceph-exporter instances were running on the cluster that expose RGW perf counter metrics (such as ceph_rgw_op_get_obj_lat_sum) and the counters are unavailable to Prometheus.

Most RGW metrics are perf counters, which is why the entire RGW dashboard showed no data.


Resolution

  1. Investigate why ceph-exporters are not deployed or, if they are deployed, why they are not providing metrics for necessary daemons. Repair them if possible.

  2. If there is no way to fix ceph-exporters, then use the following procedure to set the MGR option exclude_perf_counters to false as a workaround:

    Re-enable perf counter export in the MGR Prometheus module by running:

    ceph config set mgr mgr/prometheus/exclude_perf_counters false

    Restart the MGR.

Verification

After applying the fix, confirm the metrics are available:

  1. Check that RGW perf counters (e.g. ceph_rgw_op_get_obj_lat_sum) appear in the ceph-exporter or Prometheus endpoint.
  2. Reload the RGW Grafana dashboards and confirm data is now being populated.