RGW Grafana Dashboard Reporting No Data on Reef
Summary
On a Ceph cluster running Reef (v18.2.x), all RGW metrics in Grafana were reporting "no data". The root cause was introduced in Reef, a change that disables the export of perf counters via the MGR Prometheus module by default. A workaround can be effected with a single configuration change.
Environment
| Detail | Value |
|---|---|
| Affected Ceph version | Reef (18.2.x) |
| Working reference cluster | Quincy |
| Deployment method | Cephadm |
Symptoms
- The entire RGW dashboard in Grafana reported no data.
- Prometheus did not expose RGW metrics.
- The RGW section of the Ceph Dashboard functioned normally (e.g. creating and editing buckets worked).
- No relevant configuration options were found that would obviously prevent metric collection.
- The same setup worked correctly on a Quincy cluster.
Root Cause
A change was introduced in Reef that disables the export of Ceph daemon perf counters as Prometheus metrics by default in the MGR Prometheus module.
Reference: https://docs.ceph.com/en/reef/mgr/prometheus/#id1
In Reef, perf counters are intended to be exposed by ceph-exporter instead.
However, if ceph-exporter deployment incomplete, there may be no
ceph-exporter instances were running on the cluster that expose RGW perf
counter metrics (such as ceph_rgw_op_get_obj_lat_sum) and the counters are
unavailable to Prometheus.
Most RGW metrics are perf counters, which is why the entire RGW dashboard showed no data.
Resolution
-
Investigate why
ceph-exporters are not deployed or, if they are deployed, why they are not providing metrics for necessary daemons. Repair them if possible. -
If there is no way to fix
ceph-exporters, then use the following procedure to set the MGR optionexclude_perf_counterstofalseas a workaround:Re-enable perf counter export in the MGR Prometheus module by running:
ceph config set mgr mgr/prometheus/exclude_perf_counters falseRestart the MGR.
Verification
After applying the fix, confirm the metrics are available:
- Check that RGW perf counters (e.g.
ceph_rgw_op_get_obj_lat_sum) appear in theceph-exporteror Prometheus endpoint. - Reload the RGW Grafana dashboards and confirm data is now being populated.