CLYSO Monitoring Module
Currently it supports:
- pushing prometheus metrics to a gateway;
- interface for running CLYSO diagnostics script;
Enable the Clyso MGR module and Prometheus
By default the clyso module are disabled.
$ ceph mgr module ls
MODULE
balancer on (always on)
crash on (always on)
cephadm on
clyso off
prometheus off
To enable the module, run the following command:
ceph mgr module enable clyso
and to enable prometheus:
ceph mgr module enable prometheus
Pushing prometheus metrics
The [CLYSO_GATEWAY] will be provided by Clyso Support. This URL will be the gateway to which the module will push the metrics. It will look like https://metrics.clyso.com/metrics/test
.
Configure the push gateway:
ceph config set mgr mgr/clyso/gateway [CLYSO_GATEWAY]
If needed set the proxy:
ceph config set mgr mgr/clyso/proxy https://localhost:8443
Note, HTTP_PROXY and HTTPS_PROXY environment variables are also respected.
To make a push send only metrics that match a regexp (e.g. '^ceph_pool_.*'):
ceph config set mgr mgr/clyso/metrics_filter_in '^ceph_pool_.*'
To make push filter out metrics that match a regexp (e.g. '^ceph_pool_.*'):
ceph config set mgr mgr/clyso/metrics_filter_out '^ceph_pool_.*'
To make push add fsid param to every metric (e.g.
ceph_pg_active{fsid="90694361-4ac5-40b6-9bca-636ee7b0e2d9",pool_id="1"}
):
ceph config set mgr mgr/clyso/add_fsid_to_metric true
By default, the module will not send "# HELP " strings to the gateway. This may be changed with:
ceph config set mgr mgr/clyso/disable_metric_help false
To make push compress data (and add "Content-Encoding: gzip" header):
ceph config set mgr mgr/clyso/compress_metrics true
The module will periodically (mgr/clyso/push_interval config option) push prometheus metrics to the configured gateway, if it is configured.
Additionally, one can use CLI "push" command to execute it once:
ceph clyso push [CLYSO_GATEWAY]
Diagnostics collection
The module provides interface for running CLYSO diagnostics script, which collects the cluster info into a tarball, so it could be uploaded to CLYSO support team.
To start Ceph diagnostics collect:
ceph clyso diagnostics collect start [--all-osd-asok-stats] [--query-inactive-pg] [--timeout <int>] [--uncensored] [--debug]
To check the status and collect log:
ceph clyso diagnostics collect status
ceph clyso diagnostics collect log
When the diagnostics collect is complete the status will show the path to the result file.
The diagnostics collect may be aborted with the command:
ceph clyso diagnostics collect abort
Security-sensitive config options
In "config report" and "config log", values for security-sensitive config options are replaced with md5 digest. The module considers a config option as security-sensitive if its name matches the regexp (case insensitive) defined in mgr/clyso/config_sensitive_options (the default is 'ACCESS_KEY|SECRET_KEY|PASSWORD'). To change it use a command like below:
ceph config set mgr mgr/clyso/config_sensitive_options 'key|pass'
Health Check
The module provides checks for the module proper functioning, so if a problem is detected it will be seen in Ceph cluster health status. Additionally, it provides some cluster health checks that are not available in the upstream version.
Manually check the health and display the result:
ceph clyso health check
Enable health monitoring (enabled by default):
ceph clyso health check on
The module will periodically (mgr/clyso/health_check_interval config option) run health checks and update Ceph health status.
Disable health monitoring:
ceph clyso health check off
Recovery Tools
The module provides an interface for running CLYSO recovery tools. To see the list of available commands:
ceph --help | grep 'clyso recover'
The recovery tools are disabled by default. To enable:
ceph config set mgr mgr/clyso/enable_recovery_tools true
To start CephFS metadata recovery:
ceph clyso recover fs metadata start <fs> [<nranks:int>] [<nworkers:int>]
To check the status and recovery log:
ceph clyso recover fs journal status
ceph clyso recover fs journal log
To abort CephFS metadata recovery:
ceph clyso recover fs metadata abort
To start CephFS journal recovery:
ceph clyso recover fs journal start <fs> [<ranks>]
To check the status and recovery log:
ceph clyso recover fs journal status
ceph clyso recover fs journal log
To abort CephFS journal recovery:
ceph clyso recover fs journal abort
If clyso health check is on, information (warning) about an in-progress recovery is displayed in Ceph cluster health status.
Troubleshooting
A lot of useful information may be found in the mgr log after enabling debug:
ceph config set mgr mgr/clyso/log_level debug
See ceph mgr dump
for all supported config options.