Skip to main content

CLYSO Monitoring Module

Currently it supports:

  • pushing prometheus metrics to a gateway;
  • interface for running CLYSO diagnostics script;

Enable the Clyso MGR module and Prometheus

By default the clyso module are disabled.

$ ceph mgr module ls
MODULE
balancer on (always on)
crash on (always on)
cephadm on
clyso off
prometheus off

To enable the module, run the following command:

ceph mgr module enable clyso

and to enable prometheus:

ceph mgr module enable prometheus

Pushing prometheus metrics

The [CLYSO_GATEWAY] will be provided by Clyso Support. This URL will be the gateway to which the module will push the metrics. It will look like https://metrics.clyso.com/metrics/test.

Configure the push gateway:

ceph config set mgr mgr/clyso/gateway [CLYSO_GATEWAY]

If needed set the proxy:

ceph config set mgr mgr/clyso/proxy https://localhost:8443

Note, HTTP_PROXY and HTTPS_PROXY environment variables are also respected.

To make a push send only metrics that match a regexp (e.g. '^ceph_pool_.*'):

ceph config set mgr mgr/clyso/metrics_filter_in '^ceph_pool_.*'

To make push filter out metrics that match a regexp (e.g. '^ceph_pool_.*'):

ceph config set mgr mgr/clyso/metrics_filter_out '^ceph_pool_.*'

To make push add fsid param to every metric (e.g. ceph_pg_active{fsid="90694361-4ac5-40b6-9bca-636ee7b0e2d9",pool_id="1"}):

ceph config set mgr mgr/clyso/add_fsid_to_metric true

By default, the module will not send "# HELP " strings to the gateway. This may be changed with:

ceph config set mgr mgr/clyso/disable_metric_help false

To make push compress data (and add "Content-Encoding: gzip" header):

ceph config set mgr mgr/clyso/compress_metrics true

The module will periodically (mgr/clyso/push_interval config option) push prometheus metrics to the configured gateway, if it is configured.

Additionally, one can use CLI "push" command to execute it once:

ceph clyso push [CLYSO_GATEWAY]

Diagnostics collection

The module provides interface for running CLYSO diagnostics script, which collects the cluster info into a tarball, so it could be uploaded to CLYSO support team.

To start Ceph diagnostics collect:

ceph clyso diagnostics collect start [--all-osd-asok-stats] [--query-inactive-pg] [--timeout <int>] [--uncensored] [--debug]

To check the status and collect log:

ceph clyso diagnostics collect status
ceph clyso diagnostics collect log

When the diagnostics collect is complete the status will show the path to the result file.

The diagnostics collect may be aborted with the command:

ceph clyso diagnostics collect abort

Security-sensitive config options

In "config report" and "config log", values for security-sensitive config options are replaced with md5 digest. The module considers a config option as security-sensitive if its name matches the regexp (case insensitive) defined in mgr/clyso/config_sensitive_options (the default is 'ACCESS_KEY|SECRET_KEY|PASSWORD'). To change it use a command like below:

ceph config set mgr mgr/clyso/config_sensitive_options 'key|pass'

Health Check

The module provides checks for the module proper functioning, so if a problem is detected it will be seen in Ceph cluster health status. Additionally, it provides some cluster health checks that are not available in the upstream version.

Manually check the health and display the result:

ceph clyso health check

Enable health monitoring (enabled by default):

ceph clyso health check on

The module will periodically (mgr/clyso/health_check_interval config option) run health checks and update Ceph health status.

Disable health monitoring:

ceph clyso health check off

Recovery Tools

The module provides an interface for running CLYSO recovery tools. To see the list of available commands:

ceph --help | grep 'clyso recover'

The recovery tools are disabled by default. To enable:

ceph config set mgr mgr/clyso/enable_recovery_tools true

To start CephFS metadata recovery:

ceph clyso recover fs metadata start <fs> [<nranks:int>] [<nworkers:int>]

To check the status and recovery log:

ceph clyso recover fs journal status
ceph clyso recover fs journal log

To abort CephFS metadata recovery:

ceph clyso recover fs metadata abort

To start CephFS journal recovery:

ceph clyso recover fs journal start <fs> [<ranks>]

To check the status and recovery log:

ceph clyso recover fs journal status
ceph clyso recover fs journal log

To abort CephFS journal recovery:

ceph clyso recover fs journal abort

If clyso health check is on, information (warning) about an in-progress recovery is displayed in Ceph cluster health status.

Troubleshooting

A lot of useful information may be found in the mgr log after enabling debug:

ceph config set mgr mgr/clyso/log_level debug

See ceph mgr dump for all supported config options.