Ceph Copilot
Installation
The CES image includes the Copilot CLI tool, which is installed by default. Installing through Cephadm is the easiest way to get started with Copilot.
Enter the cephadm shell by cephadm shell
you should be able to run ceph-copilot --help
and see the help menu.
The Clyso Ceph Copilot Assistant
Ceph Copilot is a CLI assistant designed to help administrators manage their Ceph clusters more efficiently. The tool provides a variety of features to help validate cluster health, simplify complex maintenance tasks, and optimize configurations for improved performance and stability.
What does Copilot do?
-
Cluster Validation: Ceph Copilot checks the health of your Ceph cluster and validates its configuration to ensure optimal performance and reliability.
-
Advanced Monitoring and Advising: Future versions of Ceph Copilot will include agents that monitor OSDs, MDSs, RGWs, and other cluster daemons, providing real-time insights and advice for improved configurations.
Usage
Help Command
$ ceph-copilot --help
usage: copilot [command]
Ceph Copilot: Your Expert Ceph Assistant.
optional arguments:
-h, --help show this help message and exit
--version, -v, -V show program's version number and exit
subcommands:
valid subcommands
{help,cluster,pools,toolkit}
help Show this help message and exit
cluster List of commands related to the cluster
pools Operations and management of Ceph pools
toolkit A selection of useful Ceph Tools
If you encounter any bugs, please report them at https://ticket.clyso.com/
Cluster Command
Checkup Command
The checkup
command performs an overall health and safety check on the cluster. It checks the health of your Ceph cluster and validates its configuration to ensure optimal performance and reliability.
$ ceph-copilot cluster checkup
Running tests: ...!.!.X.!.!..............X...X!..!
Overall score: 29 out of 35 (B-)
- WARN in Version/Check for Known Issues in Running Version: Info: Found 1 low severity issue(s) in running version 17.2.7-1
- WARN in Operating System/OS Support: Operating System is Unknown
- FAIL in Pools/Recommended Flags: Some pools have missing flags
- WARN in Pools/Pool Autoscale Mode: pg_autoscaler is on which may cause unexpected data movement
- WARN in Pools/Pool CRUSH Failure Domain Buckets: Not enough crush failure domain buckets for some pools
- FAIL in OSD Health/Check BlueFS DB/Journal is on Flash: All OSDs have bluefs db/wal or journal on rotational device
- FAIL in OSD Health/OSD host memory: All OSD hosts have insufficient memory
- WARN in OSD Health/OSD host swap: Some OSD hosts have swap enabled
- WARN in OSD Health/Dedicated Cluster Network: Public and Cluster Networks are Shared
Use --verbose for details and recommendations
Toolkit Command
The toolkit
command provides a selection of useful Ceph tools.
$ ceph-copilot toolkit --help
usage: copilot toolkit [-h] {list,run} ...
positional arguments:
{list,run}
list List the included Ceph tools
run Run an included Ceph tool
optional arguments:
-h, --help show this help message and exit
Example of the toolkit list
command:
$ ceph-copilot toolkit list
Ceph Tools are installed to /usr/libexec/ceph-copilot/tools
Tools:
clyso-cephfs-recover-metadata
clyso-rgw-find-missing
clyso-ceph-diagnostics-collect
clyso-rados-bulk
contrib/jj_ceph_balancer
cern/upmap-remapped.py
Toolkit run example contrib/jj_ceph_balancer
The contrib/jj_ceph_balancer tool is a Ceph balancer optimized for equal OSD storage utilization and PG placements across all pools. This can be run with the following command:
ceph-copilot toolkit run contrib/jj_ceph_balancer -h
Running tool: contrib/jj_ceph_balancer -h
usage: jj-ceph-balancer [-h] [-v] [-q] [--osdsize {device,weighted,crush}]
{gather,show,showremapped,balance,poolosddiff,repairstats,test,osdmap}
...
Ceph balancer optimized for equal OSD storage utilization and PG placements across all pools.
positional arguments:
{gather,show,showremapped,balance,poolosddiff,repairstats,test,osdmap}
gather only gather cluster information, i.e. generate a state file
repairstats which OSDs repaired their stored data?
test test internal stuff
osdmap compatibility with ceph osd maps
optional arguments:
-h, --help show this help message and exit
-v, --verbose increase program verbosity
-q, --quiet decrease program verbosity
--osdsize {device,weighted,crush}
what parameter to take for determining the osd size. default: crush. device=device_size, weighted=devsize*weight, crush=crushweight*weight
This is adoption of JJ's Ceph Balancer https://github.com/TheJJ/ceph-balancer
This balancer doesn't change your cluster anyway, it just prints the commands you can run to generate movements.
Example: for max 10 pg movements:
`jj-ceph-balancer -v balance --max-pg-moves 10 | tee /tmp/balance-upmaps`
If you're satisfied, run: $ bash /tmp/balance-upmaps
To get pool and OSD usage overview:
jj-ceph-balancer show --osds --per-pool-count --sort-utilization
Checkout more with `jj-ceph-balancer --help`
Pools Command
Example of the pools pg distribution
command:
$ ceph-copilot pools pg distribution
# NumSamples = 270; Min = 86.00; Max = 160.00
# Mean = 120.044444; Variance = 119.583210; SD = 10.935411; Median 124.000000
# each # represents a count of 2
86.0000 - 93.4000 [ 18]: #########
93.4000 - 100.8000 [ 0]:
100.8000 - 108.2000 [ 29]: ##############
108.2000 - 115.6000 [ 7]: ###
115.6000 - 123.0000 [ 48]: ########################
123.0000 - 130.4000 [ 166]: ###################################################################################
130.4000 - 137.8000 [ 0]:
137.8000 - 145.2000 [ 0]:
145.2000 - 152.6000 [ 0]:
152.6000 - 160.0000 [ 2]: #
This cluster is an example of 270 OSDs which are distributed across the PGs. The output shows the distribution of the OSDs across the PGs.