Ceph Day NYC 2023
The Ceph Community hosted its first post-pandemic event at the Bloomberg offices in New York City. Ceph Day NYC was a great success!
The Ceph Community hosted its first post-pandemic event at the Bloomberg offices in New York City. Ceph Day NYC was a great success!
Clyso's Mark Nelson has written the first part in a series looking at performance testing of the upcoming Ceph Reef release vs the previous Quincy release. See the blog post here!
Please feel free to contact us if you are interested in Ceph support or performance consulting!
After more than 4 years of development, mclock is the default scheduler for ceph quincy (version 17).If you don't want to use the scheduler, you can disable it with the option osd_op_queue.
WPQ was the default before Ceph Quincy and the change requires a restart of the OSDs.
Source:
https://docs.ceph.com/en/quincy/rados/configuration/osd-config-ref/#confval-osd_op_queue"
https://docs.ceph.com/en/quincy/rados/configuration/osd-config-ref/#qos-based-on-mclock>"
After a reboot of the MDS Server it can happen that the CephFS Filesystem becomes read-only:
HEALTH_WARN 1 MDSs are read only
[WRN] MDS_READ_ONLY: 1 MDSs are read only
mds.XXX(mds.0): MDS in read-only mode
[https://tracker.ceph.com/issues/58082](https://tracker.ceph.com/issues/58082)
In the MDS log you will find following entry
log_channel(cluster) log [ERR] : failed to commit dir 0x1 object, errno -22 mds.0.11963 unhandled write error (22) Invalid argument, force readonly... mds.0.cache force file system read-only log_channel(cluster) log [WRN] : force file system read-only mds.0.server force_clients_readonly
This is a known upstream issue thought the fix is still not merged
As a workaround you can use following steps:
ceph config set mds mds_dir_max_commit_size 80
ceph fs fail <fs_name>
ceph fs set <fs_name> joinable true
If not successful you may need to increase the mds_dir_max_commit_size, e.g. to 160
Our bugfix from earlier this year was published in the ceph quincy release 17.2.4.
Trimming of PGLog dups is now controlled by size instead of the version. This fixes the PGLog inflation issue that was happening when online (in OSD) trimming jammed after a PG split operation. Also, a new offline mechanism has been added: ceph-objectstore-tool now has a trim-pg-log-dups op that targets situations where an OSD is unable to boot due to those inflated dups. If that is the case, in OSD logs the “You can be hit by THE DUPS BUG” warning will be visible. Relevant tracker: https://tracker.ceph.com/issues/53729"
osds with unlimited ram growth
how to identify osds affected by pg dup bug
https://docs.ceph.com/en/latest/releases/quincy/#v17-2-4-quincy
At the time when the MDS cache runs full, the process must clear inodes from its cache. This also means that the MDS will prompt some clients to also clear some inodes from their cache.
The MDS asks the cephfs client several times to release the inodes. If the client does not respond to this cache recall request, Ceph will log this warning.
if you had to recreate the device_health or .mgr pool, the healthdevice module is missing his sqlite3 database structure. You have recreate the structure manually.
backtrace": [
" File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 373, in serve\n self.scrape_all()",
" File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 425, in scrape_all\n self.put_device_metrics(device, data)",
" File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 500, in put_device_metrics\n self._create_device(devid)",
" File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 487, in _create_device\n cursor = self.db.execute(SQL, (devid,))",
"sqlite3.InternalError: unknown operation"
apt install libsqlite3-mod-ceph libsqlite3-mod-ceph-dev
clyso@compute-21:~$ sqlite3 -cmd '.load libcephsqlite.so' -cmd '.open file:///.mgr:devicehealth/main.db?vfs=ceph'
main: "" r/w
SQLite version 3.39.1 2022-07-13 19:41:41
Enter ".help" for usage hints.
sqlite>
clyso@compute-21:~$ sqlite3 -cmd '.load libcephsqlite.so' -cmd '.databases'
main: "" r/w
SQLite version 3.39.1 2022-07-13 19:41:41
Enter ".help" for usage hints.
sqlite>
clyso@compute-21:~$ sqlite3 -cmd '.load libcephsqlite.so' -cmd '.open file:///.mgr:devicehealth/main.db?vfs=ceph'
SQLite version 3.39.1 2022-07-13 19:41:41
Enter ".help" for usage hints.
sqlite> CREATE TABLE IF NOT EXISTS MgrModuleKV (
key TEXT PRIMARY KEY,
value NOT NULL
) WITHOUT ROWID;
sqlite> INSERT OR IGNORE INTO MgrModuleKV (key, value) VALUES ('__version', 0);
sqlite> .tables
Device DeviceHealthMetrics MgrModuleKV
sqlite>
https://ceph.io/en/news/blog/2021/new-in-pacific-sql-on-ceph https://docs.ceph.com/en/latest/rados/api/libcephsqlite/ https://docs.ceph.com/en/latest/rados/api/libcephsqlite/#usage https://github.com/ceph/ceph/blob/main/src/pybind/mgr https://github.com/ceph/ceph/blob/main/src/pybind/mgr/devicehealth/module.py
we have tested ceph s3 in openstack swift intensively before. We were interested in the behavior of the radosgw stack in ceph. We paid particular attention to the size and number of objects in relation to the resource consumption of the radosgw process. Effects on response latencies of radosgw were also important to us. To be able to plan the right sizing of the physical and virtual environments.
From a technical point of view, we were interested in the behavior of radosgw in the following topics.
when choosing the right tool, it was important for us to be able to test both small and large ceph clusters with several thousand osds.
We want to use the test results as a file for evaluation as well as have a graphical representation as timeseries data.
For timeseries data we rely on the standard stack with Grafana, Prometheus and Thanos.
the main prometheus exporters we use are ceph-mgr-exporter and node-exporter.
CBT is a testing harness written in python
This is a set of unofficial Amazon AWS S3 compatibility tests
https://github.com/ceph/s3-tests
COSBench is a benchmarking tool to measure the performance of Cloud Object Storage services.
https://github.com/intel-cloud/cosbench
Gosbench is the Golang reimplementation of Cosbench. It is a distributed S3 performance benchmark tool with Prometheus exporter leveraging the official Golang AWS SDK
https://github.com/mulbc/gosbench
hsbench is an S3 compatable benchmark originally based on wasabi-tech/s3-benchmark.
https://github.com/markhpc/hsbench
Minio - S3 benchmarking tool.
getput can be run individually on a test client.
gpsuite is responsible for synchronization and scaling across any number of test clients. Communication takes place via ssh keys and the simultaneous start of all s3 test clients is synchronized over a common time base.
Installation on linux as script or as container is supported.
If you don't want to set flags for the whole cluster, like noout or noup. Then you can also use ceph osd set-group and ceph osd unset-group to set the appropriate flag for a group of osds or even whole hosts.
ceph osd set-group <flags> <who>
ceph osd unset-group <flags> <who>
for example set noout for a whole host with osds
ceph osd set-group noout clyso-ceph-node3
``
```bash
root@clyso-ceph-node1:~# ceph health detail
HEALTH_WARN 1 OSDs or CRUSH {nodes, device-classes} have {NOUP,NODOWN,NOIN,NOOUT} flags set
[WRN] OSD_FLAGS: 1 OSDs or CRUSH {nodes, device-classes} have {NOUP,NODOWN,NOIN,NOOUT} flags set
host clyso-ceph-node3 has flags noout
ceph osd unset-group noout clyso-ceph-node3
root@clyso-ceph-node1:~# ceph health detail
HEALTH_OK
root@clyso-ceph-node1:
Sources:
docs.ceph.com/en/quincy/rados/operations/health-checks/#osd-flags
Unlock a ceph dashboard user via commandline.
ceph dashboard ac-user-enable <username>
example with admin user
ceph dashboard ac-user-enable admin
https://docs.ceph.com/en/quincy/mgr/dashboard/#enable-a-locked-user