101 posts tagged with "ceph"

Ceph Reef vs Quincy RBD Performance

March 27, 2023 · One min read

Managing Director at Clyso

Clyso's Mark Nelson has written the first part in a series looking at performance testing of the upcoming Ceph Reef release vs the previous Quincy release. See the blog post here!

Please feel free to contact us if you are interested in Ceph support or performance consulting!

ceph - how do disable mclock scheduler

March 22, 2023 · One min read

Joachim Kraftmayer

Managing Director at Clyso

After more than 4 years of development, mclock is the default scheduler for ceph quincy (version 17).If you don't want to use the scheduler, you can disable it with the option osd_op_queue.

WPQ was the default before Ceph Quincy and the change requires a restart of the OSDs.

Source:

https://docs.ceph.com/en/quincy/rados/configuration/osd-config-ref/#confval-osd_op_queue"

https://docs.ceph.com/en/quincy/rados/configuration/osd-config-ref/#qos-based-on-mclock>"

Fix CephFS Filesystem Read-Only

January 6, 2023 · One min read

Joachim Kraftmayer

Managing Director at Clyso

After a reboot of the MDS Server it can happen that the CephFS Filesystem becomes read-only:

HEALTH_WARN 1 MDSs are read only
[WRN] MDS_READ_ONLY: 1 MDSs are read only
    mds.XXX(mds.0): MDS in read-only mode
[https://tracker.ceph.com/issues/58082](https://tracker.ceph.com/issues/58082)

In the MDS log you will find following entry

log_channel(cluster) log [ERR] : failed to commit dir 0x1 object, errno -22 mds.0.11963 unhandled write error (22) Invalid argument, force readonly... mds.0.cache force file system read-only log_channel(cluster) log [WRN] : force file system read-only mds.0.server force_clients_readonly

https://tracker.ceph.com/issues/58082

This is a known upstream issue thought the fix is still not merged

As a workaround you can use following steps:

ceph config set mds mds_dir_max_commit_size 80
ceph fs fail <fs_name>
ceph fs set <fs_name> joinable true

If not successful you may need to increase the mds_dir_max_commit_size, e.g. to 160

ceph Quincy release with bugfix for PGLog dups

November 5, 2022 · One min read

Joachim Kraftmayer

Managing Director at Clyso

Our bugfix from earlier this year was published in the ceph quincy release 17.2.4.

Trimming of PGLog dups is now controlled by size instead of the version. This fixes the PGLog inflation issue that was happening when online (in OSD) trimming jammed after a PG split operation. Also, a new offline mechanism has been added: ceph-objectstore-tool now has a trim-pg-log-dups op that targets situations where an OSD is unable to boot due to those inflated dups. If that is the case, in OSD logs the “You can be hit by THE DUPS BUG” warning will be visible. Relevant tracker: https://tracker.ceph.com/issues/53729"

osds with unlimited ram growth

how to identify osds affected by pg dup bug

Sources

https://docs.ceph.com/en/latest/releases/quincy/#v17-2-4-quincy

\[WRN\] clients failing to respond to cache pressure

November 5, 2022 · One min read

Joachim Kraftmayer

Managing Director at Clyso

At the time when the MDS cache runs full, the process must clear inodes from its cache. This also means that the MDS will prompt some clients to also clear some inodes from their cache.

The MDS asks the cephfs client several times to release the inodes. If the client does not respond to this cache recall request, Ceph will log this warning.

ceph-mgr recreate sqlite database for healthdevice module

August 2, 2022 · One min read

Joachim Kraftmayer

Managing Director at Clyso

if you had to recreate the device_health or .mgr pool, the healthdevice module is missing his sqlite3 database structure. You have recreate the structure manually.

crash events

backtrace": &#91;
    "  File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 373, in serve\n    self.scrape_all()",
    "  File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 425, in scrape_all\n    self.put_device_metrics(device, data)",
    "  File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 500, in put_device_metrics\n    self._create_device(devid)",
    "  File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 487, in _create_device\n    cursor = self.db.execute(SQL, (devid,))",
    "sqlite3.InternalError: unknown operation"

apt install libsqlite3-mod-ceph libsqlite3-mod-ceph-dev

create database

clyso@compute-21:~$ sqlite3 -cmd '.load libcephsqlite.so' -cmd '.open file:///.mgr:devicehealth/main.db?vfs=ceph'
main: "" r/w
SQLite version 3.39.1 2022-07-13 19:41:41
Enter ".help" for usage hints.
sqlite>

list databases

clyso@compute-21:~$ sqlite3 -cmd '.load libcephsqlite.so' -cmd '.databases'
main: "" r/w
SQLite version 3.39.1 2022-07-13 19:41:41
Enter ".help" for usage hints.
sqlite>

create table

clyso@compute-21:~$ sqlite3 -cmd '.load libcephsqlite.so' -cmd '.open file:///.mgr:devicehealth/main.db?vfs=ceph'
SQLite version 3.39.1 2022-07-13 19:41:41
Enter ".help" for usage hints.
sqlite> CREATE TABLE IF NOT EXISTS MgrModuleKV (
key TEXT PRIMARY KEY,
value NOT NULL
) WITHOUT ROWID;
sqlite> INSERT OR IGNORE INTO MgrModuleKV (key, value) VALUES ('__version', 0);
sqlite> .tables
Device               DeviceHealthMetrics  MgrModuleKV
sqlite>

sources

https://ceph.io/en/news/blog/2021/new-in-pacific-sql-on-ceph https://docs.ceph.com/en/latest/rados/api/libcephsqlite/ https://docs.ceph.com/en/latest/rados/api/libcephsqlite/#usage https://github.com/ceph/ceph/blob/main/src/pybind/mgr https://github.com/ceph/ceph/blob/main/src/pybind/mgr/devicehealth/module.py

Ceph S3 load and performance test

August 1, 2022 · 2 min read

Joachim Kraftmayer

Managing Director at Clyso

motivation

we have tested ceph s3 in openstack swift intensively before. We were interested in the behavior of the radosgw stack in ceph. We paid particular attention to the size and number of objects in relation to the resource consumption of the radosgw process. Effects on response latencies of radosgw were also important to us. To be able to plan the right sizing of the physical and virtual environments.

technical topics

From a technical point of view, we were interested in the behavior of radosgw in the following topics.

dynamic bucket sharding
http frontend difference between Civetweb and Beast
index pool io pattern and latencies
data pool io pattern and latencies with erasure-coded and replicated pools
fast_read vs. standard read for workloads with large and small objects.

requirements

when choosing the right tool, it was important for us to be able to test both small and large ceph clusters with several thousand osds.

We want to use the test results as a file for evaluation as well as have a graphical representation as timeseries data.

For timeseries data we rely on the standard stack with Grafana, Prometheus and Thanos.

the main prometheus exporters we use are ceph-mgr-exporter and node-exporter.

load and performance tools

CBT - The Ceph Benchmarking Tool

CBT is a testing harness written in python

https://github.com/ceph/cbt

s3 - tests

This is a set of unofficial Amazon AWS S3 compatibility tests

https://github.com/ceph/s3-tests

COSBench - Cloud Object Storage Benchmark

COSBench is a benchmarking tool to measure the performance of Cloud Object Storage services.

https://github.com/intel-cloud/cosbench

Gosbench

Gosbench is the Golang reimplementation of Cosbench. It is a distributed S3 performance benchmark tool with Prometheus exporter leveraging the official Golang AWS SDK

https://github.com/mulbc/gosbench

hsbench

hsbench is an S3 compatable benchmark originally based on wasabi-tech/s3-benchmark.

https://github.com/markhpc/hsbench

Warp

Minio - S3 benchmarking tool.

https://github.com/minio/warp

the tool of our choice

getput

getput can be run individually on a test client.

gpsuite is responsible for synchronization and scaling across any number of test clients. Communication takes place via ssh keys and the simultaneous start of all s3 test clients is synchronized over a common time base.

Installation on linux as script or as container is supported.

https://github.com/markseger/getput

ceph osd set-group

July 20, 2022 · One min read

Joachim Kraftmayer

Managing Director at Clyso

If you don't want to set flags for the whole cluster, like noout or noup. Then you can also use ceph osd set-group and ceph osd unset-group to set the appropriate flag for a group of osds or even whole hosts.

ceph osd set-group <flags> <who>
ceph osd unset-group <flags> <who>

for example set noout for a whole host with osds

ceph osd set-group noout clyso-ceph-node3
``

```bash
root@clyso-ceph-node1:~# ceph health detail
HEALTH_WARN 1 OSDs or CRUSH {nodes, device-classes} have {NOUP,NODOWN,NOIN,NOOUT} flags set
[WRN] OSD_FLAGS: 1 OSDs or CRUSH {nodes, device-classes} have {NOUP,NODOWN,NOIN,NOOUT} flags set
    host clyso-ceph-node3 has flags noout

ceph osd unset-group noout clyso-ceph-node3

root@clyso-ceph-node1:~# ceph health detail
HEALTH_OK
root@clyso-ceph-node1:

Sources:

docs.ceph.com/en/quincy/rados/operations/health-checks/#osd-flags

ceph unlock/enable a locked dashboard user

July 15, 2022 · One min read

Joachim Kraftmayer

Managing Director at Clyso

unlock dashboard user

Unlock a ceph dashboard user via commandline.

ceph dashboard ac-user-enable <username>

example with admin user

ceph dashboard ac-user-enable admin

sources

https://docs.ceph.com/en/quincy/mgr/dashboard/#enable-a-locked-user

ceph mds increase memory without downtime

July 13, 2022 · One min read

Joachim Kraftmayer

Managing Director at Clyso

get config (default: 4G)

ceph daemon mds.&lt;mds-id&gt; config get mds_cache_memory_limit

ceph daemon /var/run/ceph/<fsid>/<mds-id> config get mds_cache_memory_limit

ceph tell mds.storefs-a config show |grep mds_cache_memory_limit

set config on the fly not persistent (to 64 GB)

ceph daemon mds.<mds.id> config set mds_cache_memory_limit 68719476736

ceph daemon /var/run/ceph/<fsid>/<mds-id> set mds_cache_memory_limit 68719476736

ceph tell mds.storefs-a injectargs --mds_cache_memory_limit 68719476736

persist config ( to 64 GB)

ceph config set mds mds_cache_memory_limit 68719476736

related posts​

Sources​

crash events​

create database​

list databases​

create table​

sources​

motivation​

technical topics​​

requirements​

load and performance tools​​

CBT - The Ceph Benchmarking Tool​

s3 - tests​

COSBench - Cloud Object Storage Benchmark​

Gosbench​

hsbench​

Warp​

the tool of our choice​​

getput​

unlock dashboard user​

sources​

get config (default: 4G)​

set config on the fly not persistent (to 64 GB)​

persist config ( to 64 GB)​

related posts

Sources

crash events

create database

list databases

create table

sources

motivation

technical topics

requirements

load and performance tools

CBT - The Ceph Benchmarking Tool

s3 - tests

COSBench - Cloud Object Storage Benchmark

Gosbench

hsbench

Warp

the tool of our choice

getput

unlock dashboard user

sources

get config (default: 4G)

set config on the fly not persistent (to 64 GB)

persist config ( to 64 GB)