Skip to main content

rook ceph validate that the RBD cache is active inside the k8s pod

· 2 min read
Joachim Kraftmayer
Managing Director at Clyso

validate if the RBD Cache is active on your client

By default the cache is enabled, since Version 0.87.

To enable the cache on the client side you have to add following config /etc/ceph/ceph.conf:

[client]
rbd cache = true
rbd cache writethrough until flush = true

add local admin socket

So that you can also verify the status on the client side, you must add the following two parameters:

[client]
admin socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok
log file = /var/log/ceph/

configure permissions and security

Both paths must be writable by the user who uses the RBD library. Applications such as SELinux or AppArmor must be properly configured.

request infos via admin socket

Once this is done, run your application that is supposed to use librbd (kvm, docker, podman, ...) and request the information via the admin daemon socket:

$ sudo ceph --admin-daemon /var/run/ceph/ceph-client.admin.66606.140190886662256.asok config show | grep rbd_cache "rbd_cache": "true", "rbd_cache_writethrough_until_flush": "true", "rbd_cache_size": "33554432", "rbd_cache_max_dirty": "25165824", "rbd_cache_target_dirty": "16777216", "rbd_cache_max_dirty_age": "1", "rbd_cache_max_dirty_object": "0", "rbd_cache_block_writes_upfront": "false",

Verify the cache behaviour

To compare the performance difference you can test the cache, you can deactivate it in the [client] section in your ceph.conf as follows:

[client]
rbd cache = false

Then run a fio benchmark with the following command:

fio --ioengine=rbd --pool=<pool-name> --rbdname=rbd1 --direct=1 --fsync=1 --rw=write --bs=4k --numjobs=1 --iodepth=1 --runtime=60 --time_base

Finally, run this test with RBD client cache enabled and disabled and you should notice a significant difference.

Sources

https://www.sebastien-han.fr/blog/2015/09/02/ceph-validate-that-the-rbd-cache-is-active/

ceph-mgr recreate sqlite database for healthdevice module

· One min read
Joachim Kraftmayer
Managing Director at Clyso

if you had to recreate the device_health or .mgr pool, the healthdevice module is missing his sqlite3 database structure. You have recreate the structure manually.

crash events

backtrace": &#91;
" File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 373, in serve\n self.scrape_all()",
" File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 425, in scrape_all\n self.put_device_metrics(device, data)",
" File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 500, in put_device_metrics\n self._create_device(devid)",
" File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 487, in _create_device\n cursor = self.db.execute(SQL, (devid,))",
"sqlite3.InternalError: unknown operation"
apt install libsqlite3-mod-ceph libsqlite3-mod-ceph-dev

create database

clyso@compute-21:~$ sqlite3 -cmd '.load libcephsqlite.so' -cmd '.open file:///.mgr:devicehealth/main.db?vfs=ceph'
main: "" r/w
SQLite version 3.39.1 2022-07-13 19:41:41
Enter ".help" for usage hints.
sqlite>

list databases

clyso@compute-21:~$ sqlite3 -cmd '.load libcephsqlite.so' -cmd '.databases'
main: "" r/w
SQLite version 3.39.1 2022-07-13 19:41:41
Enter ".help" for usage hints.
sqlite>

create table

clyso@compute-21:~$ sqlite3 -cmd '.load libcephsqlite.so' -cmd '.open file:///.mgr:devicehealth/main.db?vfs=ceph'
SQLite version 3.39.1 2022-07-13 19:41:41
Enter ".help" for usage hints.
sqlite> CREATE TABLE IF NOT EXISTS MgrModuleKV (
key TEXT PRIMARY KEY,
value NOT NULL
) WITHOUT ROWID;
sqlite> INSERT OR IGNORE INTO MgrModuleKV (key, value) VALUES ('__version', 0);
sqlite> .tables
Device DeviceHealthMetrics MgrModuleKV
sqlite>

sources

https://ceph.io/en/news/blog/2021/new-in-pacific-sql-on-ceph https://docs.ceph.com/en/latest/rados/api/libcephsqlite/ https://docs.ceph.com/en/latest/rados/api/libcephsqlite/#usage https://github.com/ceph/ceph/blob/main/src/pybind/mgr https://github.com/ceph/ceph/blob/main/src/pybind/mgr/devicehealth/module.py

Ceph S3 load and performance test

· 2 min read
Joachim Kraftmayer
Managing Director at Clyso

motivation

we have tested ceph s3 in openstack swift intensively before. We were interested in the behavior of the radosgw stack in ceph. We paid particular attention to the size and number of objects in relation to the resource consumption of the radosgw process. Effects on response latencies of radosgw were also important to us. To be able to plan the right sizing of the physical and virtual environments.

technical topics​

From a technical point of view, we were interested in the behavior of radosgw in the following topics.

  • dynamic bucket sharding
  • http frontend difference between Civetweb and Beast
  • index pool io pattern and latencies
  • data pool io pattern and latencies with erasure-coded and replicated pools
  • fast_read vs. standard read for workloads with large and small objects.

requirements

when choosing the right tool, it was important for us to be able to test both small and large ceph clusters with several thousand osds.

We want to use the test results as a file for evaluation as well as have a graphical representation as timeseries data.

For timeseries data we rely on the standard stack with Grafana, Prometheus and Thanos.

the main prometheus exporters we use are ceph-mgr-exporter and node-exporter.

load and performance tools​

CBT - The Ceph Benchmarking Tool

CBT is a testing harness written in python

https://github.com/ceph/cbt

s3 - tests

This is a set of unofficial Amazon AWS S3 compatibility tests

https://github.com/ceph/s3-tests

COSBench - Cloud Object Storage Benchmark

COSBench is a benchmarking tool to measure the performance of Cloud Object Storage services.

https://github.com/intel-cloud/cosbench

Gosbench

Gosbench is the Golang reimplementation of Cosbench. It is a distributed S3 performance benchmark tool with Prometheus exporter leveraging the official Golang AWS SDK

https://github.com/mulbc/gosbench

hsbench

hsbench is an S3 compatable benchmark originally based on wasabi-tech/s3-benchmark.

https://github.com/markhpc/hsbench

Warp

Minio - S3 benchmarking tool.

https://github.com/minio/warp

the tool of our choice​

getput

getput can be run individually on a test client.

gpsuite is responsible for synchronization and scaling across any number of test clients. Communication takes place via ssh keys and the simultaneous start of all s3 test clients is synchronized over a common time base.

Installation on linux as script or as container is supported.

https://github.com/markseger/getput

ceph osd set-group

· One min read
Joachim Kraftmayer
Managing Director at Clyso

If you don't want to set flags for the whole cluster, like noout or noup. Then you can also use ceph osd set-group and ceph osd unset-group to set the appropriate flag for a group of osds or even whole hosts.

ceph osd set-group <flags> <who>
ceph osd unset-group <flags> <who>

for example set noout for a whole host with osds

ceph osd set-group noout clyso-ceph-node3
``

```bash
root@clyso-ceph-node1:~# ceph health detail
HEALTH_WARN 1 OSDs or CRUSH {nodes, device-classes} have {NOUP,NODOWN,NOIN,NOOUT} flags set
[WRN] OSD_FLAGS: 1 OSDs or CRUSH {nodes, device-classes} have {NOUP,NODOWN,NOIN,NOOUT} flags set
host clyso-ceph-node3 has flags noout
ceph osd unset-group noout clyso-ceph-node3
root@clyso-ceph-node1:~# ceph health detail
HEALTH_OK
root@clyso-ceph-node1:

Sources:

docs.ceph.com/en/quincy/rados/operations/health-checks/#osd-flags

ceph mds increase memory without downtime

· One min read
Joachim Kraftmayer
Managing Director at Clyso

get config (default: 4G)

ceph daemon mds.&lt;mds-id&gt; config get mds_cache_memory_limit
ceph daemon /var/run/ceph/<fsid>/<mds-id> config get mds_cache_memory_limit
ceph tell mds.storefs-a config show |grep mds_cache_memory_limit

set config on the fly not persistent (to 64 GB)

ceph daemon mds.<mds.id> config set mds_cache_memory_limit 68719476736
ceph daemon /var/run/ceph/<fsid>/<mds-id> set mds_cache_memory_limit 68719476736
ceph tell mds.storefs-a injectargs --mds_cache_memory_limit 68719476736

persist config ( to 64 GB)

ceph config set mds mds_cache_memory_limit 68719476736

ceph iscsi - get the global gateway.conf

· One min read
Joachim Kraftmayer
Managing Director at Clyso

We have two options to get the gateway.conf:

gwcli

gwcli export mode=copy

or

rados

rados -p iscsi get gateway.conf /root/gateway.conf

At the moment there is no way to update or write the gateway.conf via gwcli command. So the only option is to use the rados command line tool.

note

Be careful when editing the content manually, it requires care and expertise.

sources

docs.ceph.com/en/latest/man/8/rados

manpages.ubuntu.com/manpages/jammy/man8/gwcli.8.html

ceph warning - HEALTH_WARN pools have many more objects per pg than average

· 2 min read
Joachim Kraftmayer
Managing Director at Clyso

You might have wondered how to get rid of the warning "ceph warning - pools have many more objects per pg than average", because you want to see your cluster in HEALTH_OK status. The option to change the thresholds for this warning is: mon_pg_warn_max_object_skew

Especially for start in production of a ceph cluster or a new pool you can set the threshold high. After a certain time you should always check the value and adjust it if necessary.

An important note for the option is that it must be set on ceph mgr, often you can find posts that set the option on ceph mon and see no effect on ceph status.

The cluster status commands 

ceph status

or

ceph health detail

shows the following warning:

[WRN] MANY_OBJECTS_PER_PG: 1 pools have many more objects per pg than average pool test objects per pg (2079) is more than 11.6798 times cluster average (178)

note

To disable the warning completely the value of mon_pg_warn_max_object_skew must be set to 0 or a negative number.

Verify the default value

ceph config get mgr mon_pg_warn_max_object_skew10.000000

inject the value:

ceph tell mgr.a injectargs '--mon_pg_warn_max_object_skew 50'

verify the value:

ceph tell mgr.a config get mon_pg_warn_max_object_skew
{
"mon_pg_warn_max_object_skew": "50.000000"
}

set the value persistent, e.g. 50 times higher

ceph config set mgr mon_pg_warn_max_object_skew 50
ceph config get mgr mon_pg_warn_max_object_skew
50.000000

cloudland conference 2022

· One min read
Joachim Kraftmayer
Managing Director at Clyso

We were speakers at the first edition of Cloudland

Cloudland is the festival of the German-speaking Cloud Native Community (DCNC), with the aim of communicating the current status quo in the use of cloud technologies and focusing in particular on future challenges.

Our contribution on Multi Cloud Deployment met with great interest at the "Container & Cloud Technologies" theme day. cloudland

cloudland

cloudland

cloudland

ceph-volume - create WAL/DB on separate device for existing OSD

· 2 min read
Joachim Kraftmayer
Managing Director at Clyso

ceph-volume can be used to create for a existing OSD a new WAL/DB on a faster device without the need to recreate the OSD.

ceph-volume lvm new-db --osd-id 15 --osd-fsid FSID --target cephdb/cephdb1
--> NameError: name 'get_first_lv' is not defined
this is a bug in ceph-volume v16.2.7 that will be fixed in v16.2.8
[https://github.com/ceph/ceph/pull/44209](https://github.com/ceph/ceph/pull/44209)

First create a new Logical Volume on the Device that will hold the new WAL/DB

vgcreate cephdb /dev/sdb
Volume group "cephdb" successfully created
lvcreate -L 100G -n cephdb1 cephdb
Logical volume "cephdb1" created.

Now stop running OSD and if it was deactivated ( cephadm ) then activate it on the host

systemctl stop ceph-FSID@osd.0.service
ceph-volume lvm activate --all --no-systemd

Create new WAL/DB on new Device

ceph-volume lvm new-db --osd-id 0 --osd-fsid OSD-FSID --target cephdb/cephdb1
--> Making new volume at /dev/cephdb/cephdb1 for OSD: 0 (/var/lib/ceph/osd/ceph-0)
Running command: /bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-0/block.db
Running command: /bin/chown -R ceph:ceph /dev/dm-1
--> New volume attached.

Migrate existing WAL/DB to new Device

ceph-volume lvm migrate --osd-id 0 --osd-fsid OSD-FSID --from data --target cephdb/cephdb1
--> Migrate to existing, Source: ['--devs-source', '/var/lib/ceph/osd/ceph-0/block'] Target: /var/lib/ceph/osd/ceph-0/block.db
--> Migration successful.

Deactivate OSD and start it

ceph-volume lvm deactivate 0
Running command: /bin/umount -v /var/lib/ceph/osd/ceph-0
stderr: umount: /var/lib/ceph/osd/ceph-0 unmounted
systemctl start ceph-FSID@osd.0.service