12 posts tagged with "tooling"

Ceph S3 load and performance test

August 1, 2022 · 2 min read

Managing Director at Clyso

motivation

we have tested ceph s3 in openstack swift intensively before. We were interested in the behavior of the radosgw stack in ceph. We paid particular attention to the size and number of objects in relation to the resource consumption of the radosgw process. Effects on response latencies of radosgw were also important to us. To be able to plan the right sizing of the physical and virtual environments.

technical topics

From a technical point of view, we were interested in the behavior of radosgw in the following topics.

dynamic bucket sharding
http frontend difference between Civetweb and Beast
index pool io pattern and latencies
data pool io pattern and latencies with erasure-coded and replicated pools
fast_read vs. standard read for workloads with large and small objects.

requirements

when choosing the right tool, it was important for us to be able to test both small and large ceph clusters with several thousand osds.

We want to use the test results as a file for evaluation as well as have a graphical representation as timeseries data.

For timeseries data we rely on the standard stack with Grafana, Prometheus and Thanos.

the main prometheus exporters we use are ceph-mgr-exporter and node-exporter.

load and performance tools

CBT - The Ceph Benchmarking Tool

CBT is a testing harness written in python

https://github.com/ceph/cbt

s3 - tests

This is a set of unofficial Amazon AWS S3 compatibility tests

https://github.com/ceph/s3-tests

COSBench - Cloud Object Storage Benchmark

COSBench is a benchmarking tool to measure the performance of Cloud Object Storage services.

https://github.com/intel-cloud/cosbench

Gosbench

Gosbench is the Golang reimplementation of Cosbench. It is a distributed S3 performance benchmark tool with Prometheus exporter leveraging the official Golang AWS SDK

https://github.com/mulbc/gosbench

hsbench

hsbench is an S3 compatable benchmark originally based on wasabi-tech/s3-benchmark.

https://github.com/markhpc/hsbench

Warp

Minio - S3 benchmarking tool.

https://github.com/minio/warp

the tool of our choice

getput

getput can be run individually on a test client.

gpsuite is responsible for synchronization and scaling across any number of test clients. Communication takes place via ssh keys and the simultaneous start of all s3 test clients is synchronized over a common time base.

Installation on linux as script or as container is supported.

https://github.com/markseger/getput

ceph osd set-group

July 20, 2022 · One min read

Joachim Kraftmayer

Managing Director at Clyso

If you don't want to set flags for the whole cluster, like noout or noup. Then you can also use ceph osd set-group and ceph osd unset-group to set the appropriate flag for a group of osds or even whole hosts.

ceph osd set-group <flags> <who>
ceph osd unset-group <flags> <who>

for example set noout for a whole host with osds

ceph osd set-group noout clyso-ceph-node3
``

```bash
root@clyso-ceph-node1:~# ceph health detail
HEALTH_WARN 1 OSDs or CRUSH {nodes, device-classes} have {NOUP,NODOWN,NOIN,NOOUT} flags set
[WRN] OSD_FLAGS: 1 OSDs or CRUSH {nodes, device-classes} have {NOUP,NODOWN,NOIN,NOOUT} flags set
    host clyso-ceph-node3 has flags noout

ceph osd unset-group noout clyso-ceph-node3

root@clyso-ceph-node1:~# ceph health detail
HEALTH_OK
root@clyso-ceph-node1:

Sources:

docs.ceph.com/en/quincy/rados/operations/health-checks/#osd-flags

ceph-volume - create WAL/DB on separate device for existing OSD

May 13, 2022 · 2 min read

Joachim Kraftmayer

Managing Director at Clyso

ceph-volume can be used to create for a existing OSD a new WAL/DB on a faster device without the need to recreate the OSD.

ceph-volume lvm new-db --osd-id 15 --osd-fsid FSID --target cephdb/cephdb1
--> NameError: name 'get_first_lv' is not defined
this is a bug in ceph-volume v16.2.7 that will be fixed in v16.2.8 
[https://github.com/ceph/ceph/pull/44209](https://github.com/ceph/ceph/pull/44209)

First create a new Logical Volume on the Device that will hold the new WAL/DB

vgcreate cephdb /dev/sdb
Volume group "cephdb" successfully created
lvcreate -L 100G -n cephdb1 cephdb
Logical volume "cephdb1" created.

Now stop running OSD and if it was deactivated ( cephadm ) then activate it on the host

systemctl stop ceph-FSID@osd.0.service
ceph-volume lvm activate --all --no-systemd

Create new WAL/DB on new Device

ceph-volume lvm new-db --osd-id 0 --osd-fsid OSD-FSID --target cephdb/cephdb1
--> Making new volume at /dev/cephdb/cephdb1 for OSD: 0 (/var/lib/ceph/osd/ceph-0)
Running command: /bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-0/block.db
Running command: /bin/chown -R ceph:ceph /dev/dm-1
--> New volume attached.

Migrate existing WAL/DB to new Device

ceph-volume lvm migrate --osd-id 0 --osd-fsid OSD-FSID --from data --target cephdb/cephdb1
--> Migrate to existing, Source: ['--devs-source', '/var/lib/ceph/osd/ceph-0/block'] Target: /var/lib/ceph/osd/ceph-0/block.db
--> Migration successful.

Deactivate OSD and start it

ceph-volume lvm deactivate 0
Running command: /bin/umount -v /var/lib/ceph/osd/ceph-0
 stderr: umount: /var/lib/ceph/osd/ceph-0 unmounted
systemctl start ceph-FSID@osd.0.service

Kubernetes resize pvc – cephfs

February 2, 2021 · One min read

Joachim Kraftmayer

Managing Director at Clyso

the mysql database was running out of space and we have to increase the pvc size for it.

nothing simple than that. just verified if we have enough free space left.

edited the pvc defintion of the mysql statefulset and set the spec storage size to 20 Gi.

a few seconds later the mysql database space was doubled.

**ceph version 15.2.8

**ceph-csi version: 3.2.1

kubectl edit pvc data-mysql-0 -n mysql

Windows drivers for RBD and maybe soon for cephfs

December 1, 2020 · One min read

Joachim Kraftmayer

Managing Director at Clyso

presentation from 2017 was presented again at SUSECON Digital 2020.

www.youtube.com/watch?v=BWZIwXLcNts

download

beta.suse.com/private/SLE15/SP2/download/SES7/SES4Win/?_ga=2.220676824.150232264.1610460395-709272379.1610460395

sources

suse.com/betaprogram/suse-enterprise-storage-windows-driver-beta/

Search for certificates for a specific domain

July 30, 2020 · One min read

Joachim Kraftmayer

Managing Director at Clyso

Rather by chance, I came across the following website with the service for searching for certificates with the history for a domain.

If you are wondering how many and which certificates you have used for a domain, you can use this link.

crt.sh

ceph bug of the year 2020 - CERN

June 29, 2020 · One min read

Joachim Kraftmayer

Managing Director at Clyso

interesting insights into how dependency on external operating system libraries can affect the operation of Ceph.

https://www.youtube.com/watch?v=_4HUR00oCGo

https://codimd.web.cern.ch/p/rkNZH4JR8?print-pdf#/

speed up or slow down ceph recovery

June 12, 2020 · One min read

Joachim Kraftmayer

Managing Director at Clyso

osd max backfills: This is the maximum number of backfill operations allowed to/from OSD. The higher the number, the quicker the recovery, which might impact overall cluster performance until recovery finishes.
osd recovery max active: This is the maximum number of active recover requests. Higher the number, quicker the recovery, which might impact the overall cluster performance until recovery finishes.
osd recovery op priority: This is the priority set for recovery operation. Lower the number, higher the recovery priority. Higher recovery priority might cause performance degradation until recovery completes.

ceph tell 'osd.*' injectargs --osd-max-backfills=2 --osd-recovery-max-active=2

Recommendation

Start in small steps, observe the Ceph status, client IOPs and throughput and then continue to increase in small steps.

In the producton with regard to the applications and hardware infrastructure, we recommend setting these settings back to default as soon as possible.

Sources

https://www.suse.com/support/kb/doc/?id=000019693

RocksDB - Leveled Compaction

February 24, 2019 · One min read

Joachim Kraftmayer

Managing Director at Clyso

Bluestore/RocksDB will only put the next level up size of DB on flash if the whole size will fit. These sizes are roughly 3GB,30GB,300GB. Anything in-between those sizes are pointless. Only ~3GB of SSD will ever be used out of a 28GB partition. Likewise a 240GB partition is also pointless as only ~30GB will be used.

How do I find the right SSD/NVMe partition size for hot DB.

https://github.com/facebook/rocksdb/wiki/Leveled-Compaction

ceph tell osd.* bench

October 12, 2018 · One min read

Joachim Kraftmayer

Managing Director at Clyso

When commissioning a cluster, it is always advisable to log and evaluate the ceph osd bench results.

The values can also be helpful for performance analysis in a productive Ceph cluster.

ceph tell osd.<int|*> bench {<int>} {<int>} {<int>}

OSD benchmark: write <count> <size> -byte objects, (default 1G size 4MB)

osd_bench_max_block_size=65536 kB

Example:

1G size 4MB (default)

ceph tell osd.* bench

1G size 64MB

ceph tell osd.* bench 1073741824 67108864

motivation​

technical topics​​

requirements​

load and performance tools​​

CBT - The Ceph Benchmarking Tool​

s3 - tests​

COSBench - Cloud Object Storage Benchmark​

Gosbench​

hsbench​

Warp​

the tool of our choice​​

getput​

download​

sources​

Recommendation​

Sources​

motivation

technical topics

requirements

load and performance tools

CBT - The Ceph Benchmarking Tool

s3 - tests

COSBench - Cloud Object Storage Benchmark

Gosbench

hsbench

Warp

the tool of our choice

getput

download

sources

Recommendation

Sources