In the presentation by Sage Weil at Fosdem 2019 in Brussels. Update on the status of the Ceph Nautilus release, planned for the end of February 2019.
101 posts tagged with "ceph"
View All Tagsverify ceph osd DB and WAL setup
When configuring osd in mixed setup with db and wal colocated on a flash device, ssd or NVMe. There were always changes and irritations where the DB and the WAL are really located.
With a simple test it can be checked:
The location of the DB for the respective OSD can be verified via
ceph osd metadata osd.<id>
and the variable "bluefs_dedicated_db": "1"
.
The WAL was created separately in earlier Ceph versions and automatically on the same device as the DB in later Ceph versions.
The WAL can be easily tested by using the ceph osd.<id> tell bench
command.
First you check larger write operations with the command:
ceph tell osd.0 bench 65536 409600
Second, you check with smaller objects that are smaller than the bluestore_prefer_deferred_size_hdd (64k).
ceph tell osd.0 bench 65536 4096
If you compare the IOPs of the two tests, one result should correspond to the IOPs of an SSD and the other result should be quite low for the HDD. From this you can know if the WAL is on the HDD or the flash device.
RocksDB - Leveled Compaction
Bluestore/RocksDB will only put the next level up size of DB on flash if the whole size will fit. These sizes are roughly 3GB,30GB,300GB. Anything in-between those sizes are pointless. Only ~3GB of SSD will ever be used out of a 28GB partition. Likewise a 240GB partition is also pointless as only ~30GB will be used.
How do I find the right SSD/NVMe partition size for hot DB.
Openstack Summit 2018 – Sage Weil Vortrag
In the presentation by Sage Weil at the Openstack Summit 2018 in Berlin, he presents the roadmap for the upcoming features in version Nautilus and the following. Strong focus on the trend of multi- and hybrid cloud environments and scaling of Ceph clusters across data center boundaries.
www.slideshare.net/sageweil1/ceph-data-services-in-a-multi-and-hybrid-cloud-world
ceph tell osd.* bench
When commissioning a cluster, it is always advisable to log and evaluate the ceph osd bench results.
The values can also be helpful for performance analysis in a productive Ceph cluster.
ceph tell osd.<int|*> bench {<int>} {<int>} {<int>}
OSD benchmark: write <count> <size> -byte objects
, (default 1G size 4MB)
osd_bench_max_block_size=65536 kB
Example:
1G size 4MB (default)
ceph tell osd.* bench
1G size 64MB
ceph tell osd.* bench 1073741824 67108864
ceph rbd better distribute small write operations
When creating an RBD image, you can pass the stripe unit and the stripe count.
A smaller stripe unit means that smaller write operations are better distributed across the Ceph cluster with its OSDs.
rbd -p benchpool create image-su-64kb --size 102400 --stripe-unit 65536 --stripe-count 16
RBD images are striped over many objects, which are then stored by the Ceph distributed object store (RADOS). As a result, read and write requests for the image are distributed across many nodes in the cluster, generally preventing any single node from becoming a bottleneck when individual images get large or busy.
The striping is controlled by three parameters:
order The size of objects we stripe over is a power of two, specifically 2^[order] bytes. The default is 22, or 4 MB.
stripe_unit Each [stripe_unit] contiguous bytes are stored adjacently in the same object, before we move on to the next object.
stripe_count After we write [stripe_unit] bytes to [stripe_count] objects, we loop back to the initial object and write another stripe, until the object reaches its maximum size (as specified by [order]. At that point, we move on to the next [stripe_count] objects. By default, [stripe_unit] is the same as the object size and [stripe_count] is 1. Specifying a different [stripe_unit] requires that the STRIPINGV2 feature be supported (added in Ceph v0.53) and format 2 images be used.
ceph deep-scrub monitoring and distribution
for date in \`ceph pg dump | grep active | awk '{print $20}'\`; do date +%A -d $date; done | sort | uniq -c
19088 Monday
1752 Saturday
54296 Sunday
for date in \`ceph pg dump | grep active | awk '{print $21}'\`; do date +%H -d $date; done | sort | uniq -c
dumped all
3399 00
3607 01
2449 02
2602 03
6145 04
4907 05
4986 06
3777 07
2421 08
2429 09
2478 10
2546 11
2523 12
2614 13
2661 14
2722 15
2669 16
2649 17
2656 18
2751 19
2780 20
2893 21
3157 22
3315 23
min_compat_client and set-require-min-compat-client
min_compat_client determines the oldest version to which this cluster connects.
*set-require-min-compat-client* the cluster forces this client version
multisite environment - ceph bucket index dynamic resharding
Dynamic resharding is not supported in multisite environment. It is disabled by default since Ceph 12.2.2, but we recommend you to double check the setting.
Sources
Ceph radosgw-admin create S3 access_key and secret
radosgw-admin key create --uid=clyso-user-id --key-type=s3 --gen-access-key --gen-secret
...
"keys": [
{
"user": "clyso-user-id",
"access_key": "VO8C17LBI9Y39FSODOU5",
"secret_key": "zExCLO1bLQJXoY451ZiKpeoePLSQ1khOJG4CcT3N"
}
],
...