Skip to main content

48 posts tagged with "operation"

View All Tags

· One min read
Joachim Kraftmayer
  • osd max backfills: This is the maximum number of backfill operations allowed to/from OSD. The higher the number, the quicker the recovery, which might impact overall cluster performance until recovery finishes.
  • osd recovery max active: This is the maximum number of active recover requests. Higher the number, quicker the recovery, which might impact the overall cluster performance until recovery finishes.
  • osd recovery op priority: This is the priority set for recovery operation. Lower the number, higher the recovery priority. Higher recovery priority might cause performance degradation until recovery completes.
ceph tell 'osd.*' injectargs --osd-max-backfills=2 --osd-recovery-max-active=2

Recommendation

Start in small steps, observe the Ceph status, client IOPs and throughput and then continue to increase in small steps.

In the producton with regard to the applications and hardware infrastructure, we recommend setting these settings back to default as soon as possible.

Sources

https://www.suse.com/support/kb/doc/?id=000019693

· One min read
Joachim Kraftmayer

supported by kernel version 4.13

ceph features - wrong display

ceph tries to determine the ceph client version based on the feature flags. However, the kernel ceph client is not the same codestream.

So the output is not always correct.

· One min read
Joachim Kraftmayer

When configuring osd in mixed setup with db and wal colocated on a flash device, ssd or NVMe. There were always changes and irritations where the DB and the WAL are really located. With a simple test it can be checked: The location of the DB for the respective OSD can be verified via ceph osd metadata osd.<id> and the variable "bluefs_dedicated_db": "1".

The WAL was created separately in earlier Ceph versions and automatically on the same device as the DB in later Ceph versions. The WAL can be easily tested by using the ceph osd.<id> tell bench command.

First you check larger write operations with the command:

ceph tell osd.0 bench 65536 409600

Second, you check with smaller objects that are smaller than the bluestore_prefer_deferred_size_hdd (64k).

ceph tell osd.0 bench 65536 4096

If you compare the IOPs of the two tests, one result should correspond to the IOPs of an SSD and the other result should be quite low for the HDD. From this you can know if the WAL is on the HDD or the flash device.

· One min read
Joachim Kraftmayer

Bluestore/RocksDB will only put the next level up size of DB on flash if the whole size will fit. These sizes are roughly 3GB,30GB,300GB. Anything in-between those sizes are pointless. Only ~3GB of SSD will ever be used out of a 28GB partition. Likewise a 240GB partition is also pointless as only ~30GB will be used.

How do I find the right SSD/NVMe partition size for hot DB.

https://github.com/facebook/rocksdb/wiki/Leveled-Compaction

· One min read
Joachim Kraftmayer

When commissioning a cluster, it is always advisable to log and evaluate the ceph osd bench results.

The values can also be helpful for performance analysis in a productive Ceph cluster.

ceph tell osd.<int|*> bench {<int>} {<int>} {<int>}

OSD benchmark: write <count> <size> -byte objects, (default 1G size 4MB)

osd_bench_max_block_size=65536 kB

Example:

1G size 4MB (default)

ceph tell osd.* bench

1G size 64MB

ceph tell osd.* bench 1073741824 67108864

· One min read
Joachim Kraftmayer
for date in \`ceph pg dump | grep active | awk '{print $20}'\`; do date +%A -d $date; done | sort | uniq -c

19088 Monday
1752 Saturday
54296 Sunday
for date in \`ceph pg dump | grep active | awk '{print $21}'\`; do date +%H -d $date; done | sort | uniq -c

dumped all
3399 00
3607 01
2449 02
2602 03
6145 04
4907 05
4986 06
3777 07
2421 08
2429 09
2478 10
2546 11
2523 12
2614 13
2661 14
2722 15
2669 16
2649 17
2656 18
2751 19
2780 20
2893 21
3157 22
3315 23

· One min read
Joachim Kraftmayer

List of users:

radosgw-admin metadata list user

List of buckets:

radosgw-admin metadata list bucket

List of bucket instances:

radosgw-admin metadata list user.instance

All necessary information

  • user-id = Output from the list of users
  • bucket-id = Output from the list of bucket instances
  • bucket-name = Output from the list of buckets or bucket instances
  • Change of user for this bucket instance:
radosgw-admin bucket link --bucket <bucket-name> --bucket-id <default-uuid>.267207.1 --uid=<user-uid>

Example:

radosgw-admin bucket link --bucket test-clyso-test --bucket-id aa81cf7e-38c5-4200-b26b-86e900207813.267207.1 --uid=c19f62adbc7149ad9d19-8acda2dcf3c0

If you compare the buckets before and after the change, the following values are changed:

  • ver: is increased
  • mtime: will be updated
  • owner: is set to the new uid
  • key: user.rgw.acl: The rights are reset for the user.rgw.acl key