interesting insights into how dependency on external operating system libraries can affect the operation of Ceph.
48 posts tagged with "operation"
View All Tagsspeed up or slow down ceph recovery
- osd max backfills: This is the maximum number of backfill operations allowed to/from OSD. The higher the number, the quicker the recovery, which might impact overall cluster performance until recovery finishes.
- osd recovery max active: This is the maximum number of active recover requests. Higher the number, quicker the recovery, which might impact the overall cluster performance until recovery finishes.
- osd recovery op priority: This is the priority set for recovery operation. Lower the number, higher the recovery priority. Higher recovery priority might cause performance degradation until recovery completes.
ceph tell 'osd.*' injectargs --osd-max-backfills=2 --osd-recovery-max-active=2
Recommendation
Start in small steps, observe the Ceph status, client IOPs and throughput and then continue to increase in small steps.
In the producton with regard to the applications and hardware infrastructure, we recommend setting these settings back to default as soon as possible.
Sources
Ceph Client Sessions on the Ceph Monitor (ceph-mon)
ceph daemon mon.clyso-mon1 sessions
If you are looking for the IP addresses for the output of ceph features.
ceph balancer up-map
supported by kernel version 4.13
ceph features - wrong display
ceph tries to determine the ceph client version based on the feature flags. However, the kernel ceph client is not the same codestream.
So the output is not always correct.
verify ceph osd DB and WAL setup
When configuring osd in mixed setup with db and wal colocated on a flash device, ssd or NVMe. There were always changes and irritations where the DB and the WAL are really located.
With a simple test it can be checked:
The location of the DB for the respective OSD can be verified via
ceph osd metadata osd.<id>
and the variable "bluefs_dedicated_db": "1"
.
The WAL was created separately in earlier Ceph versions and automatically on the same device as the DB in later Ceph versions.
The WAL can be easily tested by using the ceph osd.<id> tell bench
command.
First you check larger write operations with the command:
ceph tell osd.0 bench 65536 409600
Second, you check with smaller objects that are smaller than the bluestore_prefer_deferred_size_hdd (64k).
ceph tell osd.0 bench 65536 4096
If you compare the IOPs of the two tests, one result should correspond to the IOPs of an SSD and the other result should be quite low for the HDD. From this you can know if the WAL is on the HDD or the flash device.
RocksDB - Leveled Compaction
Bluestore/RocksDB will only put the next level up size of DB on flash if the whole size will fit. These sizes are roughly 3GB,30GB,300GB. Anything in-between those sizes are pointless. Only ~3GB of SSD will ever be used out of a 28GB partition. Likewise a 240GB partition is also pointless as only ~30GB will be used.
How do I find the right SSD/NVMe partition size for hot DB.
ceph tell osd.* bench
When commissioning a cluster, it is always advisable to log and evaluate the ceph osd bench results.
The values can also be helpful for performance analysis in a productive Ceph cluster.
ceph tell osd.<int|*> bench {<int>} {<int>} {<int>}
OSD benchmark: write <count> <size> -byte objects
, (default 1G size 4MB)
osd_bench_max_block_size=65536 kB
Example:
1G size 4MB (default)
ceph tell osd.* bench
1G size 64MB
ceph tell osd.* bench 1073741824 67108864
ceph deep-scrub monitoring and distribution
for date in \`ceph pg dump | grep active | awk '{print $20}'\`; do date +%A -d $date; done | sort | uniq -c
19088 Monday
1752 Saturday
54296 Sunday
for date in \`ceph pg dump | grep active | awk '{print $21}'\`; do date +%H -d $date; done | sort | uniq -c
dumped all
3399 00
3607 01
2449 02
2602 03
6145 04
4907 05
4986 06
3777 07
2421 08
2429 09
2478 10
2546 11
2523 12
2614 13
2661 14
2722 15
2669 16
2649 17
2656 18
2751 19
2780 20
2893 21
3157 22
3315 23
multisite environment - ceph bucket index dynamic resharding
Dynamic resharding is not supported in multisite environment. It is disabled by default since Ceph 12.2.2, but we recommend you to double check the setting.
Sources
Assign RadosGW Bucket to another user
List of users:
radosgw-admin metadata list user
List of buckets:
radosgw-admin metadata list bucket
List of bucket instances:
radosgw-admin metadata list user.instance
All necessary information
- user-id = Output from the list of users
- bucket-id = Output from the list of bucket instances
- bucket-name = Output from the list of buckets or bucket instances
- Change of user for this bucket instance:
radosgw-admin bucket link --bucket <bucket-name> --bucket-id <default-uuid>.267207.1 --uid=<user-uid>
Example:
radosgw-admin bucket link --bucket test-clyso-test --bucket-id aa81cf7e-38c5-4200-b26b-86e900207813.267207.1 --uid=c19f62adbc7149ad9d19-8acda2dcf3c0
If you compare the buckets before and after the change, the following values are changed:
- ver: is increased
- mtime: will be updated
- owner: is set to the new uid
- key: user.rgw.acl: The rights are reset for the user.rgw.acl key