multisite environment - ceph bucket index dynamic resharding
Dynamic resharding is not supported in multisite environment. It is disabled by default since Ceph 12.2.2, but we recommend you to double check the setting.
Dynamic resharding is not supported in multisite environment. It is disabled by default since Ceph 12.2.2, but we recommend you to double check the setting.
List of users:
radosgw-admin metadata list user
List of buckets:
radosgw-admin metadata list bucket
List of bucket instances:
radosgw-admin metadata list user.instance
All necessary information
radosgw-admin bucket link --bucket <bucket-name> --bucket-id <default-uuid>.267207.1 --uid=<user-uid>
Example:
radosgw-admin bucket link --bucket test-clyso-test --bucket-id aa81cf7e-38c5-4200-b26b-86e900207813.267207.1 --uid=c19f62adbc7149ad9d19-8acda2dcf3c0
If you compare the buckets before and after the change, the following values are changed:
osd max object size
Description: The maximum size of a RADOS object in bytes.
Type: 32-bit Unsigned Integer
Default: 128MB
Before the Ceph Luminous release, the default value was 100 GB. Now it has been reduced to 128 MB. This means that unpleasant performance problems can be prevented right from the start
github.com/ceph/ceph/pull/15520
docs.ceph.com/docs/master/releases/luminous/
docs.ceph.com/docs/master/rados/configuration/osd-config-ref/
We encountered the first large omap objects in one of our Luminous Ceph clusters in Q3 2018 and worked with a couple of Ceph Core developers on the solution for internal management of RadosGW objects. This included topics such as large omap objects, dynamic resharding, multisite, deleting old object instances in the RadosGW index pool, and many small changes that were included in the Luminous, Mimic, and subsequent versions.
Here is a step by step guide on how to identify large omap objects and buckets and then manually reshard the affected objects.
ceph -s
cluster:
id: 52296cfd-d6c6-3129-bf70-db16f0e4423d
health: HEALTH_WARN
1 large omap object
ceph health detail
HEALTH_WARN 1 large omap objects
1 large objects found in pool 'clyso-test-sin-1.rgw.buckets.index'
Search the cluster log for 'Large omap object found' for more details.
search the ceph.log of the Ceph cluster:
2018-09-26 12:10:38.440682 mon.clyso1-mon1 mon.0 192.168.130.20:6789/0 77104 : cluster [WRN] Health check failed: 1 large omap objects (LARGE_OMAP_OBJECTS)
2018-09-26 12:10:35.037753 osd.1262 osd.1262 192.168.130.31:6836/10060 152 : cluster [WRN] Large omap object found. Object: 28:18428495:::.dir.143112fc-1178-40e1-b209-b859cd2c264c.38511450.376:head Key count: 2928429 Size (bytes): 861141085
2018-09-26 13:00:00.000103 mon.clyso1-mon1 mon.0 192.168.130.20:6789/0 77505 : cluster [WRN] overall HEALTH_WARN 1 large omap objects
From the ceph.log we extract the bucket instance, in this case:
143112fc-1178-40e1-b209-b859cd2c264c.38511450.376 and look for it in the RadosGW metadata
root@salt-master1.clyso.test:~ # radosgw-admin metadata list "bucket.instance" | egrep "143112fc-1178-40e1-b209-b859cd2c264c.38511450.376"
"b1868d6d-9d61-49b0-b101-c89207009b16:143112fc-1178-40e1-b209-b859cd2c264c.38511450.376"
root@salt-master1.clyso.test:~ #
The instance exists and we checked the metadata of the instance.
root@salt-master1.clyso.test:~ # radosgw-admin metadata get bucket.instance:b1868d6d-9d61-49b0-b101-c89207009b16:143112fc-1178-40e1-b209-b859cd2c264c.38511450.376
{
"key": "bucket.instance:b1868d6d-9d61-49b0-b101-c89207009b16:143112fc-1178-40e1-b209-b859cd2c264c.38511450.376",
"ver": {
"tag": "_Ehz5PYLhHBxpsJ_s39lePnX",
"ver": 7
},
"mtime": "2018-04-24 10:02:32.362129Z",
"data": {
"bucket_info": {
"bucket": {
"name": "b1868d6d-9d61-49b0-b101-c89207009b16",
"marker": "143112fc-1178-40e1-b209-b859cd2c264c.38511450.376",
"bucket_id": "143112fc-1178-40e1-b209-b859cd2c264c.38511450.376",
"tenant": "",
"explicit_placement": {
"data_pool": "",
"data_extra_pool": "",
"index_pool": ""
}
},
"creation_time": "2018-02-20 20:58:51.125791Z",
"owner": "d7a84e1aed9144919f8893b7d6fc5b02",
"flags": 0,
"zonegroup": "1c44aba5-fe64-4db3-9ef7-f0eb30bf5d80",
"placement_rule": "default-placement",
"has_instance_obj": "true",
"quota": {
"enabled": true,
"check_on_raw": true,
"max_size": 54975581388800,
"max_size_kb": 53687091200,
"max_objects": -1
},
"num_shards": 0,
"bi_shard_hash_type": 0,
"requester_pays": "false",
"has_website": "false",
"swift_versioning": "false",
"swift_ver_location": "",
"index_type": 0,
"mdsearch_config": [],
"reshard_status": 0,
"new_bucket_instance_id": ""
},
"attrs": [
{
"key": "user.rgw.acl",
"val": "AgK4A.....AAAAAAA="
},
{
"key": "user.rgw.idtag",
"val": ""
},
{
"key": "user.rgw.x-amz-read",
"val": "aW52YWxpZAA="
},
{
"key": "user.rgw.x-amz-write",
"val": "aW52YWxpZAA="
}
]
}
}
root@salt-master1.clyso.test:~ #
root@salt-master1.clyso.test:~ # radosgw-admin metadata get bucket:b1868d6d-9d61-49b0-b101-c89207009b16
{
"key": "bucket:b1868d6d-9d61-49b0-b101-c89207009b16",
"ver": {
"tag": "_WaSWh9mb21kEjHCisSzhWs8",
"ver": 1
},
"mtime": "2018-02-20 20:58:51.152766Z",
"data": {
"bucket": {
"name": "b1868d6d-9d61-49b0-b101-c89207009b16",
"marker": "143112fc-1178-40e1-b209-b859cd2c264c.38511450.376",
"bucket_id": "143112fc-1178-40e1-b209-b859cd2c264c.38511450.376",
"tenant": "",
"explicit_placement": {
"data_pool": "",
"data_extra_pool": "",
"index_pool": ""
}
},
"owner": "d7a84e1aed9144919f8893b7d6fc5b02",
"creation_time": "2018-02-20 20:58:51.125791Z",
"linked": "true",
"has_bucket_info": "false"
}
}
root@salt-master1.clyso.test:~ #
root@salt-master1.clyso.test:~ # rados -p eu-de-200-1.rgw.buckets.index ls | egrep “143112fc-1178-40e1-b209-b859cd2c264c.38511450.376” | wc -l
1
root@salt-master1.clyso.test:~ #
143112fc-1178-40e1-b209-b859cd2c264c.38511450.376
size & size_kb: summary of all objects sizes in the bucket/container = output swift stat <bucket/container> | grep Bytes
size_actual & size_kb_actual: account for compression, encryption (showing the nearest 4k alignment) = output swift stat <bucket/container> | grep X-Container-Bytes-Used-Actual
num_objects: number of objects = output swift stat <bucket/container> | grep Objects
size_utilized & size_kb_utilized: represent the total size of compressed data in byte and kilobytes => we don´t use compression so size = size_utilized
The size does not include the information of the underlying replication of 3 or erasure coding.
ceph-rgw4:~/clyso# radosgw-admin bucket stats --bucket=size-container
{
"bucket": "size-container",
"zonegroup": "226fe09d-0ebf-4f30-a93b-d136f24a04d3",
"placement_rule": "default-placement",
"explicit_placement": {
"data_pool": "",
"data_extra_pool": "",
"index_pool": ""
},
"id": "d667b6f1-5737-4f5e-bad0-fc030f0a4e94.11750341.561",
"marker": "d667b6f1-5737-4f5e-bad0-fc030f0a4e94.11750341.561",
"index_type": "Normal",
"owner": "0fdfa377cd56439ab3e3e65c69787e92",
"ver": "0#7",
"master_ver": "0#0",
"mtime": "2018-09-03 12:37:37.744221",
"max_marker": "0#",
"usage": {
"rgw.main": {
"size": 4149,
"size_actual": 16384,
"size_utilized": 4149,
"size_kb": 5,
"size_kb_actual": 16,
"size_kb_utilized": 5,
"num_objects": 3
}
},
"bucket_quota": {
"enabled": false,
"check_on_raw": true,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
}
}
ceph-rgw4:~/clyso # swift stat size-container
Account: v1
Container: size-container
Objects: 3
Bytes: 4149
Read ACL:
Write ACL:
Sync To:
Sync Key:
Accept-Ranges: bytes
X-Storage-Policy: default-placement
X-Container-Bytes-Used-Actual: 16384
X-Timestamp: 1535967792.05717
X-Trans-Id: tx00000000000000002378a-005b8e218c-b2faf1-eu-de-997-1
Content-Type: text/plain; charset=utf-8
X-Openstack-Request-Id: tx00000000000000002378a-005b8e218c-b2faf1-eu-de-997-1
The output of the sizes was as follows:
"size": 26,
"size_actual": 4096,
"size_utilized": 26,
"size_kb": 1,
"size_kb_actual": 4,
"size_kb_utilized": 1,
"num_objects": 1
"size": 52,
"size_actual": 8192,
"size_utilized": 52,
"size_kb": 1,
"size_kb_actual": 8,
"size_kb_utilized": 1,
"num_objects": 2
"size": 4149,
"size_actual": 16384,
"size_utilized": 4149,
"size_kb": 5,
"size_kb_actual": 16,
"size_kb_utilized": 5,
"num_objects": 3
Example to remove authorizations of a user with radosgw-admin
Remove the authorization of buckets=read from the user clyso-user-id.
radosgw-admin caps rm --uid=clyso-user-id --caps="buckets=read"
The Ceph cluster has recognized that a placement group (PG) is missing important information. This may be missing information on any write operations that have occurred or that there are no error-free copies.
The recommendation is to bring all OSDs that are in the down or out state back into the Ceph cluster, as these could contain the required information. In the case of an Ereasure Coding (EC) pool, the temporary reduction of the min_size can enable recovery. However, the min_size cannot be smaller than the number of defined data shunks for this pool.
https://docs.ceph.com/docs/master/rados/operations/pg-states/ https://docs.ceph.com/docs/master/rados/operations/erasure-code/
root@master.qa.cloud.clyso.com:~ # radosgw-admin user list
[
...
"57574cda626b45fba1cd96e68a57ced2",
...
"admin",
...
]
radosgw-admin user info --uid=57574cda626b45fba1cd96e68a57ced2
{
"user_id": "57574cda626b45fba1cd96e68a57ced2",
"display_name": "qa-clyso-backup",
"email": "",
"suspended": 0,
"max_buckets": 1000,
"auid": 0,
"subusers": [],
"keys": [],
"swift_keys": [],
"caps": [],
"op_mask": "read, write, delete",
"default_placement": "",
"placement_tags": [],
"bucket_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
},
"user_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
},
"temp_url_keys": [],
"type": "keystone"
}
root@master.qa.cloud.clyso.com:~ # radosgw-admin quota set --quota-scope=user --uid=57574cda626b45fba1cd96e68a57ced2 --max-size=32985348833280```
## verify the set quota max_size and max_size_kb
```bash
root@master.qa.cloud.clyso.com:~ # radosgw-admin user info --uid=57574cda626b45fba1cd96e68a57ced2
{
"user_id": "57574cda626b45fba1cd96e68a57ced2",
"display_name": "qa-clyso-backup",
"email": "",
"suspended": 0,
"max_buckets": 1000,
"auid": 0,
"subusers": [],
"keys": [],
"swift_keys": [],
"caps": [],
"op_mask": "read, write, delete",
"default_placement": "",
"placement_tags": [],
"bucket_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
},
"user_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": 32985348833280,
"max_size_kb": 32212254720,
"max_objects": -1
},
"temp_url_keys": [],
"type": "keystone"
}
root@master.qa.cloud.clyso.com:~ # radosgw-admin quota enable --quota-scope=user --uid=57574cda626b45fba1cd96e68a57ced2
root@master.qa.cloud.clyso.com:~ # radosgw-admin user info --uid=57574cda626b45fba1cd96e68a57ced2
{
"user_id": "57574cda626b45fba1cd96e68a57ced2",
"display_name": "qa-clyso-backup",
"email": "",
"suspended": 0,
"max_buckets": 1000,
"auid": 0,
"subusers": [],
"keys": [],
"swift_keys": [],
"caps": [],
"op_mask": "read, write, delete",
"default_placement": "",
"placement_tags": [],
"bucket_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
},
"user_quota": {
"enabled": true,
"check_on_raw": false,
"max_size": 32985348833280,
"max_size_kb": 32212254720,
"max_objects": -1
},
"temp_url_keys": [],
"type": "keystone"
}
root@master.qa.cloud.clyso.com:~ # radosgw-admin user stats --uid=57574cda626b45fba1cd96e68a57ced2 --sync-stats
{
"stats": {
"total_entries": 10404,
"total_bytes": 54915680,
"total_bytes_rounded": 94674944
},
"last_stats_sync": "2017-08-21 07:09:58.909073Z",
"last_stats_update": "2017-08-21 07:09:58.906372Z"
}
You can delete buckets and their contents with S3 Tools and Ceph's own board tools.
With the popular command line tool s3cmd, you can delete buckets with content via S3 API call as follows:
s3cmd rb --rekursives s3: // clyso_bucket
Radosgw-admin talks directly to the Ceph cluster and does not require a running radosgw process and is also the faster way to delete buckets with content from the Ceph cluster.
radosgw-admin bucket rm --bucket=clyso_bucket --purge-objects
If you want to delete an entire user and his or her data from the system, you can do so with the following command:
radosgw-admin user rm --uid=<username> --purge-data
Use this command wisely!
First you need access to the boot screen.
Than reboot your server, you will see the boot loader screen, select the recovery mode.
Type e for edit and add the following option to the kernel boot options:
init=/bin/bash
Press Enter to exit the edit mode and boot into the single-user mode.
This will boot the kernel with /bin/bash instead of the standard init.