Skip to main content

13 posts tagged with "rgw"

View All Tags

· One min read
Joachim Kraftmayer

Commvault has been in use as a data protection solution for years and is now looking to replace its existing storage solution (EMC), for its entire customer environments.

Commvault provides data backup through a single interface. Through the gradual deployment of Ceph S3 in several expansion stages, the customer built confidence in Ceph as a storage technology and more and more backups are gradually being transferred to the new backend.

In the first phase, Ceph S3 was allowed to excel in its performance and scalability capabilities.

In the following phases, the focus will be on flexibility and use as unified storage for cloud computing and Kubernetes.

For all these scenarios, the customer relies on Ceph as an extremely scalable, high-performance and cost-effective storage backend.

Over 1 PB of backup data and more than 500 GBytes per hour of backup throughput can be easily handled by Ceph S3 and it is ready to grow even further with the requirements in the future.

After in-depth consultation, we were able to exceed the customer’s expectations for the Ceph cluster in production.

· One min read
Joachim Kraftmayer

The customer uses Commvault as a data backup solution for their entire customer environments.

Wherever the data resides, Commvault provides the backup of the data through a single interface. The customer thus avoids costly data loss scenarios, disconnected data silos, lack of recovery SLAs and inefficient scaling.

For all these scenarios, the customer relies on Ceph as a powerful and cost-effective storage backend for Commvault.

With over 2 PB of backup data and more than 1 TByte per hour of backup throughput, Ceph can easily handle and is ready to grow even further with the requirements in the future.

In conclusion, we were able to clearly exceed the customer’s expectations of the Ceph Cluster already in the test phase.

· 2 min read
Joachim Kraftmayer

motivation

we have tested ceph s3 in openstack swift intensively before. We were interested in the behavior of the radosgw stack in ceph. We paid particular attention to the size and number of objects in relation to the resource consumption of the radosgw process. Effects on response latencies of radosgw were also important to us. To be able to plan the right sizing of the physical and virtual environments.

technical topics​

From a technical point of view, we were interested in the behavior of radosgw in the following topics.

  • dynamic bucket sharding
  • http frontend difference between Civetweb and Beast
  • index pool io pattern and latencies
  • data pool io pattern and latencies with erasure-coded and replicated pools
  • fast_read vs. standard read for workloads with large and small objects.

requirements

when choosing the right tool, it was important for us to be able to test both small and large ceph clusters with several thousand osds.

We want to use the test results as a file for evaluation as well as have a graphical representation as timeseries data.

For timeseries data we rely on the standard stack with Grafana, Prometheus and Thanos.

the main prometheus exporters we use are ceph-mgr-exporter and node-exporter.

load and performance tools​

CBT - The Ceph Benchmarking Tool

CBT is a testing harness written in python

https://github.com/ceph/cbt

s3 - tests

This is a set of unofficial Amazon AWS S3 compatibility tests

https://github.com/ceph/s3-tests

COSBench - Cloud Object Storage Benchmark

COSBench is a benchmarking tool to measure the performance of Cloud Object Storage services.

https://github.com/intel-cloud/cosbench

Gosbench

Gosbench is the Golang reimplementation of Cosbench. It is a distributed S3 performance benchmark tool with Prometheus exporter leveraging the official Golang AWS SDK

https://github.com/mulbc/gosbench

hsbench

hsbench is an S3 compatable benchmark originally based on wasabi-tech/s3-benchmark.

https://github.com/markhpc/hsbench

Warp

Minio - S3 benchmarking tool.

https://github.com/minio/warp

the tool of our choice​

getput

getput can be run individually on a test client.

gpsuite is responsible for synchronization and scaling across any number of test clients. Communication takes place via ssh keys and the simultaneous start of all s3 test clients is synchronized over a common time base.

Installation on linux as script or as container is supported.

https://github.com/markseger/getput

· One min read
Joachim Kraftmayer

radosgw-admin key create --uid=clyso-user-id --key-type=s3 --gen-access-key --gen-secret

...

"keys": [
{
"user": "clyso-user-id",
"access_key": "VO8C17LBI9Y39FSODOU5",
"secret_key": "zExCLO1bLQJXoY451ZiKpeoePLSQ1khOJG4CcT3N"
}
],

...

access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html/object_gateway_guide_for_red_hat_enterprise_linux/administration_cli#create_a_key

· One min read
Joachim Kraftmayer

List of users:

radosgw-admin metadata list user

List of buckets:

radosgw-admin metadata list bucket

List of bucket instances:

radosgw-admin metadata list user.instance

All necessary information

  • user-id = Output from the list of users
  • bucket-id = Output from the list of bucket instances
  • bucket-name = Output from the list of buckets or bucket instances
  • Change of user for this bucket instance:
radosgw-admin bucket link --bucket <bucket-name> --bucket-id <default-uuid>.267207.1 --uid=<user-uid>

Example:

radosgw-admin bucket link --bucket test-clyso-test --bucket-id aa81cf7e-38c5-4200-b26b-86e900207813.267207.1 --uid=c19f62adbc7149ad9d19-8acda2dcf3c0

If you compare the buckets before and after the change, the following values are changed:

  • ver: is increased
  • mtime: will be updated
  • owner: is set to the new uid
  • key: user.rgw.acl: The rights are reset for the user.rgw.acl key

· 3 min read
Joachim Kraftmayer

We encountered the first large omap objects in one of our Luminous Ceph clusters in Q3 2018 and worked with a couple of Ceph Core developers on the solution for internal management of RadosGW objects. This included topics such as large omap objects, dynamic resharding, multisite, deleting old object instances in the RadosGW index pool, and many small changes that were included in the Luminous, Mimic, and subsequent versions.

Here is a step by step guide on how to identify large omap objects and buckets and then manually reshard the affected objects.

output ceph status

ceph -s

cluster:
id: 52296cfd-d6c6-3129-bf70-db16f0e4423d
health: HEALTH_WARN
1 large omap object

output ceph health detail

ceph health detail
HEALTH_WARN 1 large omap objects
1 large objects found in pool 'clyso-test-sin-1.rgw.buckets.index'
Search the cluster log for 'Large omap object found' for more details.
search the ceph.log of the Ceph cluster:
2018-09-26 12:10:38.440682 mon.clyso1-mon1 mon.0 192.168.130.20:6789/0 77104 : cluster [WRN] Health check failed: 1 large omap objects (LARGE_OMAP_OBJECTS)
2018-09-26 12:10:35.037753 osd.1262 osd.1262 192.168.130.31:6836/10060 152 : cluster [WRN] Large omap object found. Object: 28:18428495:::.dir.143112fc-1178-40e1-b209-b859cd2c264c.38511450.376:head Key count: 2928429 Size (bytes): 861141085
2018-09-26 13:00:00.000103 mon.clyso1-mon1 mon.0 192.168.130.20:6789/0 77505 : cluster [WRN] overall HEALTH_WARN 1 large omap objects

From the ceph.log we extract the bucket instance, in this case:

143112fc-1178-40e1-b209-b859cd2c264c.38511450.376 and look for it in the RadosGW metadata

root@salt-master1.clyso.test:~ # radosgw-admin metadata list "bucket.instance" | egrep "143112fc-1178-40e1-b209-b859cd2c264c.38511450.376"
"b1868d6d-9d61-49b0-b101-c89207009b16:143112fc-1178-40e1-b209-b859cd2c264c.38511450.376"
root@salt-master1.clyso.test:~ #

The instance exists and we checked the metadata of the instance.

root@salt-master1.clyso.test:~ # radosgw-admin metadata get bucket.instance:b1868d6d-9d61-49b0-b101-c89207009b16:143112fc-1178-40e1-b209-b859cd2c264c.38511450.376
{
"key": "bucket.instance:b1868d6d-9d61-49b0-b101-c89207009b16:143112fc-1178-40e1-b209-b859cd2c264c.38511450.376",
"ver": {
"tag": "_Ehz5PYLhHBxpsJ_s39lePnX",
"ver": 7
},
"mtime": "2018-04-24 10:02:32.362129Z",
"data": {
"bucket_info": {
"bucket": {
"name": "b1868d6d-9d61-49b0-b101-c89207009b16",
"marker": "143112fc-1178-40e1-b209-b859cd2c264c.38511450.376",
"bucket_id": "143112fc-1178-40e1-b209-b859cd2c264c.38511450.376",
"tenant": "",
"explicit_placement": {
"data_pool": "",
"data_extra_pool": "",
"index_pool": ""
}
},
"creation_time": "2018-02-20 20:58:51.125791Z",
"owner": "d7a84e1aed9144919f8893b7d6fc5b02",
"flags": 0,
"zonegroup": "1c44aba5-fe64-4db3-9ef7-f0eb30bf5d80",
"placement_rule": "default-placement",
"has_instance_obj": "true",
"quota": {
"enabled": true,
"check_on_raw": true,
"max_size": 54975581388800,
"max_size_kb": 53687091200,
"max_objects": -1
},
"num_shards": 0,
"bi_shard_hash_type": 0,
"requester_pays": "false",
"has_website": "false",
"swift_versioning": "false",
"swift_ver_location": "",
"index_type": 0,
"mdsearch_config": [],
"reshard_status": 0,
"new_bucket_instance_id": ""
},
"attrs": [
{
"key": "user.rgw.acl",
"val": "AgK4A.....AAAAAAA="
},
{
"key": "user.rgw.idtag",
"val": ""
},
{
"key": "user.rgw.x-amz-read",
"val": "aW52YWxpZAA="
},
{
"key": "user.rgw.x-amz-write",
"val": "aW52YWxpZAA="
}
]
}
}
root@salt-master1.clyso.test:~ #

get the metadata infos from the bucket

root@salt-master1.clyso.test:~ # radosgw-admin metadata get bucket:b1868d6d-9d61-49b0-b101-c89207009b16
{
"key": "bucket:b1868d6d-9d61-49b0-b101-c89207009b16",
"ver": {
"tag": "_WaSWh9mb21kEjHCisSzhWs8",
"ver": 1
},
"mtime": "2018-02-20 20:58:51.152766Z",
"data": {
"bucket": {
"name": "b1868d6d-9d61-49b0-b101-c89207009b16",
"marker": "143112fc-1178-40e1-b209-b859cd2c264c.38511450.376",
"bucket_id": "143112fc-1178-40e1-b209-b859cd2c264c.38511450.376",
"tenant": "",
"explicit_placement": {
"data_pool": "",
"data_extra_pool": "",
"index_pool": ""
}
},
"owner": "d7a84e1aed9144919f8893b7d6fc5b02",
"creation_time": "2018-02-20 20:58:51.125791Z",
"linked": "true",
"has_bucket_info": "false"
}
}
root@salt-master1.clyso.test:~ #

grep for the bucket_id in the radosgw index pool

root@salt-master1.clyso.test:~ # rados -p eu-de-200-1.rgw.buckets.index ls | egrep “143112fc-1178-40e1-b209-b859cd2c264c.38511450.376” | wc -l
1
root@salt-master1.clyso.test:~ #

the bucket rados object, that has to be resharded

143112fc-1178-40e1-b209-b859cd2c264c.38511450.376

· One min read
Joachim Kraftmayer

The aim is to achieve a scaling of the rgw instances for the production system so that 10,000 active connections are possible.

As a result of various test runs, the following configuration emerged for our setup

[client.rgw.<id>]
keyring = /etc/ceph/ceph.client.rgw.keyring
rgw content length compat = true
rgw dns name = <rgw.hostname.clyso.com>
rgw enable ops log = false
rgw enable usage log = false
rgw frontends = civetweb port=80
error_log_file=/var/log/radosgw/civetweb.error.log
rgw num rados handles = 8
rgw swift url = http://<rgw.hostname.clyso.com>
rgw thread pool size = 512

Notes on the configuration

rgw thread pool size ist der Standardwert für num_threads des civeweb webservers.

Line 54: https://github.com/ceph/ceph/blob/master/src/rgw/rgw_civetweb_frontend.cc

set_conf_default(conf_map, "num_threads",
std::to_string(g_conf->rgw_thread_pool_size));
[client.radosgw]
keyring = /etc/ceph/ceph.client.radosgw.keyring
rgw content length compat = true
rgw dns name = <fqdn hostname>
rgw enable ops log = false
rgw enable usage log = false
rgw frontends = civetweb port=8080 num_threads=512
error_log_file=/var/log/radosgw/civetweb.error.log
rgw num rados handles = 8
rgw swift url = http://<fqdn hostname>
rgw thread pool size = 51``

sources

https://github.com/ceph/ceph/blob/master/doc/radosgw/config-ref.rst

http://docs.ceph.com/docs/master/radosgw/config-ref/

https://github.com/ceph/ceph/blob/master/src/rgw/rgw_civetweb_frontend.cc

https://indico.cern.ch/event/578974/contributions/2695212/attachments/1521538/2377177/Ceph_pre-gdb_2017.pdf

http://www.osris.org/performance/rgw.html

https://www.swiftstack.com/docs/integration/python-swiftclient.html

https://github.com/civetweb/civetweb/tree/master/docs

· One min read
Joachim Kraftmayer

If you quickly need the syntax for the radosgw-admin command.

clyso-ceph-rgw-client:~/clyso # radosgw-admin object stat --bucket=size-container --object=clysofile


{
"name": "clysofile",
"size": 26,
"policy": {
"acl": {
"acl_user_map": [
{
"user": "clyso-user",
"acl": 15
}
],
"acl_group_map": [],
"grant_map": [
{
"id": "clyso-user",
"grant": {
"type": {
"type": 0
},
"id": "clyso-user",
"email": "",
"permission": {
"flags": 15
},
"name": "clyso-admin",
"group": 0,
"url_spec": ""
}
}
]
},
"owner": {
"id": "clyso-user",
"display_name": "clyso-admin"
}
},
"etag": "clyso-user",
"tag": "d667b6f1-5737-4f5e-bad0-fc030f0a4e94.11729649.143382",
"manifest": {
"objs": [],
"obj_size": 26,
"explicit_objs": "false",
"head_size": 26,
"max_head_size": 4194304,
"prefix": ".ZQzVc6phBAMCv3lSbiHBo0fftkpXmjm_",
"rules": [
{
"key": 0,
"val": {
"start_part_num": 0,
"start_ofs": 4194304,
"part_size": 0,
"stripe_max_size": 4194304,
"override_prefix": ""
}
}
],
"tail_instance": "",
"tail_placement": {
"bucket": {
"name": "size-container",
"marker": "d667b6f1-5737-4f5e-bad0-fc030f0a4e94.11750341.561",
"bucket_id": "d667b6f1-5737-4f5e-bad0-fc030f0a4e94.11750341.561",
"tenant": "",
"explicit_placement": {
"data_pool": "",
"data_extra_pool": "",
"index_pool": ""
}
},
"placement_rule": "default-placement"
}
},
"attrs": {
"user.rgw.pg_ver": "��",
"user.rgw.source_zone": "eR[�\u0011",
"user.rgw.tail_tag": "d667b6f1-5737-4f5e-bad0-fc030f0a4e94.11729649.143382",
"user.rgw.x-amz-meta-mtime": "1535100720.157102"
}
}