14 posts tagged with "s3"

Critical Known Bugs in Ceph Quincy, Reef and Squid Versions

April 7, 2025 · 4 min read

Software Engineer at Clyso

Below are a list of known bugs in ceph versions that we want to highlight. As of time of writing this, the latest versions for each release are the following:

reef: 18.2.7
squid: 19.2.3

There are more bugs in Ceph that were not included here, we've highlighted a few that we wanted to share.

Cross-version Issues

These critical bugs affect multiple major versions of Ceph.

RadosGW --bypass-gc Data Loss Bug

Severity: Critical
Affected Versions: Quincy (17.2.x), Reef (18.2.x), Squid (19.2.x) Bug Tracker: https://tracker.ceph.com/issues/73348

Description

An RGW bug that has affected all ceph versions. With --bypass-gc causes deletion of copied object data.

When using radosgw-admin bucket rm --bypass-gc, if any of the deleted objects had been copied to/from other buckets using the server-side S3 CopyObject operation, the command deletes the data of those copies as well. As a result, the copied objects remain visible in ListObjects requests but GetObject requests fail with NoSuchKey errors, leading to silent data loss.

**This bug affects all major versions: Quincy (v17), Reef (v18), Squid (v19) Backports are still in progress for reef, squid and tentacle releases.

Recommendation

Do not use --bypass-gc flag - Use radosgw-admin bucket rm without --bypass-gc which correctly handles copied objects
If you've used --bypass-gc in the past, audit your buckets for objects with missing data
Follow the bug tracker at https://tracker.ceph.com/issues/73348 for fix and backport updates

Ceph Squid (v19) Known Issues

Squid Deployed OSDs Are Crashing

Severity: Critical
Affected Versions: 19.2.*
Bug Tracker: https://tracker.ceph.com/issues/70390

Description

OSDs created in v19 may crash unexpectedly. This issue affects only newly deployed OSDs using Squid, while previously deployed OSDs continue to run fine. The root cause is likely the Elastic Shared Blob implementation introduced in PR #53178. Unfortunately, ceph-bluestore-tool repair cannot fix affected OSDs.

Recommendation

Before deploying new OSDs: Run ceph config set osd bluestore_elastic_shared_blobs 0 on any Squid cluster
For already affected OSDs: The only known fix is complete redeployment of the OSD

S3 DeleteBucketLifecycle Does Not Delete Config

Severity: Medium
Affected Versions: 19.2.3
Bug Tracker: https://tracker.ceph.com/issues/71083

Description

The S3 DeleteBucketLifecycle API call fails to actually delete the lifecycle configuration from the bucket, causing S3 compatibility issues. This is a regression introduced specifically in squid 19.2.3.

Recommendation

Upgrade to a fixed version when available
Avoid using the DeleteBucketLifecycle API in affected versions
Consider using alternative methods to manage lifecycle configurations

Ceph Reef (v18) Known Issues

BlueStore Potential Corruption

Severity: Critical
Affected Versions: 18.2.5, 18.2.6
Bug Tracker: https://tracker.ceph.com/issues/69764

Description

Versions 18.2.5 and 18.2.6 of Ceph Reef were released with a bug that may cause OSDs to crash and corrupt on-disk data. This is a data integrity issue that poses serious risks to cluster stability and data safety.

Recommendation

Upgrade to version 18.2.7. Do not deploy or upgrade to versions 18.2.5 or 18.2.6.

Ceph Quincy (v17) Known Issues

BlueStore Potential Corruption

Severity: Critical
Affected Versions: 17.2.8
Bug Tracker: https://tracker.ceph.com/issues/69764

Description

Version 17.2.8 of Ceph Quincy was released with a bug that may cause OSDs to crash and corrupt on-disk data. This is a data integrity issue that poses serious risks to cluster stability and data safety.

Recommendation

Upgrade to version 17.2.9 or 18.2.7. Do not deploy or remain on version 17.2.8.

Additional Resources

Please file a ticket in our support portal if there are any other questions or concerns.

As of now, we recommend Reef v18.2.7 as a stable version to be used in production. This changes overtime as we gain more confidence over later ceph releases.

For more detailed information and updates on these issues, refer to our knowledge base articles:

S3 migration with chorus

March 26, 2025 · 10 min read

Artem Torubarov

Software Engineer at Clyso

An S3 system holds data—call it source. It keeps applications running, but migration is needed, maybe due to scale limits or costs creeping up. A new S3 setup, target, is set to replace it. The challenge is to move all data from source to target with no downtime, no data lost, and no breaks for the apps using source. What can get this done?

What Are the Challenges?

Migrating S3 data brings several key difficulties:

Data and Metadata Consistency: It’s essential to make sure that all data is copied correctly—no objects lost or corrupted. Besides this, applications relying on metadata (e.g., ACLs, versions, timestamps) need to keep working as expected, so that metadata has to be spot-on too.
Ongoing Writes: Applications don’t stop writing to source storage during migration, and all that new data needs to reach target storage too. Synchronous replication sounds good but can get slow and messy—what’s the right response to a user if a PUT works on source storage but flops on target? The alternative is downtime: pause writes to source storage and copy everything at once. For some applications, though, downtime just isn’t an option.

These issues don’t stand alone—they tangle together. Verifying data integrity gets tricky when ongoing writes shift the dataset mid-migration.

Regarding Tools and Strategies

Two high-level approaches can tackle these challenges:

Do It Bucket by Bucket: This approach leans on a canary deployment strategy. Copy one bucket, switch the application to target storage, and check if everything runs as expected. If it does, move on to the next bucket; if not, flip back to source storage and dig into the problem. It cuts downtime too—copy a bucket at a time and switch the application only when all data’s in place.
Do It in Two Phases: Say a bucket holds 10 million objects, and copying takes a day. During that time, ongoing writes mean about 5% of objects get added, updated, or removed. So, copy all the data once without stopping writes. Then, use a short downtime to figure out which objects changed and copy just those. Call the first pass initial replication and the second event replication.

These ideas sound simple, but execution isn’t. Questions arise:

If applications expect one URL for all buckets, how does bucket-by-bucket work?
How can writes to one bucket be stopped for downtime?
How are changes (aka events) tracked for event replication?
How can 10 million objects be copied fast enough?
How does this scale to 10,000 buckets automatically?

Now, let’s see how Chorus handles these challenges and strategies.

Opensourcing Chorus project

January 24, 2024 · 4 min read

Artem Torubarov

Software Engineer at Clyso

Today, we're excited to share that we've released the Chorus project under the Apache 2.0 License. In this blog post, let's talk about what Chorus is and why we made it.

At Clyso, we frequently assist our customers in migrating infrastructure, whether to or from the cloud, or between different cloud providers. Our focus often centers around storage, particularly S3.

Like many others in the field, we initially relied on the fantastic Rclone tool, which excelled at the task. However, as we encountered challenges while attempting to migrate 100TB bucket with 100M objects, we recognized the need for an additional layer of automation. Migrating large buckets within a reasonable timeframe requires a machine with substantial RAM and network bandwidth to take advantage of the parallelism options provided by Rclone.

Yet, even with powerful machines, the risk of network problems or VM restarts interrupting the synchronization process remained. While Rclone handles restarts admirably by comparing object size, ETag, and modification time, the process becomes time-consuming and incurs additional costs for cloud-based S3, especially with very large buckets.

The missing piece in our puzzle was the ability to run Rclone on multiple machines for improved hardware utilization and the ability to track and store progress on remote persistent storage. With these goals in mind, we developed Chorus - a vendor-agnostic S3 backup, replication, and routing software. Written in Go, Chorus uses Rclone for S3 object copying, Redis for progress tracking, and Asynq work queue for load distribution across multiple machines.

Commvault Backup with Ceph S3

July 27, 2023 · One min read

Joachim Kraftmayer

Managing Director at Clyso

Commvault has been in use as a data protection solution for years and is now looking to replace its existing storage solution (EMC), for its entire customer environments.

Commvault provides data backup through a single interface. Through the gradual deployment of Ceph S3 in several expansion stages, the customer built confidence in Ceph as a storage technology and more and more backups are gradually being transferred to the new backend.

In the first phase, Ceph S3 was allowed to excel in its performance and scalability capabilities.

In the following phases, the focus will be on flexibility and use as unified storage for cloud computing and Kubernetes.

For all these scenarios, the customer relies on Ceph as an extremely scalable, high-performance and cost-effective storage backend.

Over 1 PB of backup data and more than 500 GBytes per hour of backup throughput can be easily handled by Ceph S3 and it is ready to grow even further with the requirements in the future.

After in-depth consultation, we were able to exceed the customer’s expectations for the Ceph cluster in production.

Productive Ceph Cluster in Microsoft Azure with AKS

July 27, 2023 · One min read

Joachim Kraftmayer

Managing Director at Clyso

The customer uses Commvault as a data backup solution for their entire customer environments.

Wherever the data resides, Commvault provides the backup of the data through a single interface. The customer thus avoids costly data loss scenarios, disconnected data silos, lack of recovery SLAs and inefficient scaling.

For all these scenarios, the customer relies on Ceph as a powerful and cost-effective storage backend for Commvault.

With over 2 PB of backup data and more than 1 TByte per hour of backup throughput, Ceph can easily handle and is ready to grow even further with the requirements in the future.

In conclusion, we were able to clearly exceed the customer’s expectations of the Ceph Cluster already in the test phase.

Ceph S3 load and performance test

August 1, 2022 · 2 min read

Joachim Kraftmayer

Managing Director at Clyso

motivation

we have tested ceph s3 in openstack swift intensively before. We were interested in the behavior of the radosgw stack in ceph. We paid particular attention to the size and number of objects in relation to the resource consumption of the radosgw process. Effects on response latencies of radosgw were also important to us. To be able to plan the right sizing of the physical and virtual environments.

technical topics

From a technical point of view, we were interested in the behavior of radosgw in the following topics.

dynamic bucket sharding
http frontend difference between Civetweb and Beast
index pool io pattern and latencies
data pool io pattern and latencies with erasure-coded and replicated pools
fast_read vs. standard read for workloads with large and small objects.

requirements

when choosing the right tool, it was important for us to be able to test both small and large ceph clusters with several thousand osds.

We want to use the test results as a file for evaluation as well as have a graphical representation as timeseries data.

For timeseries data we rely on the standard stack with Grafana, Prometheus and Thanos.

the main prometheus exporters we use are ceph-mgr-exporter and node-exporter.

load and performance tools

CBT - The Ceph Benchmarking Tool

CBT is a testing harness written in python

https://github.com/ceph/cbt

s3 - tests

This is a set of unofficial Amazon AWS S3 compatibility tests

https://github.com/ceph/s3-tests

COSBench - Cloud Object Storage Benchmark

COSBench is a benchmarking tool to measure the performance of Cloud Object Storage services.

https://github.com/intel-cloud/cosbench

Gosbench

Gosbench is the Golang reimplementation of Cosbench. It is a distributed S3 performance benchmark tool with Prometheus exporter leveraging the official Golang AWS SDK

https://github.com/mulbc/gosbench

hsbench

hsbench is an S3 compatable benchmark originally based on wasabi-tech/s3-benchmark.

https://github.com/markhpc/hsbench

Warp

Minio - S3 benchmarking tool.

https://github.com/minio/warp

the tool of our choice

getput

getput can be run individually on a test client.

gpsuite is responsible for synchronization and scaling across any number of test clients. Communication takes place via ssh keys and the simultaneous start of all s3 test clients is synchronized over a common time base.

Installation on linux as script or as container is supported.

https://github.com/markseger/getput

multisite environment - ceph bucket index dynamic resharding

September 9, 2018 · One min read

Joachim Kraftmayer

Managing Director at Clyso

Dynamic resharding is not supported in multisite environment. It is disabled by default since Ceph 12.2.2, but we recommend you to double check the setting.

Sources

https://www.suse.com/documentation/suse-enterprise-storage-5/book_storage_admin/data/ogw_bucket_sharding.html

Ceph performance tests via Openstack Swift Backend

August 14, 2018 · One min read

Joachim Kraftmayer

Managing Director at Clyso

There are several testing frameworks for Openstack Swift, but most of them are no longer maintained or have some other problems to test larger Ceph clusters properly under load.

At this point we decided to use getput by Mark Seger.

Test Setup

Ceph Cluster

Our Ceph test cluster consisted of 108 Ceph nodes with a total of 2592 osds. The network was a spine-leaf architecture.

Openstack Compute Nodes

12 compute nodes with 56 HT cores and 100 GBit/s network connectivity.

Test setup

gpsuite

Sources:

github.com/markseger/getput

ceph health HEALTH_WARN 1 large omap objects

May 9, 2018 · 3 min read

Joachim Kraftmayer

Managing Director at Clyso

We encountered the first large omap objects in one of our Luminous Ceph clusters in Q3 2018 and worked with a couple of Ceph Core developers on the solution for internal management of RadosGW objects. This included topics such as large omap objects, dynamic resharding, multisite, deleting old object instances in the RadosGW index pool, and many small changes that were included in the Luminous, Mimic, and subsequent versions.

Here is a step by step guide on how to identify large omap objects and buckets and then manually reshard the affected objects.

output ceph status

ceph -s

cluster:
id: 52296cfd-d6c6-3129-bf70-db16f0e4423d
health: HEALTH_WARN
1 large omap object

output ceph health detail

ceph health detail
HEALTH_WARN 1 large omap objects
1 large objects found in pool 'clyso-test-sin-1.rgw.buckets.index'
Search the cluster log for 'Large omap object found' for more details.
search the ceph.log of the Ceph cluster:
2018-09-26 12:10:38.440682 mon.clyso1-mon1 mon.0 192.168.130.20:6789/0 77104 : cluster [WRN] Health check failed: 1 large omap objects (LARGE_OMAP_OBJECTS)
2018-09-26 12:10:35.037753 osd.1262 osd.1262 192.168.130.31:6836/10060 152 : cluster [WRN] Large omap object found. Object: 28:18428495:::.dir.143112fc-1178-40e1-b209-b859cd2c264c.38511450.376:head Key count: 2928429 Size (bytes): 861141085
2018-09-26 13:00:00.000103 mon.clyso1-mon1 mon.0 192.168.130.20:6789/0 77505 : cluster [WRN] overall HEALTH_WARN 1 large omap objects

From the ceph.log we extract the bucket instance, in this case:

143112fc-1178-40e1-b209-b859cd2c264c.38511450.376 and look for it in the RadosGW metadata

root@salt-master1.clyso.test:~ # radosgw-admin metadata list "bucket.instance" | egrep "143112fc-1178-40e1-b209-b859cd2c264c.38511450.376"
"b1868d6d-9d61-49b0-b101-c89207009b16:143112fc-1178-40e1-b209-b859cd2c264c.38511450.376"
root@salt-master1.clyso.test:~ #

The instance exists and we checked the metadata of the instance.

root@salt-master1.clyso.test:~ # radosgw-admin metadata get bucket.instance:b1868d6d-9d61-49b0-b101-c89207009b16:143112fc-1178-40e1-b209-b859cd2c264c.38511450.376
{
"key": "bucket.instance:b1868d6d-9d61-49b0-b101-c89207009b16:143112fc-1178-40e1-b209-b859cd2c264c.38511450.376",
"ver": {
"tag": "_Ehz5PYLhHBxpsJ_s39lePnX",
"ver": 7
},
"mtime": "2018-04-24 10:02:32.362129Z",
"data": {
"bucket_info": {
"bucket": {
"name": "b1868d6d-9d61-49b0-b101-c89207009b16",
"marker": "143112fc-1178-40e1-b209-b859cd2c264c.38511450.376",
"bucket_id": "143112fc-1178-40e1-b209-b859cd2c264c.38511450.376",
"tenant": "",
"explicit_placement": {
"data_pool": "",
"data_extra_pool": "",
"index_pool": ""
}
},
"creation_time": "2018-02-20 20:58:51.125791Z",
"owner": "d7a84e1aed9144919f8893b7d6fc5b02",
"flags": 0,
"zonegroup": "1c44aba5-fe64-4db3-9ef7-f0eb30bf5d80",
"placement_rule": "default-placement",
"has_instance_obj": "true",
"quota": {
"enabled": true,
"check_on_raw": true,
"max_size": 54975581388800,
"max_size_kb": 53687091200,
"max_objects": -1
},
"num_shards": 0,
"bi_shard_hash_type": 0,
"requester_pays": "false",
"has_website": "false",
"swift_versioning": "false",
"swift_ver_location": "",
"index_type": 0,
"mdsearch_config": [],
"reshard_status": 0,
"new_bucket_instance_id": ""
},
"attrs": [
{
"key": "user.rgw.acl",
"val": "AgK4A.....AAAAAAA="
},
{
"key": "user.rgw.idtag",
"val": ""
},
{
"key": "user.rgw.x-amz-read",
"val": "aW52YWxpZAA="
},
{
"key": "user.rgw.x-amz-write",
"val": "aW52YWxpZAA="
}
]
}
}
root@salt-master1.clyso.test:~ #

get the metadata infos from the bucket

root@salt-master1.clyso.test:~ # radosgw-admin metadata get bucket:b1868d6d-9d61-49b0-b101-c89207009b16
{
"key": "bucket:b1868d6d-9d61-49b0-b101-c89207009b16",
"ver": {
"tag": "_WaSWh9mb21kEjHCisSzhWs8",
"ver": 1
},
"mtime": "2018-02-20 20:58:51.152766Z",
"data": {
"bucket": {
"name": "b1868d6d-9d61-49b0-b101-c89207009b16",
"marker": "143112fc-1178-40e1-b209-b859cd2c264c.38511450.376",
"bucket_id": "143112fc-1178-40e1-b209-b859cd2c264c.38511450.376",
"tenant": "",
"explicit_placement": {
"data_pool": "",
"data_extra_pool": "",
"index_pool": ""
}
},
"owner": "d7a84e1aed9144919f8893b7d6fc5b02",
"creation_time": "2018-02-20 20:58:51.125791Z",
"linked": "true",
"has_bucket_info": "false"
}
}
root@salt-master1.clyso.test:~ #

grep for the bucket_id in the radosgw index pool

root@salt-master1.clyso.test:~ # rados -p eu-de-200-1.rgw.buckets.index ls | egrep “143112fc-1178-40e1-b209-b859cd2c264c.38511450.376” | wc -l
1
root@salt-master1.clyso.test:~ #

the bucket rados object, that has to be resharded

143112fc-1178-40e1-b209-b859cd2c264c.38511450.376

ceph radosgw-admin remove rights from users

March 15, 2018 · One min read

Joachim Kraftmayer

Managing Director at Clyso

Example to remove authorizations of a user with radosgw-admin

Remove the authorization of buckets=read from the user clyso-user-id.

radosgw-admin caps rm --uid=clyso-user-id --caps="buckets=read"

Cross-version Issues​

RadosGW --bypass-gc Data Loss Bug​

Description​

Recommendation​

Ceph Squid (v19) Known Issues​

Squid Deployed OSDs Are Crashing​

Description​

Recommendation​

S3 DeleteBucketLifecycle Does Not Delete Config​

Description​

Recommendation​

Ceph Reef (v18) Known Issues​

BlueStore Potential Corruption​

Description​

Recommendation​

Ceph Quincy (v17) Known Issues​

BlueStore Potential Corruption​

Description​

Recommendation​

Additional Resources​

What Are the Challenges?​

Regarding Tools and Strategies​

motivation​

technical topics​​

requirements​

load and performance tools​​

CBT - The Ceph Benchmarking Tool​

s3 - tests​

COSBench - Cloud Object Storage Benchmark​

Gosbench​

hsbench​

Warp​

the tool of our choice​​

getput​

Sources​

Test Setup​

Ceph Cluster​

Openstack Compute Nodes​

Test setup​

output ceph status​

output ceph health detail​

get the metadata infos from the bucket​

grep for the bucket_id in the radosgw index pool​

the bucket rados object, that has to be resharded​

Cross-version Issues

RadosGW --bypass-gc Data Loss Bug

Description

Recommendation

Ceph Squid (v19) Known Issues

Squid Deployed OSDs Are Crashing

Description

Recommendation

S3 DeleteBucketLifecycle Does Not Delete Config

Description

Recommendation

Ceph Reef (v18) Known Issues

BlueStore Potential Corruption

Description

Recommendation

Ceph Quincy (v17) Known Issues

BlueStore Potential Corruption

Description

Recommendation

Additional Resources

What Are the Challenges?

Regarding Tools and Strategies

motivation

technical topics

requirements

load and performance tools

CBT - The Ceph Benchmarking Tool

s3 - tests

COSBench - Cloud Object Storage Benchmark

Gosbench

hsbench

Warp

the tool of our choice

getput

Sources

Test Setup

Ceph Cluster

Openstack Compute Nodes

Test setup

output ceph status

output ceph health detail

get the metadata infos from the bucket

grep for the bucket_id in the radosgw index pool

the bucket rados object, that has to be resharded