Skip to main content

3 posts tagged with "quincy"

View All Tags

Critical Known Bugs in Ceph Quincy, Reef and Squid Versions

· 4 min read
Joshua Blanch
Software Engineer at Clyso

Below are a list of known bugs in ceph versions that we want to highlight. As of time of writing this, the latest versions for each release are the following:

  • reef: 18.2.7
  • squid: 19.2.3

There are more bugs in Ceph that were not included here, we've highlighted a few that we wanted to share.

Cross-version Issues

These critical bugs affect multiple major versions of Ceph.

RadosGW --bypass-gc Data Loss Bug

Severity: Critical
Affected Versions: Quincy (17.2.x), Reef (18.2.x), Squid (19.2.x) Bug Tracker: https://tracker.ceph.com/issues/73348

Description

An RGW bug that has affected all ceph versions. With --bypass-gc causes deletion of copied object data.

When using radosgw-admin bucket rm --bypass-gc, if any of the deleted objects had been copied to/from other buckets using the server-side S3 CopyObject operation, the command deletes the data of those copies as well. As a result, the copied objects remain visible in ListObjects requests but GetObject requests fail with NoSuchKey errors, leading to silent data loss.

**This bug affects all major versions: Quincy (v17), Reef (v18), Squid (v19) Backports are still in progress for reef, squid and tentacle releases.

Recommendation

  • Do not use --bypass-gc flag - Use radosgw-admin bucket rm without --bypass-gc which correctly handles copied objects
  • If you've used --bypass-gc in the past, audit your buckets for objects with missing data
  • Follow the bug tracker at https://tracker.ceph.com/issues/73348 for fix and backport updates

Ceph Squid (v19) Known Issues

Squid Deployed OSDs Are Crashing

Severity: Critical
Affected Versions: 19.2.*
Bug Tracker: https://tracker.ceph.com/issues/70390

Description

OSDs created in v19 may crash unexpectedly. This issue affects only newly deployed OSDs using Squid, while previously deployed OSDs continue to run fine. The root cause is likely the Elastic Shared Blob implementation introduced in PR #53178. Unfortunately, ceph-bluestore-tool repair cannot fix affected OSDs.

Recommendation

  • Before deploying new OSDs: Run ceph config set osd bluestore_elastic_shared_blobs 0 on any Squid cluster
  • For already affected OSDs: The only known fix is complete redeployment of the OSD

S3 DeleteBucketLifecycle Does Not Delete Config

Severity: Medium
Affected Versions: 19.2.3
Bug Tracker: https://tracker.ceph.com/issues/71083

Description

The S3 DeleteBucketLifecycle API call fails to actually delete the lifecycle configuration from the bucket, causing S3 compatibility issues. This is a regression introduced specifically in squid 19.2.3.

Recommendation

  • Upgrade to a fixed version when available
  • Avoid using the DeleteBucketLifecycle API in affected versions
  • Consider using alternative methods to manage lifecycle configurations

Ceph Reef (v18) Known Issues

BlueStore Potential Corruption

Severity: Critical
Affected Versions: 18.2.5, 18.2.6
Bug Tracker: https://tracker.ceph.com/issues/69764

Description

Versions 18.2.5 and 18.2.6 of Ceph Reef were released with a bug that may cause OSDs to crash and corrupt on-disk data. This is a data integrity issue that poses serious risks to cluster stability and data safety.

Recommendation

Upgrade to version 18.2.7. Do not deploy or upgrade to versions 18.2.5 or 18.2.6.

Ceph Quincy (v17) Known Issues

BlueStore Potential Corruption

Severity: Critical
Affected Versions: 17.2.8
Bug Tracker: https://tracker.ceph.com/issues/69764

Description

Version 17.2.8 of Ceph Quincy was released with a bug that may cause OSDs to crash and corrupt on-disk data. This is a data integrity issue that poses serious risks to cluster stability and data safety.

Recommendation

Upgrade to version 17.2.9 or 18.2.7. Do not deploy or remain on version 17.2.8.

Additional Resources

Please file a ticket in our support portal if there are any other questions or concerns.

As of now, we recommend Reef v18.2.7 as a stable version to be used in production. This changes overtime as we gain more confidence over later ceph releases.

For more detailed information and updates on these issues, refer to our knowledge base articles:

ceph Quincy release with bugfix for PGLog dups

· One min read
Joachim Kraftmayer
Managing Director at Clyso

Our bugfix from earlier this year was published in the ceph quincy release 17.2.4.

Trimming of PGLog dups is now controlled by size instead of the version. This fixes the PGLog inflation issue that was happening when online (in OSD) trimming jammed after a PG split operation. Also, a new offline mechanism has been added: ceph-objectstore-tool now has a trim-pg-log-dups op that targets situations where an OSD is unable to boot due to those inflated dups. If that is the case, in OSD logs the “You can be hit by THE DUPS BUG” warning will be visible. Relevant tracker: https://tracker.ceph.com/issues/53729"

osds with unlimited ram growth

how to identify osds affected by pg dup bug

Sources

https://docs.ceph.com/en/latest/releases/quincy/#v17-2-4-quincy