Ceph Quincy (v17)
This document lists known critical bugs affecting Ceph Quincy (v17) releases.
PG Splitting/Merging Causes OSD Out-Of-Memory
Severity: High
Affected Versions: 17.2.0, 17.2.1, 17.2.2, 17.2.3
Bug Report: https://tracker.ceph.com/issues/53729
Description
A bug in the PG splitting and merging code can cause the OSD to go out-of-memory, a condition which persists even after restart. Offline tools are available in fixed releases to workaround the issue.
Recommendation
- Do not change pg_num for any pool until after upgrade to a fixed release
- Disable the pg autoscaler
- Fixed in v17.2.4
BlueStore Potential Corruption
Severity: Critical
Affected Versions: 17.2.8
Bug Report: https://tracker.ceph.com/issues/69764
Description
Some versions of Ceph were released with a bug that may cause OSDs to crash and corrupt the on-disk data.
Recommendation
Upgrade to a fixed version (17.2.9 or 18.2.7) as soon as possible.
RadosGW --bypass-gc Data Loss Bug
Severity: Critical
Affected Versions: 17.2.x
Bug Report: https://tracker.ceph.com/issues/73348
Description
A long-standing data loss bug with --bypass-gc causes deletion of copied object data. If any of the deleted objects had been copied to/from other buckets, --bypass-gc deletes the data of those copies too. As a result, the copies are still visible to ListObjects requests but GetObject requests fail with NoSuchKey.
Note: This bug also affects Ceph Reef (v18) and Squid (v19). See the Reef known bugs and Squid known bugs pages for details.
Recommendation
- Use
radosgw-admin bucket rmwithout--bypass-gcwhich correctly handles copied objects - Follow the bug tracker for fix and backport updates
- If you've used
--bypass-gcin the past, audit your buckets for objects with missing data