Ceph Quincy (v17)
This document lists known critical bugs affecting Ceph Quincy (v17) releases.
PG Splitting/Merging Causes OSD Out-Of-Memory
Severity: High
Affected Versions: 17.2.0, 17.2.1, 17.2.2, 17.2.3
Bug Report: https://tracker.ceph.com/issues/53729
Description
A bug in the PG splitting and merging code can cause the OSD to go out-of-memory, a condition which persists even after restart. Offline tools are available in fixed releases to workaround the issue.
Recommendation
- Do not change pg_num for any pool until after upgrade to a fixed release
- Disable the pg autoscaler
- Fixed in v17.2.4
BlueStore Potential Corruption
Severity: Critical
Affected Versions: 17.2.8
Bug Report: https://tracker.ceph.com/issues/69764
Description
Some versions of Ceph were released with a bug that may cause OSDs to crash and corrupt the on-disk data.
Recommendation
Upgrade to a fixed version (17.2.9 or 18.2.7) as soon as possible.
RadosGW --bypass-gc Data Loss Bug
Severity: Critical
Affected Versions: 17.2.x
Bug Report: https://tracker.ceph.com/issues/73348
Description
A long-standing data loss bug with --bypass-gc causes deletion of copied object data. If any of the deleted objects had been copied to/from other buckets, --bypass-gc deletes the data of those copies too. As a result, the copies are still visible to ListObjects requests but GetObject requests fail with NoSuchKey.
Note: This bug also affects Ceph Reef (v18) and Squid (v19). See the Reef known bugs and Squid known bugs pages for details.
Recommendation
- Use
radosgw-admin bucket rmwithout--bypass-gcwhich correctly handles copied objects - Follow the bug tracker for fix and backport updates
- If you've used
--bypass-gcin the past, audit your buckets for objects with missing data
After a host reboot, cephadm refuses to exit maintenance mode
Severity: minor Affected Versions: 17.2.7 Bug Report: https://tracker.ceph.com/issues/67905
Description
Under certain circumstances, SSH throws an error to the user and drops its connection.
For example, when attempting to use cephadm to exit maintenance mode, you
might see this:
#Exiting from maintenance mode - Failed
root@mcw-ceph017-04:/# ceph orch host maintenance exit mcw-ceph018-32
Error EINVAL: Traceback (most recent call last):
File "/usr/share/ceph/mgr/mgr_module.py", line 1757, in _handle_command
return self.handle_command(inbuf, cmd)
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 171, in handle_command
return dispatch[cmd['prefix']].call(self, cmd, inbuf)
File "/usr/share/ceph/mgr/mgr_module.py", line 462, in call
return self.func(mgr, **kwargs)
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 107, in <lambda>
wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, **l_kwargs) # noqa: E731
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 96, in wrapper
return func(*args, **kwargs)
File "/usr/share/ceph/mgr/orchestrator/module.py", line 575, in _host_maintenance_exit
raise_if_exception(completion)
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 225, in raise_if_exception
e = pickle.loads(c.serialized_exception)
TypeError: __init__() missing 2 required positional arguments: 'hostname' and 'addr'
~~~
### Recommendation