Skip to main content

Ceph Quincy (v17)

This document lists known critical bugs affecting Ceph Quincy (v17) releases.

PG Splitting/Merging Causes OSD Out-Of-Memory

Severity: High
Affected Versions: 17.2.0, 17.2.1, 17.2.2, 17.2.3
Bug Report: https://tracker.ceph.com/issues/53729

Description

A bug in the PG splitting and merging code can cause the OSD to go out-of-memory, a condition which persists even after restart. Offline tools are available in fixed releases to workaround the issue.

Recommendation

  • Do not change pg_num for any pool until after upgrade to a fixed release
  • Disable the pg autoscaler
  • Fixed in v17.2.4

BlueStore Potential Corruption

Severity: Critical
Affected Versions: 17.2.8
Bug Report: https://tracker.ceph.com/issues/69764

Description

Some versions of Ceph were released with a bug that may cause OSDs to crash and corrupt the on-disk data.

Recommendation

Upgrade to a fixed version (17.2.9 or 18.2.7) as soon as possible.

RadosGW --bypass-gc Data Loss Bug

Severity: Critical
Affected Versions: 17.2.x
Bug Report: https://tracker.ceph.com/issues/73348

Description

A long-standing data loss bug with --bypass-gc causes deletion of copied object data. If any of the deleted objects had been copied to/from other buckets, --bypass-gc deletes the data of those copies too. As a result, the copies are still visible to ListObjects requests but GetObject requests fail with NoSuchKey.

Note: This bug also affects Ceph Reef (v18) and Squid (v19). See the Reef known bugs and Squid known bugs pages for details.

Recommendation

  • Use radosgw-admin bucket rm without --bypass-gc which correctly handles copied objects
  • Follow the bug tracker for fix and backport updates
  • If you've used --bypass-gc in the past, audit your buckets for objects with missing data

After a host reboot, cephadm refuses to exit maintenance mode

Severity: minor Affected Versions: 17.2.7 Bug Report: https://tracker.ceph.com/issues/67905

Description

Under certain circumstances, SSH throws an error to the user and drops its connection.

For example, when attempting to use cephadm to exit maintenance mode, you might see this:

#Exiting from maintenance mode - Failed
root@mcw-ceph017-04:/# ceph orch host maintenance exit mcw-ceph018-32
Error EINVAL: Traceback (most recent call last):
File "/usr/share/ceph/mgr/mgr_module.py", line 1757, in _handle_command
return self.handle_command(inbuf, cmd)
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 171, in handle_command
return dispatch[cmd['prefix']].call(self, cmd, inbuf)
File "/usr/share/ceph/mgr/mgr_module.py", line 462, in call
return self.func(mgr, **kwargs)
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 107, in <lambda>
wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, **l_kwargs) # noqa: E731
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 96, in wrapper
return func(*args, **kwargs)
File "/usr/share/ceph/mgr/orchestrator/module.py", line 575, in _host_maintenance_exit
raise_if_exception(completion)
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 225, in raise_if_exception
e = pickle.loads(c.serialized_exception)
TypeError: __init__() missing 2 required positional arguments: 'hostname' and 'addr'
~~~

### Recommendation