Ceph Quincy (v17)

This document lists known critical bugs affecting Ceph Quincy (v17) releases.

PG Splitting/Merging Causes OSD Out-Of-Memory

Severity: High
Affected Versions: 17.2.0, 17.2.1, 17.2.2, 17.2.3
Bug Report: https://tracker.ceph.com/issues/53729

Description

A bug in the PG splitting and merging code can cause the OSD to go out-of-memory, a condition which persists even after restart. Offline tools are available in fixed releases to workaround the issue.

Recommendation

Do not change pg_num for any pool until after upgrade to a fixed release
Disable the pg autoscaler
Fixed in v17.2.4

BlueStore Potential Corruption

Severity: Critical
Affected Versions: 17.2.8
Bug Report: https://tracker.ceph.com/issues/69764

Description

Some versions of Ceph were released with a bug that may cause OSDs to crash and corrupt the on-disk data.

Recommendation

Upgrade to a fixed version (17.2.9 or 18.2.7) as soon as possible.

RadosGW --bypass-gc Data Loss Bug

Severity: Critical
Affected Versions: 17.2.x
Bug Report: https://tracker.ceph.com/issues/73348

Description

A long-standing data loss bug with --bypass-gc causes deletion of copied object data. If any of the deleted objects had been copied to/from other buckets, --bypass-gc deletes the data of those copies too. As a result, the copies are still visible to ListObjects requests but GetObject requests fail with NoSuchKey.

Note: This bug also affects Ceph Reef (v18) and Squid (v19). See the Reef known bugs and Squid known bugs pages for details.

Recommendation

Use radosgw-admin bucket rm without --bypass-gc which correctly handles copied objects
Follow the bug tracker for fix and backport updates
If you've used --bypass-gc in the past, audit your buckets for objects with missing data

After a host reboot, cephadm refuses to exit maintenance mode

Severity: minor Affected Versions: 17.2.7 Bug Report: https://tracker.ceph.com/issues/67905

Description

Under certain circumstances, SSH throws an error to the user and drops its connection.

For example, when attempting to use cephadm to exit maintenance mode, you might see this:

#Exiting from maintenance mode - Failed
root@mcw-ceph017-04:/# ceph orch host maintenance exit mcw-ceph018-32
Error EINVAL: Traceback (most recent call last):
File "/usr/share/ceph/mgr/mgr_module.py", line 1757, in _handle_command
  return self.handle_command(inbuf, cmd)
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 171, in handle_command
  return dispatch[cmd['prefix']].call(self, cmd, inbuf)
File "/usr/share/ceph/mgr/mgr_module.py", line 462, in call
  return self.func(mgr, **kwargs)
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 107, in <lambda>
  wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, **l_kwargs)  # noqa: E731
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 96, in wrapper
  return func(*args, **kwargs)
File "/usr/share/ceph/mgr/orchestrator/module.py", line 575, in _host_maintenance_exit
  raise_if_exception(completion)
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 225, in raise_if_exception
  e = pickle.loads(c.serialized_exception)
TypeError: __init__() missing 2 required positional arguments: 'hostname' and 'addr'
~~~

### Recommendation

PG Splitting/Merging Causes OSD Out-Of-Memory​

Description​

Recommendation​

BlueStore Potential Corruption​

Description​

Recommendation​

RadosGW --bypass-gc Data Loss Bug​

Description​

Recommendation​

After a host reboot, cephadm refuses to exit maintenance mode​

Description​

PG Splitting/Merging Causes OSD Out-Of-Memory

Description

Recommendation

BlueStore Potential Corruption

Description

Recommendation

RadosGW --bypass-gc Data Loss Bug

Description

Recommendation

After a host reboot, cephadm refuses to exit maintenance mode

Description