Laggy Placement Groups When Stopping OSDs

Problem

A customer cluster running v16.2.7 reports that whenever they stop an OSD, PGs go into laggy state and cluster IO stops for several seconds.

Solution

This is a known issue related to the implementation of the osd_fast_shutdown feature in early pacific v16 releases. As a workaround, use:

# ceph config set osd osd_fast_shutdown_notify_mon true

It is then recommended to upgrade upgrade to the latest v16 release (v16.2.13 at the time of writing). Note that osd_fast_shutdown_notify_mon = true is now the default in current Ceph releases as of summer 2023.

Discussion

The osd_fast_shutdown feature was added in pacific as a quicker way to shutdown the OSD. In the previous approach, the OSD would call the destructors for all OSD classes and safely close all files like the rocksdb and objects. With osd_fast_shutdown, the OSD simply aborts its process. The thinking is that the OSD can already cleanly recover from a power loss, so this type of abrupt stop is preferable. The problem is that the mon takes a long time to notice that an OSD has shut down like this, so the osd_fast_shutdown_notify_mon option was added to send a message to the mon, letting it know that this OSD is stopping. This allows the PGs to re-peer quickly and avoid a long IO pause.

Problem​

Solution​

Discussion​

Problem

Solution

Discussion