OSD - PGs Stuck Activating
Problem
A user reports that whenever they change the weight of an OSD (e.g. mark an osd in), several PGs get stuck in the activating state, and ceph status reports many slow ops warnings.
Solution
Increase the hard limit on the number of PGs per OSD using:
ceph config set osd osd_max_pg_per_osd_hard_ratio 10
Discussion
Changes to the CRUSH map or OSD weights may temporarily cause the number of PGs mapped to an OSD to exceed the mon_max_pg_per_osd. By using a large value for osd_max_pg_per_osd_hard_ratio, we can configure the OSD to not block PGs from activating in this transient case.