RADOS
CES and Ceph KB articles related to RADOS, covering the MON, MGR, and OSDs.
Cluster - Pause Whole Cluster For Maintenance
Problem
Admin wants to shutdown whole Cluster ( all Nodes ) for maintenance.
Solution
ceph osd set noout
ceph osd set norecover
ceph osd set norebalance
ceph osd set nobackfill
ceph osd set nodown
ceph osd set noscrub
ceph osd set nodeep-scrub
ceph osd set pause
Discussion
To restart cluster run commands in reverse order
OSD - PGs Stuck Activating
Problem
A user reports that whenever they change the weight of an OSD (e.g. mark an osd in
), several PGs get stuck in the activating
state, and ceph status
reports many slow ops warnings.
Solution
Increase the hard limit on the number of PGs per OSD using:
# ceph config set osd osd_max_pg_per_osd_hard_ratio 10
Discussion
Changes to the CRUSH map or OSD weights may temporarily cause the number of PGs mapped to an OSD to exceed the mon_max_pg_per_osd
. By using a large value for osd_max_pg_per_osd_hard_ratio
, we can configure the OSD to not block PGs from activating in this transient case.
OSD - Laggy Placement Groups
Problem
A customer cluster running v16.2.7 reports that whenever they stop an OSD, PGs go into laggy
state and cluster IO stops for several seconds.
Solution
This is a known issue related to the implementation of the osd_fast_shutdown
feature in early pacific v16 releases.
As a workaround, use:
# ceph config set osd osd_fast_shutdown_notify_mon true
It is then recommended to upgrade upgrade to the latest v16 release (v16.2.13 at the time of writing). Note that osd_fast_shutdown_notify_mon = true
is now the default in current Ceph releases as of summer 2023.
Discussion
The osd_fast_shutdown
feature was added in pacific as a quicker way to shutdown the OSD. In the previous approach, the OSD would call the destructors for all OSD classes and safely close all files like the rocksdb and objects. With osd_fast_shutdown
, the OSD simply aborts its process. The thinking is that the OSD can already cleanly recover from a power loss, so this type of abrupt stop is preferable. The problem is that the mon takes a long time to notice that an OSD has shut down like this, so the osd_fast_shutdown_notify_mon
option was added to send a message to the mon, letting it know that this OSD is stopping. This allows the PGs to re-peer quickly and avoid a long IO pause.
OSD - Repairing Inconsistent PG
Problem
Ceph is warning about inconsistent
PGs.
Solution
Users are advised to refer to the upstream documentation Repairing Inconsitent PGs.
If users notice that deep-scrub
is discovering inconsistent objects with a regular frequency, and if those errors coincide with SCSI Medium Errors on the underlying drives, it is recommended to switch on automatic repair of damaged objects detected during scrub:
# ceph config set osd osd_scrub_auto_repair true