Skip to main content

Recreating cephadm-managed OSDs

Problem

An OSD must be recreated from scratch. This is useful when the OSD device should be fully zapped and redeployed, or when you need Ceph to rebalance data after replacing one or more OSDs.

This cephadm-specific procedure uses upmap-remapped.py before rebalancing data to reduce misplaced objects while recreating OSDs. For the general capacity-add procedure that uses the same approach, see Adding Hosts or OSDs - Improved.

When the OSD is removed with --replace --zap, cephadm drains the OSD, marks it for replacement, zaps the device, and then recreates the OSD if the device still matches an OSD service specification. Ceph then backfills and rebalances data to return the cluster to a healthy state.

This procedure sets the OSD out manually to run upmap-remapped then use cephadm to zap the OSDs, then recreate them.

Link to upmap-remapped script: https://github.com/clyso/otto/blob/main/otto/src/clyso/ceph/otto/tools/cern/upmap-remapped.py

Procedure

High-level steps:

  1. Set norebalance and turn the balancer off.
  2. Set the OSDs out.
  3. Run upmap-remapped.py twice.
  4. Unset norebalance, turn on the balancer, and wait for rebalancing to finish.
  5. Set norebalance and turn the balancer off again.
  6. Let cephadm destroy the OSDs for replacement and recreate them.
  7. Set the recreated OSDs in.
  8. Run upmap-remapped.py twice.
  9. Unset norebalance, turn on the balancer, and wait for rebalancing to finish.

Detailed Steps

Keep watch ceph -s running in another terminal during the whole procedure:

watch ceph -s

1. Set norebalance and Turn the Balancer Off

Prevent Ceph from immediately moving data while you prepare the replacement:

ceph osd set norebalance
ceph balancer off

2. Set the OSDs Out

Mark the OSDs that will be recreated as out:

ceph osd out <osd_id>

For multiple OSDs:

ceph osd out <osd_id_1> <osd_id_2>

There should be misplaced objects but it should not be going down as we have 'norebalance' set.

3. Run upmap-remapped.py Twice

Run upmap-remapped.py twice to reduce misplaced objects before allowing the cluster to rebalance:

./upmap-remapped.py | sh -x
./upmap-remapped.py | sh -x

The number of misplaced objects in ceph -s should decrease.

4. Unset norebalance, Turn the Balancer On, and Wait

Allow the cluster to rebalance after the OSDs are out:

ceph osd unset norebalance
ceph balancer on

Misplaced objects will start to increase based on target_max_misplaced_ratio (recommend to set this to 0.005 or 5%).

Wait until rebalancing and backfill finish before continuing:

ceph -s

5. Set norebalance and Turn the Balancer Off Again

After the cluster is stable, stop automatic data movement again before asking cephadm to recreate the OSDs:

ceph osd set norebalance
ceph balancer off

6. Let cephadm Destroy and Recreate the OSDs

Confirm that the target devices are still selected by a cephadm OSD service. If the devices are not managed, cephadm will zap them but will not automatically recreate the OSDs and leave them as destroyed.

ceph orch ls --service_type osd
ceph orch set-unmanaged <osd_service_name_here>

Then remove the OSDs with --replace --zap:

ceph orch osd rm <osd_id> --replace --zap

For multiple OSDs:

ceph orch osd rm <osd_id_1> <osd_id_2> --replace --zap

Monitor the replacement process:

ceph orch osd rm status
ceph orch ps --daemon_type osd
ceph osd tree
ceph log last cephadm

The OSDs should be marked 'destroyed' if the OSD service is unmanaged:

ceph orch ls --service_type osd
ceph orch set-managed <osd_service_name_here>

OSDs should be created in this step and should be marked up.

Set cephadm service managed to recreate OSDs

7. Set the Recreated OSDs In

After cephadm recreates the OSDs and the daemons are running, mark the OSDs back in:

ceph osd in <osd_id>

For multiple OSDs:

ceph osd in <osd_id_1> <osd_id_2>

Verify that the recreated OSDs are up and in:

ceph osd tree
ceph -s

This should create misplaced objects.

8. Run upmap-remapped.py Twice Again

Run upmap-remapped.py twice again to reduce misplaced objects caused by the recreated OSDs before allowing the final rebalance:

./upmap-remapped.py | sh -x
./upmap-remapped.py | sh -x

9. Unset norebalance, Turn the Balancer On, and Wait

Allow the cluster to complete the final rebalance:

ceph osd unset norebalance
ceph balancer on

Verify that the replacement OSDs are up and in, and that rebalancing finishes:

ceph osd tree
ceph orch osd rm status
ceph -s

Expected results:

  • The replacement OSD daemons are running.
  • 'ceph osd df tree' can be used to determine how many PGs / data are in an OSD state.
  • cephadm recreates the OSDs that are marked as destroyed