Recreating cephadm-managed OSDs
Problem
An OSD must be recreated from scratch. This is useful when the OSD device should be fully zapped and redeployed, or when you need Ceph to rebalance data after replacing one or more OSDs.
This cephadm-specific procedure uses
upmap-remapped.py
before rebalancing data to reduce misplaced objects while recreating OSDs. For
the general capacity-add procedure that uses the same approach, see
Adding Hosts or OSDs - Improved.
When the OSD is removed with --replace --zap, cephadm drains the OSD, marks it
for replacement, zaps the device, and then recreates the OSD if the device still
matches an OSD service specification. Ceph then backfills and rebalances data to
return the cluster to a healthy state.
This procedure sets the OSD out manually to run upmap-remapped then use cephadm to zap the OSDs, then recreate them.
Link to upmap-remapped script: https://github.com/clyso/otto/blob/main/otto/src/clyso/ceph/otto/tools/cern/upmap-remapped.py
Procedure
High-level steps:
- Set
norebalanceand turn the balancer off. - Set the OSDs out.
- Run
upmap-remapped.pytwice. - Unset
norebalance, turn on the balancer, and wait for rebalancing to finish. - Set
norebalanceand turn the balancer off again. - Let cephadm destroy the OSDs for replacement and recreate them.
- Set the recreated OSDs in.
- Run
upmap-remapped.pytwice. - Unset
norebalance, turn on the balancer, and wait for rebalancing to finish.
Detailed Steps
Keep watch ceph -s running in another terminal during the whole procedure:
watch ceph -s
1. Set norebalance and Turn the Balancer Off
Prevent Ceph from immediately moving data while you prepare the replacement:
ceph osd set norebalance
ceph balancer off
2. Set the OSDs Out
Mark the OSDs that will be recreated as out:
ceph osd out <osd_id>
For multiple OSDs:
ceph osd out <osd_id_1> <osd_id_2>
There should be misplaced objects but it should not be going down as we have 'norebalance' set.
3. Run upmap-remapped.py Twice
Run upmap-remapped.py twice to reduce misplaced objects before allowing the
cluster to rebalance:
./upmap-remapped.py | sh -x
./upmap-remapped.py | sh -x
The number of misplaced objects in ceph -s should decrease.
4. Unset norebalance, Turn the Balancer On, and Wait
Allow the cluster to rebalance after the OSDs are out:
ceph osd unset norebalance
ceph balancer on
Misplaced objects will start to increase based on target_max_misplaced_ratio (recommend to set this to 0.005 or 5%).
Wait until rebalancing and backfill finish before continuing:
ceph -s
5. Set norebalance and Turn the Balancer Off Again
After the cluster is stable, stop automatic data movement again before asking cephadm to recreate the OSDs:
ceph osd set norebalance
ceph balancer off
6. Let cephadm Destroy and Recreate the OSDs
Confirm that the target devices are still selected by a cephadm OSD service. If
the devices are not managed, cephadm will zap them but will not automatically
recreate the OSDs and leave them as destroyed.
ceph orch ls --service_type osd
ceph orch set-unmanaged <osd_service_name_here>
Then remove the OSDs with --replace --zap:
ceph orch osd rm <osd_id> --replace --zap
For multiple OSDs:
ceph orch osd rm <osd_id_1> <osd_id_2> --replace --zap
Monitor the replacement process:
ceph orch osd rm status
ceph orch ps --daemon_type osd
ceph osd tree
ceph log last cephadm
The OSDs should be marked 'destroyed' if the OSD service is unmanaged:
ceph orch ls --service_type osd
ceph orch set-managed <osd_service_name_here>
OSDs should be created in this step and should be marked up.
Set cephadm service managed to recreate OSDs
7. Set the Recreated OSDs In
After cephadm recreates the OSDs and the daemons are running, mark the OSDs back in:
ceph osd in <osd_id>
For multiple OSDs:
ceph osd in <osd_id_1> <osd_id_2>
Verify that the recreated OSDs are up and in:
ceph osd tree
ceph -s
This should create misplaced objects.
8. Run upmap-remapped.py Twice Again
Run upmap-remapped.py twice again to reduce misplaced objects caused by the
recreated OSDs before allowing the final rebalance:
./upmap-remapped.py | sh -x
./upmap-remapped.py | sh -x
9. Unset norebalance, Turn the Balancer On, and Wait
Allow the cluster to complete the final rebalance:
ceph osd unset norebalance
ceph balancer on
Verify that the replacement OSDs are up and in, and that rebalancing finishes:
ceph osd tree
ceph orch osd rm status
ceph -s
Expected results:
- The replacement OSD daemons are running.
- 'ceph osd df tree' can be used to determine how many PGs / data are in an OSD state.
- cephadm recreates the OSDs that are marked as destroyed