Before the Lumionous Release
- Ceph Cluster is in status HEALTH_OK
- Add all OSDs with weight 0 to the Ceph cluster
- Gradually increase the weight of all new OSDs by 0.1 to 1.0, depending on the base load of the cluster.
- Wait until the Ceph cluster has reached the status HEALTH_OK again or all PGs have reached the status active+clean
- Repeat the weight increase for the new OSDs until you have achieved the desired weighting.
Since the Luminous Release
- Ceph cluster is in HEALTH_OK status
- Set the 'norebalance' flag (and normally also nobackfill)
- Add the new OSDs to the cluster
- Wait until the PGs start peering with each other (this can take a few minutes)
- Remove the norebalance and nobackfill flag
- Wait until the Ceph cluster has reached the HEALTH_OK status again
Since the Nautilus Release
With the Nautilus release PG splitting and merging was introduced and the following default values were set:
"osd_pool_default_pg_num": "8"
"osd_pool_default_pgp_num": "0"
Furthermore, the osd_pool_default_pg_num should be set to a value that makes sense for the respective Ceph cluster.
The value 0 of osd_pool_default_pgp_num now indicates that this value is automatically monitored by the Ceph cluster and adjusted according to the following criteria:
Starting in Nautilus, this second step is no longer necessary: as long as pgp_num and pg_num currently match, pgp_num will automatically track any pg_num changes. More importantly, the adjustment of pgp_num to migrate data and (eventually) converge to pg_num is done gradually to limit the data migration load on the system based on the new target_max_misplaced_ratio config option (which defaults to .05, or 5%). That is, by default, Ceph will try to have no more than 5% of the data in a “misplaced” state and queued for migration, limiting the impact on client workloads. ceph.com/rados/new-in-nautilus-pg-merging-and-autotuning/
note
Before the Nautilus release, the number of PGs had to be adjusted manually for the respective pools. With Nautilus, the Ceph Manager module pg_autoscaler can take over.