Skip to main content

Migrating to a New Backbone Router

Problem

Administrators don't know what to expect during migration or network maintenance. Should the cluster be taken completely offline? Should all connections to the cluster, such as S3 connections and ceph-fuse connections, be disconnected before migration?

Solution

Expected cluster_network behavior

Internal heartbeats are performed on the cluster_network. As long as the cluster_network stays online, Monitors and OSDs will not be impacted. But if the cluster_network is interrupted (for example, if communications between Monitors and OSDs are interrupted or if communications between OSDs and OSDs are interrupted), then the OSDs will send DOWN messages to the Monitor, and the Monitor will mark the OSDs down.

Expected public_network behavior

The public_network carries communications between (a) RGW and client communications and (b) the OSDs and Monitors. The public_network does not support communication between OSDs and MONs. As long as the public_network remains online, RGWs will not be impacted by the outage. But if there is an outage on the public_network, communications between the RGW and OSDs will be blocked for the duration of the intervention.

General Guidelines for this Kind of Migration

  1. A few minutes before the outage, run the following commands:

    • ceph osd set nodown
    • ceph osd set noout

    These flags instruct the Monitor to ignore any missed heartbeats and to ignore any OSD DOWN messages. You will observe SLOW_OPS (when OSDS cannot communicate, requests pile up), but these are not problematic.

  2. After the intervention, when the network is fully restored, run the following command:

    • ceph osd unset nodown

    If after 5 minutes the cluster is stable and everything looks normal, run the following command:

    • ceph osd unset noout

Discussion

Clyso's advice regarding nodown

Clyso provides the following insights regarding setting nodown:

There are at least two reasons to set nodown during network maintenance:

  1. osdmap churn, peering churn -- it's a stress test on peering and if that code isn't rock solid, there will be stuck PGs after the maintenance.

  2. max markdown -- if an OSD gets marked down 6 times in 1 hour, the process will exit. This is inevitable during network maintenance, and the result is that when the network is back online, many (none?) of the OSDs are running, and the admin needs to scramble to find all the OSD processes that aren't running, and restart them.

Additional Resources

  1. WARNING - THE FOLLOWING BLOGPOST IS PROVIDED ONLY FOR CONTEXT. IT DIFFERS FROM CLYSO'S ADVICE BY SUGGESTING THAT "NODOWN" NOT BE SET WHEN SHUTTING DOWN A CEPH CLUSTER. THIS IS THE OPPOSITE OF WHAT CLYSO RECOMMENDS.

    This Croit blogpost explains what not to do when shutting down a Ceph cluster.

  2. Clyso's CephFS Clean Power Off Procedure