Ceph Time Synchronization with Chrony

Overview

Ceph clusters require precise time synchronization across all nodes, particularly Monitor (MON) nodes. Clock skew between monitors can prevent quorum formation, cause election storms, and lead to cluster instability. This guide explains how to configure Chrony for Ceph deployments.

Understanding NTP Stratum Architecture

Before diving into Ceph-specific configurations, it's important to understand how time synchronization works through the NTP stratum hierarchy. This hierarchical architecture ensures that accurate time flows from authoritative sources down through your network to every device.

What is Stratum?

Stratum is a measure of distance (in terms of synchronization hops) from the ultimate time source. The lower the stratum number, the closer the device is to an authoritative time reference.

Stratum Levels Explained

Stratum 0 - Reference Clocks

These are the most accurate timekeeping devices and represent the root of the entire time distribution tree:

Atomic clocks: Cesium, rubidium, or hydrogen maser atomic clocks (accuracy: nanoseconds)
GPS/GNSS receivers: Receive time signals from GPS, GLONASS, Galileo, or BeiDou satellites (accuracy: microseconds)
Radio clocks: Receive time broadcasts from national time services like WWVB, DCF77, or MSF

Stratum 0 devices do not directly connect to computer networks. They provide time signals (typically via pulse-per-second or PPS) to Stratum 1 servers through dedicated hardware connections like serial ports.

Stratum 1 - Primary Time Servers

These are computers directly connected to Stratum 0 devices:

Synchronized to within microseconds of UTC (Coordinated Universal Time)
Act as the first network-accessible time sources
Also called "primary time servers"
Often peer with other Stratum 1 servers for sanity checking and backup
Typical accuracy: 1-10 microseconds from UTC

Examples: time.nist.gov (NIST servers), GPS-equipped NTP servers in data centers

Stratum 2 - Secondary Time Servers

These are computers synchronized over a network to Stratum 1 servers:

Query multiple Stratum 1 servers for redundancy
Often peer with other Stratum 2 servers for stability
Act as servers for Stratum 3 clients
Typical accuracy: 1-10 milliseconds from UTC
Most public NTP servers on the internet are Stratum 2

Examples: Public pool servers (pool.ntp.org), ISP-provided NTP servers

Stratum 3 through 15 - Client Servers

Stratum 3 devices sync from Stratum 2 servers
Each subsequent level adds one stratum number
Most enterprise clients operate at Stratum 3 or 4
Accuracy degrades slightly at each level due to network latency and jitter
Typical accuracy at Stratum 3: 10-100 milliseconds from UTC

Stratum 16 - Unsynchronized

Special value indicating a device is not synchronized
Device has lost contact with all time sources
Should never be used as a time source by other devices
Chrony/NTP clients will reject Stratum 16 sources

How the Hierarchy Works

The NTP protocol uses this hierarchical structure to prevent timing loops and distribute load efficiently:

Bellman-Ford algorithm: Each client constructs a shortest-path tree to minimize round-trip delay to Stratum 1 servers
Reference ID tracking: Each server knows which upstream server it's synchronized to, preventing circular dependencies
Load distribution: Higher stratum servers reduce load on primary time sources

Why Multiple Strata Instead of Everyone Using Stratum 1?

Load distribution: If every device on the internet queried Stratum 1 servers directly, these primary servers would be overwhelmed with requests and unable to function properly.

Efficient scaling: The hierarchical model allows thousands of Stratum 2 servers to serve millions of Stratum 3+ clients without overloading the limited number of Stratum 1 sources.

Network efficiency: Clients should use time sources close to them in network topology. A Stratum 3 server on your local network will give better accuracy than a distant Stratum 1 server due to lower network latency.

Cost considerations: Operating a Stratum 1 server requires expensive hardware (GPS receivers, atomic clocks) and is unnecessary for most use cases.

Stratum and Accuracy: Not Always Correlated

Important: Stratum number indicates distance from reference, not quality of time. A well-configured Stratum 3 server on your local network can provide more accurate time than a poorly-configured or distant Stratum 1 server.

Factors affecting accuracy regardless of stratum:

Network latency: Variable delays between client and server
Symmetric vs asymmetric paths: Different delays in each direction
Server load: Overloaded servers respond inconsistently
Clock stability: Quality of the local oscillator
Peering relationships: Servers that peer can validate each other's time

Stratum in Ceph Deployments

For a typical Ceph cluster:

Stratum 1: Public time servers (time.nist.gov, pool.ntp.org)
           ↓
Stratum 2: Your Ceph Monitor nodes (synced to Stratum 1 + peered)
           ↓
Stratum 3: Your Ceph OSD/MDS/RGW nodes (synced to MON nodes)
           ↓
Stratum 4: Other infrastructure (synced to Ceph nodes)

For Ceph specifically:

Monitor nodes should be Stratum 2 or 3 (synced to external sources)
OSD/MDS/RGW nodes will be one stratum higher than MON nodes
The exact stratum number matters less than maintaining tight synchronization between monitors
Sub-millisecond accuracy is easily achievable at Stratum 3, well below Ceph's 50ms threshold

Verifying Your Stratum Level

Check your current stratum with Chrony:

# View stratum in tracking output
chronyc tracking | grep Stratum

# View stratum of your time sources
chronyc sources -v

Example output:

Reference ID    : C0A80001 (ntp1.example.com)
Stratum         : 3

This shows that the system is at Stratum 3, synchronized to a Stratum 2 source.

Why Ceph Demands Precise Time Synchronization

Ceph uses the Paxos consensus algorithm, which requires closely synchronized clocks:

Default tolerance: Ceph warns when clock skew exceeds 50ms (mon_clock_drift_allowed = 0.05)
Monitor behavior: Monitors with excessive clock skew cannot reliably participate in quorum
Critical operations affected: Monitor elections, client connections, and cluster state updates
Check frequency: Ceph evaluates time synchronization every 5 minutes

Clock skew symptoms:

HEALTH_WARN clock skew detected messages
Monitors stuck in probing, electing, or synchronizing states
Monitors failing to join quorum
Client authentication failures

Why Chrony for Ceph?

Chrony is recommended over legacy ntpd for Ceph clusters:

Better accuracy: Achieves sub-millisecond synchronization (well below Ceph's 50ms threshold)
Faster convergence: Synchronizes clocks more quickly after system boot or network outages
Handles network issues better: More resilient to intermittent connectivity
Smooth time adjustments: Uses clock slewing instead of sudden jumps (critical for Ceph)

Important: Ceph does not tolerate sudden time jumps. Never use ntpdate or similar tools that set time abruptly.

Ceph Cluster Architecture for Time Sync

Basic Setup (Most Common)

External NTP sources (pool.ntp.org, etc.)
           |
           v
    Ceph MON nodes (peer with each other)
           |
           v
    OSD/MDS/RGW nodes (sync from MONs)

Recommended Configuration

All MON nodes:
- Sync to multiple external NTP sources
- Peer with each other (critical for Ceph)
- Act as NTP servers for other cluster nodes
OSD/MDS/RGW nodes:
- Sync to all MON nodes
- No need to peer with each other
Network consideration:
- Use local/internal NTP server if available
- Avoid single network path to external sources

Chrony Configuration for Ceph

Monitor Node Configuration

Configure MON nodes to sync externally and peer with each other:

# /etc/chrony/chrony.conf - Ceph Monitor Node

# External time sources - use multiple for redundancy
pool pool.ntp.org iburst
server 0.pool.ntp.org iburst
server 1.pool.ntp.org iburst

# Peer with other Ceph monitor nodes (CRITICAL for Ceph)
peer mon1.cluster.local
peer mon2.cluster.local
peer mon3.cluster.local

# Allow other cluster nodes to query this server
allow 10.0.0.0/8

# Drift file location
driftfile /var/lib/chrony/drift

# Sync system clock to hardware clock
rtcsync

# Allow stepping the clock initially, then only slew
# First number: step threshold in seconds (1.0 = 1 second)
# Second number: step limit (3 = allow stepping for first 3 updates)
makestep 1.0 3

# Log files
logdir /var/log/chrony

Critical notes:

The peer directives ensure MON nodes sync with each other - this is MORE important than syncing to external sources
makestep 1.0 3 allows initial time steps but switches to slewing afterwards
Never remove the peering between monitors

OSD/MDS/RGW Node Configuration

Configure non-monitor cluster nodes to sync from MONs:

# /etc/chrony/chrony.conf - Ceph OSD/MDS/RGW Node

# Use Ceph monitor nodes as time sources
server mon1.cluster.local iburst
server mon2.cluster.local iburst
server mon3.cluster.local iburst

# Drift file location
driftfile /var/lib/chrony/drift

# Sync system clock to hardware clock
rtcsync

# Allow stepping the clock initially, then only slew
makestep 1.0 3

Configuration with Local NTP Server

If you have a dedicated internal NTP server:

# /etc/chrony/chrony.conf - Ceph Monitor with local NTP

# Internal NTP server (primary source)
server ntp.company.local iburst prefer

# External sources (backup)
server 0.pool.ntp.org iburst
server 1.pool.ntp.org iburst

# Peer with other monitors
peer mon1.cluster.local
peer mon2.cluster.local
peer mon3.cluster.local

allow 10.0.0.0/8
driftfile /var/lib/chrony/drift
rtcsync
makestep 1.0 3

Deployment and Verification

Install and Enable Chrony

# Install (Debian/Ubuntu)
apt-get install chrony

# Install (RHEL/CentOS)
yum install chrony

# Enable and start
systemctl enable chronyd
systemctl start chronyd

# Verify service is running
systemctl status chronyd

Disable Conflicting Time Services

# Stop and disable systemd-timesyncd (conflicts with Chrony)
systemctl stop systemd-timesyncd
systemctl disable systemd-timesyncd

# Stop and disable ntpd if present
systemctl stop ntpd
systemctl disable ntpd

Verify Chrony Synchronization

# Check time sources
chronyc sources -v

# Check tracking status
chronyc tracking

# Check server statistics (on MON nodes)
chronyc serverstats

Expected output from chronyc tracking:

Reference ID    : C0A80001 (mon1.cluster.local)
Stratum         : 3
Ref time (UTC)  : Mon Feb 10 15:45:32 2026
System time     : 0.000001234 seconds fast of NTP time
Last offset     : +0.000000987 seconds
RMS offset      : 0.000002145 seconds
Frequency       : 3.456 ppm slow
Residual freq   : +0.001 ppm
Skew            : 0.012 ppm
Root delay      : 0.001234567 seconds
Root dispersion : 0.000123456 seconds
Update interval : 64.5 seconds
Leap status     : Normal

Key indicators of good sync:

System time offset < 1ms (0.001 seconds)
Root dispersion < 10ms
Stratum value reasonable for your setup
Update interval regular (typically 64 seconds)

Verify Ceph Clock Synchronization

# Check Ceph cluster health
ceph -s

# Check time sync status (Ceph Luminous and later)
ceph time-sync-status

# Check for clock skew warnings
ceph health detail

Healthy output:

cluster:
  id:     a1b2c3d4-e5f6-7890-abcd-ef1234567890
  health: HEALTH_OK

services:
  mon: 3 daemons, quorum mon1,mon2,mon3 (age 2d)
  mgr: mon1(active, since 2d), standbys: mon2, mon3
  osd: 12 osds: 12 up (since 2d), 12 in (since 2w)

Problem output:

cluster:
  health: HEALTH_WARN
          clock skew detected on mon.2, mon.3

  mon.2 addr 10.0.1.2:6789/0 clock skew 0.085s > max 0.05s (latency 0.001s)
  mon.3 addr 10.0.1.3:6789/0 clock skew 0.076s > max 0.05s (latency 0.001s)

Troubleshooting Clock Skew

Check Current Skew

# Ceph's perspective on time sync
ceph time-sync-status

# Chrony's perspective
chronyc sources -v
chronyc tracking

Force Immediate Sync

If clocks are significantly out of sync:

# Stop chronyd
systemctl stop chronyd

# Force sync (one-time step)
chronyd -q 'server pool.ntp.org iburst'

# Restart chronyd
systemctl start chronyd

Warning: Only do this when the cluster is in a degraded state. During normal operation, let Chrony gradually correct drift.

Restart Chrony on All Nodes

# Restart Chrony service
systemctl restart chronyd

# Wait 5-15 minutes for Ceph to re-evaluate sync
# Ceph checks time sync every 5 minutes

Common Issues and Solutions

Issue: Clock skew persists despite Chrony showing good sync

Solution:

Verify that all MON nodes have identical Chrony configuration
Check that MON nodes are peering with each other
Ensure that there is no firewall blocking NTP (UDP port 123)

Restart ceph-mon services on the affected nodes:

systemctl restart ceph-mon@<hostname>
# or
ceph orch daemon restart mon.<hostname>

Issue: Virtual machines show persistent clock skew

Solution:

VM clocks tend to drift more than physical hardware
Ensure that the VM host has the accurate time
Enable VM guest time synchronization if available
Consider running MON nodes on physical hardware

Use hpet clocksource instead of tsc:

# Check current clocksource
cat /sys/devices/system/clocksource/clocksource0/current_clocksource

# Set to hpet
echo hpet > /sys/devices/system/clocksource/clocksource0/current_clocksource

Issue: High jitter or unstable sync

Solution:

Use local/internal NTP server on same network as Ceph cluster
Add more NTP sources for redundancy
Check network connectivity and latency to NTP sources
Reduce the polling interval (add minpoll 4 maxpoll 7 to server lines)

Adjust Clock Skew Tolerance (Not Recommended)

If absolutely necessary, increase Ceph's tolerance:

# Increase from default 0.05s to 0.1s (100ms)
ceph config set mon mon_clock_drift_allowed 0.1

# Check current value
ceph config get mon mon_clock_drift_allowed

Warning: Increase this only as a last resort. The default value exists to prevent serious cluster problems. Focus on fixing the underlying time sync issue in preference to doing this.

Best Practices for Ceph + Chrony

MON node peering is mandatory: Always configure peer between all MON nodes
Never use time jumps: Avoid ntpdate, chronyd -q during normal operation, or makestep without limits
Use multiple sources: Configure at least 3 external NTP sources
Local NTP preferred: Deploy an internal NTP server for better stability
Monitor continuously: Set up alerts for MON_CLOCK_SKEW warnings
Physical hardware for MONs: Run monitors on bare metal when possible
Consistent configuration: Use identical Chrony config across all MON nodes
Wait after changes: Give Ceph 5-15 minutes to re-evaluate after fixing time sync

Monitoring Time Sync

Add monitoring for:

Ceph health status (ceph -s)
Chrony tracking offset (chronyc tracking | grep "System time")
Clock skew warnings (ceph health detail | grep skew)

Example monitoring script:

#!/bin/bash
# Check Ceph clock skew

OFFSET=$(chronyc tracking | grep "System time" | awk '{print $4}')
HEALTH=$(ceph health detail | grep -i skew)

if [ ! -z "$HEALTH" ]; then
    echo "WARNING: Ceph clock skew detected"
    echo "$HEALTH"
    exit 1
fi

# Alert if offset > 10ms
if (( $(echo "$OFFSET > 0.010" | bc -l) )); then
    echo "WARNING: Time offset ${OFFSET}s exceeds 10ms threshold"
    exit 1
fi

echo "OK: Time sync healthy (offset: ${OFFSET}s)"

Summary

For Ceph clusters:

Use Chrony instead of ntpd
Configure MON nodes to peer with each other (critical)
Never allow sudden time jumps
Keep clock skew under 50ms (ideally < 10ms)
Monitor continuously and address warnings immediately
Use local NTP server when possible

Proper time synchronization is not optional for Ceph - it's a fundamental requirement for cluster stability.

Overview​

Understanding NTP Stratum Architecture​

What is Stratum?​

Stratum Levels Explained​

How the Hierarchy Works​

Why Multiple Strata Instead of Everyone Using Stratum 1?​

Stratum and Accuracy: Not Always Correlated​

Stratum in Ceph Deployments​

Verifying Your Stratum Level​

Why Ceph Demands Precise Time Synchronization​

Why Chrony for Ceph?​

Ceph Cluster Architecture for Time Sync​

Basic Setup (Most Common)​

Recommended Configuration​

Chrony Configuration for Ceph​

Monitor Node Configuration​

OSD/MDS/RGW Node Configuration​

Configuration with Local NTP Server​

Deployment and Verification​

Install and Enable Chrony​

Disable Conflicting Time Services​

Verify Chrony Synchronization​

Verify Ceph Clock Synchronization​

Troubleshooting Clock Skew​

Check Current Skew​

Force Immediate Sync​

Restart Chrony on All Nodes​

Common Issues and Solutions​

Adjust Clock Skew Tolerance (Not Recommended)​

Best Practices for Ceph + Chrony​

Monitoring Time Sync​

Summary​

See Also​

Overview

Understanding NTP Stratum Architecture

What is Stratum?

Stratum Levels Explained

How the Hierarchy Works

Why Multiple Strata Instead of Everyone Using Stratum 1?

Stratum and Accuracy: Not Always Correlated

Stratum in Ceph Deployments

Verifying Your Stratum Level

Why Ceph Demands Precise Time Synchronization

Why Chrony for Ceph?

Ceph Cluster Architecture for Time Sync

Basic Setup (Most Common)

Recommended Configuration

Chrony Configuration for Ceph

Monitor Node Configuration

OSD/MDS/RGW Node Configuration

Configuration with Local NTP Server

Deployment and Verification

Install and Enable Chrony

Disable Conflicting Time Services

Verify Chrony Synchronization

Verify Ceph Clock Synchronization

Troubleshooting Clock Skew

Check Current Skew

Force Immediate Sync

Restart Chrony on All Nodes

Common Issues and Solutions

Adjust Clock Skew Tolerance (Not Recommended)

Best Practices for Ceph + Chrony

Monitoring Time Sync

Summary

See Also