Ceph RGW: Objects with Expired TTL Remain Visible (Stale Objects Issue)

Issue Summary

Objects with expired TTL (Time To Live) in a Ceph RGW bucket remain visible and listable but cannot be downloaded. The objects should have been deleted but are stuck in a stale state.

Root Cause

This is a known bug (https://tracker.ceph.com/issues/39495) that occurs when buckets are resharded before objects with TTL expire. The resharding operation causes the object expiry metadata to be lost or orphaned, preventing proper cleanup by the garbage collection process.

Affected Versions

Reproduced on: Ceph 17.2.6 and 17.2.8 (Quincy)
Fixed in: Reef release and later

Key Symptoms

Objects past their delete_at timestamp remain visible in bucket listings
Downloading expired objects returns 404 errors
Objects can be detected with radosgw-admin objects expire-stale list
Discrepancy between Swift stat and radosgw-admin object counts

Example Behavior

# Object is visible in listing
$ swift list STRCSTOR-17233
files/24mb.img

# But attempting to download returns 404
$ swift stat STRCSTOR-17233 files/24mb.img
Object HEAD failed: ... 404 Not Found

# Objects still present at RADOS layer after expiry
$ radosgw-admin bucket radoslist --bucket STRCSTOR-17233
f3a3ea75-3c40-4065-8073-c9d05477ebd0.178427027.5_files/24mb.img
[...shadow objects...]

Reproduction Steps

Upload object with TTL:

swift upload foobar foo -H "X-Delete-After: 100"

Reshard bucket before object expires:

radosgw-admin bucket reshard --bucket foobar --num-shards 47

Wait for object to expire
Object becomes stale:
- Listing shows object
- Download/stat returns 404
- Object visible with radosgw-admin objects expire-stale list

Workarounds

1. Clean Existing Stale Objects

Use the expire-stale command to remove orphaned objects:

# List stale objects
radosgw-admin objects expire-stale list --bucket <bucket-name>

# Remove stale objects (safe for production)
radosgw-admin objects expire-stale rm --bucket <bucket-name>

Notes:

Safe to run in production
Can be interrupted with Ctrl+C and resumed
Monitor progress with ceph df to see object count decrease
Tested with ~10,000 objects successfully

2. Prevent Future Occurrences (Quincy Clusters)

For clusters running Quincy until upgrade to Reef:

A. Disable Dynamic Resharding

# Disable dynamic resharding cluster-wide
ceph config set global rgw_dynamic_resharding false

B. Pre-shard Buckets with Higher Default

# Set default shard count to prime number (recommended: 293)
ceph config set global rgw_override_bucket_index_max_shards 293

# When creating buckets, specify shard count
radosgw-admin bucket create --bucket=mybucket --num-shards=293

Why prime numbers? Better distribution of objects across shards.

C. Set Bucket Object Limits

Recommend max ~20 million objects per bucket
Split larger datasets across multiple buckets

D. Apply GC Configuration Tuning

# Improve garbage collection performance
ceph config set global rgw_gc_max_concurrent_io 20
ceph config set global rgw_gc_max_trim_chunk 64

Trade-offs of Higher Default Shard Counts

Pros:

Prevents stale object issue
Better prepared for large buckets

Cons:

Slower listing performance on small buckets
More objects in index pool
Slightly higher metadata overhead

Recommendation: Test listing performance with 293 shards on a new bucket before applying cluster-wide.

Diagnostic Commands

# Check for stale objects
radosgw-admin objects expire-stale list --bucket <bucket-name>

# Check bucket stats and shard count
radosgw-admin bucket stats --bucket <bucket-name>

# List objects at RADOS layer
radosgw-admin bucket radoslist --bucket <bucket-name>

# Check bucket index consistency
radosgw-admin bucket check --bucket <bucket-name> --check-objects --fix

# Check lifecycle configuration
radosgw-admin lc list

# View object metadata including delete_at timestamp
radosgw-admin object stat --bucket=<bucket> --object=<object-name>

Long-term Solution

Upgrade to Reef or later where this bug is fixed. After upgrade:

Existing stale objects will still need to be cleaned with expire-stale rm
New objects will not experience this issue
Dynamic resharding can be safely re-enabled

Primary bug: https://tracker.ceph.com/issues/39495
Related: https://tracker.ceph.com/issues/63935

Summary of Recommendations

Immediate Actions (Production Quincy Clusters):

Run radosgw-admin objects expire-stale rm on affected buckets
Disable dynamic resharding
Pre-shard new buckets with ~293 shards
Set bucket object limits (~20M objects)
Apply GC tuning configuration

Long-term:

Plan upgrade to Reef or Squid release
After upgrade, clean remaining stale objects
Re-evaluate dynamic resharding settings

Issue Summary​

Root Cause​

Affected Versions​

Key Symptoms​

Example Behavior​

Reproduction Steps​

Workarounds​

1. Clean Existing Stale Objects​

2. Prevent Future Occurrences (Quincy Clusters)​

Trade-offs of Higher Default Shard Counts​

Diagnostic Commands​

Long-term Solution​

Related Tracker Issues​

Summary of Recommendations​