Skip to main content

Ceph RGW: Objects with Expired TTL Remain Visible (Stale Objects Issue)

Issue Summary

Objects with expired TTL (Time To Live) in a Ceph RGW bucket remain visible and listable but cannot be downloaded. The objects should have been deleted but are stuck in a stale state.

Root Cause

This is a known bug (https://tracker.ceph.com/issues/39495) that occurs when buckets are resharded before objects with TTL expire. The resharding operation causes the object expiry metadata to be lost or orphaned, preventing proper cleanup by the garbage collection process.

Affected Versions

  • Reproduced on: Ceph 17.2.6 and 17.2.8 (Quincy)
  • Fixed in: Reef release and later

Key Symptoms

  1. Objects past their delete_at timestamp remain visible in bucket listings
  2. Downloading expired objects returns 404 errors
  3. Objects can be detected with radosgw-admin objects expire-stale list
  4. Discrepancy between Swift stat and radosgw-admin object counts

Example Behavior

# Object is visible in listing
$ swift list STRCSTOR-17233
files/24mb.img

# But attempting to download returns 404
$ swift stat STRCSTOR-17233 files/24mb.img
Object HEAD failed: ... 404 Not Found

# Objects still present at RADOS layer after expiry
$ radosgw-admin bucket radoslist --bucket STRCSTOR-17233
f3a3ea75-3c40-4065-8073-c9d05477ebd0.178427027.5_files/24mb.img
[...shadow objects...]

Reproduction Steps

  1. Upload object with TTL:

    swift upload foobar foo -H "X-Delete-After: 100"
  2. Reshard bucket before object expires:

    radosgw-admin bucket reshard --bucket foobar --num-shards 47
  3. Wait for object to expire

  4. Object becomes stale:

    • Listing shows object
    • Download/stat returns 404
    • Object visible with radosgw-admin objects expire-stale list

Workarounds

1. Clean Existing Stale Objects

Use the expire-stale command to remove orphaned objects:

# List stale objects
radosgw-admin objects expire-stale list --bucket <bucket-name>

# Remove stale objects (safe for production)
radosgw-admin objects expire-stale rm --bucket <bucket-name>

Notes:

  • Safe to run in production
  • Can be interrupted with Ctrl+C and resumed
  • Monitor progress with ceph df to see object count decrease
  • Tested with ~10,000 objects successfully

2. Prevent Future Occurrences (Quincy Clusters)

For clusters running Quincy until upgrade to Reef:

A. Disable Dynamic Resharding

# Disable dynamic resharding cluster-wide
ceph config set global rgw_dynamic_resharding false

B. Pre-shard Buckets with Higher Default

# Set default shard count to prime number (recommended: 293)
ceph config set global rgw_override_bucket_index_max_shards 293

# When creating buckets, specify shard count
radosgw-admin bucket create --bucket=mybucket --num-shards=293

Why prime numbers? Better distribution of objects across shards.

C. Set Bucket Object Limits

  • Recommend max ~20 million objects per bucket
  • Split larger datasets across multiple buckets

D. Apply GC Configuration Tuning

# Improve garbage collection performance
ceph config set global rgw_gc_max_concurrent_io 20
ceph config set global rgw_gc_max_trim_chunk 64

Trade-offs of Higher Default Shard Counts

Pros:

  • Prevents stale object issue
  • Better prepared for large buckets

Cons:

  • Slower listing performance on small buckets
  • More objects in index pool
  • Slightly higher metadata overhead

Recommendation: Test listing performance with 293 shards on a new bucket before applying cluster-wide.

Diagnostic Commands

# Check for stale objects
radosgw-admin objects expire-stale list --bucket <bucket-name>

# Check bucket stats and shard count
radosgw-admin bucket stats --bucket <bucket-name>

# List objects at RADOS layer
radosgw-admin bucket radoslist --bucket <bucket-name>

# Check bucket index consistency
radosgw-admin bucket check --bucket <bucket-name> --check-objects --fix

# Check lifecycle configuration
radosgw-admin lc list

# View object metadata including delete_at timestamp
radosgw-admin object stat --bucket=<bucket> --object=<object-name>

Long-term Solution

Upgrade to Reef or later where this bug is fixed. After upgrade:

  • Existing stale objects will still need to be cleaned with expire-stale rm
  • New objects will not experience this issue
  • Dynamic resharding can be safely re-enabled

Summary of Recommendations

Immediate Actions (Production Quincy Clusters):

  1. Run radosgw-admin objects expire-stale rm on affected buckets
  2. Disable dynamic resharding
  3. Pre-shard new buckets with ~293 shards
  4. Set bucket object limits (~20M objects)
  5. Apply GC tuning configuration

Long-term:

  • Plan upgrade to Reef or Squid release
  • After upgrade, clean remaining stale objects
  • Re-evaluate dynamic resharding settings