Ceph RGW: Objects with Expired TTL Remain Visible (Stale Objects Issue)
Issue Summary
Objects with expired TTL (Time To Live) in a Ceph RGW bucket remain visible and listable but cannot be downloaded. The objects should have been deleted but are stuck in a stale state.
Root Cause
This is a known bug (https://tracker.ceph.com/issues/39495) that occurs when buckets are resharded before objects with TTL expire. The resharding operation causes the object expiry metadata to be lost or orphaned, preventing proper cleanup by the garbage collection process.
Affected Versions
- Reproduced on: Ceph 17.2.6 and 17.2.8 (Quincy)
- Fixed in: Reef release and later
Key Symptoms
- Objects past their
delete_attimestamp remain visible in bucket listings - Downloading expired objects returns 404 errors
- Objects can be detected with
radosgw-admin objects expire-stale list - Discrepancy between Swift stat and radosgw-admin object counts
Example Behavior
# Object is visible in listing
$ swift list STRCSTOR-17233
files/24mb.img
# But attempting to download returns 404
$ swift stat STRCSTOR-17233 files/24mb.img
Object HEAD failed: ... 404 Not Found
# Objects still present at RADOS layer after expiry
$ radosgw-admin bucket radoslist --bucket STRCSTOR-17233
f3a3ea75-3c40-4065-8073-c9d05477ebd0.178427027.5_files/24mb.img
[...shadow objects...]
Reproduction Steps
-
Upload object with TTL:
swift upload foobar foo -H "X-Delete-After: 100" -
Reshard bucket before object expires:
radosgw-admin bucket reshard --bucket foobar --num-shards 47 -
Wait for object to expire
-
Object becomes stale:
- Listing shows object
- Download/stat returns 404
- Object visible with
radosgw-admin objects expire-stale list
Workarounds
1. Clean Existing Stale Objects
Use the expire-stale command to remove orphaned objects:
# List stale objects
radosgw-admin objects expire-stale list --bucket <bucket-name>
# Remove stale objects (safe for production)
radosgw-admin objects expire-stale rm --bucket <bucket-name>
Notes:
- Safe to run in production
- Can be interrupted with Ctrl+C and resumed
- Monitor progress with
ceph dfto see object count decrease - Tested with ~10,000 objects successfully
2. Prevent Future Occurrences (Quincy Clusters)
For clusters running Quincy until upgrade to Reef:
A. Disable Dynamic Resharding
# Disable dynamic resharding cluster-wide
ceph config set global rgw_dynamic_resharding false
B. Pre-shard Buckets with Higher Default
# Set default shard count to prime number (recommended: 293)
ceph config set global rgw_override_bucket_index_max_shards 293
# When creating buckets, specify shard count
radosgw-admin bucket create --bucket=mybucket --num-shards=293
Why prime numbers? Better distribution of objects across shards.
C. Set Bucket Object Limits
- Recommend max ~20 million objects per bucket
- Split larger datasets across multiple buckets
D. Apply GC Configuration Tuning
# Improve garbage collection performance
ceph config set global rgw_gc_max_concurrent_io 20
ceph config set global rgw_gc_max_trim_chunk 64
Trade-offs of Higher Default Shard Counts
Pros:
- Prevents stale object issue
- Better prepared for large buckets
Cons:
- Slower listing performance on small buckets
- More objects in index pool
- Slightly higher metadata overhead
Recommendation: Test listing performance with 293 shards on a new bucket before applying cluster-wide.
Diagnostic Commands
# Check for stale objects
radosgw-admin objects expire-stale list --bucket <bucket-name>
# Check bucket stats and shard count
radosgw-admin bucket stats --bucket <bucket-name>
# List objects at RADOS layer
radosgw-admin bucket radoslist --bucket <bucket-name>
# Check bucket index consistency
radosgw-admin bucket check --bucket <bucket-name> --check-objects --fix
# Check lifecycle configuration
radosgw-admin lc list
# View object metadata including delete_at timestamp
radosgw-admin object stat --bucket=<bucket> --object=<object-name>
Long-term Solution
Upgrade to Reef or later where this bug is fixed. After upgrade:
- Existing stale objects will still need to be cleaned with
expire-stale rm - New objects will not experience this issue
- Dynamic resharding can be safely re-enabled
Related Tracker Issues
- Primary bug: https://tracker.ceph.com/issues/39495
- Related: https://tracker.ceph.com/issues/63935
Summary of Recommendations
Immediate Actions (Production Quincy Clusters):
- Run
radosgw-admin objects expire-stale rmon affected buckets - Disable dynamic resharding
- Pre-shard new buckets with ~293 shards
- Set bucket object limits (~20M objects)
- Apply GC tuning configuration
Long-term:
- Plan upgrade to Reef or Squid release
- After upgrade, clean remaining stale objects
- Re-evaluate dynamic resharding settings