Large OMAP on .log Pool
Issue Overview
Ceph clusters running RADOS Gateway (RGW) may report a LARGE_OMAP_OBJECTS
health warning related to the .rgw.log pool (or ceph-objectstore.rgw.log
depending on naming conventions). This warning indicates that usage log objects
have accumulated a large number of OMAP keys, which can impact performance.
Typical Warning:
HEALTH_WARN 1 large omap objects
[WRN] LARGE_OMAP_OBJECTS: 1 large omap objects
1 large objects found in pool '<pool-name>.rgw.log'
Search the cluster log for 'Large omap object found' for more details.
Example from Cluster Logs:
osd.<osd-id> [WRN] Large omap object found. Object: <pool-id>:<object-hash>:usage::usage.<shard-id>:head PG: <pool-id>.<pg-hash> (<pool-id>.<pg-short>) Key count: 203714 Size (bytes): 56605688
Understanding the Problem
What is the .log Pool?
The .rgw.log pool stores RGW usage logs, which track bandwidth and operations
for billing and monitoring purposes. Each usage log object accumulates OMAP
entries over time as users perform operations.
Why Do OMAP Objects Grow Large?
Usage log objects accumulate OMAP keys as they record:
- Bandwidth usage per user/bucket
- Operation counts (GET, PUT, DELETE, etc.)
- Request statistics
The object in the warning (usage::usage.<shard-id>:head) indicates a usage
log object that has grown to over 200,000 OMAP keys and ~54 MB in size.
Impact
- Performance degradation: Large OMAP objects can slow down OSD operations
- Health warnings: Cluster health status shows as WARN
- Increased memory usage: OSDs may consume more memory handling large OMAP objects
Root Cause
The most common cause is usage log trimming not being configured or functioning properly. By default, RGW does not automatically trim (delete) old usage logs, causing them to accumulate indefinitely.
Resolution
Step 1: Verify Current Usage Log Status
Check the current usage logs to see how much data has accumulated:
# List usage logs for the current month
radosgw-admin usage show
# Check for a specific time period
radosgw-admin usage show --start-date=YYYY-MM-DD --end-date=YYYY-MM-DD
Step 2: Manually Trim Existing Usage Logs
To immediately clean up accumulated usage logs:
# Trim all usage logs older than a specific date
# IMPORTANT: Always specify both --start-date and --end-date to avoid bug #72593
radosgw-admin usage trim --start-date=YYYY-MM-DD --end-date=YYYY-MM-DD
# Trim all usage logs (use with caution - you'll lose historical data)
radosgw-admin usage trim --start-date=YYYY-MM-DD --end-date=$(date -u +%Y-%m-%d)
Critical Note: In certain Ceph versions, you must specify the
--start-date parameter or the trim command will fail. See tracker issue:
72593
Important: Make sure to export any usage data you need for billing or compliance purposes before trimming.
Step 3: Manually Deep Scrub the Affected PG
After trimming usage logs, the LARGE_OMAP_OBJECTS warning will not
automatically clear. You must manually trigger a deep scrub on the affected
placement group:
# Deep scrub the specific PG (use the PG ID from the warning message)
ceph pg deep-scrub <pool-id>.<pg-hash>
Replace <pool-id>.<pg-hash> with the actual PG ID from your warning message.
The deep scrub will recalculate OMAP statistics and clear the warning if the
object is no longer considered large.
Step 4: Verify the Warning is Resolved
After trimming and deep scrubbing, monitor the cluster health:
# Check cluster health status
ceph health detail
# Verify OMAP object size has decreased
ceph pg dump | grep -i omap
# Check the .log pool size
ceph df | grep -i "\.log"
The LARGE_OMAP_OBJECTS warning should clear within a few minutes after the
deep scrub completes.
Prevention
Configure Usage Log Retention Policy
Set an appropriate retention period based on your billing and compliance requirements:
# Trim logs older than 90 days (recommended for most use cases)
ceph config set client.rgw rgw_usage_log_flush_threshold 90
ceph config set client.rgw rgw_usage_log_tick_interval 30
Disable Usage Logging (If Not Needed)
If you don't need usage statistics for billing or monitoring:
# Disable usage logging entirely
ceph config set client.rgw rgw_enable_usage_log false
# Restart RGW daemons
ceph orch restart rgw.<service-name>
Note: Disable usage logging only if you're certain you don't need the data for billing, monitoring, or compliance.
Set Up Monitoring
Create alerts to catch large OMAP objects before they become problematic:
# Set OMAP size thresholds (adjust as needed)
ceph config set osd osd_deep_scrub_large_omap_object_key_threshold 200000
ceph config set osd osd_deep_scrub_large_omap_object_value_sum_threshold 1073741824
Enable Automatic Usage Log Trimming
Configure RGW to automatically trim old usage logs by setting the following parameters:
# Enable usage log trimming (trim logs older than 30 days)
ceph config set client.rgw rgw_usage_log_trim_interval 86400
ceph config set client.rgw rgw_usage_max_shards 32
ceph config set client.rgw rgw_usage_max_user_shards 1
# Restart RGW daemons for settings to take effect
ceph orch restart rgw.<service-name>
Configuration Parameters:
rgw_usage_log_trim_interval: Interval in seconds between usage log trimming operations (86400 = 24 hours)rgw_usage_max_shards: Number of shards for usage logs (default: 32)rgw_usage_max_user_shards: Number of shards per user (default: 1)
Additional Notes
- Usage log trimming is a non-destructive operation - it only removes old log entries
- The trimming process may take several minutes for large datasets
- Consider scheduling manual trims during maintenance windows for very large accumulations
- Regular trimming prevents the accumulation of large OMAP objects
- Usage logs are separate from bucket index OMAP objects and require different maintenance procedures
Related Issues
Large OMAP objects can also occur on:
- Bucket index pools (
rgw.buckets.index) - caused by large numbers of objects in a single bucket - Metadata pools (
rgw.meta) - less common but possible with many users/buckets
Each pool type requires different resolution strategies.
References
- Ceph RGW Admin Guide: https://docs.ceph.com/en/latest/radosgw/admin/
- Usage Logging Documentation: https://docs.ceph.com/en/latest/radosgw/admin/#usage
- OMAP Object Management: https://docs.ceph.com/en/latest/rados/operations/health-checks/#large-omap-objects