Increased IOPS on RGW Meta Pool after Upgrading to Pacific

Problem

A customer noticed that after upgrading from Nautilus to Pacific, the amount of read IOPS on the .rgw pool (aka the RGW Meta Pool) increased by a large factor. This was leading to a performance problem, with disks at nearly 100% IO utilization.

Debugging with debug_rgw=10 revealed that the RGW LRU cache was thrashing:

2023-10-06T14:21:38.749-0700 7f88225f0700 10 req 13873390272645627794 115.349494934s :get_bucket_info cache put: name=.rgw++scorpio-9B31 info.flags=0x6
2023-10-06T14:21:38.749-0700 7f88225f0700 10 removing entry: name=.rgw++convention823 from cache LRU

Solution

Pacific has an increased usage of the internal RGW bucket instance cache, and in a customer environment with many 10s of thousands of buckets, the default cache size 10000 is too small. The RGW cache size can be increased as follows:

# ceph config set global rgw_cache_lru_size 100000

Problem​

Solution​

Problem

Solution