Skip to main content

MDS - Prevent MDS out-of-memory shortly after restart

Problem

Customer reports that the MDS is using much more memory than configured, and even goes OOM occasionally causing a service disruption.

Solution

Increase the mds_cache_trim_threshold option from a default 64k to 512k:

# ceph config set mds mds_cache_trim_threshold 524288

Discussion

The MDS maintains its LRU cache size by periodically trimming entries. It trims up to mds_cache_trim_threshold entries per tick. With the default setting of 64kB, a single highly active client can easily hammer the MDS and force it to increase its cache more quickly than it can be trimmed. By increasing this option to 512k, it will trim the LRU more actively, keeping the cache size under the configured limit.