MDS - Prevent MDS out-of-memory shortly after restart
Problem
Customer reports that the MDS is using much more memory than configured, and even goes OOM occasionally causing a service disruption.
Solution
Increase the mds_cache_trim_threshold
option from a default 64k to 512k:
# ceph config set mds mds_cache_trim_threshold 524288
Discussion
The MDS maintains its LRU cache size by periodically trimming entries. It trims up to mds_cache_trim_threshold
entries per tick. With the default setting of 64kB, a single highly active client can easily hammer the MDS and force it to increase its cache more quickly than it can be trimmed. By increasing this option to 512k, it will trim the LRU more actively, keeping the cache size under the configured limit.