Skip to main content

When and How to Enable Client Auto Eviction

Problem

In some customer environments, an unknown workload is regularly causing MDS_CLIENT_RECALL, MDS_SLOW_REQUEST, MDS_CLIENT_OLDEST_TID warnings. Operators are using the client "eviction" procedure as a workaround and would like some automation.

Solution

Automatic client eviction should be used only sparingly, after the following conditions have been satisfied:

  1. The CephFS cluster is seeing MDS_CLIENT_RECALL warnings lasting many hours, with MDS_SLOW_REQUEST ops also lasting many hours.
  2. Manual client eviction is confirmed to resolve the MDS_SLOW_REQUEST warnings fully.
  3. Manual client eviction is confirmed with the client/user to not have an adverse impact on their workload or data consistency.

If all of the above are true, then you may configure automatic client eviction, e.g. after 15 minutes of blocked caps eviction. Run the following commands to configure automatic client eviction:

# ceph config set mds mds_session_blocklist_on_evict false
# ceph config set mds mds_session_blocklist_on_timeout false
# ceph config set mds mds_cap_revoke_eviction_timeout 900