When and How to Enable Client Auto Eviction
Problem
In some customer environments, an unknown workload is regularly causing MDS_CLIENT_RECALL, MDS_SLOW_REQUEST, MDS_CLIENT_OLDEST_TID warnings. Operators are using the client "eviction" procedure above as a workaround and would like some automation.
Solution
Automatic client eviction should only be used sparingly, after the following conditions have been satisfied:
- The CephFS cluster is seeing MDS_CLIENT_RECALL warnings lasting many hours, with MDS_SLOW_REQUEST ops also lasting many hours.
- Manual client eviction is confirmed to resolve the MDS_SLOW_REQUEST warnings fully.
- Manual client eviction is confirmed with the client/user to not have an adverse impact on their workload or data consistency.
If all of the above are true, then you may configure automatic client eviction, e.g. after 15 minutes of blocked caps eviction:
# ceph config set mds mds_session_blocklist_on_evict false
# ceph config set mds mds_session_blocklist_on_timeout false
# ceph config set mds mds_cap_revoke_eviction_timeout 900