When and How to Enable Client Auto Eviction
Problem
In some customer environments, an unknown workload is regularly causing
MDS_CLIENT_RECALL, MDS_SLOW_REQUEST, MDS_CLIENT_OLDEST_TID warnings.
Operators are using the client "eviction"
procedure as a workaround and would like some
automation.
Solution
Automatic client eviction should be used only sparingly, after the following conditions have been satisfied:
- The CephFS cluster is seeing
MDS_CLIENT_RECALLwarnings lasting many hours, withMDS_SLOW_REQUESTops also lasting many hours. - Manual client eviction is confirmed to resolve the
MDS_SLOW_REQUESTwarnings fully. - Manual client eviction is confirmed with the client/user to not have an adverse impact on their workload or data consistency.
If all of the above are true, then you may configure automatic client eviction, e.g. after 15 minutes of blocked caps eviction. Run the following commands to configure automatic client eviction:
# ceph config set mds mds_session_blocklist_on_evict false
# ceph config set mds mds_session_blocklist_on_timeout false
# ceph config set mds mds_cap_revoke_eviction_timeout 900