Skip to main content

CephFS Pool Data Usage Growth Without Explanation

Problem

The CephFS data pool usage is increasing, even though users are deleting their CephFS files.

Solution

Deleted files are added to a Purge Queue which are processed sequentially. If files are deleted by users more quickly than the purge queue can be processed, the data pool usage will increase over time.

Internally the MDS has a few options to throttle the processing of the purge queue:

  • mds_max_purge_ops (default 8192)
  • mds_max_purge_ops_per_pg (default 0.5)
  • filer_max_purge_ops (default 10)

The defaults for the mds_max_purge_ops related options are normally good. The default filer_max_purge_ops (10) is too small for CephFS file systems holding very large files (e.g. 1TB+).

Increase filer_max_purge_ops to 40 so that space can be freed up more quickly.

Discussion

Internally the MDS records the status of the Purge Queue in perf counters which can be queried using perf dump:

{
"pq_executing_ops": 44814,
"pq_executing_ops_high_water": 524321,
"pq_executing": 1,
"pq_executing_high_water": 64,
"pq_executed": 93799,
"pq_item_in_journal": 40967
}

After setting filer_max_purge_ops to 40, the Purge Queue clears out:

{
"pq_executing_ops": 0,
"pq_executing_ops_high_water": 524321,
"pq_executing": 0,
"pq_executing_high_water": 64,
"pq_executed": 133469,
"pq_item_in_journal": 0
}

In the above example, there are 40967 files to be removed, 1 file is currently being removed, and it has 44814 RADOS objects to be removed. With the default configuration, only 10 objects will be removed at once, so the queue of pg_items_in_journal will continue to grow, leading to unbound space usage.