CephFS: No space left on device

Problem

The error message No space left on device appears.

Attempt to remove a file called test1:

[root@mds ~]# rm test1
rm: remove regular empty file ‘test1’? y
rm: cannot remove ‘test1’: No space left on device

Run the command df -h, which shows us the actual filesystem consumption:

[root@mds ~]# df -h /cephfs/
Filesystem      Size  Used Avail Use% Mounted on
ceph-fuse       115T   87T   28T  76% /cephfs

Explanation

The mds_bal_fragment_size_max sets a hard limit on a directory's fragment size. When this limit is reached for a directory fragment and CephFS tries to add a new entry it fails with a No space left on device message. This is usually not observed for ordinary directories, because the fragment size is controlled by the mds_bal_split_size soft limit, which still may be exceeded due to delayed splitting. Because it is ten times smaller then the hard limit mds_bal_fragment_size_max (by default) it is unlikely to be reached although it is still possible, and one needs to check for this possibility too. Note that the error will appear when adding a file (not when removing it).

A more likely scenario is when mds_bal_fragment_size_max is reached for a special "stray" directory. This directory keeps entries of deleted (unlinked) files. When a file is deleted, its entry is unlinked from its directory and added to the stray directory. Later, when data for this file is deleted, the entry is removed from the stray directory. This however can be delayed if the file is still in use (e.g. it is is still opened by a client or remains in the MDS cache) or if there was a bulk deletion of many files, which can take some time to process. So the size of the stray directory may grow (and this correlates with having a large MDS cache) and as it never dynamically splits, it may reach the mds_bal_fragment_size_max limit. In that case, a file-removing operation will fail with the No space left on device error (when CephFS tries to add the removing entry to the stray directory).

This case can be confirmed by checking the mds num_strays counter, for example by running the command ceph tell mds.X perf dump |grep strays. If it shows that num_strays is close to 10 times mds_bal_fragment_size_max values, the the limit is reached.

You can try to decrease the number of strays by forcing the MDS to free deleted inodes from its cache (for example, ls -R /cephfs might help, or running the mds scrub command, or temporarily reducing the MDS cache size), although if it does not help or helps only temporarily and you need a permanent working solution, then the workaround is to increase the hard limit mds_bal_fragment_size_max. But be careful when increasing this value, because it may lead to oversized directory fragment objects in the metadata pool, which the OSDs may not be able to handle. We therefore recommend increasing it gradually, for example increasing it two times (2x) on the first step, checking if it is enough (num_strays does not reach the new limit) and increasing it more by repeating the process only if the previous step did not fix the problem.

The default value of mds_bal_fragment_size_max is 100,000, meaning the error will occur when num_strays approaches 1,000,000. If mds_cache_memory_limit has been increased, mds_bal_fragment_size_max should be increased accordingly to raise this threshold.

For more on mds_bal_fragment_size_max, see the "Size Thresholds" section of the "Configuring Directory Fragmentation" page of the upstream Ceph documentation.

Diagnosing the issue

Run the following commands on the active MDS node to check the relevant values:
```
ceph tell mds.{id} config get mds_bal_fragment_size_max
```
Check the current num_strays value
```
ceph tell mds.{id} perf dump | grep strays
```
If num_strays is approaching 10x the value of mds_bal_fragment_size_max, the issue is likely to occur or is already occurring. For example, with the default mds_bal_fragment_size_max of 100,000, a num_strays value approaching 1,000,000 indicates the threshold is being reached.

Solution

Double mds_bal_fragment_size_max to 200,000 on the active MDS node:

ceph config set mds mds_bal_fragment_size_max 200000

Monitor num_strays to ensure that it stays below 2,000,000 (10x the new value). If it again approaches the limit, further increase mds_bal_fragment_size_max in increments of 100,000. Be careful when increasing this value, because increasing it can lead to oversized directory fragment objects in the metadata pool that the OSDs might not be able to handle.

Remember that if mds_cache_memory_limit has been increased, mds_bal_fragment_size_max may need to be increased accordingly to raise this threshold.

Problem​

Explanation​

Diagnosing the issue​

Solution​

Problem

Explanation

Diagnosing the issue

Solution