Ceph – pool nearfull

February 24, 2018 · 2 min read

Managing Director at Clyso

Ceph Version: Luminous 12.2.2 with Filestore and XFS

After more than 2 years, several OSDs on the productive Ceph cluster reported the error message:

** ERROR: osd init failed: (28) No space left on device

and terminated itself. Attempts to restart the OSD always ended with the same error message.

The Ceph cluster changed from HEALTH_OK to HEALTH_ERR status with the warning:

ceph osd near full

ceph pool near full

The superficial check with df -h sometimes showed 71% to 89% used disk space and no more files could be created in the file system.

No remount or unmount and mount has changed the situation.

The first suspicion was that the inode64 option for XFS might be missing, but this option was set. After closer examination of the internal statistics of the XFS file system with

xfs_db -r "-c freesp -s" /dev/sdd1

df -h

df -i

we chose the following solution:

First we stopped the recovery with

ceph osd set noout

so as not to fill the remaining OSDs any further. We then automatically distributed the data on the remaining Ceph cluster according to usage with

ceph osd reweight-by-utilization

We then moved a single PG (important: always different PGs per OSD) from the affected OSD to /root to have additional space on the file system and started the OSDs.

In the next step, we deleted virtual machine images that were no longer required from our cloud environment.

It took some time for the blocked requests to clear and the system to resume normal operation.

Unfortunately, it was not possible for us to definitively clarify the cause.

However, as we are currently in the process of switching from Filestore to Bluestore, we will soon no longer need XFS.