RGW Crashes on multi-object deletes
Problem
RGWs crash when there are requests for multi-object deletes.
Solution
-
Set the configuration variable
rgw_multi_obj_del_max_aioto 1. Use the following command to do this:# ceph config set client.rgw rgw_multi_obj_del_max_aio 1 -
Restart all RadosGW daemons:
# ceph orch daemon restart <rgw>
Discussion
When multisite-sync is enabled, any bulk-delete operation will deadlock. See https://tracker.ceph.com/issues/63373 for more on this.
The configuration variable rgw_multi_obj_del_max_aio controls the concurrency
of the underlying RADOS delete operations when a client issues an S3
Multi-object Delete request. Limiting the value of this variable to 1
eliminates concurrency and thereby avoids the situation that causes this error.
This bug was fixed in Reef, but may be present in earlier releases of Ceph.
The upstream pull request in which this bug was fixed is https://github.com/ceph/ceph/pull/49362.
The commit in which this bug was fixed is https://github.com/ceph/ceph/commit/998ee313d4d306737b6ab851d101122693ab84c0.