Skip to main content

RGW Crashes on multi-object deletes

Problem

RGWs crash when there are requests for multi-object deletes.

Solution

  1. Set the configuration variable rgw_multi_obj_del_max_aio to 1. Use the following command to do this:

    # ceph config set client.rgw rgw_multi_obj_del_max_aio 1
  2. Restart all RadosGW daemons:

    # ceph orch daemon restart <rgw>

Discussion

When multisite-sync is enabled, any bulk-delete operation will deadlock. See https://tracker.ceph.com/issues/63373 for more on this.

The configuration variable rgw_multi_obj_del_max_aio controls the concurrency of the underlying RADOS delete operations when a client issues an S3 Multi-object Delete request. Limiting the value of this variable to 1 eliminates concurrency and thereby avoids the situation that causes this error.

This bug was fixed in Reef, but may be present in earlier releases of Ceph.

The upstream pull request in which this bug was fixed is https://github.com/ceph/ceph/pull/49362.

The commit in which this bug was fixed is https://github.com/ceph/ceph/commit/998ee313d4d306737b6ab851d101122693ab84c0.