Skip to main content

Overview

Chorus is data replication software designed for Object Storage systems. For exact information on supported Object Storage APIs and vendors, see:

When to Use Chorus

Migrate to a new S3 storage with zero downtime

You need to move from AWS S3 to self-hosted Ceph (or any S3-to-S3 migration) without stopping your application. Chorus Proxy sits in front of your storage, replicates data in the background, and switches traffic when ready.

Keep a synchronized backup of your S3 data

You want a live copy of your data in another storage for disaster recovery or compliance. Set up replication without the switch—Chorus continuously syncs changes to the follower storage.

Migrate from a managed S3 service where you can't deploy a proxy

Your data is in AWS S3 or another managed service where you can't intercept requests. Use Chorus Agent to receive S3 bucket notifications and replicate changes to your destination storage.

Verify migration integrity

After migration, you need to confirm all objects were copied correctly. Use the consistency check to compare buckets and identify any discrepancies.

How It Works

Chorus works by:

  • Users inputting storage credentials into the Chorus configuration.
  • Selecting one storage endpoint as the main while others become followers.
  • Using Chorus's S3 API instead of the main storage endpoint's API (after Chorus's S3 API has been configured and started)
  • Proxying Chorus requests to the main storage endpoint and asynchronously replicating the data to the follower storage endpoints.
  • Replicating all existing data from the main storage endpoint to follower storage endpoints in the background.
  • Supporting the configuration, the pausing, and the resumption of Data replication by user on a per-bucket basis with the web admin UI or CLI

Components

Chorus is structured around these main services:

  1. Chorus Proxy - S3 proxy that captures changes
  2. Chorus Worker - Processes replication tasks
  3. Chorus Agent - Alternative to Proxy using bucket notifications

chorus-diagram.png

The Chorus Proxy operates as an intermediary for the main S3 storage endpoint. This means that Chorus also provides an S3 API.

The Chorus Agent is an alternative to Proxy for environments where deploying a proxy is not feasible (e.g., managed S3 services like AWS S3). Instead of intercepting requests, Agent receives S3 bucket notifications via webhook to capture changes. Agent is S3 only and supports bucket-level replication only. See Agent Configuration for setup.

Here is the workflow for using Chorus Proxy (text such as -1- refers to the numbered arrows in the above diagram):

  1. A request is sent to the Chorus S3 API -1-.
  2. The Chorus Proxy redirects the request to the main storage according to routing policy in config -2-3-4-7-.
  3. For write requests ({POST}, {PUT}, {DELETE}), the proxy creates a task directing changes to be copied from the main storage endpoint to follower storage endpoints according to replication policy -5-6-.
  4. The Chorus Worker retrieves the task and syncs changes from the main to the follower -8-9-10-.

All changes generated by the proxy are stored in an event queue.

Chorus also has a initial replication feature for cases where the main S3 storage endpoint isn't initially empty. This allows Chorus to transfer existing data to followers in the background. The initial replication process works as follows:

  1. All buckets in the main storage endpoint are listed.
  2. All objects for all listed buckets in the main storage endpoint are listed.
  3. A task is created for each object: each object is synced from the main storage endpoint to the follower storage endpoint..
  4. The worker processes tasks in the background, copying or updating files as needed.

Features

  • S3 and OpenStack Swift storage support
  • User-level and bucket-level replication policies
  • Routing policies and migration switch (Proxy only)
  • PAUSE & RESUME replication
  • Defining custom S3 credentials for Chorus Proxy
  • Dynamic credentials management via API
  • Syncing object/bucket metadata, content, tags, ACL
  • Migrating existing data in background
  • Tracking replication lag
  • Data consistency check between storages
  • Per-storage rate limiting (requests per minute)
  • Worker rate limiting (concurrent objects copies)