Skip to main content

Overview

Chorus is data replication software designed for multiple S3 storage systems. It works by:

  • Users inputting storage credentials into the Chorus configuration.
  • Selecting one storage as the main while others become followers.
  • Using Chorus's S3 API instead of the main storage's API (after it has been configured and started)
  • Proxying Chorus requests to the main storage and asynchronously replicating the data to the follower storages.
  • Replicating all existing data from main to follower in the background.
  • Supporting the configuration, the pausing, and the resumption of Data replication by user on a per-bucket basis with the web admin UI or CLI

Components

Chorus is structured around two main web services:

  1. Chorus Proxy
  2. Chorus Worker

chorus-diagram.png

The Chorus Proxy operates as an intermediary for the main S3 storage. This means that Chorus also provides an S3 API.

Here is the workflow for using Chorus Proxy (text such as -1- refers to the numbered arrows in the above diagram):

  1. A request is sent to the Chorus S3 API -1-.
  2. The Chorus Proxy redirects the request to the main storage according to routing policy in config -2-3-4-7-.
  3. For write requests ({POST}, {PUT}, {DELETE}), the proxy creates a task directing changes to be copied from the main storage to follower storages according to replication policy -5-6-.
  4. The Chorus Worker retrieves the task and syncs changes from the main to the follower -8-9-10-.

All changes generated by the proxy are stored in an event queue.

Chorus also has a initial migration feature for cases where the main S3 storage isn't initially empty. This allows Chorus to transfer existing data to followers in the background. The initial migration process works as follows:

  1. All buckets in main are listed.
  2. All objects for all listed buckets in the main are listed.
  3. A task is created for each object to sync it from the main to the follower.
  4. The worker processes tasks in the background, copying or updating files as needed.

Features

  • routing & replication per bucket, PAUSE & RESUME
  • defining custom s3 credentials for Chorus Proxy
  • syncing obj/bucket metadata, content, tags, ACL
  • migrating existing data in background
  • tracking replication lag
  • worker rate-limit