Skip to main content

Overview

Chorus is data replication software designed for multiple S3 storage systems. It works by:

  • Users inputting storage credentials into the Chorus configuration.
  • Selecting one storage volume as the main while others become followers.
  • Using Chorus's S3 API instead of the main storage volume's API (after Chorus's S3 API has been configured and started)
  • Proxying Chorus requests to the main storage volume and asynchronously replicating the data to the follower storage volumes.
  • Replicating all existing data from the main storage volume to follower storage volumes in the background.
  • Supporting the configuration, the pausing, and the resumption of Data replication by user on a per-bucket basis with the web admin UI or CLI

Components

Chorus is structured around two main web services:

  1. Chorus Proxy
  2. Chorus Worker

chorus-diagram.png

The Chorus Proxy operates as an intermediary for the main S3 storage volume. This means that Chorus also provides an S3 API.

Here is the workflow for using Chorus Proxy (text such as -1- refers to the numbered arrows in the above diagram):

  1. A request is sent to the Chorus S3 API -1-.
  2. The Chorus Proxy redirects the request to the main storage according to routing policy in config -2-3-4-7-.
  3. For write requests ({POST}, {PUT}, {DELETE}), the proxy creates a task directing changes to be copied from the main storage volume to follower storage volumes according to replication policy -5-6-.
  4. The Chorus Worker retrieves the task and syncs changes from the main to the follower -8-9-10-.

All changes generated by the proxy are stored in an event queue.

Chorus also has a initial migration feature for cases where the main S3 storage volume isn't initially empty. This allows Chorus to transfer existing data to followers in the background. The initial migration process works as follows:

  1. All buckets in the main storage volume are listed.
  2. All objects for all listed buckets in the main storage volume are listed.
  3. A task is created for each object: each object is synced from the main storage volume to the follower storage volume..
  4. The worker processes tasks in the background, copying or updating files as needed.

Features

  • routing & replication per bucket, PAUSE & RESUME
  • defining custom s3 credentials for Chorus Proxy
  • syncing obj/bucket metadata, content, tags, ACL
  • migrating existing data in background
  • tracking replication lag
  • worker rate-limit