Skip to main content

Overview

Chorus is a data replication software designed for multiple S3 storage systems. It works by:

  • Users inputting storage credentials into the Chorus configuration.
  • One storage is selected as the main while others become followers.
  • Once configured and started, Chorus's S3 API can be used instead of the main storage's API.
  • Chorus proxies requests to the main storage and asynchronously replicates the data to follower storages.
  • All existing data is also replicated from main to follower in background.
  • Data replication can be configured, paused, resumed by user by bucket with web admin UI or CLI

Components

Chorus is structured around two main web services: Chorus Proxy and Chorus Worker.

chorus-diagram.png

The Chorus Proxy operates as an intermediary for the main S3 storage, which also means Chorus provides an S3 API. Using Chorus Proxy involves:

  1. Sending a request to the Chorus S3 API -1-.
  2. The Chorus Proxy redirects the request to the main storage according to routing policy in config -2-3-4-7-.
  3. For write requests ({POST}, {PUT}, {DELETE}), the proxy creates a task to copy changes from the main to follower storages according to replication policy -5-6-.
  4. The Chorus Worker retrieves the task and syncs changes from the main to the follower -8-9-10-.

All changes generated by the proxy are stored in an event queue.

Chorus also has a initial migration feature for cases where the main S3 storage isn't initially empty. This allows Chorus to transfer existing data to followers in the background. The initial migration process involves: sss

  1. Listing all buckets in the main.
  2. Listing all objects for all listed buckets in the main.
  3. Creating a task for each object to sync it from the main to the follower.
  4. The worker processes tasks in the background, copying or updating files as needed.

Features

  • routing & replication per bucket, PAUSE & RESUME
  • defining custom s3 credentials for Chorus Proxy
  • sync obj/bucket meta, content, tags, ACL
  • migrate existing data in background
  • track replication lag
  • worker rate-limit