Skip to main content

Usage

S3 migration

See the S3 Migration blog post for more detail on use cases and Chorus implementation.

Disclaimer

This document provides repeatable, executable, step-by-step instructions for migrating data between S3 storage systems using Chorus.

The goal is to enable users to follow these steps on a local machine, inspect configurations, and test APIs. The Chorus standalone binary has been chosen to provide a straightforward and accessible option for this purpose.

Alternatively, the Docker Compose setup can be used to gain insight into running Chorus across multiple hosts in a production-like environment.

Prerequisites

  1. Install the S3 CLI client s3cmd.
  2. Install the Chorus standalone binary chorus: installation instructions.
  3. Install the Chorus management CLI chorctl: installation instructions.

Step 1: Set up Chorus

The standalone binary is a playground version of Chorus. It hosts all required components on different ports and stores all data in memory.

Start the chorus binary to start the standalone version of Chorus. Inspect the output.

$ chorus
_________ .__
\_ ___ \| |__ ___________ __ __ ______
/ \ \/| | \ / _ \_ __ \ | \/ ___/
\ \___| Y ( <_> ) | \/ | /\___ \
\______ /___| /\____/|__| |____//____ >
\/ \/ \/


S3 Proxy URL: http://127.0.0.1:9669
S3 Proxy Credentials (AccessKey|SecretKey):
- user1: [testKey1|testSecretKey1]
- user2: [testKey2|testSecretKey2]

GRPC mgmt API: 127.0.0.1:9670
HTTP mgmt API: http://127.0.0.1:9671
Redis URL: 127.0.0.1:59156

Storage list:
- [FAKE] one: http://127.0.0.1:9680 < MAIN
- [FAKE] two: http://127.0.0.1:9681
  • The Chorus S3 proxy is on port 9669, with two users: user1 and user2. By default, the Chorus proxy forwards all S3 requests to [MAIN] storage (one).
  • The Chorus management GRPC API is on port 9670 and the HTTP API is on port 9671. The GRPC API is used by chorctl. The CLI and HTTP are used by the by Web UI. The Web UI can be hosted separately with docker. For more on the Web UI, see the Chorus GitHub repo
  • Redis is on random port 59156.
  • Two in-memory S3 storage volumes named one and two are on ports 9680 and 9681. Storage one is marked as MAIN. Fake storage volumes can be called directly with any secret key and access key.

The Chorus process will keep running in the foreground and print logs to the console.

tip

Print help with chorus -h to learn how to view and modify the standalone configuration.

It is not necessary to change the standalone configuration to follow this guide but it might be instructive to inspect the full configuration and to experiment with it.

Step 2: Set up Clients

By default, chorctl expects GRPC API on localhost:9670. Run the command chorctl storage to see the list of storages and their statuses:

$ chorctl storage
NAME ADDRESS PROVIDER USERS
one [MAIN] http://127.0.0.1:9680 Other user1,user2
two http://127.0.0.1:9681 Other user1,user2
tip

Print help with chorctl -h to learn about available commands, flags, and configs.

The following command creates a config file called proxy.conf that s3cmd will use to call the Chorus proxy:

cat << EOF > proxy.conf
use_https = false
host_base = 127.0.0.1:9669
host_bucket = 127.0.0.1:9669
access_key = testKey1
secret_key = testSecretKey1
EOF

When this command has been run, s3cmd will use the config file to interact with the Chorus proxy:

# list buckets
$ s3cmd ls -c proxy.conf

# create bucket
$ s3cmd mb s3://test -c proxy.conf
Bucket 's3://test/' created

# List again to see the new bucket
$ s3cmd ls -c proxy.conf
2025-03-26 17:16 s3://test

By default, the proxy forwards all requests to the Main storage (one). Verify that the bucket called test was created on only the storage called one:

# exsists in one
$ s3cmd ls -c one.conf
2025-03-26 17:16 s3://test

# missing in two
$ s3cmd ls -c two.conf

PUT a file in the test bucket:

echo "some content" | s3cmd put - s3://test/file.txt -c proxy.conf

Verify the GET output for all three S3 endpoints:

# proxy returns object from main
$ s3cmd get s3://test/file.txt --quiet -c proxy.conf -
some content

# the same content is on main
$ s3cmd get s3://test/file.txt --quiet -c one.conf -
some content

# bucket not exists on two
$ s3cmd get s3://test/file.txt --quiet -c two.conf -
ERROR: Parameter problem: Source object 's3://test/file.txt' does not exist.
tip

Use the command s3cmd sync to upload local files to the S3 bucket quickly:

s3cmd sync ./local-dir/ s3://your-bucket/remote-dir/

Step 3: Start bucket replication

The proxy is configured to route requests to storage one. The storage has the bucket test, which contains the file file.txt.

Use chorctl to start the process of replicating data in the bucket test from storage one to storage two:

# no replication exists:
$ chorctl repl
NAME PROGRESS SIZE OBJECTS EVENTS PAUSED AGE HAS_SWITCH

# check buckets available for replication from storage for given source and destination:
$ chorctl repl buckets --user=user1 --from=one --to=two
test

# start replication for bucket test from one to two
$ chorctl repl add --user=user1 --from=one --to=two --bucket=test

# check replication status
$ chorctl repl
NAME PROGRESS SIZE OBJECTS EVENTS PAUSED AGE HAS_SWITCH
user1:test:one->two [##########] 100.0 % 13 B/13 B 1/1 0/0 false 57s false

The output of this Last command indicates that initial replication is done and that our object was replicated from storage one to storage two.

tip

Use the command chorctl dash to view the live replication dashboard in terminal.

Verify that the object is present and available on storage two:

$ s3cmd get s3://test/file.txt --quiet -c two.conf -
some content

From now on, all changes to the test bucket must be made through the Chorus proxy. Otherwise, the changes will not be replicated to storage two.

Let's see what happens when we PUT a new file to the test bucket using Chorus proxy:

echo "new content" | s3cmd put - s3://test/new.txt -c proxy.conf

See if the new object was detected in replication status:

$ chorctl repl
NAME PROGRESS SIZE OBJECTS EVENTS PAUSED AGE HAS_SWITCH
user1:test:one->two [##########] 100.0 % 13 B/13 B 1/1 3/3 false 11m false

The output shows 3/3 events because Chorus created extra tasks to sync object meta and tags. The fields PROGRESS, SIZE, and OBJECTS show the statuses of initial replcation—the replication of objects that existed in source bucket before replication was started. The EVENTS column shows the number of detected and processed events that were created during replication.

According to the output, the new object should already be available on storage two:

# access directly from storage one
$ s3cmd get s3://test/new.txt --quiet -c one.conf -
new content

# access directly from storage two
$ s3cmd get s3://test/new.txt --quiet -c one.conf -
new content
tip

Try to update or remove objects in the test bucket by using the proxy and see how changes are replicated to storage two.

Step 4: Switch from storage one to storage two

Here's a short recap of the current state:

  • Storage one is set as Main, which makes Proxy forward all requests to it.
  • Bucket replication is enabled for the bucket test, and replication is occurring from storage one to two.
  • During replication, all existing bucket data was copied to storage two as the initial migration phase.
  • All new changes made to the bucket test via Proxy are automatically replicated to storage two in background.

To migrate from storage one to storage two, the replicated bucket must be "switched". After the switch, the proxy redirects all requests for the bucket test to storage two and migration is then completed.

note

This guide covers switch with a downtime window. For other available strategies, see S3 migration blog post.

A previous call to chorctl repl shows that replication does not have switch enabled (see the column HAS_SWITCH).

$ chorctl repl
NAME PROGRESS SIZE OBJECTS EVENTS PAUSED AGE HAS_SWITCH
user1:test:one->two [##########] 100.0 % 13 B/13 B 1/1 3/3 false 11m false

# list existing switches
$ chorctl repl switch
USER BUCKET FROM TO STATUS LAST_STARTED DONE

Create a replication switch with a downtime window for bucket test from storage one to two.

chorctl repl switch downtime --user=user1 --from=one --to=two --bucket=test --cron='* * * * *'

The switch must be created for existing replication and must provide the same parameters as for replication creation ( user, from, to, bucket).

Chorus will attempt to start downtime every minute in accordance with the syntax of the cron expression * * * * *. If Chorus determines that it will be able to process all events during downtime and check that migration was successful, it will complete the switch.

tip

Chorus provides a rich API for planning downtime. Run chorctl repl switch downtime -h to explore downtime switch options and examples. Openapi also contains detailed information about switch API.

During downtime, the proxy will return 404 for all requests to the bucket test. After downtime is completed, the proxy starts redirecting requests to storage two. Let's verify this:

# --wide flag provide details about downtime status transitions
$ chorctl repl switch --wide
USER BUCKET FROM TO STATUS LAST_STARTED DONE
user1 test one two Done 9m 8m
2025-03-26T19:36:08+01:00 | not_started -> in_progress: downtime window started
2025-03-26T19:36:18+01:00 | in_progress -> check_in_progress: queue is drained, start buckets content check
2025-03-26T19:36:28+01:00 | check_in_progress -> done: switch done

Verify that the proxy redirects requests to storage two:

# create new object using proxy
echo "very new content" | s3cmd put - s3://test/very.txt -c proxy.conf

# verify that object landed only on storage two
$ s3cmd get s3://test/very.txt --quiet -c two.conf -
very new content

$ s3cmd get s3://test/very.txt --quiet -c one.conf -
ERROR: Parameter problem: Source object 's3://test/very.txt' does not exist.

No new events appeared for replication:

$ chorctl repl
NAME PROGRESS SIZE OBJECTS EVENTS PAUSED AGE HAS_SWITCH
user1:test:one->two [##########] 100.0 % 13 B/13 B 1/1 3/3 false 43m true

Migration is now done. Proceed to the next bucket or decommission storage one.

Admin UI

TODO: describe admin web UI