Skip to main content

Usage

S3 migration

Please read the S3 Migration blog post for additional details on the use case and Chorus implementation.

Disclaimer

This document provides repeatable, executable, step-by-step instructions for migrating data between S3 storage systems using Chorus.

The goal is to enable users to follow these steps on a local machine, inspect configurations, and test APIs. The Chorus standalone binary is chosen as a straightforward and accessible option for this purpose.

Alternatively, the Docker Compose setup can be used to gain insight into running Chorus across multiple hosts in a production-like environment.

Prerequisites

  1. Install S3 CLI client s3cmd.
  2. Install Chorus standalone binary chorus: installation instructions.
  3. Install Chorus management CLI chorctl: installation instructions.

Step 1: Setup Chorus

Standalone binary is a playground version of Chorus. It hosts all required components on a different ports and stores all data in memory.

Start chorus binary to start Chorus standalone version and inspect the output.

$ chorus
_________ .__
\_ ___ \| |__ ___________ __ __ ______
/ \ \/| | \ / _ \_ __ \ | \/ ___/
\ \___| Y ( <_> ) | \/ | /\___ \
\______ /___| /\____/|__| |____//____ >
\/ \/ \/


S3 Proxy URL: http://127.0.0.1:9669
S3 Proxy Credentials (AccessKey|SecretKey):
- user1: [testKey1|testSecretKey1]
- user2: [testKey2|testSecretKey2]

GRPC mgmt API: 127.0.0.1:9670
HTTP mgmt API: http://127.0.0.1:9671
Redis URL: 127.0.0.1:59156

Storage list:
- [FAKE] one: http://127.0.0.1:9680 < MAIN
- [FAKE] two: http://127.0.0.1:9681
  • Chorus S3 proxy is on port 9669 with two users user1 and user2. By default, Chorus proxy forwards all S3 requests to [MAIN] storage (one).
  • Chorus management GRPC API is on port 9670 and HTTP API is on port 9671. GRPC API is used by chorctl CLI, HTTP by Web UI. Web UI can be hosted separately with docker. For Web UI please refer GitHub repo
  • Redis is on random port 59156.
  • Two in-memory S3 storages named one and two are on ports 9680 and 9681. Storage one is marked as MAIN. Fake storages can be called directly with any secret key and access key.

Chorus process will keep running in the foreground and print logs to the console.

tip

Print help with chorus -h to learn how to view and modify standalone configuration.

Changing configuration is not required to follow this guide but it might be interesting to inspect full config and experiment with it.

Step 2: Setup Clients

By default, chorctl expects GRPC API on localhost:9670. Run chorctl storage to see the list of storages and their status:

$ chorctl storage
NAME ADDRESS PROVIDER USERS
one [MAIN] http://127.0.0.1:9680 Other user1,user2
two http://127.0.0.1:9681 Other user1,user2
tip

Print help with chorctl -h to learn about available commands, flags, and configs.

The following command will create a config file proxy.conf for s3cmd to call Chorus proxy:

cat << EOF > proxy.conf
use_https = false
host_base = 127.0.0.1:9669
host_bucket = 127.0.0.1:9669
access_key = testKey1
secret_key = testSecretKey1
EOF

Now s3cmd with the config file can be used to interact with Chorus proxy:

# list buckets
$ s3cmd ls -c proxy.conf

# create bucket
$ s3cmd mb s3://test -c proxy.conf
Bucket 's3://test/' created

# List again to see the new bucket
$ s3cmd ls -c proxy.conf
2025-03-26 17:16 s3://test

By default, proxy forwards all requests to Main storage (one). Lets verify that bucket test was created only on one storage:

# exsists in one
$ s3cmd ls -c one.conf
2025-03-26 17:16 s3://test

# missing in two
$ s3cmd ls -c two.conf

Now, lets PUT a file in the test bucket:

echo "some content" | s3cmd put - s3://test/file.txt -c proxy.conf

Verify GET output for all 3 S3 endpoints:

# proxy returns object from main
$ s3cmd get s3://test/file.txt --quiet -c proxy.conf -
some content

# the same content is on main
$ s3cmd get s3://test/file.txt --quiet -c one.conf -
some content

# bucket not exists on two
$ s3cmd get s3://test/file.txt --quiet -c two.conf -
ERROR: Parameter problem: Source object 's3://test/file.txt' does not exist.
tip

Use s3cmd sync to quickly upload local files to S3 bucket:

s3cmd sync ./local-dir/ s3://your-bucket/remote-dir/

Step 3: Start bucket replication

Proxy is configured to route requests to storage one. The storage has bucket test with file file.txt.

Use chorctl to start data replication for bucket test from storage one to two:

# no replication exists:
$ chorctl repl
NAME PROGRESS SIZE OBJECTS EVENTS PAUSED AGE HAS_SWITCH

# check buckets available for replication from storage for given source and destination:
$ chorctl repl buckets --user=user1 --from=one --to=two
test

# start replication for bucket test from one to two
$ chorctl repl add --user=user1 --from=one --to=two --bucket=test

# check replication status
$ chorctl repl
NAME PROGRESS SIZE OBJECTS EVENTS PAUSED AGE HAS_SWITCH
user1:test:one->two [##########] 100.0 % 13 B/13 B 1/1 0/0 false 57s false

Last command output indicates that initial replication is done and our object was replicated to storage two.

tip

Use chorctl dash to view live replication dashboart in terminal.

Now, verify that the object is available on storage two:

$ s3cmd get s3://test/file.txt --quiet -c two.conf -
some content

From now on, all changes to the test bucket must be made through the Chorus proxy. Otherwise, the changes will not be replicated to storage two.

Lets see what happens when we PUT a new file to the test bucket using Chorus proxy:

echo "new content" | s3cmd put - s3://test/new.txt -c proxy.conf

See if the new object was detected in replication status:

$ chorctl repl
NAME PROGRESS SIZE OBJECTS EVENTS PAUSED AGE HAS_SWITCH
user1:test:one->two [##########] 100.0 % 13 B/13 B 1/1 3/3 false 11m false

The output shows 3/3 events because chorus created extra tasks to sync object meta and tags. Fields PROGRESS, SIZE, and OBJECTS showing status of initial replcation—replication of objects that existed in source bucket before replication was started. EVENTS column shows number of detected and processed events that were created during replication.

According to output, new object should be already availabe on storage two:

# access directly from storage one
$ s3cmd get s3://test/new.txt --quiet -c one.conf -
new content

# access directly from storage two
$ s3cmd get s3://test/new.txt --quiet -c one.conf -
new content
tip

Try to update or remove objects in test bucket using proxy and see how changes are replicated to storage two.

Step 4: Switch from storage one to two

Short recap of the current state:

  • Storage one is set as Main making Proxy forward all requests to it.
  • Bucket replication is enabled for bucket test from storage one to two.
  • During replication, all existing bucket data was copied to storage two as initial migration phase.
  • All new changes made to bucket test via Proxy are automatically replicated to storage two in background.

To migrate from storage one to two the replicated bucket must be "switched". After the switch, proxy will redirect all requests for bucket test to storage two and migraion will be completed.

note

This guide covers switch with downtime window. For other available strategies please read S3 migration blog post.

Previous call to chorctl repl shows that replication does not have switch enabled (see column HAS_SWITCH).

$ chorctl repl
NAME PROGRESS SIZE OBJECTS EVENTS PAUSED AGE HAS_SWITCH
user1:test:one->two [##########] 100.0 % 13 B/13 B 1/1 3/3 false 11m false

# list existing switches
$ chorctl repl switch
USER BUCKET FROM TO STATUS LAST_STARTED DONE

Now, create replication switch with downtime window for bucket test from storage one to two.

chorctl repl switch downtime --user=user1 --from=one --to=two --bucket=test --cron='* * * * *'

Switch must be created for existing replication and provide the same parameters as for replication creation ( user, from, to, bucket). Chorus will attempt to start downtime every minute accoring to cron expression * * * * *. If it will be able to process all events during downtime and check that migration was successful, it will complete the switch.

tip

Chorus provide rich API for planning downtime. Run chorctl repl switch downtime -h to explore downtime switch options and examples. Openapi also contains detailed information about switch API.

During downtime proxy will return 404 for all requests to the bucket test. After downtime is completed, proxy will start redirecting requests to storage two. Lets verify:

# --wide flag provide details about downtime status transitions
$ chorctl repl switch --wide
USER BUCKET FROM TO STATUS LAST_STARTED DONE
user1 test one two Done 9m 8m
2025-03-26T19:36:08+01:00 | not_started -> in_progress: downtime window started
2025-03-26T19:36:18+01:00 | in_progress -> check_in_progress: queue is drained, start buckets content check
2025-03-26T19:36:28+01:00 | check_in_progress -> done: switch done

Verify that proxy redirects requests to storage two:

# create new object using proxy
echo "very new content" | s3cmd put - s3://test/very.txt -c proxy.conf

# verify that object landed only on storage two
$ s3cmd get s3://test/very.txt --quiet -c two.conf -
very new content

$ s3cmd get s3://test/very.txt --quiet -c one.conf -
ERROR: Parameter problem: Source object 's3://test/very.txt' does not exist.

No new events appeared for replication:

$ chorctl repl
NAME PROGRESS SIZE OBJECTS EVENTS PAUSED AGE HAS_SWITCH
user1:test:one->two [##########] 100.0 % 13 B/13 B 1/1 3/3 false 43m true

At this point migration is done. Proceed to the next bucket or decommission storage one.

Admin UI

TODO: describe admin web UI