Usage
S3 migration
See the S3 Migration blog post for more detail on use cases and Chorus implementation.
This document provides repeatable, executable, step-by-step instructions for migrating data between S3 storage systems using Chorus.
The goal is to enable users to follow these steps on a local machine, inspect configurations, and test APIs. The Chorus standalone binary has been chosen to provide a straightforward and accessible option for this purpose.
Alternatively, the Docker Compose setup can be used to gain insight into running Chorus across multiple hosts in a production-like environment.
Prerequisites
- Install the S3 CLI client s3cmd.
- Install the Chorus standalone binary
chorus: installation instructions. - Install the Chorus management CLI
chorctl: installation instructions.
Step 1: Set up Chorus
The standalone binary is a playground version of Chorus. It hosts all required components on different ports and stores all data in memory.
Start the chorus binary to start the standalone version of Chorus. Inspect
the output.
$ chorus
_________ .__
\_ ___ \| |__ ___________ __ __ ______
/ \ \/| | \ / _ \_ __ \ | \/ ___/
\ \___| Y ( <_> ) | \/ | /\___ \
\______ /___| /\____/|__| |____//____ >
\/ \/ \/
S3 Proxy URL: http://127.0.0.1:9669
S3 Proxy Credentials (AccessKey|SecretKey):
- user1: [testKey1|testSecretKey1]
- user2: [testKey2|testSecretKey2]
GRPC mgmt API: 127.0.0.1:9670
HTTP mgmt API: http://127.0.0.1:9671
Redis URL: 127.0.0.1:59156
Storage list:
- [FAKE] one: http://127.0.0.1:9680 < MAIN
- [FAKE] two: http://127.0.0.1:9681
- The Chorus S3 proxy is on port
9669, with two users:user1anduser2. By default, the Chorus proxy forwards all S3 requests to[MAIN]storage (one). - The Chorus management GRPC API is on port
9670and the HTTP API is on port9671. The GRPC API is used bychorctl. The CLI and HTTP are used by the by Web UI. The Web UI can be hosted separately with docker. For more on the Web UI, see the Chorus GitHub repo - Redis is on random port
59156. - Two in-memory S3 storage volumes named
oneandtwoare on ports9680and9681. Storageoneis marked asMAIN. Fake storage volumes can be called directly with any secret key and access key.
The Chorus process will keep running in the foreground and print logs to the console.
Print help with chorus -h to learn how to view and modify the standalone
configuration.
It is not necessary to change the standalone configuration to follow this guide but it might be instructive to inspect the full configuration and to experiment with it.
Step 2: Set up Clients
By default, chorctl expects GRPC API on localhost:9670. Run the command
chorctl storage to see the list of storages and their statuses:
$ chorctl storage
NAME ADDRESS PROVIDER USERS
one [MAIN] http://127.0.0.1:9680 Other user1,user2
two http://127.0.0.1:9681 Other user1,user2
Print help with chorctl -h to learn about available commands, flags, and
configs.
The following command creates a config file called proxy.conf that s3cmd
will use to call the Chorus proxy:
- Proxy
- Fake one
- Fake two
cat << EOF > proxy.conf
use_https = false
host_base = 127.0.0.1:9669
host_bucket = 127.0.0.1:9669
access_key = testKey1
secret_key = testSecretKey1
EOF
cat << EOF > one.conf
use_https = false
host_base = 127.0.0.1:9680
host_bucket = 127.0.0.1:9680
access_key = testKey1
secret_key = testSecretKey1
EOF
cat << EOF > two.conf
use_https = false
host_base = 127.0.0.1:9681
host_bucket = 127.0.0.1:9681
access_key = testKey1
secret_key = testSecretKey1
EOF
When this command has been run, s3cmd will use the config file to interact
with the Chorus proxy:
# list buckets
$ s3cmd ls -c proxy.conf
# create bucket
$ s3cmd mb s3://test -c proxy.conf
Bucket 's3://test/' created
# List again to see the new bucket
$ s3cmd ls -c proxy.conf
2025-03-26 17:16 s3://test
By default, the proxy forwards all requests to the Main storage (one).
Verify that the bucket called test was created on only the storage called
one:
# exsists in one
$ s3cmd ls -c one.conf
2025-03-26 17:16 s3://test
# missing in two
$ s3cmd ls -c two.conf
PUT a file in the test bucket:
echo "some content" | s3cmd put - s3://test/file.txt -c proxy.conf
Verify the GET output for all three S3 endpoints:
# proxy returns object from main
$ s3cmd get s3://test/file.txt --quiet -c proxy.conf -
some content
# the same content is on main
$ s3cmd get s3://test/file.txt --quiet -c one.conf -
some content
# bucket not exists on two
$ s3cmd get s3://test/file.txt --quiet -c two.conf -
ERROR: Parameter problem: Source object 's3://test/file.txt' does not exist.
Use the command s3cmd sync to upload local files to the S3 bucket quickly:
s3cmd sync ./local-dir/ s3://your-bucket/remote-dir/
Step 3: Start bucket replication
The proxy is configured to route requests to storage one. The storage has
the bucket test, which contains the file file.txt.
Use chorctl to start the process of replicating data in the bucket test
from storage one to storage two:
# no replication exists:
$ chorctl repl
NAME PROGRESS SIZE OBJECTS EVENTS PAUSED AGE HAS_SWITCH
# check buckets available for replication from storage for given source and destination:
$ chorctl repl buckets --user=user1 --from=one --to=two
test
# start replication for bucket test from one to two
$ chorctl repl add --user=user1 --from=one --to=two --bucket=test
# check replication status
$ chorctl repl
NAME PROGRESS SIZE OBJECTS EVENTS PAUSED AGE HAS_SWITCH
user1:test:one->two [##########] 100.0 % 13 B/13 B 1/1 0/0 false 57s false
The output of this Last command indicates that initial replication is done
and that our object was replicated from storage one to storage two.
Use the command chorctl dash to view the live replication dashboard in
terminal.
Verify that the object is present and available on storage two:
$ s3cmd get s3://test/file.txt --quiet -c two.conf -
some content
From now on, all changes to the test bucket must be made through the Chorus
proxy. Otherwise, the changes will not be replicated to storage two.
Let's see what happens when we PUT a new file to the test bucket using
Chorus proxy:
echo "new content" | s3cmd put - s3://test/new.txt -c proxy.conf
See if the new object was detected in replication status:
$ chorctl repl
NAME PROGRESS SIZE OBJECTS EVENTS PAUSED AGE HAS_SWITCH
user1:test:one->two [##########] 100.0 % 13 B/13 B 1/1 3/3 false 11m false
The output shows 3/3 events because Chorus created extra tasks to sync object
meta and tags. The fields PROGRESS, SIZE, and OBJECTS show the statuses
of initial replcation—the replication of objects that existed in source
bucket before replication was started. The EVENTS column shows the number of
detected and processed events that were created during replication.
According to the output, the new object should already be available on storage
two:
# access directly from storage one
$ s3cmd get s3://test/new.txt --quiet -c one.conf -
new content
# access directly from storage two
$ s3cmd get s3://test/new.txt --quiet -c one.conf -
new content
Try to update or remove objects in the test bucket by using the proxy and see
how changes are replicated to storage two.
Step 4: Switch from storage one to storage two
Here's a short recap of the current state:
- Storage
oneis set asMain, which makesProxyforward all requests to it. - Bucket replication is enabled for the bucket
test, and replication is occurring from storageonetotwo. - During replication, all existing bucket data was copied to storage
twoas theinitial migrationphase. - All new changes made to the bucket
testviaProxyare automatically replicated to storagetwoin background.
To migrate from storage one to storage two, the replicated bucket must be
"switched". After the switch, the proxy redirects all requests for the bucket
test to storage two and migration is then completed.
This guide covers switch with a downtime window. For other available strategies, see S3 migration blog post.
A previous call to chorctl repl shows that replication does not have switch
enabled (see the column HAS_SWITCH).
$ chorctl repl
NAME PROGRESS SIZE OBJECTS EVENTS PAUSED AGE HAS_SWITCH
user1:test:one->two [##########] 100.0 % 13 B/13 B 1/1 3/3 false 11m false
# list existing switches
$ chorctl repl switch
USER BUCKET FROM TO STATUS LAST_STARTED DONE
Create a replication switch with a downtime window for bucket test from
storage one to two.
chorctl repl switch downtime --user=user1 --from=one --to=two --bucket=test --cron='* * * * *'
The switch must be created for existing replication and must provide the same parameters as for replication creation ( user, from, to, bucket).
Chorus will attempt to start downtime every minute in accordance with the
syntax of the cron expression * * * * *. If Chorus determines that it will be able to process all events during
downtime and check that migration was successful, it will complete the switch.
Chorus provides a rich API for planning downtime. Run chorctl repl switch downtime -h to explore downtime switch options and examples.
Openapi
also contains detailed information about switch API.
During downtime, the proxy will return 404 for all requests to the bucket
test. After downtime is completed, the proxy starts redirecting requests
to storage two. Let's verify this:
# --wide flag provide details about downtime status transitions
$ chorctl repl switch --wide
USER BUCKET FROM TO STATUS LAST_STARTED DONE
user1 test one two Done 9m 8m
2025-03-26T19:36:08+01:00 | not_started -> in_progress: downtime window started
2025-03-26T19:36:18+01:00 | in_progress -> check_in_progress: queue is drained, start buckets content check
2025-03-26T19:36:28+01:00 | check_in_progress -> done: switch done
Verify that the proxy redirects requests to storage two:
# create new object using proxy
echo "very new content" | s3cmd put - s3://test/very.txt -c proxy.conf
# verify that object landed only on storage two
$ s3cmd get s3://test/very.txt --quiet -c two.conf -
very new content
$ s3cmd get s3://test/very.txt --quiet -c one.conf -
ERROR: Parameter problem: Source object 's3://test/very.txt' does not exist.
No new events appeared for replication:
$ chorctl repl
NAME PROGRESS SIZE OBJECTS EVENTS PAUSED AGE HAS_SWITCH
user1:test:one->two [##########] 100.0 % 13 B/13 B 1/1 3/3 false 43m true
Migration is now done. Proceed to the next bucket or decommission storage
one.
Admin UI
TODO: describe admin web UI