Usage
S3 migration
Please read the S3 Migration blog post for additional details on the use case and Chorus implementation.
This document provides repeatable, executable, step-by-step instructions for migrating data between S3 storage systems using Chorus.
The goal is to enable users to follow these steps on a local machine, inspect configurations, and test APIs. The Chorus standalone binary is chosen as a straightforward and accessible option for this purpose.
Alternatively, the Docker Compose setup can be used to gain insight into running Chorus across multiple hosts in a production-like environment.
Prerequisites
- Install S3 CLI client s3cmd.
- Install Chorus standalone binary
chorus
: installation instructions. - Install Chorus management CLI
chorctl
: installation instructions.
Step 1: Setup Chorus
Standalone binary is a playground version of Chorus. It hosts all required components on a different ports and stores all data in memory.
Start chorus
binary to start Chorus standalone version and inspect the output.
$ chorus
_________ .__
\_ ___ \| |__ ___________ __ __ ______
/ \ \/| | \ / _ \_ __ \ | \/ ___/
\ \___| Y ( <_> ) | \/ | /\___ \
\______ /___| /\____/|__| |____//____ >
\/ \/ \/
S3 Proxy URL: http://127.0.0.1:9669
S3 Proxy Credentials (AccessKey|SecretKey):
- user1: [testKey1|testSecretKey1]
- user2: [testKey2|testSecretKey2]
GRPC mgmt API: 127.0.0.1:9670
HTTP mgmt API: http://127.0.0.1:9671
Redis URL: 127.0.0.1:59156
Storage list:
- [FAKE] one: http://127.0.0.1:9680 < MAIN
- [FAKE] two: http://127.0.0.1:9681
- Chorus S3 proxy is on port
9669
with two usersuser1
anduser2
. By default, Chorus proxy forwards all S3 requests to[MAIN]
storage (one
). - Chorus management GRPC API is on port
9670
and HTTP API is on port9671
. GRPC API is used bychorctl
CLI, HTTP by Web UI. Web UI can be hosted separately with docker. For Web UI please refer GitHub repo - Redis is on random port
59156
. - Two in-memory S3 storages named
one
andtwo
are on ports9680
and9681
. Storageone
is marked asMAIN
. Fake storages can be called directly with any secret key and access key.
Chorus process will keep running in the foreground and print logs to the console.
Print help with chorus -h
to learn how to view and modify standalone configuration.
Changing configuration is not required to follow this guide but it might be interesting to inspect full config and experiment with it.
Step 2: Setup Clients
By default, chorctl
expects GRPC API on localhost:9670
.
Run chorctl storage
to see the list of storages and their status:
$ chorctl storage
NAME ADDRESS PROVIDER USERS
one [MAIN] http://127.0.0.1:9680 Other user1,user2
two http://127.0.0.1:9681 Other user1,user2
Print help with chorctl -h
to learn about available commands, flags, and configs.
The following command will create a config file proxy.conf
for s3cmd
to call Chorus proxy:
- Proxy
- Fake one
- Fake two
cat << EOF > proxy.conf
use_https = false
host_base = 127.0.0.1:9669
host_bucket = 127.0.0.1:9669
access_key = testKey1
secret_key = testSecretKey1
EOF
cat << EOF > one.conf
use_https = false
host_base = 127.0.0.1:9680
host_bucket = 127.0.0.1:9680
access_key = testKey1
secret_key = testSecretKey1
EOF
cat << EOF > two.conf
use_https = false
host_base = 127.0.0.1:9681
host_bucket = 127.0.0.1:9681
access_key = testKey1
secret_key = testSecretKey1
EOF
Now s3cmd
with the config file can be used to interact with Chorus proxy:
# list buckets
$ s3cmd ls -c proxy.conf
# create bucket
$ s3cmd mb s3://test -c proxy.conf
Bucket 's3://test/' created
# List again to see the new bucket
$ s3cmd ls -c proxy.conf
2025-03-26 17:16 s3://test
By default, proxy forwards all requests to Main
storage (one
). Lets verify that bucket test
was created only on one
storage:
# exsists in one
$ s3cmd ls -c one.conf
2025-03-26 17:16 s3://test
# missing in two
$ s3cmd ls -c two.conf
Now, lets PUT
a file in the test
bucket:
echo "some content" | s3cmd put - s3://test/file.txt -c proxy.conf
Verify GET
output for all 3 S3 endpoints:
# proxy returns object from main
$ s3cmd get s3://test/file.txt --quiet -c proxy.conf -
some content
# the same content is on main
$ s3cmd get s3://test/file.txt --quiet -c one.conf -
some content
# bucket not exists on two
$ s3cmd get s3://test/file.txt --quiet -c two.conf -
ERROR: Parameter problem: Source object 's3://test/file.txt' does not exist.
Use s3cmd sync
to quickly upload local files to S3 bucket:
s3cmd sync ./local-dir/ s3://your-bucket/remote-dir/
Step 3: Start bucket replication
Proxy is configured to route requests to storage one
. The storage has bucket test
with file file.txt
.
Use chorctl
to start data replication for bucket test
from storage one
to two
:
# no replication exists:
$ chorctl repl
NAME PROGRESS SIZE OBJECTS EVENTS PAUSED AGE HAS_SWITCH
# check buckets available for replication from storage for given source and destination:
$ chorctl repl buckets --user=user1 --from=one --to=two
test
# start replication for bucket test from one to two
$ chorctl repl add --user=user1 --from=one --to=two --bucket=test
# check replication status
$ chorctl repl
NAME PROGRESS SIZE OBJECTS EVENTS PAUSED AGE HAS_SWITCH
user1:test:one->two [##########] 100.0 % 13 B/13 B 1/1 0/0 false 57s false
Last command output indicates that initial replication
is done and our object was replicated to storage two
.
Use chorctl dash
to view live replication dashboart in terminal.
Now, verify that the object is available on storage two
:
$ s3cmd get s3://test/file.txt --quiet -c two.conf -
some content
From now on, all changes to the test
bucket must be made through the Chorus proxy. Otherwise, the changes will not be replicated to storage two
.
Lets see what happens when we PUT
a new file to the test
bucket using Chorus proxy:
echo "new content" | s3cmd put - s3://test/new.txt -c proxy.conf
See if the new object was detected in replication status:
$ chorctl repl
NAME PROGRESS SIZE OBJECTS EVENTS PAUSED AGE HAS_SWITCH
user1:test:one->two [##########] 100.0 % 13 B/13 B 1/1 3/3 false 11m false
The output shows 3/3
events because chorus created extra tasks to sync object meta and tags.
Fields PROGRESS
, SIZE
, and OBJECTS
showing status of initial replcation
—replication of objects that existed in source bucket before replication was started. EVENTS
column shows number of detected and processed events that were created during replication.
According to output, new object should be already availabe on storage two
:
# access directly from storage one
$ s3cmd get s3://test/new.txt --quiet -c one.conf -
new content
# access directly from storage two
$ s3cmd get s3://test/new.txt --quiet -c one.conf -
new content
Try to update or remove objects in test
bucket using proxy and see how changes are replicated to storage two
.
Step 4: Switch from storage one to two
Short recap of the current state:
- Storage
one
is set asMain
makingProxy
forward all requests to it. - Bucket replication is enabled for bucket
test
from storageone
totwo
. - During replication, all existing bucket data was copied to storage
two
asinitial migration
phase. - All new changes made to bucket
test
viaProxy
are automatically replicated to storagetwo
in background.
To migrate from storage one
to two
the replicated bucket must be "switched".
After the switch, proxy will redirect all requests for bucket test
to storage two
and migraion will be completed.
This guide covers switch with downtime window. For other available strategies please read S3 migration blog post.
Previous call to chorctl repl
shows that replication does not have switch enabled (see column HAS_SWITCH
).
$ chorctl repl
NAME PROGRESS SIZE OBJECTS EVENTS PAUSED AGE HAS_SWITCH
user1:test:one->two [##########] 100.0 % 13 B/13 B 1/1 3/3 false 11m false
# list existing switches
$ chorctl repl switch
USER BUCKET FROM TO STATUS LAST_STARTED DONE
Now, create replication switch with downtime window for bucket test
from storage one
to two
.
chorctl repl switch downtime --user=user1 --from=one --to=two --bucket=test --cron='* * * * *'
Switch must be created for existing replication and provide the same parameters as for replication creation ( user, from, to, bucket).
Chorus will attempt to start downtime every minute accoring to cron expression * * * * *
.
If it will be able to process all events during downtime and check that migration was successful, it will complete the switch.
Chorus provide rich API for planning downtime. Run chorctl repl switch downtime -h
to explore downtime switch options and examples. Openapi also contains detailed information about switch API.
During downtime proxy will return 404
for all requests to the bucket test
. After downtime is completed, proxy will start redirecting requests to storage two
. Lets verify:
# --wide flag provide details about downtime status transitions
$ chorctl repl switch --wide
USER BUCKET FROM TO STATUS LAST_STARTED DONE
user1 test one two Done 9m 8m
2025-03-26T19:36:08+01:00 | not_started -> in_progress: downtime window started
2025-03-26T19:36:18+01:00 | in_progress -> check_in_progress: queue is drained, start buckets content check
2025-03-26T19:36:28+01:00 | check_in_progress -> done: switch done
Verify that proxy redirects requests to storage two
:
# create new object using proxy
echo "very new content" | s3cmd put - s3://test/very.txt -c proxy.conf
# verify that object landed only on storage two
$ s3cmd get s3://test/very.txt --quiet -c two.conf -
very new content
$ s3cmd get s3://test/very.txt --quiet -c one.conf -
ERROR: Parameter problem: Source object 's3://test/very.txt' does not exist.
No new events appeared for replication:
$ chorctl repl
NAME PROGRESS SIZE OBJECTS EVENTS PAUSED AGE HAS_SWITCH
user1:test:one->two [##########] 100.0 % 13 B/13 B 1/1 3/3 false 43m true
At this point migration is done. Proceed to the next bucket or decommission storage one
.
Admin UI
TODO: describe admin web UI