Skip to main content

2 posts tagged with "chorus"

View All Tags

S3 migration with chorus

· 10 min read
Artem Torubarov
Software Engineer at Clyso

An S3 system holds data—call it source. It keeps applications running, but migration is needed, maybe due to scale limits or costs creeping up. A new S3 setup, target, is set to replace it. The challenge is to move all data from source to target with no downtime, no data lost, and no breaks for the apps using source. What can get this done?

What Are the Challenges?

Migrating S3 data brings several key difficulties:

  • Data and Metadata Consistency: It’s essential to make sure that all data is copied correctly—no objects lost or corrupted. Besides this, applications relying on metadata (e.g., ACLs, versions, timestamps) need to keep working as expected, so that metadata has to be spot-on too.
  • Ongoing Writes: Applications don’t stop writing to source storage during migration, and all that new data needs to reach target storage too. Synchronous replication sounds good but can get slow and messy—what’s the right response to a user if a PUT works on source storage but flops on target? The alternative is downtime: pause writes to source storage and copy everything at once. For some applications, though, downtime just isn’t an option.

These issues don’t stand alone—they tangle together. Verifying data integrity gets tricky when ongoing writes shift the dataset mid-migration.

Regarding Tools and Strategies

Two high-level approaches can tackle these challenges:

  • Do It Bucket by Bucket: This approach leans on a canary deployment strategy. Copy one bucket, switch the application to target storage, and check if everything runs as expected. If it does, move on to the next bucket; if not, flip back to source storage and dig into the problem. It cuts downtime too—copy a bucket at a time and switch the application only when all data’s in place.
  • Do It in Two Phases: Say a bucket holds 10 million objects, and copying takes a day. During that time, ongoing writes mean about 5% of objects get added, updated, or removed. So, copy all the data once without stopping writes. Then, use a short downtime to figure out which objects changed and copy just those. Call the first pass initial replication and the second event replication.

These ideas sound simple, but execution isn’t. Questions arise:

  • If applications expect one URL for all buckets, how does bucket-by-bucket work?
  • How can writes to one bucket be stopped for downtime?
  • How are changes (aka events) tracked for event replication?
  • How can 10 million objects be copied fast enough?
  • How does this scale to 10,000 buckets automatically?

Now, let’s see how Chorus handles these challenges and strategies.

Opensourcing Chorus project

· 4 min read
Artem Torubarov
Software Engineer at Clyso

Today, we're excited to share that we've released the Chorus project under the Apache 2.0 License. In this blog post, let's talk about what Chorus is and why we made it.

At Clyso, we frequently assist our customers in migrating infrastructure, whether to or from the cloud, or between different cloud providers. Our focus often centers around storage, particularly S3.

Like many others in the field, we initially relied on the fantastic Rclone tool, which excelled at the task. However, as we encountered challenges while attempting to migrate 100TB bucket with 100M objects, we recognized the need for an additional layer of automation. Migrating large buckets within a reasonable timeframe requires a machine with substantial RAM and network bandwidth to take advantage of the parallelism options provided by Rclone.

Yet, even with powerful machines, the risk of network problems or VM restarts interrupting the synchronization process remained. While Rclone handles restarts admirably by comparing object size, ETag, and modification time, the process becomes time-consuming and incurs additional costs for cloud-based S3, especially with very large buckets.

The missing piece in our puzzle was the ability to run Rclone on multiple machines for improved hardware utilization and the ability to track and store progress on remote persistent storage. With these goals in mind, we developed Chorus - a vendor-agnostic S3 backup, replication, and routing software. Written in Go, Chorus uses Rclone for S3 object copying, Redis for progress tracking, and Asynq work queue for load distribution across multiple machines.