It’s no secret, moving data at scale can be a nightmare! S3 buckets with millions of existing objects as well as petabytes of data have always needed multiple deploying clusters of compute to allow moving data at scale.
But it seems those days are numbered as AWS roll out the red carpet for AWS DataSync advancements.
Tell me more about AWS DataSync
You got it! AWS DataSync makes it easy to move data over the network from on-premise storage and AWS storage devices (as well as between AWS Storage services). It also provides end-to-end security that includes encryption and integrity validation as part of the shifting process.
For larger migrations, the time spent transferring can vary drastically! Fortunately, Datasync uses a purpose built network protocol and a multi-threaded architecture that runs in parallel, making chunky data migrations speedy, secure and most importantly, stable!
Here’s a typical data transfer between on-premise and AWS using DataSync.
Migration vs Archival
Whilst migration is the main shaker for DataSync, it also has applications for archiving cold data that is often stored in expensive on-premise systems. AWS Datasync’s latest features allow for data movement to durable and secure long-term storage such as Amazon S3 Glacier or Amazon S3 Glacier Deep Archive.
The key migration features have become well known now since its inception and launch in 2018, but here’s a quick rundown if you’ve yet to get hands on experience with AWS DataSync:
You can use DataSync to move active datasets rapidly over the network into Amazon S3, Amazon EFS or Amazon FSx for Windows File Server.
Use DataSync to make an initial copy of your entire dataset and to schedule subsequent incremental transfers of changing data until the final cut-over to AWS.
Finally, DataSync preserves metadata between storage systems that have similar metadata structures, enabling a smooth transition of end users and applications to your target AWS Storage service.
Find out more
Be sure to visit the official AWS Migration blog to see a step-by-step configuration of a DataSync task that copies objects from one S3 bucket to another without deploying an agent on EC2. Senior Solutions Architect, Joe Viggiano, goes into great detail on the process and provides additional steps to configure migration tasks for cross-region and cross-account use cases. 👍