AWS DataSync is an online data transfer service that simplifies, automates, and accelerates moving data between on-premises storage systems and AWS Storage services, as well as between AWS Storage services. You can use DataSync to migrate active datasets to AWS, archive data to free up on-premises storage capacity, replicate data to AWS for business continuity, or transfer data to the cloud for analysis and processing.

Writing, maintaining, monitoring, and troubleshooting scripts to move large amounts of data can burden your IT operations and slow migration projects. DataSync eliminates or automatically handles this work for you. DataSync provides built-in security capabilities such as encryption of data in-transit, and data integrity verification in-transit and at-rest. It optimizes use of network bandwidth, and automatically recovers from network connectivity failures. In addition, DataSync provides control and monitoring capabilities such as data transfer scheduling and granular visibility into the transfer process through Amazon CloudWatch metrics, logs, and events.

DataSync can copy data between Network File System (NFS) shares, Server Message Block (SMB) shares, self-managed object storage, AWS Snowcone, Amazon Simple Storage Service (Amazon S3) buckets, Amazon Elastic File System (Amazon EFS) file systems, and Amazon FSx for Windows File Server file systems.

In this lab, we going to make use of AWS DataSync, usually will have this Architecture

But for this lab, to make it easier, I'm going to use all the elements on AWS Cloud, this means, the NFS and the AWS DataSync Agent will be provided by AWS, so the communication between the AWS DataSync Agent and the AWS DataSync will be internally in the same AWS infrastructure instead between AWS and On-premises.

Objective

We going to upload some demo files to a bucket in S3, and make a AWS DataSync Task to copy the content to the NFS.

Steps

  • Create a S3 bucket and upload some demo files
  • Create an Amazon Elastic File System (EFS)
  • Launch an AWS DataSync Agent in EC2
  • Associate the DataSync Agent to DataSync
  • Create a DataSync task
  • Create an EC2 instance to see the results of the task
  • Run the DataSync Task

Hands-On

  1. In the AWS Management Console go S3

2. Click on Create bucket

3. Give it a name, go down and click on Create bucket

4. Click on the name of the bucket to enter to the details

5. Click on Upload

6. Click on Add files

7. And add some image files from your computer (you can upload any random file), go down and click on Upload

8. If everything is OK, you will see a success message

9. Go to EFS

10. Click on Create file system

11. Give it a name and click on Customize

12. Disable Automatic backups as this is for testing, and click on Next

13. In the next window leave it in defaults and click Next, copy the security group, click Next

14. In the Policy editor you can leave it in blank and click on Next

15. Review and click on Create

16. Go to EC2

17. Go to Security groups

18. Select the security group associated with the EFS, and select the Inboud rules, then click on Edit inboud rules

19. Click on Add rule

20. Add All traffic, and select as source all your VPC CIDR block, click on save rules

21. Now that we have our NFS up, now it time to create the AWS DataSync agent, we have many ways to create this agent, go to this page Deploy an AWS DataSync Agent and you will see in the part of how to launch a DataSync Agent from EC2 you will see a command to choose the agent AMI for your AWS Region, copy this command

22. On your AWS Management Console, click on the Cloud Shell icon, between your search bar and the bell icon

23. This will open a browser terminal in a new tab, wait until is ready and paste the command to copy the ami

24. Go to EC2

25. Go to Instances

26. Click on Launch instances

27. Click on Community AMIs and paste the ami id in the search bar, press enter

28. Select the AMI

29. Select the t2.mico and click on Next

30. In the next window you can leave this at default and click Next

31. As well leve it at defaults and click on Next

32. The same on Tags, click on Next

33. In Security groups, add the port 80 (HTTP) and click on Review and Launch

34. Review the information and click on Launch

35. In the next window, choose Create a new key pair, give it a name, download it and click on Launch Instances

36. If everything is ok, you will see a success message, click on View Instances

37. Select the instance, and copy the Public IP

38. If you see this message it means the DataSync Agent is waiting to be associated

39. Go to DataSync

40. Click on the menu in the left and click on Agents

41. Click on Create agent

42. Select Amazon EC2, on Service endpoint enter your Region (all the lab need to be in the same region, EFS and DataSync Agent) and give the IP as Agent address and click on Get key

43. If everything is ok, you will see the activation key, give the agent a name, and click on Create agent

44. From other window go to EFS and click on the EFS we just create, go to Network tab and copy and paste one of the IP address

45. Now click on Create a task

46. On the next window, select Amazon S3 as Location type, select the bucket, and on IAM role click on Autogenerate, then click on Next

47. On Location type select NFS, Select the Agent, paste the IP address of the NFS server, and as mount path put root / and click on Next

48. In the next window, give it a name

49. Go down and in the Task logging, Select Do not send logs to CloudWatch, and click on Next

50. Review and click on Create task

51. now if everything goes well, you will see your task as Available in a few moments, now all you need to do is click on Start to the task run and copy the files from S3 to EFS, before we do this, lets create an EC2 instance, and mount the EFS, in a new window open AWS Management Console and go to EC2

52. Click on Instances

53. Before we create a new instance, give it a name to the datasync agent, to not connect to the wrong instance

54. Click on Launch instances

55. Select Amazon Linux 2 AMI 64-bit

56. Select t2.micro and click on Review and Launch

57. Review the configuration and click on Launch

58. Select the same Key pair, and check the checkbox and click on Launch Instances

59. Click on View instances

60. Select the instacen we just create (without name) and click on Connect

61. Select the SSH client and copy the command

62. Open a terminal, and go where you download the pem file and use chmod 400 to change the permissions of the pem file

cd ~/Downloads
chmod 400 datasync-keypair.pem

63. Now connect to the AWS instance with the command we copy

ssh -i "datasync-keypair.pem" ec2-user@ec2-3-16-112-71.us-east-2.compute.amazonaws.com

64. In the AWS Management Console go to EFS

65. Select the EFS we create for this lab

66. Click on Attach up on the right

67. Copy the command for mount your EFS system

68. In the terminal, in the EC2 instance, gain root permissions, and crea the efs folder

mkdir efs

69. Install the amazon-efs-utils

sudo yum -y install amazon-efs-utils

69. Now mount the EFS

sudo mount -t efs -o tls fs-03c3847b:/ efs

70. Enter the directory

cd efs

71. now in the DataSync Tab go and start the task

72. and click on Start

73. After it finish, you will see the files on the efs folder

ls
concept_diagram.jpeg  decrypt.png

Great now you know how to make a AWS DataSync between NFS and S3, in the next post we going to go deeper into the AWS Services.

Clean-Up

  1. Go to DataSync

2. Select the task and click on Actions, then Delete

3. Confirm

4. Go to Locations, select all the locations and click on Delete

5. Confirm

6. Go to Agents, select the Agent and click on Delete

7. Confirm

8. Go to EFS

9. Select the EFS system and click on Delete

10. Confirm

11. Go to EC2

12. Go to Instances

13. Select the two instances we use for this lab, and click on Instance state, then Terminate instance

14. Confirm

15. Go to Security Groups, select the security groups we create on this lab and click on Actions then on Delete security group

16. Confirm

17. Select the default security group (if you use this with the EFS) and click on Inbound rules tab, then Edit inbound rules

18. Delete the rule who accepts all traffic on the CIDR block (be careful, as this is the default security group)

19. Save the rules

20. Go to Key Pairs, select the key pair we create on this lab and click on Actions, then Delete

21. Confirm

22. Go to S3

23. Select the s3 bucket and click on Empty

24. Confirm

25. Go back, select again the bucket and now click on Delete

26. Confirm

Resources