This topic is resolved
- Thu, May 2 2019 at 6:43 am #1161326BargaviParticipant
Member Points: 125Rank: Level 2
- Topics: 2
- Replies: 0
I would like to launch multiple Amazon EC2 spot instances (fleet?) using a custom AMI (docker?) for performing a deep-learning training task. I would like all the instances to share a common set of files for the purposes of training the model.
The idea here is not to lose training history and keep a backup in EBS (network drive?) AWS Certified when the spot instance is terminated by AWS due to pricing-limit/demand. The task state can be updated in a file and then resumed when instances are available.
Is it possible to launch all instances and let them work cooperatively to complete the training task? What kind of setup could accomplish this?
- Thu, May 2 2019 at 7:28 am #1161342Michael PietroforteKeymaster
Post count: 1909Member Points: 25,469Rank: Level 4
- Topics: 170
- Replies: 640
It depends on the size of the project. With AWS ClouldFormation you can coordinate the provisioning of multiple AWS resources.
I am curious. What deep learning task is this? Which software are using?0
- Thu, May 2 2019 at 9:19 am #1161352Swapnil KambliModerator
Post count: 49Member Points: 4,869Rank: Level 3
- Topics: 0
- Replies: 60
Spot Fleet would do the trick for you to launch and maintain the number of machines running.
Here is the step by step tutorial for the similar requirement that you have.
Kindly reply back if you need to tweak or customize the solution.
- You must be logged in to reply to this topic.