This topic is resolved
- Thu, May 2 2019 at 6:43 am #1161326BargaviParticipantMember Points: 125Rank: Level 1
I would like to launch multiple Amazon EC2 spot instances (fleet?) using a custom AMI (docker?) for performing a deep-learning training task. I would like all the instances to share a common set of files for the purposes of training the model.
The idea here is not to lose training history and keep a backup in EBS (network drive?) AWS Certified when the spot instance is terminated by AWS due to pricing-limit/demand. The task state can be updated in a file and then resumed when instances are available.
Is it possible to launch all instances and let them work cooperatively to complete the training task? What kind of setup could accomplish this?
- Thu, May 2 2019 at 7:28 am #1161342
- Thu, May 2 2019 at 9:19 am #1161352Swapnil KambliModeratorPost count: 48Member Points: 4,750Rank: Level 1
Spot Fleet would do the trick for you to launch and maintain the number of machines running.
Here is the step by step tutorial for the similar requirement that you have.
Kindly reply back if you need to tweak or customize the solution.
You must be logged in to reply to this topic.