This topic is resolved


This topic contains 2 replies, has 3 voices, and was last updated by  Swapnil Kambli 4 months, 2 weeks ago.

  • Author
  • #1161326
    Member Points: 125
    Rank: Level 1

    I would like to launch multiple Amazon EC2 spot instances (fleet?) using a custom AMI (docker?) for performing a deep-learning training task. I would like all the instances to share a common set of files for the purposes of training the model.

    The idea here is not to lose training history and keep a backup in EBS (network drive?) AWS Certified when the spot instance is terminated by AWS due to pricing-limit/demand. The task state can be updated in a file and then resumed when instances are available.

    Is it possible to launch all instances and let them work cooperatively to complete the training task? What kind of setup could accomplish this?


    Users who have liked this topic:

    • avatar
  • #1161342
     Michael Pietroforte 
    Post count: 1766
    Member Points: 21,663
    Author of the year 2018
    Rank: Level 1

    It depends on the size of the project. With AWS ClouldFormation you can coordinate the provisioning of multiple AWS resources.

    I am curious. What deep learning task is this? Which software are using?

  • #1161352
     Swapnil Kambli 
    Post count: 48
    Member Points: 4,750
    Rank: Level 1

    Hi Bargavi,

    Spot Fleet would do the trick for you to launch and maintain the number of machines running.

    Here is the step by step tutorial for the similar requirement that you have.

    Kindly reply back if you need to tweak or customize the solution.


    Users who have liked this topic:

    • avatar

You must be logged in to reply to this topic.

© 4sysops 2006 - 2019


Please ask IT administration questions in the forums. Any other messages are welcome.


Log in with your credentials


Forgot your details?

Create Account