This article demonstrates how to orchestrate the failover of multiple Azure virtual machines (VMs) using recovery plans in Azure Site Recovery (ASR), a backup service for cloud or on-premises servers. Protecting a single VM is a straightforward process I covered in my previous article. A common requirement is to failover multiple VMs in a specific order and modify server and application settings as part of the process. Recovery plans in ASR aim to do this.

Travis Roberts

Travis Roberts is the Manager of Data Center Services at a Minnesota based Credit Union. Travis has 20 years of IT experience in the legal, pharmaceutical and marketing industries, and has worked with IT hardware manufacturers and managed service providers. Travis has held numerous technical certifications over the span of his career from Microsoft, VMware, Citrix and Cisco.

Recovery plans define a step-by-step process for VM failover. They use either manual intervention or a script to execute steps during a failover. A manual action prompts the administrator running the plan to take an action, such as checking a service or changing a setting. A scripted action uses Azure Automation to automate the process. Although more complex, automation makes the process more reliable.

The following example consists of two Azure VMs in the same resource group. An Azure recovery vault protects both. The failover sequence for the recovery plan is as follows:

  • The first VM starts
  • A manual check verifies a service
  • The second VM starts
  • Azure Automation runs a script to update a text file and restart a service

Create the recovery plan ^

The first step is to create the recovery plan. Go to the Recovery Services vault and open the protecting recovery vault; this is the location VMs will failover to. In this example, a West US recovery vault is protecting servers in the Central US Region. Go to Recovery Plans (Site Recovery) under Manage and create a recovery plan.

Create a recovery plan

Create a recovery plan

Give the recovery plan a name and select the Source region. After selecting this, an option to select the Destination and the Deployment model appears. Select the failover destination and Resource manager for the deployment model. Next, select the protected items. Once finished, it should look similar to the image below.

New recovery plan

New recovery plan

Click OK from the Selected items in the Create recovery plan window to finish creating the recovery plan.

Next, go to the new recovery plan and select Customize.

Customize the recovery plan

Customize the recovery plan

Notice there is one failover group labeled Group 1: Start. The two protected VMs are in this one group with the start action. The next step will create a second group and add one VM to it.

New recovery group

New recovery group

Remove the second VM from Group 1.

Delete a VM from a group

Delete a VM from a group

Create a new group.

Create Group 2

Create Group 2

Then add protected items.

Add protected items

Add protected items

Next, select the second VM.

Select the protected server

Select the protected server

Click OK from the Selected items window and Save the recovery plan when finished.

Save the recovery plan

Save the recovery plan

Manual failover action ^

Now there are two groups with the start action, starting each VM in sequence. The next step will create a manual action step. This step will prompt the administrator running the failover to perform an action before proceeding to the next step.

Select Add post action at Group 1: Start.

Add a manual post action

Add a manual post action

In the Insert action window, change it to Manual action. Give it a name, add a message, and select the boxes for both Test failover and Failover.

Manual Post Action

Manual Post Action

Click OK and Save to finish creating a manual action for Group 1. The user implementing the failover will get a prompt after the first server starts, requiring a response before moving to the next step.

Automated failover action ^

Manual intervention is useful in some circumstances, but it's not practical with complex recovery plans. The better option is an automated process that runs checks or makes configuration changes at failover. A scripted recovery action accomplishes this. Listed below are the pieces to make this work.

  • Configuration script: a PowerShell script that runs in the VM to make configuration changes
  • Azure Automation runbook: a PowerShell runbook referenced by the recovery plan scripted action
  • Storage account: location of the configuration script—this should be in a region other than the replication source because that may not be available at recovery time
  • Storage account key: the key Azure Automation uses to access the configuration script
  • Automation Run As account: the account used to authenticate the runbook to Azure

When the recovery plan starts, it passes specific job details to the Azure Automation runbook. These details include the resource group, failover direction, and computer name. The Azure Automation runbook interacts directly with Azure resources at the point of failover. From the runbook, you can take actions such as starting or stopping VMs or modifying an App Service.

Azure Automation cannot interact directly with the Azure infrastructure-as-a-service (IaaS) VM OS. That uses a configuration script. At runtime, the runbook passes relevant information to the script, hosted in an Azure Storage Account.

The Automation runbook uses the Set-AzureRMVMCustomScriptExtension command to run the configuration script inside the recovered VM. The script has logic to determine which recovery group is calling the script, as well as the VM and region. This allows for the creation of a single script to use in multiple failover and failback scenarios.

The overall process is:

  • The recovery plan calls the Azure Automation runbook, passing job details to the runbook as a PowerShell custom object.
  • The runbook starts, interacting directly with any Azure resources if required.
  • If changes are required inside the VM, a custom script extension retrieves the configuration script from the Storage account and runs it locally on the VM.

The example below demonstrates how to make a configuration change on a server via custom script extensions. This script modifies the default web page to indicate the region the VM is running in. After that, it restarts the IIS service.

Create a runbook ^

Protecting a VM creates an Azure Automation account in the recovery vault resource group. Create a new runbook by going to the Azure Automation account > Runbooks and selecting Create a runbook. Enter a name and description; select PowerShell as the type. Click Create to finish. The new, blank runbook will open.

Create a new runbook

Create a new runbook

The code below starts the configuration script. It's broken down into the following sections:

  • Parameters: A Recovery Services vault passes job-specific information to the Automation runbook when it starts; $recoveryPlanContext captures this information, which the rest of the runbook consumes.
  • Connect with the Run As account: The runbook authenticates with the Run As account to run the custom script extension command.
  • Command variables: These collect information from $recoveryPlanContext to pass to the custom script extension command.
  • Set-AzureRMCustomScriptExtension: This command runs the script in the target server OS. The example uses the full storage account key. It would be better to use Key Vault or an Automation encrypted variable in production. Also, the parameters passed are specific to the configuration script. The configuration script details are below.

Add and update the code (below) for your environment. Once finished, click Save and Publish.

Save and publish the runbook

Create the configuration script ^

Below is the code for the configuration script. The actions are intentionally simple with the goal of demonstrating how the process works. The runbook passes the Location and GroupID to the script. The first statement evaluates the group. If it equals Group2, it continues that block. The script can apply to multiple groups in the recovery plan if needed. For example, this script could include logic to automate the manual process configured earlier with Group1.

Within the Group2 code block, it evaluates the location to set text variables. The script updates the contents of a text file, in this case, the contents of the default.htm file. The last step restarts the IIS service. Modify the code for your environment and save it as a .ps1 file.

Upload the script to a Storage account ^

Now that the script is ready, upload it to an Azure Storage account. Be sure the Storage account is not in the source region as it may not be available in the event of a regional outage. Make a note of the Storage account name, Storage container name, Script name, and Storage account key. As noted above, the runbook needs all of these.

Configure the scripted action ^

All the pieces are now in place to configure the scripted action in the recovery plan. Go back to the recovery plan set up previously and select Customize. Go to Group 2: Start and select Add post action.

Scripted post action

Scripted post action

Give the action a name. Select the Automation account where you saved the runbook and add the runbook. Click OK and save the changes to the recovery plan when finished.

Scripted post action runbook

Scripted post action runbook

Test the failover ^

The next step is to test the failover. The test brings VMs online at the recovery region and runs actions assigned in the recovery plan. Start by going into the Recovery Services vault and select the recovery plan previously created. The plan will indicate that no test has been performed.

Recovery plan test status

Recovery plan test status

Click on Test failover to begin the test.

Start test failover

Start test failover

The Test failover window will appear next. Leave the Recovery Point on the latest and select the destination network.

Start test failover details

Start test failover details

Click Start to start the failover test. Click on CURRENT JOB to see details of the failover test.

Failover test details

Failover test details

The Group 1: Post-step will display a "User input required" warning.

User input required

User input required

Click on the three dots in the Manual action and complete the manual action. It may be necessary to refresh the view to see the Complete manual action option.

Complete manual action

Complete manual action

Add any relevant notes to Manual action details and select the Manual actions completed box. Click OK to continue. The failover test will move on to the next steps.

Manual action details

Manual action details

The test will show success for all steps once finished. The recovery plan will also indicate a successful test. Verify the configuration script ran by viewing the default web page on the source and recovery test VM. Below shows the source site displaying Central US, while the test failover VM web page has updated to show West US, indicating the script ran successfully.

Source default web page

Source default web page

Recovery default web page

Recovery default web page

Clean up the test failover after completing the test and verifying the actions. Go into the recovery plan and select Cleanup test failover.

Cleanup test failover

Cleanup test failover

Add notes as needed and select the option to delete the test failover VMs.

Test failover cleanup notes

Test failover cleanup notes

Click OK to finish the cleanup. This will remove the recovered test VMs.

Conclusion ^

The steps above outline the process to create an ASR recovery plan, including a custom script extension that makes configuration changes directly on the recovered VM. A test verified the recovery plan, and when it completed, we showed the steps taken to clean up the test. The custom script extension and configuration script are powerful tools for automating the failover process. In addition to the failover test, I recommend running a full failover and failback to verify the logic in the configuration script works in both directions.

Win the monthly 4sysops member prize for IT pros

0
Share
1 Comment
  1. Samuel Jamkhandi 6 months ago

    Nice one 😊

    0

Leave a reply

Your email address will not be published. Required fields are marked *

*

© 4sysops 2006 - 2019

CONTACT US

Please ask IT administration questions in the forums. Any other messages are welcome.

Sending

Log in with your credentials

or    

Forgot your details?

Create Account