The following information demonstrates the process of protecting and testing an Azure VM with Azure Site Recovery, as well as the process to failover and failback a VM, should the primary region become temporarily unavailable.
Latest posts by Travis Roberts (see all)

This example uses a single VM deployed to the East US region.  The VM is protected in the West US region. Each region has a virtual network configured with a server subnet.  The West US virtual network has an additional recovery subnet configured.  The recovery network is used for static IP mappings when the protected server fails over.

Azure Site Recovery Network

Azure Site Recovery Network

Set up Azure Site Recovery

Deploy a Recovery Vault by searching and creating the resource "Backup and Site Recovery" in the Azure portal.

Backup and Site Recovery

Backup and Site Recovery

Give the new Recovery Vault a name, select the subscription and resource group, and select a location. Recovery Services Vault cannot replicate VMs in the same region as the vault; this is by design, because if the protected VM and the Recovery Services Vault were in the same region, both would be unavailable in the event of a regional outage. Click Create to deploy the Recovery Services Vault.

Recovery Vault setup

Recovery Vault setup

Protecting an Azure VM

Now that the Recovery Vault is in place, the next step is to protect the VM. This is done from within the Recovery Vault or from Properties on the VM blade.  The example below will configure protection from the VM blade.

Go to the protected VM and select Disaster Recovery under Operations. This takes you to the Configure disaster recovery blade. From there, select the Target region. This region does not need to be the same as the Recovery Vault, but it cannot be the same as the source VM.

Select the Subscription type and VM resource group. The example below uses the default target resource group. Set the Virtual network to the destination VNet.  A new target VNet is created unless an existing one is selected.  The example below uses a network already set up in the West US region. Availability set is left as "Single instance."

Configure Disaster Recovery

Configure Disaster Recovery

The Storage, Replication, and Extension settings are left default for this example. Click on Enable Replication to finish.  Notice that the bottom of the Configure Disaster Recovery window shows an image of the replication path.

Replication path

Replication path

Azure creates several new resources for the deployment, including destination resource groups, an Automation Account, and a Service Principle. Initial replication may take several minutes to finish, depending on the size of the VM. Once finished, the VM is protected.

IP Mapping

One last step before moving on to testing is IP mapping. The server has a static private IP set in the source subnet of 10.101.1.25. The default behavior is to create a new VNet at the destination with the same IP schema as the source. The new VNet will have the same name as the source and append "-asr" at the end of it.  To provide more control over the IP space in this example, the default behavior is modified to assign a static IP on a designated failover VNet at the recovery region.

Go to the protected server, Operations, Disaster Recovery blade, and select Compute and Network. Click the Edit button at the top of the page.

Edit network subnet

Edit network subnet

Click on the Source NIC; this opens the Network interface configuration.  Change the subnet and add the static IP address to the Private IP Address Target field.  As shown below, the subnet changes to the recovery subnet created in the destination VNet and the IP is set to a static address.  Click OK and Save once finished.

Edit network interface

Edit network interface

Verify settings

Before moving on, take a look at what was created when replication was enabled.  ASR created a new resource group that has the same name as the protected server resource group with an "-asr" extension.  This resource group contains two storage accounts or drives, depending on whether Managed Disks were used. One storage account is located in the source region; this is the disk that caches changes to the source drive prior to replication.  The other storage account or disk is in the destination region and will become the server disk when it fails over.

New Storage Accounts

New Storage Accounts

There is also a new Automation account in the Recovery Services Vault resource group. This is used for failover automation tasks.

New Automation Account

New Automation Account

Failover Test

The VM is now replicated, but there is still one step left before it is considered protected. Go to Operations, Disaster Recovery on the protected server.  The warning message below indicates that a test failover has not been completed.  Therefore, let's test the failover.

Start the failover test by selecting Test failover at the top of the Disaster Recovery blade. A window will open allowing you to choose the recovery point and the Virtual Network.  The example below uses the most recent recovery point and the VNet in the recovery region "WestNetwork."  Click OK when finished.

Test failover warning

Test failover warning

The test will take a few minutes to finish. Once the process is complete, there will be new objects in the recovery resource group. Below, the test failover VM and NIC are shown.  Both have the "-test" extension.

Failover test server and NIC

Failover test server and NIC

The properties of the NIC show that it is attached to the subnet and uses the IP we defined in an earlier step.

Test NIC details

Test NIC details

The test worked, but the subscription is now getting charged for two VMs. These test objects are removed by cleaning up the test failover.  Cleanup removes the items created for the test, such as the VM and NIC, and returns the VM to a protected state.

Go back to the source VM and into the Disaster Recovery blade. At the top of the window is a Cleanup test failover option. Click that to start the cleanup.

Cleanup test failover

Cleanup test failover

A box will appear where you can add notes about the failover. Enter notes as needed and select the confirmation box to delete the test VMs. Click OK when finished.

Test failover notes

Test failover notes

The Disaster Recovery blade now shows a "Healthy" status with a Last successful Test Failover date and timestamp as shown below. If any problems occurred, any applicable errors will also be shown here.

Successful test failover

Successful test failover

Failover

A failover test is useful, but it is better to do a true failover to understand the steps and verify that the process works. This allows you to go through the process beforehand so you will be prepared in case an actual disaster occurs.

Start by going back to the Disaster Recovery blade of the protected server and select Failover at the top of the window. This will start the failover process.

Start failover

Start failover

You will have two options available.  The first allows you to select the recovery point.  Use the latest one or go to a previous point in time if needed. The other option allows you to shut down the server before failover.  Check this box to shut down the servers.

Failover Recovery Point

Failover Recovery Point

* This is a disruptive test. Use caution when implementing in production.

Once the failover is complete, notice that the VM and NIC appear in the recovery resource group and the failover region similar to as in the test, but this time without "-test" appended to the name.

Failover server and NIC

Failover server and NIC

The next step commits the failover. However, be sure that you verify the recovery point prior to committing the failover; once the failover is committed, it is not possible to go back to another restore point. This is a safety feature, in the event you need to use a different recovery point. For example, if the server experienced data corruption at the last restore point as part of the outage, you may need to go to an earlier recovery point. Changing restore points is only an option prior to committing the failover.

Change Recovery Point

Change Recovery Point

Once the recovery is verified, commit the failover. Click OK at the verification window to continue.

Re-Protect the VM

The VM is now available in the recovery region. The next step is to re-protect and prepare the VM to move back to the original region.  Start this process by re-protecting the VM. This is done by going into the server, Disaster Recovery and clicking Re-protect at the top of the screen.  This will start the replication process back to the original region. All options can stay as default for this step.

Re protect

Re protect

Re protect options

Re protect options

The re-protection process can take some time, depending on the size of the VM. You can monitor the status of the process under Health and Status, Status.

Health and status

Health and status

A new resource group is created as part of the re-protection process.  This resource group holds the cache disk for the recovery (source) region.

Re protect disk cache

Re protect disk cache

Once re-protection is finished, the VM will be in a state similar to that at the end of the "Protect a VM" step above.  The rest of the steps follow the same process outlined above.  The difference is that the source and destination regions are flipped.  To complete re-protection, follow the IP Mapping and Failover Test steps, modifying the settings to reflect below.

IP Mapping – Configure source NIC with the destination subnet and static IP address, as shown below.

Re protect network settings

Re protect network settings

Test failover – Select the virtual network at the recovery region to test the failover, as shown below.

Re protect test failover

Re protect test failover

Failback the VM

Like re-protecting the VM, failing the VM back to its original region is the same as outlined in the "Failover VM" section above, only reversing the direction so the VM is replicated back to where we started.

The complete steps to failback are:

  1. Failover VM
  2. Commit failover
  3. Re-protect the VM
  4. Failover Test

You may notice that the recovery server and NIC are left behind in the recovery resource group.  The server will not incur a charge if left in the deallocated state.  These objects will be reused in the event of another failover.

Subscribe to 4sysops newsletter!

Conclusion

Azure Site Recovery is a valuable tool for any high availability strategy. The steps above outline the process for protecting an Azure VM to a second region.  Always test it before you trust it.  I recommend testing a failover with non-production servers prior to implementing in production.

0 Comments

Leave a reply

Please enclose code in pre tags

Your email address will not be published.

*

© 4sysops 2006 - 2023

CONTACT US

Please ask IT administration questions in the forums. Any other messages are welcome.

Sending

Log in with your credentials

or    

Forgot your details?

Create Account