- Azure AD without on-prem Windows Active Directory? - Mon, Oct 25 2021
- An overview of Azure security - Mon, Mar 29 2021
- An introduction to Azure AD administrative units - Wed, Jan 6 2021
This example uses a single VM deployed to the East US region. The VM is protected in the West US region. Each region has a virtual network configured with a server subnet. The West US virtual network has an additional recovery subnet configured. The recovery network is used for static IP mappings when the protected server fails over.
Set up Azure Site Recovery
Deploy a Recovery Vault by searching and creating the resource "Backup and Site Recovery" in the Azure portal.
Give the new Recovery Vault a name, select the subscription and resource group, and select a location. Recovery Services Vault cannot replicate VMs in the same region as the vault; this is by design, because if the protected VM and the Recovery Services Vault were in the same region, both would be unavailable in the event of a regional outage. Click Create to deploy the Recovery Services Vault.
Protecting an Azure VM
Now that the Recovery Vault is in place, the next step is to protect the VM. This is done from within the Recovery Vault or from Properties on the VM blade. The example below will configure protection from the VM blade.
Go to the protected VM and select Disaster Recovery under Operations. This takes you to the Configure disaster recovery blade. From there, select the Target region. This region does not need to be the same as the Recovery Vault, but it cannot be the same as the source VM.
Select the Subscription type and VM resource group. The example below uses the default target resource group. Set the Virtual network to the destination VNet. A new target VNet is created unless an existing one is selected. The example below uses a network already set up in the West US region. Availability set is left as "Single instance."
The Storage, Replication, and Extension settings are left default for this example. Click on Enable Replication to finish. Notice that the bottom of the Configure Disaster Recovery window shows an image of the replication path.
Azure creates several new resources for the deployment, including destination resource groups, an Automation Account, and a Service Principle. Initial replication may take several minutes to finish, depending on the size of the VM. Once finished, the VM is protected.
IP Mapping
One last step before moving on to testing is IP mapping. The server has a static private IP set in the source subnet of 10.101.1.25. The default behavior is to create a new VNet at the destination with the same IP schema as the source. The new VNet will have the same name as the source and append "-asr" at the end of it. To provide more control over the IP space in this example, the default behavior is modified to assign a static IP on a designated failover VNet at the recovery region.
Go to the protected server, Operations, Disaster Recovery blade, and select Compute and Network. Click the Edit button at the top of the page.
Click on the Source NIC; this opens the Network interface configuration. Change the subnet and add the static IP address to the Private IP Address Target field. As shown below, the subnet changes to the recovery subnet created in the destination VNet and the IP is set to a static address. Click OK and Save once finished.
Verify settings
Before moving on, take a look at what was created when replication was enabled. ASR created a new resource group that has the same name as the protected server resource group with an "-asr" extension. This resource group contains two storage accounts or drives, depending on whether Managed Disks were used. One storage account is located in the source region; this is the disk that caches changes to the source drive prior to replication. The other storage account or disk is in the destination region and will become the server disk when it fails over.
There is also a new Automation account in the Recovery Services Vault resource group. This is used for failover automation tasks.
Failover Test
The VM is now replicated, but there is still one step left before it is considered protected. Go to Operations, Disaster Recovery on the protected server. The warning message below indicates that a test failover has not been completed. Therefore, let's test the failover.
Start the failover test by selecting Test failover at the top of the Disaster Recovery blade. A window will open allowing you to choose the recovery point and the Virtual Network. The example below uses the most recent recovery point and the VNet in the recovery region "WestNetwork." Click OK when finished.
The test will take a few minutes to finish. Once the process is complete, there will be new objects in the recovery resource group. Below, the test failover VM and NIC are shown. Both have the "-test" extension.
The properties of the NIC show that it is attached to the subnet and uses the IP we defined in an earlier step.
The test worked, but the subscription is now getting charged for two VMs. These test objects are removed by cleaning up the test failover. Cleanup removes the items created for the test, such as the VM and NIC, and returns the VM to a protected state.
Go back to the source VM and into the Disaster Recovery blade. At the top of the window is a Cleanup test failover option. Click that to start the cleanup.
A box will appear where you can add notes about the failover. Enter notes as needed and select the confirmation box to delete the test VMs. Click OK when finished.
The Disaster Recovery blade now shows a "Healthy" status with a Last successful Test Failover date and timestamp as shown below. If any problems occurred, any applicable errors will also be shown here.
Failover
A failover test is useful, but it is better to do a true failover to understand the steps and verify that the process works. This allows you to go through the process beforehand so you will be prepared in case an actual disaster occurs.
Start by going back to the Disaster Recovery blade of the protected server and select Failover at the top of the window. This will start the failover process.
You will have two options available. The first allows you to select the recovery point. Use the latest one or go to a previous point in time if needed. The other option allows you to shut down the server before failover. Check this box to shut down the servers.
* This is a disruptive test. Use caution when implementing in production.
Once the failover is complete, notice that the VM and NIC appear in the recovery resource group and the failover region similar to as in the test, but this time without "-test" appended to the name.
The next step commits the failover. However, be sure that you verify the recovery point prior to committing the failover; once the failover is committed, it is not possible to go back to another restore point. This is a safety feature, in the event you need to use a different recovery point. For example, if the server experienced data corruption at the last restore point as part of the outage, you may need to go to an earlier recovery point. Changing restore points is only an option prior to committing the failover.
Once the recovery is verified, commit the failover. Click OK at the verification window to continue.
Re-Protect the VM
The VM is now available in the recovery region. The next step is to re-protect and prepare the VM to move back to the original region. Start this process by re-protecting the VM. This is done by going into the server, Disaster Recovery and clicking Re-protect at the top of the screen. This will start the replication process back to the original region. All options can stay as default for this step.
The re-protection process can take some time, depending on the size of the VM. You can monitor the status of the process under Health and Status, Status.
A new resource group is created as part of the re-protection process. This resource group holds the cache disk for the recovery (source) region.
Once re-protection is finished, the VM will be in a state similar to that at the end of the "Protect a VM" step above. The rest of the steps follow the same process outlined above. The difference is that the source and destination regions are flipped. To complete re-protection, follow the IP Mapping and Failover Test steps, modifying the settings to reflect below.
IP Mapping – Configure source NIC with the destination subnet and static IP address, as shown below.
Test failover – Select the virtual network at the recovery region to test the failover, as shown below.
Failback the VM
Like re-protecting the VM, failing the VM back to its original region is the same as outlined in the "Failover VM" section above, only reversing the direction so the VM is replicated back to where we started.
The complete steps to failback are:
- Failover VM
- Commit failover
- Re-protect the VM
- Failover Test
You may notice that the recovery server and NIC are left behind in the recovery resource group. The server will not incur a charge if left in the deallocated state. These objects will be reused in the event of another failover.
Subscribe to 4sysops newsletter!
Conclusion
Azure Site Recovery is a valuable tool for any high availability strategy. The steps above outline the process for protecting an Azure VM to a second region. Always test it before you trust it. I recommend testing a failover with non-production servers prior to implementing in production.