When it comes to providing availability, you'll need to be aware of certain cases before moving your production environment, even in the cloud.
Types of failures in Azure ^
In an Azure cloud environment, we can talk about two types of failures you may have to face:
- Planned downtime: This type of outage happens due to planned maintenance tasks on the Azure infrastructure. These maintenance tasks include updates to the Hyper-V hosts, which may require rebooting your VMs.
- Unplanned downtime: These outages happen due to a fault in the underlying infrastructure for Azure VMs.
They are unexpected and usually longer than planned downtimes. In such cases Azure automatically migrates your affected VMs to a new physical machine. This may cause your VMs to reboot. In addition, users cannot access your applications during that time.
Yes, even for Azure, some planned downtimes may result in rebooting your VMs due to a critical patch. You need to design your applications and infrastructure for protection against both planned and unplanned failures that may happen in Azure. To this end, you need to make sure that at least one member from each tier is available at all times to serve your clients.
An availability set for Azure logically groups a minimum of two or more VMs. If you place your VMs within an availability set, Azure will make sure to distribute them automatically across separate update and fault domains.
Fault domain vs. update domain ^
Let's talk quickly about update and fault domains, which also relate to planned and unplanned downtimes:
Fault domain: A fault domain defines set of Hyper-V hosts that could be affected by a physical failure such as a power source or network failure. Virtual machines in the same fault domain share same underlying infrastructure, power source, and network switch. This means that if any failure happens on the shared infrastructure, it will affect all VMs running in the same FD.
Update Domain: In Azure infrastructure, every update domain represents physical hosts that you can update and reboot at the same time. Microsoft and Azure Service Fabric make sure that the update process (planned downtime) will occur for each UD separately. After completing the update for a particular UD, Azure Service Fabric will start updating the second UD. In such cases, you should distribute your VMs across multiple UDs in order to prevent a potential planned downtime from affecting all your VMs.
Using Azure availability sets ^
The availability set feature makes your Azure VM environment resilient to failures, and it works in alignment with UDs and FDs. When you put your virtual machines into a logical availability set, Azure makes sure to distribute those VMs across different UDs and FDs to prevent planned or unplanned VM reboots.
Azure Service Fabric is responsible here for ensuring distribution of VMs in the same availability set across up to three fault domains when using Azure Resource Manager (AzureRM). For multi-tiered applications, the recommended approach is to put them in functionality-based availability sets to protect each service from failures. You can also use Traffic Manager or Azure Load Balancer for additional failover mechanisms.
If you follow the above design and put each functional tier into separate availability sets, you'll simply make sure that you'll have at least one VM available at all times in case of planned or unplanned failures.
An availability set can also consist of up to 20 update domains. Let's say you set the UD number to seven for an availability set and then put eight VMs into this logical group. This would distribute seven of the VMs across different UDs and place the eighth VM into the same UD as the first VM. In case of maintenance or an update in Azure infrastructure, Azure Service Fabric will update hosts in one of these eight UDs; other hosts will remain online.
Thus, if you want to gain protection for your Azure VMs, you have to have at least two or more VMs within an availability set. That's also a requirement for Azure service-level agreements (SLAs). Microsoft provides 99.95% monthly uptime only if you have two or more instances deployed in the same availability set. Otherwise for single-instance VMs, that number falls to 99.9%.
Creating an Azure availability set in the Portal ^
When it comes to configuring availability sets, Azure Service Fabric governs most of the configuration. To place your VM in an availability set, you have two options to start with.
- Create an availability set and VM at the same time.
- Create an availability set and use it while creating a new VM.
You can separate VMs across up to three fault domains here. This assigns five update domains by default, which you can increase up to 20. Choosing the right number of UDs and FDs is crucial and usually depends on your VM count. If you are planning to place more VMs in this tier than the specified number of UDs, this will place all additional VMs into the first UD. So in case of an Azure planned maintenance, this could reboot the whole UD and all VMs in it.
Creating an Azure availability set with PowerShell ^
You can also use AzureRM commands to create availability sets.
New-AzureRmAvailabilitySet -ResourceGroupName "WEBTIER" -Name "AvailabilitySet01" -Location "West Europe"
Get-AzureRmAvailabilitySet -ResourceGroupName "WEBTIER"
One important point to mention here is that you can also configure availability sets when creating a virtual machine. To move a VM in or out of an availability set, you need to recreate it.
Here is a simple example of one availability set and distributed VMs across UDs and FDs:
As I have only two fault domains, Azure placed my third VM in the same FD as the first VM. It also distributed all VMs across different update domains.
Also, it's important to check AzureRM availability group limits. You can place a maximum of 100 VMs into each availability set.
Distributing traffic between VMs ^
The above configuration is a way to provide protection for your VMs and applications in case of planned or unplanned downtimes. But to get maximum application resiliency, you need to place a load balancer in front of your internally or externally faced tiers to distribute traffic among multiple VMs in an availability set. In case of any planned maintenance, this load balancer plus availability set combo will serve traffic continuously.
There are three options in Azure to distribute traffic:
Azure Load Balancer: This basic load balancer works at Open Systems Interconnection (OSI) Layer 4 and provides transport-level distribution.
Application Gateway: This load balancer works at OSI Layer 7 and can act as a reverse proxy, do SSL offloading, and perform URL path-based forwarding.
Traffic Manager: This allows you to direct client DNS requests to the closest Azure distribution point; it works at the DNS level.
In the example below, I have three web servers placed under an availability set. Now I would like to introduce an internet-faced load balancer to receive the traffic from the internet and then distribute it to my web tier VM pool.
Creating an Azure load balancer ^
Creating a load balancer is possible using AzureRM PowerShell commands or Azure Portal. In the below example, I will do all configuration with PowerShell.
I'm logging into my AzureRM account, getting the details for my existing virtual network (VNet) called WEBTIER, and then adding a new subnet called LB-Subnet-BE.
Get-AzureRmResourceGroup -Name WEBTIER
$ExistingVNET = Get-AzureRmVirtualNetwork -Name WEBTIER-vnet ‑ResourceGroupName WEBTIER
Add-AzureRmVirtualNetworkSubnetConfig -Name LB-Subnet-BE -VirtualNetwork $ExistingVNET -AddressPrefix 10.0.1.0/24
Set-AzureRmVirtualNetwork -VirtualNetwork $ExistingVNET
The next step is to request a new static public IP address and assign a new DNS label.
$publicIP = New-AzureRmPublicIpAddress -Name LBPIP -ResourceGroupName WEBTIER `-Location 'West Europe' -AllocationMethod Static -DomainNameLabel loadbalancerwebtier
Get-AzureRmPublicIpAddress -NAME LBPIP -ResourceGroupName WEBTIER
Now I need to create the load balancer front-end configuration, configure a health check, configure rules, and then create a load balancer using all of these settings.
$frontendIP = New-AzureRmLoadBalancerFrontendIpConfig -Name LB-Frontend ‑PublicIpAddress $publicIP
$beaddresspool = New-AzureRmLoadBalancerBackendAddressPoolConfig ‑Name LB‑backend
$healthProbe = New-AzureRmLoadBalancerProbeConfig -Name HealthProbe ‑Protocol Tcp -Port 80 -IntervalInSeconds 15 -ProbeCount 2
$lbrule = New-AzureRmLoadBalancerRuleConfig -Name HTTP ‑FrontendIpConfiguration $frontendIP -BackendAddressPool $beAddressPool ‑Probe $healthProbe -Protocol Tcp -FrontendPort 80 -BackendPort 80
$NRPLB = New-AzureRmLoadBalancer -ResourceGroupName WEBTIER -Name WEBTIER‑LB -Location 'West Europe' -FrontendIpConfiguration $frontendIP ‑LoadBalancingRule $lbrule -BackendAddressPool $beAddressPool ‑Probe $healthProbe
And finally, we can create network interface controllers (NICs) and assign them to my load balancer.
In this post, we looked at using availability sets to place our VMs in a highly available Azure infrastructure. To design scalable and highly available applications in the cloud, we need to make sure that our applications or VMs are using key features like a