What can we do to reduce the impact of a host failing without prior notice? VMware’s answer to this is High Availability (HA) technology. There are two main reasons you’ll want to implement a VMware vSphere cluster and activate HA.
- You want to ensure high availability of your VMs. VMware HA will restart the VMs on the remaining hosts in the cluster in case of hardware failure of one of your hosts.
- You want to distribute your VMs evenly across the resources of the cluster. The distributed resource scheduler (DRS) (to be discussed later) will make sure that each host runs with the same level of memory and CPU utilization. If, for any reason, the use of memory or CPU rises (or falls), DRS kicks in and vMotion moves the VMs to other hosts within your cluster in order to guarantee an equal level of utilization of resources across the cluster.
In other words, HA will protect your VMs from host hardware failure, while DRS ensures that utilization of resources across the cluster is equal. So far, so simple.
Shared vs hyper-converged storage ^
VMware HA works well in traditional architectures with shared storage, but what about hyper-converged storage?
Shared storage means NAS or SAN that is populated with disks or SSDs. All hosts present in the cluster can access shared storage and run VMs from a shared data store. Such architecture has been commonplace for about 15 years and has proven reliable and efficient. However, it reaches its limits in large clusters, where the storage controller’s performance becomes the bottleneck of the cluster. So basically, after you hit that limit, you end up creating another silo where you can put some hosts together with new shared storage.
Recently, hyper-converged architectures have become popular and are available from different vendors (including VMware with VSAN), where shared storage devices are replaced with the local disks in the hosts of the cluster. These local disks are pooled together across the cluster in order to create a single virtual shared data store that is visible to all hosts.
This is a software-only solution that can leverage high-speed flash devices, optionally in combination with rotating media. It uses deduplication and compression techniques, coupled with erasure coding (Raid5/6) across the cluster, in order to save storage space. We’ll look at VMware VSAN in one of our future posts.
I remember that my first demo using VMware HA was with two servers only, while the third device was a small NAS box where we had a few VMs running. This tells us that even very small enterprises can benefit from this easy-to-use technology.
HA configuration options ^
VMware HA is configurable through an assistant, allowing you to specify several options. You’ll need a VMware vCenter server running in your environment; VMware ESXi alone is not enough. For SMB, you’ll be fine with the vSphere Essentials Plus package, which covers you for up to three ESXi hosts and one vCenter server.
Let’s have a look at the different options that VMware HA offers.
Host Monitoring – You would enable this to allow hosts in the cluster to exchange network heartbeats and to allow vSphere HA to take action when it detects failures. Note that host monitoring is required for the vSphere Fault Tolerance (FT) recovery process to work properly. FT is another advanced, cool technology that allows you to protect your workflows in case of hardware failure. However, compared to HA, it does that in real time, without downtime and without the need for a VM restart!
Admission Control – You can enable or disable admission control for the vSphere HA cluster. If you enable it, you have to choose a policy of how it is enforced. Admission control will prevent the starting of VMs if the cluster does not have sufficient resources (memory or CPU).
Virtual Machine Options – What happens when a failure occurs? The VM options allow you to set the VM restart priority and the host isolation response.
VM Monitoring – Lets you enable VM monitoring and/or application monitoring.
Datastore Heartbeating – You have the possibility to check a secondary communication channel so vSphere can verify that a host is down. In this option, the heartbeats travel through a data store (or several). VMware datastore heartbeating provides an additional option for determining whether a host is in a failed state.
If the master agent present in a single host in the cluster cannot communicate with a slave (doesn't receives heartbeats), but the heartbeat datastore answers, HA simply knows that the server is still working, but cannot communicate through one networking channel. In this case, we say "the host is partitioned."
In such a case, the host is partitioned from the network or isolated; the datastore heartbeat then takes over and determines whether the host is declared dead or alive. The datastore heartbeat function helps greatly in determining the difference between a host which has failed and one that has merely been isolated from others.
In my next post, we’ll start with the network configuration of our VMware High Availability (HA) cluster, which is perhaps the toughest part of a VMware HA setup.