- VMware vSphere Tanzu: Basic vs. Standard vs. Advanced edition - Fri, Sep 17 2021
- Containers and VMware vSphere - Fri, Sep 10 2021
- How to install ESXi 7.0 U2 directly from an HTTP server via a UEFI HTTP boot - Fri, Sep 3 2021
High Availability ^
VMware High Availability (HA) is the first technology that VMware offered. It allows automatic restart of virtual machines (VMs) on other hosts in the event of host failure. HA basically pools hosts and VMs into a single resource group where all hosts are monitored.
In the event of host failure, which can be CPU, motherboard, storage controller, or network internet card (NIC), different actions can be triggered that allow VMs running on the failed host to be restarted elsewhere.
Hosts can be declared failed when either they are not reachable over the management network or not reachable via a second communication channel, which is a storage network. Yes, we need a shared storage where all the hosts are connected at the same time and all the VMs run and are stored on shared storage datastores.
At first, when you enable vSphere HA, one of the hosts becomes the master and all the other hosts become slaves. The master host holds a list of all the VMs that are protected and communicate securely with the vCenter Server.
HA needs hosts to have static IP or persistent DHCP reservations. The hosts communicate over the management network.
HA is responsible for restarting VMs in different priorities and orders if there is a host failure.
There is also a VM monitoring feature that tells vSphere HA to restart a VM if it doesn't detect a heartbeat received from VM Tools, which is installed within the VM.
One last more granular option, called Application Monitoring, is able to do the same but with heartbeats from an application.
On the other hand, there is something called VM Component Monitoring, or VMCP. This is a function that allows vSphere to detect datastore accessibility and restart the VM if a datastore is unavailable.
vSphere 7 HA and various configuration options
There are several options in HA that can be configured. Once you enable HA, the defaults are good for most environments.
One such option is Proactive HA, which is able to receive messages from a provider plugin (Dell, HP, etc.). vSphere 7 HA is able to migrate VMs to a different host because of a failure detected by the provider's plugin. The host might still be able to run VMs, but the hardware being monitored by the manufacturer's component gives you more fine-grained ability to mitigate risks.
There are two options:
- Manual—DRS will suggest recommendations for VMs and hosts.
- Automated—VMs will be migrated to healthy hosts, and degraded hosts will be entered into quarantine or maintenance mode depending on the configured proactive HA automation level.
After VMs are migrated to other hosts within the cluster, the failed host can be placed in maintenance mode. However, there are other options too:
Maintenance mode—Ensures VMs do not run on partially failed hosts.
Quarantined mode—Balances performance and availability by avoiding the use of partially degraded hosts as long as VM performance is unaffected.
Selecting either quarantine or maintenance mode will apply the same response for both moderate and severe failures. Selecting mixed mode will enforce quarantine mode for moderate failures and maintenance mode for severe failures.
We're not done with vSphere HA. There is more, and we'll look at it. Next, we'll talk about failure conditions and responses, which is a list of possible host failure scenarios and how you want vSphere to respond to them.
vSphere 7 HA Admission Control—Allows you to make sure that you have enough resources to restart your VMs in the event of a host failure. You can configure admission control and resource availability in several ways.
You can use the default (preferred), which is Cluster resource percentage. This option determines the percentage of resources available on each host.
You can also use dedicated failover hosts or slot policy in some cases, but those waste more resources. Imagine running a dedicated spared host that sits in the data center and waits for failure. This is quite expensive, isn't it?
The option with slot policy takes the largest VM's CPU and the largest VM's memory and creates a slot. Once done, the system is capable of calculating how many slots the cluster can handle. The best (and the default) is cluster resources percentage. It simply takes a look at total resources needed and total available within the cluster. It keeps enough resources free to allow you to adjust the number of specified hosts.
If your cluster can't satisfy all resources and you have more VMs to be restarted, they are simply not restarted. Hence, the name—admission control.
Heartbeat Datastores—As I mentioned at the beginning of the post, if the host's management network fails, HA will use the datastore network to try to reach the host. vSphere HA can see if the host or a VM is still running by looking for lock files on a particular datastore. The heartbeat datastore function is used on two or more datastores.
Advanced Options—There are some advanced options that help to determine if the host is isolated on the network. You can set a second gateway, because it is the gateway that is pinged at regular intervals to determine the host's state. In order to use this, you need to set two options, das.usedefaultisolationaddress and das.isolationaddress, which are found in the advanced configuration options.
The first option enables you to configure not using the default gateway, and the second enables you to set an additional gateway address.
Fault Tolerance ^
With Fault Tolerance (FT), your VMs run all the time even if the underlying host fails. FT creates a secondary VM that runs as a shadow copy of the primary VM. Both VMs run in sync on two different hosts.
If the primary VM fails, the secondary VM takes over, and vSphere creates a new shadow VM. If the secondary VM fails, vSphere also creates a new shadow VM.
Requirements and limits—vSphere FT supports up to four FT VMs with no more than eight vCPUs between them.
VMs can have a maximum of 8 vCPUs and 128 GB of RAM, and you have to have a VMkernel adapter configured with the Fault Tolerance logging checkbox enabled.
If you are using DRS, you must enable Enhanced vMotion Compatibility (EVC) mode.
FT uses a technology called fast checkpointing, which takes checkpoints of the source VM every 10 milliseconds. Those checkpoints are sent to the shadow VM via the VMkernel port with the Fault Tolerance logging checkbox enabled.
vSphere Replication ^
This is an add-on product that is installed as a virtual appliance (VM). It allows you to configure replication of VMs to remote data centers. In conjunction with the Site Recovery Manager (SRM) product, you can automate disaster recovery (DR) plans and have a failure under control.
vSphere Replication is configured on a per-VM basis. The replication allows you to copy VMs from a primary to a secondary site, where those VMs are in stand-by mode. The system first does a full copy and then only an incremental one.
You can have one or more primary sites that are replicated to the secondary (DR) site. It uses a server–client model with appliances on both sides.
You can configure the recovery point objective (RPO), which is how often you want it to replicate the VM's disks. The settings can be as low as 5 minutes or as long as every 24 hours.
You can also set up replication with your preferred third-party backup software programs, such as Veeam Backup or Nakivo Backup & Replication. Both products allow setting replicas and other options as well.
Subscribe to 4sysops newsletter!
Final words ^
We have seen that vSphere 7 has several functions, products, and options to ensure that your DR strategy is perfect and the loss of data, if it occurs, is only minimal and as defined within your RPO strategy.