- VMware vSphere 7 Clustered VMDK - Fri, May 29 2020
- How to hot-extend VMDK disk over 2 TB with vSphere 7.0 - Fri, May 22 2020
- VMware vSAN 7 Capacity Reporting improvements - Wed, May 13 2020
Note that if a High Availability (HA) cluster protects the virtual machine (VM), it has to restart on another host. Thus, even if the whole process is automatic, there is still downtime. This is not the case for VMware FT where the secondary VM becomes the primary one within a fraction of a second. This feature automatically clones a new secondary VM from the new primary VM.
In the past, admins have used FT sparingly because it had a big overhead. There was also quite a lot of network latency, which would interfere with certain applications. In addition, FT was limited to VMs configured with one virtual CPU (vCPU) only. Conversely, vSphere 6.5 brought not only the possibility to configure four vCPUs for FT-enabled VMs, but also completely changed the technology under the covers and significantly improved the network latency.
Previous releases of vSphere used vLockstep technology where the primary and secondary VMs were in sync via Record/Replay. As of the 6.5 release, vSphere is now using a technology called "Fast Checkpointing."
How vSphere Fault Tolerance works ^
To put it simply, vSphere FT works by continuously replicating an entire running VM from one physical server to another. The FT-enabled VM has two replicas:
- Primary VM
- Secondary VM
Each VM is running on a different ESXi host. The replicas are logically identical; they represent a single VM state and a single network identity, but they are physically distinct.
Each replica has its own virtual machine files, such as configuration files (VMX) and virtual machine disk files (VMDK).
After activation of FT, the first synchronization of the virtual machine disk files (VMDKs) happens using vSphere Storage vMotion. Subsequently, vSphere FT will mirror VMDK writes between the primary and secondary VM over the FT network.
To check the VM's state, you can also see the dashboard widget with detailed log bandwidth usage.
When a physical server fails, VMware HA automatically restores redundancy by restarting a new secondary VM on another host. The VM state, network identity, and all active network connections for the VM will be identical, reflecting the whole state as a primary VM again. If the host running the secondary VM fails, VMware HA starts a new secondary VM on a different host.
Another improvement of VMware FT is that you can now configure FT networks to use multiple network interface controllers (NICs) to increase the overall bandwidth for FT logging traffic. This works similarly to Multi-NIC vMotion and provides more bandwidth for the FT network.
Testing VMware Fault Tolerance ^
You can test FT from within the right-click menu when you click on the FT-protected VM. There are several options there.
Turn off vSphere Fault Tolerance
- Turn Off Fault Tolerance: destroys the secondary VM and turns off FT for the selected VM.
- Suspend Fault Tolerance: suspends FT protection but keeps the secondary VM, its configuration, and history.
- Migrate Secondary: this is an interesting option, allowing you to migrate the secondary VM to another host manually.
- Test Failover: allows you to initiate failure of the primary VM to test if the secondary VM replaces it.
What are the VMware FT limits? ^
VMware vSphere 6.5 has a few FT technical limits and there are also licensing limits. I'll cover both.
These are the vSphere FT maximums:
- Virtual disks: 16
- Disk size: 2 TB
- Virtual CPUs per VM: 4
- RAM per FT VM: 64 GB
- FT VMs per host: 4
- Virtual CPUs per host: 8
As to VMware licensing for FT, you have access to FT with vSphere Standard, but you can only configure two vCPUs. To activate four vCPUs for your FT-protected VMs, you need to be on Enterprise Plus or vSphere with Operations Management.
VMware vSphere 6.5 FT improves the integration with vSphere Distributed Resource Scheduler (DRS) and allows better placement decisions. It now ranks hosts based on available network bandwidth and datastore latency for placing the secondary VMDKs. You can use multiple port groups for FT logging traffic to add capacity to existing network bandwidth in a similar way as for Multi-NIC vMotion.