Latest posts by Mohammed Raffic (see all)
- Ansible for VMware infrastructure automation - Wed, Dec 25 2019
- Copy VMware VMs between ESXi hosts using the OVF Tool - Wed, Nov 13 2019
- If vMotion fails - Thu, Oct 31 2019
- vMotion phases
- 1. CPU incompatibility
- 2. vMotion not enabled on VMkernel Interfaces
- 3. Misconfigured ESXi network settings
- 4. Non-Shared storage between hosts in the cluster
- 5. Inaccessible CD/DVD or ISO image
- 6. Anti-Affinity rules
- 7. Resource starvation at target ESXi host
- 8. vCenter server is hung
- 9. vMware tools installation is in progress
- 10. Check ESXi advanced settings
VMware vMotion is a zero downtime live migration of workloads from one ESXi host to another. During the vMotion migration, the application running on the virtual machine will still be running, and end-users continue to access the systems without noticing any issues.
The virtual machine retains its network identity and connections, ensuring a seamless migration process. This vMotion migration saves the downtime for any application while performing maintenances on the underlying ESXi hardware and software. vMotion of the virtual machine is not only limited to within the ESXi cluster but also possible across clusters, vSwitches, vCenter Servers, and even the cloud.
vMotion phases ^
The steps to performing vMotion a virtual machine from one ESXi host to another include:
- vMotion request is sent to the vCenter Server
- vCenter Server sends the vMotion request to the destination ESXi host
- vCenter Server computes the specifications of the virtual machine to migrate
- vCenter Server sends the vMotion request to the source ESXi host to prepare the virtual machine for migration
- vCenter Server initiates the destination virtual machine
- vCenter Server initiates the source virtual machine
- vCenter Server switches the virtual machine's ESXi host from the source to the destination
- vCenter Server completes the vMotion task
Any of these steps is prone to failure. Let's have a look at the top 10 reasons of vMotion failures.
1. CPU incompatibility ^
VMware vMotion transfers the running state of a virtual machine between underlying VMware ESXi servers. vMotion compatibility requires that the processors of the target ESXi host be able to resume execution using instructions equivalent to what the processors of the source ESXi host were using when suspended. Processor clock speeds, cache sizes, and the number of processor cores may vary, but processors must come from the same vendor class (Intel or AMD) and the same processor family to be compatible for migration with vMotion.
When scaling the cluster, it is not possible to use the same generations of CPU because of technology advances and new features. In that case you might run into this error message:
The target host does not support the virtual machine's current hardware requirements.
Use a cluster with Enhanced vMotion Compatibility (EVC) enabled to create a uniform set of CPU features across the cluster, or use per-VM EVC for a consistent set of CPU features for a virtual machine and allow the virtual machine to be moved to a host capable of supporting that set of CPU features. See KB article 1003212 for cluster EVC information.
In this case, we can make use of vSphere EVC (Enhanced vMotion compatibility) features. vSphere EVC ensures that the virtual machines can be migrated live using vMotion between ESXi hosts in a cluster that is running different CPU generations. EVC allows for uniform vMotion compatibility by enforcing a CPUID (instruction) baseline for the virtual machines running on the ESXi hosts. That means EVC will allow and expose CPU instruction-sets to the virtual machines depending on the chosen and supported compatibility level. EVC is cluster level feature. With the release of vSphere 6.7, we now have Per-VM EVC to add more flexibility.
2. vMotion not enabled on VMkernel Interfaces ^
One of the pre-requisites for vMotion is to have at least one VMkernel interface configured and enabled with vMotion. This vMotion network is used to securely perform the data transfer during vMotion operations. In most of the scenarios, vMotion will be failing due to vMotion not being enabled on the VMkernel interface of source or target ESXi hosts.
It is very important to have both the source and the destination host configured with vMotion VMkernel interfaces with Unique IP addresses.
3. Misconfigured ESXi network settings ^
There are many network level misconfigurations at the ESXI host level may also cause the vMotion failures. As I said, there are many settings which may lead the vMotion failures. During vMotion, the source host transfers the memory pages of the virtual machine to the destination host. If the destination host does not receive any data from the source host for a default period of 120 seconds, vMotion fails.
Some common settings are:
- Misconfigured Jumbo Frames
It is very important to have the same MTU settings configured between the ESXi hosts and across the network layer (port groups, virtual switches, and physical switches). Different jumbo frame settings will cause the vMotion to fail.
- IP Conflict for vMotion interface
A unique IP address must be configured for the VMkernel interface for the ESXI hosts. If two hosts share the same vMotion VMkernel interface IP address, the destination host refuses the source's initial handshake message, suggesting that you are not connecting to the correct destination host over the vMotion network. Normally, this is due to IP address conflicts within the vMotion network, with two hosts sharing the destination's IP address.
- vSwitch security settings
It is important to have the same security settings configured across the ESXi hosts, if you are using standard Switches. We can make use of Distributed virtual switches to maintain network consistency across the connected ESXi hosts.
- Packet loss and Network Latency
Packet loss and network latency is one of major concerns when it comes to networking. This may be due to network adapter driver, network adapter firmware, and other factors like cabling fault, SFP fault or even at physical switch. Huge packet loss at vMotion VMkernel interface may cause the vMotion failure and timeout during vMotion.
4. Non-Shared storage between hosts in the cluster ^
Shared storage was hardcoded requirement prior to vSphere 5.1. It is necessary to have shared storage between the hosts in the cluster to ensure that virtual machines are accessible to both source and target hosts. During vMotion, the migrated virtual machine must be on storage accessible to both the source and target ESXi hosts. This is needed especially when VM's are migrated automatically to the cluster by DRS.
After vSphere 5.1, we can migrate the virtual machines using vMotion without the shared storage. In that case, we must select the option "Change both compute resource and storage". This option performs both vMotion and Storage vMotion. Time taken to migrate the VM's will not be same as vMotion because it must migrate the VM data as well.
5. Inaccessible CD/DVD or ISO image ^
If a virtual machine has a mounted ISO image residing on storage that is not accessible by the ESXi host where you want the VM migrated to, vMotion will fail. You can change the CD/DVD device to be a client device or detach the ISO image and change the device to be a client device.
6. Anti-Affinity rules ^
Affinity and anti-affinity rules are the DRS rules. While affinity rules will help keep the group of virtual machines together, anti-affinity rules are the opposite. You can create an anti-affinity rule to place a specific group of VMs across multiple hosts in the cluster by separating each VM in the group to run on a different ESXi host. This will improve redundancy.
If you have anti-affinity rules created to separate a virtual machine, it will restrict the virtual machines to migrate to the other ESXi hosts where it is already hosting the VM, which is part of the anti-affinity group, especially when placing the ESXi host into maintenance mode. Usually, vMotion will work when manually initiated.
The workaround is to temporarily disable the anti-affinity rule, which is restricting the vMotion, and to migrate the virtual machine if you are planning for ESXi maintenance activities in the cluster.
7. Resource starvation at target ESXi host ^
Imagine your source target host is already consuming above 95% of CPU or memory utilization. This prevents correct operation of the ESXi host. In another use case, consider the virtual machine is configured with memory reservation. Memory reservation for the virtual machine is the guaranteed memory should be available to the virtual machine. If the target host does not have enough memory to satisfy the reservation of the virtual machine, vMotion will fail.
To fix this, migrate the virtual machine to another ESXi host that can provide the guaranteed memory for the VM or reduce the memory reservation of the virtual machine.
8. vCenter server is hung ^
The vCenter server is a key component in vMotion. Without the vCenter server, vMotion is not possible. A vMotion request is first sent to the vCenter, which then co-ordinates between the source and target ESXi hosts to complete the vMotion. If the vCenter server is hung or not responding, vMotion might fail. Consider restarting the vCenter services or rebooting the vCenter server to fix the issue with vCenter server.
9. vMware tools installation is in progress ^
If the vMware tools installation is in progress for the virtual machine, we will not be able to migrate the virtual machine as it will show this error: “The virtual machine is installing vMware tools and cannot initiate a migration operation.” This may not be the case every time because there are instances where vMware tools installation would have completed, but we have forgotten to unmount the vMware tools’ ISO from the virtual machine. Which may be preventing the vMotion.
We must unmount the vMware tools’ ISO image from the virtual machine to get the virtual machine migrated using vMotion.
10. Check ESXi advanced settings ^
In some instances, the advanced setting "Migrate. Enabled" is set at 0 (disabled) by some backup softwares to ensure completion of the backup jobs. Usually, backup software will revert the setting once backup jobs are completed. But it may leave the settings without reverting it in case of some host or network outage. Ensure the value "Migrate. Enabled" is always set to 1 (enabled) for the vMotion to work.