- How to use VMware vSAN ReadyNode Configurator - Fri, Dec 17 2021
- VMware Tanzu Kubernetes Toolkit version 1.3 new features - Fri, Dec 10 2021
- Disaster recovery strategies for vCenter Server appliance VM - Fri, Nov 26 2021
Moving VMs from one location to another or from one cloud to another is a common task nowadays. VMware has evolved the underlying technology and has optimized it over the years; however, admins have to make sure to configure vMotion as it should be and respect some good practices and optimizations. This post will highlight some of them.
VMware first introduced vMotion in 2003, and by 2012, shared storage was no longer required for vMotion to work. This is useful for performing cross-cluster migrations when the target cluster machines might not have access to the source cluster's storage. Applications and users can continue to work inside the VMs while those are migrated to another cluster without any use of shared storage.
A year before that, they introduced multi-network interface card (multi-NIC) vMotion, allowing faster speeds and resulting in lower times for migrating VMs.
The latest 6.7 release of vSphere also introduced encrypted vMotion and virtual graphic processor (vGPU) support with NVIDIA GRID technology. So now you can have VMs with vGPUs and still be able to move them around within your cluster or data center.
Let's look at some vMotion optimization tips, which will allow us to use more VMkernel adapters and create more vMotion streams.
What is a VMkernel adapter?
VMware calls this VMkernel port a "VMkernel networking interface." It is basically a virtual network device that allows vSphere and ESXi to communicate with the outside world. VMkernel adapters are always called "vmk" with a number at the end (e.g., vmk1, vmk2, etc.). A VMkernel adapter can provide some services, such as vMotion traffic, management traffic, iSCSI traffic, and others.
Use hardware with a higher speed and throughput
VMware requirements are only 1GbE NICs; however, small 1Gb NICs may become saturated fast with vMotion traffic. If you can, use 10Gb NICs. This is easy to say, but not every environment can afford 10Gb NICs and a 10Gb switch.
As time goes by, the 10Gb network will become mainstream even for very small businesses. Large businesses are moving up into 40Gb or 100Gb network speeds.
If you cannot afford a 10Gb switch and NICs for your hosts, don't be sad; we have other tips that might help and give better vMotion speed times.
One of them is to use several 1Gb NICs or the ones that have several RJ45 ports.
Add more physical NICs or NICs with multiple 1Gb ports
You can add multiple physical NICs to your host and use the aggregated bandwidth to speed up the vMotion process. There are single, double, or even quad-port NIC models on the market that will fit this scenario.
We won't go into configuration details, as we have already written a blog post on that: How to configure multi-NIC vMotion in VMware vSphere.
You can follow the steps there and configure multiple physical NICs or one physical NIC with several ports for vMotion operations.
The calculation is simple. More physical NICs used for vMotion traffic allow using multiple channels. This gives faster vMotion times.
Multiple VMkernel interfaces per physical NIC
If you have an environment that has already 10, 25, or 100Gb NICs, and you still find the vMotion process is taking a long time, other tweaks can still speed up and optimize the vMotion process. We're talking about accelerating an environment with already great hardware.
Before showing how, we will talk a bit about streams, which have been a part of vMotion since version 4.1 and have continuously improved since then.
VMotion uses a VMkernel adapter enabled for vMotion. Each time you add a VMkernel adapter and enable it for vMotion, you create a new stream. With more streams, there is greater bandwidth utilization.
Each stream has three helpers (threads), and each one has a different role. They are called Completion helper, Crypto helper, and Stream helper.
Even though you already have that physical NIC infrastructure at let's say 100Gb or so, you can still improve the bandwidth utilization by creating more streams (with more helpers) on that single physical NIC and use the available bandwidth more efficiently.
According to VMware, one stream has an average bandwidth utilization capacity of 15 GbE. The correlation between the physical NIC capability and the number of streams would give us this:
- 25GbE: 1 stream = 15 GbE
- 40GbE: 2 streams = 30 GbE
- 50GbE: 3 streams = 45 GbE
- 100GbE: 6 streams= 90 GbE
Note: there is another (advanced) method available in this source article at VMware. This method consists of editing advanced host settings property called Migrate.VMotionStreamHelpers.
And by modifying these properties (on a per-host basis) you're able to change the default behavior, which is a dynamic allocation of streams into a fixed one. But you really have to know what you are doing. That's why it's hidden and buried in the advanced host settings.
We won't go into the full details because you can follow the detailed steps in the source article.
Licensing vMotion
You may or may not know that there is a different sort of licensing required for different kind of vMotion.
As vMotion got new "high-end" features allowing us to do cross-vCenter or long-distance vMotions, VMware put it in the "high-end" licensing packages as well.
The traditional vMotion we all know, where you can move a VM from one ESXi to the other, is the one you can find in the lowest-end packaging called vSphere Essentials Plus.
In fact, from the licensing perspective, there are two different sorts of vMotion:
- vMotion, Storage vMotion, and X-Switch vMotion: these need vSphere Essentials Plus and higher.
- Cross-vCenter and long-distance vMotion: these need vSphere Enterprise Plus or vSphere Platinum.
Final words
The configuration of vMotion is very easy by creating at least one VMkernel adapter per host. But know that there are two ways of making improvements depending on whether you have some good networking hardware available.
What was enough two or three years ago is not enough now. Workloads and mobility of workloads within the data center is one of the key elements.
Subscribe to 4sysops newsletter!
A proper design or upgrade path is necessary. However, if planning ahead, NICs with 25, 50, or 100Gb speeds will soon become more affordable.
Read the latest IT news and community updates!
Join our IT community and read articles without ads!
Do you want to write for 4sysops? We are looking for new authors.
Hi
Why did you configured separate ed port groups per vmk?
Can i use all the vmotion vmks on same port group?
Thanks in advance
Hi, been having issues with my setup and doing the rounds I came across this article.
I am a little concerned by one statement.
“You can add multiple physical NICs to your host and use the aggregated bandwidth to speed up the vMotion process.”
I am concerned that there are a number of caveats to this idea.
My research with Veeam forums shows none other than Gostev himself pointing to the known ESXi bug where any VMK can use no more than 40% of the available bandwidth and that VMWare is aware of this because Veeam asked them about it. Sadly the reply was at least 10 years ago and subject to a NDA. Safe to say the bug still exists.
It is a common misconception in networking that LAG configurations increase bandwidth. The short but complicated answer is yes and no. ESXi switches are not Layer 3 aware even if the external network hardware is. Unless you have specified custom TCP stacks and implemented them in the VMK you are more than likely to see your backup LAG traffic routed via the Management Interface because there is only a single default gateway on an ESXi host and that is the management gateway.
Even if you do implement custom TCP stacks and apply them to the VMK which is connected to an external LAG the fact that ESXi switches are not Layer 3 aware means that you are on a MAC to MAC link.
I have watched my systems during backup using the vSphere monitor. The VMK TCP stack means that the correct NIC and custom gateway are used but the vSwitch with its LAG still hits the external switch with nearly all the packet traffic exiting the ESXi host on a single NIC despite having two or more to choose from.
That said I can see how setting up multiple VMK might create more streams and get you round the 40% of bandwidth limit but even with custom TCP stacks and gateways I cannot see how you are going to get round the fact that the external NIC doesn’t load balance in the LAG and your traffic is effectively going over a single NIC interface no matter how many are on the card or in the LAG.
I could well be wrong and have missed something in my configuration that needs turning on so it would be good to hear your thoughts.