- How to use VMware vSAN ReadyNode Configurator - Fri, Dec 17 2021
- VMware Tanzu Kubernetes Toolkit version 1.3 new features - Fri, Dec 10 2021
- Disaster recovery strategies for vCenter Server appliance VM - Fri, Nov 26 2021
VMware says that vCLS uses agent virtual machines (vCLS VMs) to maintain the health of cluster services, even if the vCenter Server is not available. The vCLS VMs are created when you add hosts to clusters.
There is a maximum of three such VMs, even if your cluster has more than three hosts. During normal operation, there is no way to disable vCLS agent VMs and the vCLS service. It is a mandatory service that is required for DRS to function normally. vSphere DRS depends on the health of the vSphere Cluster Services starting with vSphere 7.0 Update 1.
If the agent VMs are missing or not running, the cluster shows a warning message.
vSphere DRS functionality was impacted due to unhealthy state vSphere Cluster Services caused by the unavailability of vSphere Cluster Service VMs. vSphere Cluster Service VMs are required to maintain the health of vSphere DRS.
It is possible to manually disable vCLS on a vSphere cluster via Retreat Mode, but some of the cluster's services, such as DRS, will be affected. The VMs running inside your cluster are not load-balanced and will not be migrated to different hosts if your host running a particular VM is running out of resources.
What is retreat mode?
Retreat mode should only be used when you need to put your datastore into maintenance mode. If your datastore has a vCLS VM running, you must manually evacuate this VM via storage vMotion to a new location or put the cluster in retreat mode.
When first activating DRS on your cluster and the vCLS agent VMs are created and deployed, the datastores that will host the VMs are automatically selected. The selection of the datastores connected to the hosts inside your cluster is based on ranking.
A datastore is usually selected to host a vCLS VM if the host connected to the datastore has free reserved DRS slots. A datastore with more free space is preferred, and the algorithm tries not to place more than one vCLS VM on the same datastore.
Starting with vSphere 7.0 Update 2, a new anti-affinity rule is created and applied automatically. This rule makes sure that every 3 minutes, a check is performed if there are multiple vCLS VMs on the same datastore. If that's the case, the rule triggers a storage vMotion operation and redistributes those VMs to different datastores.
When a datastore hosting vCLS VMs is placed in maintenance mode, you must manually apply storage vMotion to the vCLS VMs to move them to a new location or put the cluster in retreat mode. A warning message is displayed.
Note: To enter maintenance mode, the task will start but cannot finish because there is a virtual machine residing in the datastore. To move forward, you can cancel the task in your Recent Tasks if you want to continue to evacuate the VM.
vCLS Retreat Mode advanced configuration
- Log in to the vSphere client and navigate to the cluster on which you want to disable vCLS.
- Copy the cluster domain ID from the URL of the browser. It should be similar to domain-c(number).
The URL will be something like this:
https://<fqdn-of-vCenter-server>/ui/app/cluster;nav=h/urn:vmomi:ClusterComputeResource:domainc10001:eef257af-fa50-455a-af7a-6899324fabe6/summary.
Copy the part in bold—in this case, domain-c8.
- Then select your vCenter Server and navigate to the vCenter Server Configure tab.
- Under Advanced Settings, click the Edit Settings button.
- Add a new entry, config.vcls.clusters.domain-c(number).enabled. Use the domain ID copied in Step 3.
- Set the Value to False.
- Click Save.
You should see a screen similar to the one below.
VMware has a detailed KB on this here.
The vCLS monitoring service runs every 30 seconds, so after about 1 minute, you'll see that all the vCLS VMs in the cluster are cleaned up and the Cluster Services health will be set to Degraded.
If the cluster has DRS enabled, it stops functioning. You'll see some additional warnings displayed in the Cluster Summary.
Note: DRS is not functional, even if you enable it within the UI. It stays this way until vCLS is reconfigured by removing it from Retreat Mode.
Besides vSphere DRS, High Availability (HA) will not perform optimal placement during a host failure. vSphere HA depends on DRS for placement recommendations; as such, it needs the Retreat Mode deactivated.
However, vSphere HA will be able to power the VMs if there is a host failure. These VMs will be powered on a host, but it might not be the best host with the best resources (not an optimal host).
To remove Retreat Mode from the cluster, change the value in Step 7 to True.
How to log in to the VMware vCLS virtual machine
Do you want to log in to the vCLS VM? You can. In fact, I have found a small procedure in the VMware documentation that enables you to do so.
- Use SSH to log in to the vCenter Server Appliance.
- Run the following python script, which you can find at the following location:/usr/lib/vmware-wcp/decrypt_clustervm_pw.py
- Read the output for the password.
Pwd-script-output
Read key from file
Connected to PSQL
PWD: (password displayed here)
You might need it if you want to perform an additional check for the cluster's health. However, it is meant to be used only for detailed diagnostics of the vCLS agent VMs. You won't need it for normal operations.
The link to the source doc is here.
Final words
The vCLS agent VMs might be created on a datastore where you'll need to do maintenance. In this case, to make sure that the HA stays operational, you might need to configure Retreat mode. However, in all normal situations, you should not use it. If you're using vSAN in your environment, you should consult this VMware KB article, which explains the particular case of vCLS VMs placed on a vSAN datastore and vSAN deactivation or host maintenance mode.
Good to know, thank you.
I was in the case where I couldn’t vmotion the vCLS no matter what I did. So I started to look into the retreat mode of the cluster and found your article. I’m running 7.0U3g and all my vCLS were located on the same lun, aka the biggest one somehow? Not sure about the anti-afinity rules in 7.02 but it didn’t seem to work properly in my environment.
I ended up not using the cluster retreat mode after all. A better and simplest solution if you are in 7.03 is to go on the cluster, go on the configure Tab and click on the Datastores. Then you can select on which datastores the vCLS are allowed to reside. When I selected the datastores I wanted, the cluster did an automatic vMotion of the vCLS and I was able to put my datastore in maintenance mode while not disturbing HA and DRS. Hope that helps others as an alternative.