This article explains how to troubleshoot orphaned VMware Consolidate Helper-0 snapshots caused by failed VMware backups.
If you are using any of the snapshot-based backup solutions for your virtualized machines—Veeam Backup & Replication and vRanger being the two most popular—then from time to time you are probably going to see a failed backup. This is a common occurrence even with traditional backup solutions, but in this situation it can have a drastic effect on your production environment.
While I can only directly speak for the effect with Veeam, as it is the only one I’ve used, the situation described here as I understand it is common to all of the VMware-centric backup systems. What happens is specific to situations where bandwidth is constrained, like when you are replicating over the WAN to off site. Backup job A runs and successfully completes. The last stage of the backup process is removing the temporary snapshot that is used to create the backup; this can be a time-intensive process depending on how large the backup (and thus the snapshot) is. During this process, backup job B attempts to create a snapshot and begins backing up the same virtual machine. When this happens, the VMware helper that was trying to merge the original snapshot (<disk>-delta.vmdk) back into the disk file (.vmdk) becomes orphaned. Furthermore, VMware still believes the snapshot to be locked by the original process.
In my experience, there are four steps to troubleshooting this issue and getting the delta files to roll back up into the disk files. You hopefully will not have to complete all four, as they are steps for “this fix isn’t working.” The first two can be done on a running VM, so you incur no downtime; the second two require you to take down the VM.
Step 1: Unlock the snapshot
The first step is to get the snapshot unlocked. The trick to this is to migrate the VM from the host it is on to another and then back again. Those of you using a vCenter-based infrastructure should know all about this. Just vMotion the machine, not the datastore, from one host to another.
It is important to vMotion the machine back to the original host. If you don’t, you get introduced to another error. Once the process is finished, you should be able to go into Snapshot Manager and click the “Delete All” button to clear them all out. Don’t be alarmed if the process of “Remove all snapshots” seems to hang on 95%; this is normal operation and, depending on the size and quantity of the snapshots, this process can take many, many hours.
Step 2: Trick vCenter
In some cases, after you perform the migration and you open Snapshot Manager, it appears that the VM has magically gotten rid of all of its snapshots. If you browse the datastore where the VM resides, using either the GUI VI Client or the CLI via SSH, you will still see the delta files there. If this is the case, you can trick vCenter into showing you the snapshots again by creating another snapshot manually (right-click VM, choose snapshot, “Take Snapshot”). When done, all of your Consolidate Helpers will reappear. After that, try to Delete All from the Snapshot Manager again.
Step 3: Migrate the datastore
If you have gotten to this point, I hate to tell you but you are now looking at some down time from here on out. If the snapshots still haven’t moved, the next step in my process is to shut down the virtual machine and migrate the datastore. Yes, I know, you are probably screaming at the screen about the fact that, with ESX 4, you don’t have to shut down to migrate the data any more. However, if you have reached this step, there are most likely more delta files than the VI client could handle before reaching the timeout limit, and shutting down makes the process faster and more robust. Migrating the datastore will have the effect of rolling the deltas back up if the process completes successfully.
Step 4: Convert the virtual machine
This step is for those of you (myself included) who ignore the problem too long. ESX is only capable of handling 32 snapshots for any given VM. Beyond that, trying to Delete All will not work. Neither will migrating the datastore. In this case, you need to install VMware’s standalone Converter tool on the VM and perform what’s referred to as a v2v, or virtual to virtual, conversion. I’ve seen reference to just using the VI client to clone the VM, but this process was defined to me by VMware support, so I’ll trust their judgment.
I have only reached this step once, and I hope to avoid it from here on. Current Windows activation will not survive this process, so at the least you will have to reactivate. At worst, you might actually have to call Microsoft and have them manually activate your server.