- SCP from remote to local - Wed, May 31 2023
- Understanding Kubernetes Persistent Volumes - Mon, May 29 2023
- Pulseway 9.2: Remote monitoring with workflow automation - Thu, May 18 2023
What benefits are provided by Windows Server deduplication? How is the feature added? How do you enable deduplication, check its status, and pause or stop the deduplication process? Let's take a look at Windows Server deduplication.
What is data deduplication in Windows Server?
When you store data comprising various files and other data on any Windows Server, there will be duplicated data blocks among the multiple files. It is especially true if the different files stored on a Windows Server volume are similar in content or structure. A departmental file server is a good example that helps visualize how there may be vast amounts of duplicated data. In a large file share, end users may store many copies of the same or similar files. This leads to redundant copies of data that impact the efficiency of storage.
Instead of storing multiple copies of data, as in traditional storage environments, deduplication provides the means to store the data once and create intelligent pointers to the actual data location. In this way, the storage environment does not house duplicated information. Microsoft keeps improving the features of deduplication as well. In Windows Server 2019, Data Deduplication can now deduplicate both NTFS and ReFS volumes. Prior to Windows Server 2019, ReFS deduplication was not possible.
How does Windows Server data deduplication work?
Microsoft uses two principles to implement data deduplication in Windows Server:
- The deduplication process runs on data by using a post-processing model. This means that the deduplication process does not interfere with the performance of the write process. When data is written to the storage, it is not optimized. Afterward, the deduplication optimization process runs to ensure the deduplication of the data.
- End users are unaware of the deduplication process—Deduplication in Windows Server is entirely transparent. End users are unaware they may be working with deduplicated data.
To accomplish the successful deduplication of data in accordance with the principles listed above, Windows Server uses the following process:
- The file system scans storage to find files matching the deduplication optimization policy.
- The system breaks the files into chunks.
- Unique chunks of file data are identified.
- These file chunks are placed in the chunk store.
- Pointers to the chunk store are created to allow redirecting file reads to the appropriate file chunks.
Strong use cases for data deduplication
Specific use cases lend themselves favorable to data deduplication. What are workloads that typically show massive benefits to using data deduplication? Let's list these in the order of the most significant benefits.
- 80–95% space savings—Virtualization environments, especially VDI workloads and ISOs for deployment.
- 70–80% space savings—Deployment shares contain massively duplicated data stores of software binaries, cab files, and other operation-specific files.
- 50–60% space savings—General file shares can contain monolithic repositories of files that can include a tremendous amount of duplicated data.
- 30–50% space savings—User documents can contain standard user files that may include photos, music, and videos.
Installing the Windows Server Dedup
The process to install the Windows Server Dedup feature is straightforward. Administrators can install Dedup using the GUI Server Manager, Windows Admin Center, or PowerShell. Data Deduplication is part of the File and Storage Services role in Windows Server. Below is a screenshot from Windows Server 2019.
Using PowerShell, you can enable data deduplication using the following cmdlet:
Install-WindowsFeature -Name FS-Data-Deduplication
Data Deduplication installation in Windows Admin Center is carried out by visiting the Roles and Features menu and placing a check next to Data Deduplication, which is found under the File and Storage Services role.
Enabling Data Deduplication on a Windows Server volume
Once you have installed Data Deduplication, the process to enable it on a volume is straightforward. Using Server Manager, navigate to File and Storage Services > Volumes > Disks. Click the disk. Then click the volume that resides on the disk that you want to deduplicate.
Under Deduplication Settings, you can configure several options. These include:
- The age of the files to be deduplicated
- Custom file extensions to exclude
- Custom excluded folders
- Configuration of the deduplication schedule
The Deduplication Schedule configuration provides interesting options to customize the background process used to run the data deduplication. You can further customize the deduplication schedule and resource utilization using the throughput optimization options. It also allows for multiple schedules.
Running Data Deduplication Scheduled Tasks
You may wonder how the background tasks run. When you install Data Deduplication, Windows creates scheduled tasks to take care of the background process, garbage collection, and data scrubbing. If you want to run these manually, you can. The background deduplication process runs by default every 1 hour indefinitely.
Using PowerShell for status and management
PowerShell provides many great controls and options for interacting with Windows Server Data Deduplication. Let's take note of a few of these cmdlets. The Get-DedupStatus cmdlet displays the status of the deduplication operations and the deduplication percentage.
As you can see, at first, we have no space savings after Data Deduplication is installed and enabled. However, after the process begins to run, we start to see space savings on the volume.
If you want to disable and get rid of Data Deduplication, you can do this easily with a couple of PowerShell cmdlets:
Disable-DedupVolume -Volume <volume letter> Start-DedupJob -type Unoptimization -Volume <volume letter>
What other types of DedupJobs can you kick off from PowerShell?
Wrapping up
Windows Server Data Deduplication is a great way to reclaim storage space efficiently in your Windows Server environment. With each Windows release, the deduplication capabilities continue to improve. It provides tremendous space-saving benefits with specific workloads, especially for general file servers and VDI virtualization environments. For virtualization environments, space savings can be as much as 80–95%.
Subscribe to 4sysops newsletter!
The Data Deduplication subcomponent of File and Storage Services is easy to add and enable on a specific storage volume. You can take advantage of many options to control the deduplication schedule, file types, and exclusions. PowerShell provides several cmdlets that allow interacting with, managing, and controlling Windows Server Data Deduplication.
Nice!
Thank very much.