Storage technologies have certainly evolved over the past several years and provide many powerful tools to allow the most efficient use of provisioned space. One of the technologies available in Windows Server is deduplication. Microsoft continues to add new capabilities to the deduplication feature with each Windows release.
Latest posts by Brandon Lee (see all)

What benefits are provided by Windows Server deduplication? How is the feature added? How do you enable deduplication, check its status, and pause or stop the deduplication process? Let's take a look at Windows Server deduplication.

What is data deduplication in Windows Server?

When you store data comprising various files and other data on any Windows Server, there will be duplicated data blocks among the multiple files. It is especially true if the different files stored on a Windows Server volume are similar in content or structure. A departmental file server is a good example that helps visualize how there may be vast amounts of duplicated data. In a large file share, end users may store many copies of the same or similar files. This leads to redundant copies of data that impact the efficiency of storage.

Instead of storing multiple copies of data, as in traditional storage environments, deduplication provides the means to store the data once and create intelligent pointers to the actual data location. In this way, the storage environment does not house duplicated information. Microsoft keeps improving the features of deduplication as well. In Windows Server 2019, Data Deduplication can now deduplicate both NTFS and ReFS volumes. Prior to Windows Server 2019, ReFS deduplication was not possible.

How does Windows Server data deduplication work?

Microsoft uses two principles to implement data deduplication in Windows Server:

  1. The deduplication process runs on data by using a post-processing model. This means that the deduplication process does not interfere with the performance of the write process. When data is written to the storage, it is not optimized. Afterward, the deduplication optimization process runs to ensure the deduplication of the data.
  2. End users are unaware of the deduplication process—Deduplication in Windows Server is entirely transparent. End users are unaware they may be working with deduplicated data.

To accomplish the successful deduplication of data in accordance with the principles listed above, Windows Server uses the following process:

  1. The file system scans storage to find files matching the deduplication optimization policy.
  2. The system breaks the files into chunks.
  3. Unique chunks of file data are identified.
  4. These file chunks are placed in the chunk store.
  5. Pointers to the chunk store are created to allow redirecting file reads to the appropriate file chunks.

Strong use cases for data deduplication

Specific use cases lend themselves favorable to data deduplication. What are workloads that typically show massive benefits to using data deduplication? Let's list these in the order of the most significant benefits.

  • 80–95% space savings—Virtualization environments, especially VDI workloads and ISOs for deployment.
  • 70–80% space savings—Deployment shares contain massively duplicated data stores of software binaries, cab files, and other operation-specific files.
  • 50–60% space savings—General file shares can contain monolithic repositories of files that can include a tremendous amount of duplicated data.
  • 30–50% space savings—User documents can contain standard user files that may include photos, music, and videos.

Installing the Windows Server Dedup

The process to install the Windows Server Dedup feature is straightforward. Administrators can install Dedup using the GUI Server Manager, Windows Admin Center, or PowerShell. Data Deduplication is part of the File and Storage Services role in Windows Server. Below is a screenshot from Windows Server 2019.

Installing the Data Deduplication File and Storage Services role in Windows Server 2019

Installing the Data Deduplication File and Storage Services role in Windows Server 2019

Using PowerShell, you can enable data deduplication using the following cmdlet:

Install-WindowsFeature -Name FS-Data-Deduplication

Data Deduplication installation in Windows Admin Center is carried out by visiting the Roles and Features menu and placing a check next to Data Deduplication, which is found under the File and Storage Services role.

Using Windows Admin Center to install Data Deduplication in Windows Server 2019

Using Windows Admin Center to install Data Deduplication in Windows Server 2019

Enabling Data Deduplication on a Windows Server volume

Once you have installed Data Deduplication, the process to enable it on a volume is straightforward. Using Server Manager, navigate to File and Storage Services > Volumes > Disks. Click the disk. Then click the volume that resides on the disk that you want to deduplicate.

Enabling Data Deduplication for a Windows Server 2019 volume

Enabling Data Deduplication for a Windows Server 2019 volume

Choose the type of files stored on the volume to be deduplicated

Choose the type of files stored on the volume to be deduplicated

Under Deduplication Settings, you can configure several options. These include:

  • The age of the files to be deduplicated
  • Custom file extensions to exclude
  • Custom excluded folders
  • Configuration of the deduplication schedule
Configuring Windows Server deduplication setting

Configuring Windows Server deduplication setting

The Deduplication Schedule configuration provides interesting options to customize the background process used to run the data deduplication. You can further customize the deduplication schedule and resource utilization using the throughput optimization options. It also allows for multiple schedules.

Setting the deduplication schedule in Windows Server 2019

Setting the deduplication schedule in Windows Server 2019

Running Data Deduplication Scheduled Tasks

You may wonder how the background tasks run. When you install Data Deduplication, Windows creates scheduled tasks to take care of the background process, garbage collection, and data scrubbing. If you want to run these manually, you can. The background deduplication process runs by default every 1 hour indefinitely.

Windows Deduplication Scheduled Tasks in Windows Server 2019

Windows Deduplication Scheduled Tasks in Windows Server 2019

Using PowerShell for status and management

PowerShell provides many great controls and options for interacting with Windows Server Data Deduplication. Let's take note of a few of these cmdlets. The Get-DedupStatus cmdlet displays the status of the deduplication operations and the deduplication percentage.

As you can see, at first, we have no space savings after Data Deduplication is installed and enabled. However, after the process begins to run, we start to see space savings on the volume.

Getting the status of Data Deduplication for a storage volume in Windows Server

Getting the status of Data Deduplication for a storage volume in Windows Server

If you want to disable and get rid of Data Deduplication, you can do this easily with a couple of PowerShell cmdlets:

Disable-DedupVolume -Volume <volume letter>
Start-DedupJob -type Unoptimization -Volume <volume letter>

What other types of DedupJobs can you kick off from PowerShell?

Looking at the start dedupjob cmdlet type options in Windows Server 2019

Looking at the start dedupjob cmdlet type options in Windows Server 2019

Wrapping up

Windows Server Data Deduplication is a great way to reclaim storage space efficiently in your Windows Server environment. With each Windows release, the deduplication capabilities continue to improve. It provides tremendous space-saving benefits with specific workloads, especially for general file servers and VDI virtualization environments. For virtualization environments, space savings can be as much as 80–95%.

Subscribe to 4sysops newsletter!

The Data Deduplication subcomponent of File and Storage Services is easy to add and enable on a specific storage volume. You can take advantage of many options to control the deduplication schedule, file types, and exclusions. PowerShell provides several cmdlets that allow interacting with, managing, and controlling Windows Server Data Deduplication.

avatar
1 Comment
  1. Artur Aragão 12 months ago

    Nice!
    Thank very much.

Leave a reply

Your email address will not be published.

*

© 4sysops 2006 - 2023

CONTACT US

Please ask IT administration questions in the forums. Any other messages are welcome.

Sending

Log in with your credentials

or    

Forgot your details?

Create Account