The PowerShell filter resembles a function, but it allows you to process input from the pipeline faster and more efficiently.

Michael Pietroforte

Michael Pietroforte is the founder and editor in chief of 4sysops. He has more than 35 years of experience in IT management and system administration.

The syntax of a PowerShell filter looks very much like that of a PowerShell function:

Here is an example:

In the example, the only difference from a common PowerShell function is the filter keyword. However, the more interesting difference is how a filter processes input from the pipeline. In the next example, we pipe an array with three elements to our filter:

The automatic variable $_ is used to capture the input from the pipeline.

You can also pipe data to a function, but the syntax is different. One way of piping to a function is demonstrated in the next example:

Instead of $_, we used the automatic variable $Input to access the pipeline. However, even though the output in the two examples is the same, two very different things are happening here.

First of all, if you want to access different elements of the array in the function, you need to work with a loop:

In the function, we used a ForEach loop to cycle through the $Input variable. Things look a bit simpler in a filter:

The filter doesn’t require an explicit ForEach loop because the loop functionality is already built in.

However, the main difference between the two examples here is the way the pipeline is processed. In the function, the ForEach loop starts running only after the entire array is stored in the $Input variable.

In contrast, the filter accesses the items in the pipeline one after the other as they arrive from the source. Therefore, the filter can already start producing output while the pipeline is still filled with new items. Because the array in the example is small, you don’t notice the difference; however, if you have to deal with large chunks of data, the difference is significant.

To get a feeling for the difference, try the following two scripts:

I added the -ErrorAction parameter here because, otherwise, PowerShell would complain that access to some folders has been denied. After you launch the above script, nothing will happen for quite a while until, suddenly, all files on drive C: will be displayed. You’ll notice the difference from a filter if you run this example:

The filter instantly produces output. This example demonstrates that, in some cases, a filter can handle large amounts of data much faster than a function can because you can continue processing the filter output right away.

Your code will be even faster if you can end the loop when the data in the pipeline meets a certain condition. For instance, if you search a certain file, you can terminate the filter after you find the file, like this:

However, speed is not the only issue here. Another problem with piping to a function is that the $Input variable has to store all the data that comes through the pipeline. This can consume a lot of memory on your computer. Consider the following example:

I piped a 3GB ISO file to the function. When you run this code, you won’t see any output at first, but you’ll notice if you check the Task Manager that PowerShell quickly eats up all the memory on your computer. Even though the file only has 3GB, the 8GB on my test machine were all occupied within a couple of seconds. After that, Windows started swapping to the disk like wild, which means that the computer was essentially dead. This is how you can easily crash a server.

PowerShell function depletes memory

PowerShell function depletes memory.

Now, if we do the same thing with a filter, PowerShell will make more careful use of our resources:

Not only will you immediately see the contents of the file, but the RAM consumption will be relatively moderate as well.

This makes functions look like a bad choice for processing pipeline data. However, you can use PowerShell functions as filters. The following function will do exactly the same thing as the filter above:

By using the Process keyword, you can make a function behave just like a filter. When I was talking about functions in the text above, I was actually referring to functions where the $Input variable is used to access the pipeline.

The fact that functions can be used as filters makes the filter keyword obsolete, which could mean that Microsoft could remove it from future PowerShell versions. I would therefore avoid using the keyword even though you need to type a little more when using a function as a filter. Another reason for avoiding the filter keyword is that not every admin is familiar with it. Thus, your scripts become easier to read if you just stick with the more common function keyword.

You are probably wondering why PowerShell functions offer the option to read data from the pipeline through the $Input variable if doing so can cause performance issues. The reason is that scenarios exist where you have to process the entire output of a cmdlet with a comparably complex algorithm. Then, you might prefer to store the entire data set from the pipeline in an array so you can leverage PowerShell’s array manipulation features.

However, I think that, in the vast majority of cases when you work with the pipeline, you would just want to access a subset of the data in the stream. Or, in other words, you’d want to filter the output of a cmdlet. In all those cases, I would use a function as a filter and avoid using the $Input variable.

Actually, the main point about the pipeline is that it allows you to process data items without storing the entire data set in your precious memory. And it is the nature of functions that you can’t really know in advance what kind of data you will process with them in the future. Hence, if you use a function like a filter, you will always be on the safe side.

PowerShell comes with mighty features that allow you to filter data for relatively complex tasks. The example below compares the files of the folders you pass to the function with the current folder to find all files with duplicate names. If you use the -Recurse parameter in the Get-ChildItem cmdlet, the function will compare your current folder with the passed folder and all its subfolders.

The script block after the Begin keyword is executed before the function starts to read the pipeline in the Process block. In the example, I used the Begin block to store the file names of the current folder in a variable. This improves the performance of the script because, if you load the file names in the Process block, PowerShell will read the folder again every time it processes an item from the pipeline. As mentioned above, the Process block works essentially like a ForEach loop.

In the End block, you can execute commands after the function has read the entire pipeline. Note that the Begin and End keywords are required if you want to run commands before or after the Process block.

Join the 4sysops PowerShell group!


Leave a reply

Your email address will not be published. Required fields are marked *


© 4sysops 2006 - 2019


Please ask IT administration questions in the forums. Any other messages are welcome.


Log in with your credentials


Forgot your details?

Create Account