Filter file downloads from AWS S3 with PowerShell

In this article, I want to talk about filtering and downloading files from an Amazon Web Services (AWS) Simple Storage Service (S3) bucket. A recent job requirement came up requesting files from an S3 bucket downloaded within a certain time range. I wanted to share two of the functions from the module to look at how they work together.

Recently I received a work assignment to automate downloading files from an S3 bucket uploaded within a four-hour window. I added the finished solution as a scheduled task that had a wrapper script calling functions from the module. I'm going to share two functions, Get-AWSFilesByDate and Invoke-AWSFileDownload. The functions in this article are compatible with PowerShell Core using the AWS Core module. To install the module, type:

I'm going to start with the Get-AWSFilesByDate function. Both functions are actually filters. With a function, you get the Begin, Process, and End blocks. With a filter, it's just a process block allowing input from the pipeline without having to define a process block itself. See Michael Pietroforte's excellent article for more details.
This is the filter in full:

The filter takes three parameters: the bucket name, the prefix (the path inside the bucket), and a scriptblock type filter. To see how the code works, I'm going to break it down and build it up showing how it all fits together.
The first line in the code tests whether the bucket actually exists before proceeding. Test-S3Bucket takes the name of the bucket as a string and returns a Boolean result.
It pipes the next four lines of code into one another. The first step gets an instance of the bucket:

Getting the S3 bucket with the AWS cmdlet

Getting the S3 bucket with the AWS cmdlet

To add the second part of the pipeline, Get-S3Object lists your Amazon S3 objects from the bucket:

Without file filtering on the S3 bucket

Without file filtering on the S3 bucket

Get-S3Object has returned two objects, but I only need file information and the ability to filter on the file date and time.
The object property key returned to the pipeline is of the string type. I can check this by using the GetType() method on the object:

Type of key object returned from the S3 object

Type of key object returned from the S3 object

To filter out the files, I use the .NET static method GetExtension from the System.IO.Path class. Checking the overload of the method, I can see it takes a string:

Overload methods for GetExtension
Adding the .NET class to a Where clause filters files out with an extension:

File filtering on the S3 bucket with a Where clause

File filtering on the S3 bucket with a Where clause

The final part of this filter is to be able to filter out files within a certain date and time range. Using Get-Member, I can see the object type returned and the properties:

Objects returned from the filter with Get Member

Objects returned from the filter with Get Member



From the returned objects, I see the LastModified property uses the datetime type. I've hardcoded the filter value in the filter (function), but since it is a parameter, I can change this value when calling it.
Here is the filter for the Where clause:

The first part gets the current datetime, and the second part, after the -and operator, adds four hours to the current time. This allows my search to look for files written to the S3 bucket within the last four hours.
Let's switch our attention to the download function.
The Invoke-AWSFileDownload filter takes the files that Get-AWSFilesByDate outputs. Something to note is the objects that Get-AWSFilesByDate returns to the pipeline:

Object type returned from the AWSFilesByDate filter

Object type returned from the AWSFilesByDate filter

We can see the object returned for each file found in the S3 bucket is of the Amazon.S3.Model.S3Object type. The Invoke-AWSFileDownload parameter AWSFiles uses this type, which accepts pipeline input. Here is the filter in full:
The filter takes two parameters, AWSFiles (which processes the files the pipeline returns from Get-AWSFilesByDate) and Destination. The type defined for Destination shows the parameter's use for the download location.
Each file passes through a foreach block, which creates a download variable. The download variable creates the value for the File parameter of the AWS cmdlet Read-S3Object. It formats the string with the destination directory and the key, which is the file. It needs to split the key since it displays the S3 bucket prefix. A demonstration will make this a bit clearer. This captures the object information that Get-AWSFilesByDate returns in a variable:

I only have a single file in my S3 bucket, so I'm working with a single object in this example.
Now that we have our variable, take a look at the key value:
$File.Key

Key information returned

Key information returned

The prefix, if you remember from earlier, is Testing. We'll need to remove this since the file name is the only part we want. First, we are just going to split the value by the forward slash symbol:
$File.Key -split '/'

Split without indexing

Split without indexing

In the above example, I only get a split of two. But you could have a longer prefix path and have a few more lines from the split. The objects returned from the split are of the Array type (a string array in this case). To get the last object from the array regardless of how many values are in your array, use -1:

Splitting of the key

Splitting of the key

This splats the download variable (created for each file parsed) to the AWS cmdlet Read-S3Object. As the AWS documentation for the Read-S3Object cmdlet states, it "Downloads one or more objects from an S3 bucket to the local file system."
The final working of the two filters together looks like this:

The two filters we have talked about in this article really show the value of the PowerShell pipeline. Using the object type to bind objects together through functions and filters lets us flow pipelines together to perform tasks. I've demonstrated viewing object properties that helped create the filter for files in S3.
I have uploaded the files in this article to a GitHub repository.

Join the 4sysops PowerShell group!

Your question was not answered? Ask in the forum!

1+
Share
0 Comments

Leave a reply

Your email address will not be published. Required fields are marked *

*

© 4sysops 2006 - 2020

CONTACT US

Please ask IT administration questions in the forums. Any other messages are welcome.

Sending

Log in with your credentials

or    

Forgot your details?

Create Account