PowerShell’s Invoke-WebRequest is a powerful cmdlet that allows you to download, parse, and scrape web pages.
Profile gravatar of Michael Pietroforte

Michael Pietroforte

Michael Pietroforte is the founder and editor of 4sysops. He is a Microsoft Most Valuable Professional (MVP) with more than 30 years of experience in IT management and system administration.
Profile gravatar of Michael Pietroforte

In a previous post, I outlined the options you have to download files with different Internet protocols. You use Invoke-WebRequest to download files from the web via HTTP and HTTPS. However, the cmdlet enables you to do much more than just download files; you can use it to analyze to contents of web pages and use the information in your scripts.

The HtmlWebResponseObject object ^

If you pass a URI to Invoke-WebRequest, it won’t just display the HTML code of the web page. Instead, it will show you formatted output of various properties of the corresponding web request. For example:

Storing HtmlWebResponseObject in a variable

Storing HtmlWebResponseObject in a variable

Like most cmdlets, Invoke-WebRequest returns an object. If you execute the object’s GetType method, you will learn that the object is of the type HtmlWebResponseObject.

As usual, you can pipe the object to Get-Member to get an overview of the object’s properties:

Parse an HTML page ^

Properties such as Links or ParsedHtml indicate that the main purpose of the cmdlet is to parse web pages. If you just want to access the plain content of the downloaded page, you can do so through the Content property:

There also is a RawContent property, which includes the HTTP header fields that the web server returned. Of course, you can also only read the HTTP header fields:

Headers of a web request

Headers of a web request

It may also be useful to have easy access to the HTTP response status codes and their descriptions:

The Links property is an array of objects that contain all the hyperlinks in the web page. The most interesting properties of a link object are innerHTML, innerText, outerHTML, and href.

The URL that the hyperlink points to is stored in href. To get a list of all links in the web page, you could use this command:

Displaying a web page’s links

Displaying a web page’s links

outerHTML refers to the entire link as it appears together with the <a> tag: <a href="http://contoso.com">Contoso</a>. Of course, other elements can appear here, such as additional attributes of the <a> element or additional HTML elements after the start tag (<a>), such as image tags. In contrast, the innerHTML property only stores the content between the start tag and the end tag (</a>) together with enclosed additional HTML elements.

The innerText property strips all HTML code from the innerHTML property. You can use this property to read the anchor text of a hyperlink. However, if the additional HTML elements exist inside the <a> element, you will get the text between those tags as well.

Note that the Link object also has an outerText property, but its contents will always be identical to the innerText property if you read a web page. The difference between outerText and innerText only matters if you write HTML code, which we don’t do here.

The Image property can be handled in a similar way as the Link property. It, of course, does not contain the images. Instead, it stores objects with properties that contain HTML code that refers to the images. The most interesting properties are width, height, alt, and src. If you know a little HTML, you will know how to deal with these attributes.

The following example downloads all images from a web page:

$WebResponse.Images stores an array of image objects from where we extract the src attribute of the <img> element, which refers to the location of the image. With the help of the Split-Path cmdlet, we get the file name from the URL, which we use to store the image in the current folder.

The properties that you see when you pipe an HtmlWebResponseObject object to Get-Member are those that you need most often when you have to parse an HTML page. If you are looking for other HTML elements, you can use the AllElements and ParsedHTML properties.

AllElements (you guessed it already) contains all the HTML elements that the page contains:

Of course, this also includes <a> and <img> elements, which means that you can also access them through the AllElements property. For instance, the command below, which displays all the links in a web page, is a bit more longwinded alternative to $WebResponse.links:

ParsedHTML gives you access to the Document Object Model (DOM) of the web page. One difference from AllElements is that ParsedHTML also includes empty attributes of HTML elements. More interesting is that you can easily retrieve additonal information about the web page. For example, the following command tells you when the page was last modified:

Determining when a web page was last modified

Determining when a web page was last modified

Submit an HTML form ^

Invoke-WebRequest also allows you to fill out form fields. Many websites use the HTTP method GET for forms, in which case you simply have to submit a URL that contains the form field entries. If you use a web browser to submit a form, you usually see how the URL is constructed. For instance, the next command searches for PowerShell on 4sysops:

If the website uses the POST method, things get a bit more complicated. The first thing you have to do is find out which method is used by displaying the forms objects:

Displaying the forms in a web page

Displaying the forms in a web page

A web page sometimes has multiple forms using different methods. Usually you recognize the form you need by inspecting the Fields column. If the column is cut off, you can display all the form fields with this command:

Let’s have a look at a more concrete example. Our goal is to scrape the country code of a particular IP address from a Whois website. We first have to find out how the form field is structured. Because we are working on the PowerShell console, it is okay to use the alias of Invoke-WebRequest:

Determining the form field of a Whois website

Determining the form field of a Whois website

We see that the website uses the POST method, that the URL to be called to process the query is https://who.is/domains/search, and that two form fields are required. The default value of the Search_type field is “Whois” and the query field is most likely the field for the IP address. We are now ready to scrape the country code of the IP address from the result page:

Update: The example no longer works because the web page uses a different form field now. You can use field variable now:

In the first line, we define a hash table that contains the names of our two form fields and the values we want to submit. In line 2, we store the result of the request page of the query in a variable. The web page returns the result within a <pre> element, and we extract its content in the next line.

We then use the -match operator with a regular expression to search for the country code. “\s+" matches any white space character, and “\w{2}” is supposed to match the country code, which consists of two characters. The parentheses group the country code, which allows us to access the result through the automatic variable $Matches.

Take part in our competition and win $100!


Related Posts

  1. avatar
    Kris 2 years ago

    Good stuff. Note that I got a message asking me to accept cookies every time I tried to do anything with the page contents (e.g. searching for a tag). Got round this by using -UseBasicParsing on Invoke-WebRequest which uses Powershell's built in parser rather than Internet Explorer's.

    I used this to build a proof of concept to download Dilbert strips from the archive - download a page, find the appropriate image tag, download that image, add 1 to the date and do the same. Obviously not using it to download en masse, probably get blocked for that but very pleased it worked 🙂

  2. Profile gravatar of Michael Pietroforte
    Michael Pietroforte 2 years ago

    Kris, thanks. I think I saw the cookie request only once. Maybe this is an IE setting? As to downloading en masse, you have no idea how many crawlers are out there and it is really hard to block them. Every minute or so another crawler hits 4sysops.

  3. avatar
    Schorschi 1 year ago

    If not a webpage, but a file for download, how would you get the file information without actually downloading the file contents?  Web response method will actually pull the entire file, in effect downloading or reading the file in total when all that is desired is just the file information, like size of the file.

    • Profile gravatar of Michael Pietroforte Author

      The file properties are stored in the filesystem on the host. Web servers usually don't transmit this information. So if you want to read the file metadata without downloading the file, you need an API on the host that offers this data.

      If the remote host is a Windows machine you can use PowerShell remoting to read the file size:

      invoke-command -computername RemoteComputerName -scriptblock {(get-item c:\windows\notepad.exe).length}

  4. avatar
    Caroline 1 year ago

    Thanks for the informative article on Invoke-WebRequest. Just what I was looking for.

  5. avatar
    Oleg 10 months ago

    I need to download file from https://raw.githubusercontent.com/h5bp/html5-boilerplate/master/src/index.html, modifie it (for example add some tags... bootstrapGridSystem.css). In powershell it looks like:
    $results = irm -uri "https://raw.githubusercontent.com/h5bp/html5-boilerplate/master/src/index.html"
    $html = $results.ParsedHtml
    But how can I modifie the object? Is is possible to modify it?
    For example add <link href="bootstrapGridSystem.css">  and <link href="foundationGridSystem">
    Because this code:
    $linkBoot = "css/bootstrapGridSystem.css"
    $headTag.appendChild("link") didn't modify opbject $results.content?

  6. avatar
    tejanagios 10 months ago


    I am using your script and leveraging it to download image file from a list of URLS; The Script loops through each URL and invokes a web request and downloads images from it. The problem that i am facing is the images are by default getting downloaded in 320X240; where as on the actual site the image when opened in a new tab and right click downloaded, gives me a 960X720 pix file, which is what i am after.

    here is the script.


    $url = get-content "urls.txt"

    $j = $url.count

    for ($i= 0 ; $i -le $j ; $i++)


    $WebResponse = Invoke-WebRequest -uri $url[$i]

    ForEach ($Image in $WebResponse.Images)


    $FileName = Split-Path $Image.src -Leaf

    $d =  Invoke-WebRequest $Image.src




    • Profile gravatar of Michael Pietroforte Author
      Michael Pietroforte 10 months ago

      The problem is that the src attribute of the image tag only points to the image that you see on the web page. The URL of the image that is displayed when you click an image is in an a tag before the image tag. Thus, you have to retrieve all links in the web page (as explained in the article) and then get all URLs that point to images. Those URLs all have image extensions such as .jpg or .png. You could work with a regular expression to sort out these URLs.

  7. avatar
    teja 10 months ago

    Thank you, after looping through all the URLS i've got the final output.

    here is the working code; albiet it can be improved

    $source = Invoke-WebRequest -uri "<enter URL here>" `

    | Select-Object -ExpandProperty links | Select-Object -ExpandProperty href | Select-String "part"

    $j = $source.count

    for($k = 0 ; $k -le $j ; $k++)


    #write-host $source[$k].Line

    $links = Invoke-WebRequest -uri $source[$k].Line `

    | Select-Object -ExpandProperty links | Select-Object -ExpandProperty href | Select-String ".PNG"

    foreach ($link in $links)


    $filename =  Split-Path $link.line -Leaf

    Invoke-WebRequest -uri $link.Line -OutFile "C:\users\admin\Desktop\images\$k$filename"



    • Profile gravatar of Michael Pietroforte Author
      Michael Pietroforte 10 months ago

      Thanks for sharing. If the web page not only contains links to PNGs but also to JPGs, you could use this: 

  8. avatar
    ron 6 months ago

    Does this still work?  Perhaps the website has changed its search/query methods. I am not seeing a <pre> tag.

    • Profile gravatar of Michael Pietroforte Author
      Michael Pietroforte 6 months ago

      The code does no longer work because they changed the form field. This should work now:

      $Fields = @{"searchString" = ""}

      $WebResponse = Invoke-WebRequest -Uri "https://who.is/domains/search" -Method Post -Body $Fields

      $Pre = $WebResponse.AllElements | Where {$_.TagName -eq "pre"}

      If ($Pre -match "Country:\s+(\w{2})")


      Write-Host "Country code:" $Matches[1]


  9. avatar
    Steve Giovanni 1 week ago

    I was trying to use this on a website I visit to see a list they post there weekly.  There is no RSS feed or anything so you have to manually go to the site. I thought it would be fun to automate scraping the weekly options and emailing them to myself, which is where your article came in very handy, thank you!

    The problem is while I can pull back the URL, it looks like they are embedding the stuff I want not in the actually page, but they are pulling it in from a frame (I think).

    I'm decent at basic PowerShell scripting but haven't looked at HTML since the late 90s. Any ideas?  I also tried RawContent and AllElements to no avail.

    • Profile gravatar of Michael Pietroforte Author

      If it is an iframe, you can just load the iframe's URL. In most browsers, you can right-click the element in the web page and then click "Inspect." You should then be able to see the URL where the content that interests you is coming from.

  10. avatar
    Steve Giovanni 1 week ago

    I tried to Inspect Element and this is what I see:

    <div style="left: 496px; width: 475px; position: absolute; top: 98px;" class="txtNew" id="WRchTxtd-17bd" data-reactid=".0.$SITE_ROOT.$desktop_siteRoot.$PAGES_CONTAINER.1.1.$SITE_PAGES.$c1a73_DESKTOP.1.$WRchTxtd-17bd"><p class="font_8" style="font-size:28px; text-align:center;">

    Not much help there from what I can discern so I guess my question is:  is there a way to tell PowerShell to just download/render the page as a browser would then I can parse it from there?

    • Profile gravatar of Michael Pietroforte Author

      I guess the div box is filled by JavaScript. Where should PowerShell render the page? In the console? And why would that help with parsing? PowerShell creates objects of the HTML elements in the web page. However, PowerShell doesn't understand JavaScript. I suppose you are better off with a web scraping tool that has a GUI.

  11. avatar
    Steve Giovanni 5 days ago

    Michael thank you for your reply.  I was able to get a bit further, but it still isn't working correctly for some reason.  Would you mind taking a look and letting me know if you see what I'm doing wrong?

    $site = Invoke-WebRequest -Uri "https://www.localfarefarmbagsouth.com/about_us"
    ($site.ParsedHtml.getElementsByTagName('p') | Where {$_.className -eq 'font_8'}).innerText

    • Profile gravatar of Michael Pietroforte Author

      Try this:

      You will see there is no HTML between the body tags. This is all JavaScript. You need a scraping tool with an engine that can execute JavaScript.

  12. avatar
    Steve Giovanni 5 days ago

    Understood, thanks!


Leave a reply

Your email address will not be published. Required fields are marked *



Please ask IT administration questions in the forum. Any other messages are welcome.

© 4sysops 2006 - 2017

Log in with your credentials


Forgot your details?

Create Account