Too often, IT administrators rely purely on ticket submissions to determine whether Windows clients work reliably. What I want to show you here is how you can use PowerShell to monitor system stability on multiple computers by tracking reliability indicators such as applications crashes, hanging applications, and blue screens of death (BSODs).

With the little PowerShell script discussed below, you can remotely retrieve reliability information and visualize the data with the free PowerBI Desktop tool.

Visualizing system stability with PowerBI Desktop

Visualizing system stability with PowerBI Desktop

However, before I detail the PowerShell solution, let's look at the method that admins typically use when they want to monitor the stability of a Windows computer. This will help us to understand what kind of reliability data is available.

Reliability Monitor

Reliability Monitor is a handy little tool built into Windows since Vista. The tool contains a whole lot of helpful information when it comes to troubleshooting a Windows computer. It can be a bit overwhelming when you look at it the first time. I will break it down here for you briefly.

Reliability Monitor

Reliability Monitor

Blue line across the top (top wow): This is your system stability index. It is basically a scoring system based on how often your computer experiences failures. The scoring system ranges from 1 to 10. The more often your computer fails, the lower your score. The longer you go without a system or application failure, the higher your score will be.

Application failures: Every time you have an application failure, which can be an application crashing or hanging, it will show a red "x" in that column.

Windows failures: This column will get a red "x" when you have a BSOD.

Miscellaneous failures: These are when the system unexpectedly loses power. The power button might force a shutdown or possibly the battery could run completely out.

Warnings: These do not impact your stability score but provide good information. They will show when an application installation/removal, Windows Update, or driver update was unsuccessful.

Information: This column will get a blue "i" when there are system changes you should be aware of. Driver installation, Windows Updates, and software installations will all appear in this column. This information can be very handy when troubleshooting what caused a failure.

Clicking on any of the columns will give you more detailed information about the abovementioned events. This information is great. The problem is that I cannot remotely log in to every computer in an enterprise environment and check every one of these PCs. I could try to figure out the events that trigger these reliability records and pull them in, but I would have to recreate the scoring system.

The solution? Read on.

win32_ReliabilityStabilityMetrics, win32_ReliabilityRecords

A couple of WMI classes store all the scorings and records discussed above. You will want to collect the following properties from the WMI classes:

Win32_ReliabilityStabilityMetrics

  • TimeGenerated: The system calculates the stability index score every hour the computer is on and will record the associated timestamp in this property
  • SystemStabilityIndex: This is the stability score index calculated

Win32_ReliabilityRecords

  • EventIdentifier: The ID for the event in the Windows Event Log
  • Message: The body of the Windows event associated with the failure or change
  • ProductName: The product name or executable associated with the failure
  • SourceName: This designates what type of event we are looking at and will always be one of the following:
    • Application Error: Application stops responding and crashes
    • Application Hang: Application stops responding but recovers
    • Application-Add-On-Event-Provider: Add-ons were enabled for Internet Explorer
    • EventLog: The only event I have seen from this source is "The system was shut down unexpectedly"
    • Microsoft-Windows-Setup: Occurs when Windows is first installed
    • Microsoft-Windows-StartupRepair: Windows failed to boot and a startup repair was attempted
    • Microsoft-Windows-UserPnp: Driver-related events
    • Microsoft-Windows-WER-SystemErrorReporting: Blue screen of death
    • Microsoft-Windows-WindowsUpdateClient: Windows Updates
    • MsiInstaller: Application installations and removals
  • TimeGenerated: See above
  • User: The user account active during the event

Enterprise client management systems such as Microsoft's System Center Configuration Manager (SCCM) or Symantec's Symantec Management Platform can inventory these classes. However, if such tools are not available in your environment, you can use PowerShell, a CSV file on a network share, and PowerBI Desktop to collect and analyze the data. You could easily adapt the reporting process to use a database as a source in place of the CSV.

Collecting system stability data with PowerShell

You can use the simple script below to collect data from a list of computers over the last 30 days.

$30DaysAgo = (Get-Date).AddDays(-30)

$Computers = @("Computer1","Computer2","Computer3")

$ReliabilityStabilityMetrics = Get-CimInstance -ClassName win32_reliabilitystabilitymetrics -filter "TimeGenerated > '$30DaysAgo'" -ComputerName $Computers | Select-Object PSComputerName, SystemStabilityIndex, TimeGenerated

$ReliabilityRecords = Get-CimInstance -ClassName win32_reliabilityRecords -filter "TimeGenerated > '$30DaysAgo'" -ComputerName $Computers | Select-Object PSComputerName, EventIdentifier, LogFile, Message, ProductName, RecordNumber, SourceName, TimeGenerated

$ReliabilityStabilityMetrics | Export-CSV $env:USERPROFILE\Documents\ReliabilityStabilityMetrics.csv -Encoding ASCII ‑NoTypeInformation
$ReliabilityRecords | Export-CSV $env:USERPROFILE\Documents\ReliabilityRecords.csv ‑Encoding ASCII -NoTypeInformation

The script uses the Get-CimInstance cmdlet to query the WMI classes remotely on the computers stored in an array. It then exports the reliability records and the stability metrics into a CSV file in the Documents folder.

Building the PowerBI Report

Next, you have to download PowerBI Desktop, the report builder portion of the PowerBI product. You can use this for free without registering for an account. After installing it, you can then import both CSVs created from your PowerShell commands using Get Data in PowerBI Desktop.

Now you can start creating charts based on the data you have collected. I will walk you through creating a couple of easier ones I have found useful. The procedure becomes even more useful when you relate this data to hardware and operating system inventory information.

Average of System Stability Index

I use the System Stability Index to spot major drops in stability in the environment and watch it to make sure "fixes" pushed to clients are making a difference.

Average of System Stability

Average of System Stability

Trending Events

Trending Events are useful for correlating changes to failures.

Trending Events

Trending Events

With just three PCs it's hard to tell, but I will show you an example from the dashboard I have built for our environment. We recently rolled out an update to SnagIt that is causing issues. The correlation of failures to the installation events from MSIInstaller are pretty clearly visible.

Trending SnagIt issue root cause

Trending SnagIt issue root cause

We also use this indicator to rule out updates as a possible cause of crashing. The chart shows that Excel was crashing just as often before the latest update installation events as it was after.

Subscribe to 4sysops newsletter!

Trending issue Excel root cause

Trending issue Excel root cause

Conclusion

It is important keep an eye on reliability and stability indicators to fix problems before end users start reporting them. With the help of PowerShell, you can get quickly get an overview of the troubles that are building up in your network.

avataravataravataravatar
22 Comments
  1. Matt D. 6 years ago

    Great article…can’t wait to give this a try and be more proactive!

    avatar
    • Author

      Thanks Matt! I’m really curious about what reliability score other people are averaging at. We seem to be between 7.5 and 8.

  2. YouTube the heck out of this. Trying to use it myself now and it is working really well.

    avatar
    • Author

      Thanks for the comment William. Glad it’s working out well for you! I have not really done much on YouTube but I’m curious to know what else you would like to see.

      • I haven’t gotten this to work the way you did but man is this an awesome script and BI is really cool to use. I can’t wait to learn more on how to get it working in the same manner.

        avatar
  3. CMI 6 years ago

    Hello Micah,

    Excellent article, very informative. Thank you for writing this. Do you have thoughts how does this scale? Would it scale to around 200 machines or more?

    Thank you.

    CMI

     

    • Author

      Hey CMI,

      Great question! It will definitely scale. The queries are pretty low impact on the clients they are requesting info from and the reporting of course is pretty simple data.

      We moved to using our client management solution Symantec’s SMP to collecting the information from the WMI classes because we needed to scale it globally from 5k clients and collect info when off the corporate network. SCCM would also let you inventory the two WMI classes you need.

      As long as the machines are pingable from the computer you are running this script from it should scale fine to at least 200.

       

  4. Matt 6 years ago

    @(‘Computer1′,’Computer2′,’Computer3’) is an array, not a hash table.

    Other than being nick picky about that, I want to say thanks for sharing this.

    avatar
  5. Nice article. I have to try it.

    avatar
  6. Aries 6 years ago

    This is fabulous!!

    My various W10 machines only seem to collect 30 days worth of data but my older OS machines collect a year of data. Is that configurable?

    avatar
    • Author

      Hi Aries,

      Glad you liked it! I have not been able to find any documentation on how to change how long it keeps that data, if you find it let me know though!

      We ended up just working with the 30 days of data to monitor existing issues more than trend over-time. If management gets more interested the plan was to begin archiving the data off somewhere so we could trace overtime better.

  7. Danny Nilsson 4 years ago

    For me to get this working i had to change the first line. else the time format would not work with the format from my logs

    so first converting it to datetime to make the getdate know how the conversion should work, and the string is needed to have the non supported dash time format that was needed.

    [datetime]$30DaysAgo = (Get-Date).AddDays(-30)
    [String]$30DaysAgo = $30DaysAgo.tostring(“dd-MM-yyyy HH:mm:ss”)

  8. Brent 4 years ago

    Can you provide the Power Bi file for this?

    avatar
  9. Bill Hamann (Rank 1) 4 years ago

    Hello Micah,

    Very creative! I was really impressed. I’ll second the request for providing the PowerBI file.

    Thanks much,
    Bill

  10. Albert Dutra (Rank 1) 4 years ago

    Great Article!

    I like the idea of being able to dump to .CVS files and look at it later. I’ve implemented that portion into our helpdesk ticket submission system so that whenever a ticket is sent in it also sends in these logs so we can do some investigative work (if necessary) without further requirements to connect to the PC.

    I’m curious on the BI Report you made, as others have mentioned, would you be willing to share your BI Report file with us? Thanks!

    Albert

  11. Felipe Pereira 4 years ago

    Hey My friend, thanks for the excellent article, could u send the PowerBi Files for me?

    feliperocp@yahoo.com.br

  12. Selvin 3 years ago

    is there a walk through video on this ?

  13. Al 2 years ago

    Another vote for more info on the PowerBI assistance.

  14. Fivesoul 1 year ago

    I got this error when run that script. Please help

  15. Fivesoul 1 year ago

    I got this error when run that script. Please help

    Export-Csv : Cannot bind parameter ‘Delimiter’. Cannot convert value “NoTypeInformation” to type “System.Char”. Error: “String must be exactly one
    character long

  16. amry 1 year ago

    Please help. I got below error when run script

    Export-Csv : Cannot bind parameter ‘Delimiter’. Cannot convert value “Encoding” to type “System.Char”. Error: “String must be exactly one character
    long.”
    At D:\Myself\PSTools\CollectingSystemStability&ProblemOri.ps1:5 char:102
    + … ERPROFILE\Documents\ReliabilityStabilityMetrics.csv ‑Encoding ASCII – …
    + ~~~~~~~~~
    + CategoryInfo : InvalidArgument: (:) [Export-Csv], ParameterBindingException
    + FullyQualifiedErrorId : CannotConvertArgumentNoMessage,Microsoft.PowerShell.Commands.ExportCsvCommand

    Export-Csv : Cannot bind parameter ‘Delimiter’. Cannot convert value “Encoding” to type “System.Char”. Error: “String must be exactly one character
    long.”
    At D:\Myself\PSTools\CollectingSystemStability&ProblemOri.ps1:6 char:84
    + … V $env:USERPROFILE\Documents\ReliabilityRecords.csv ‑Encoding ASCII – …
    + ~~~~~~~~~
    + CategoryInfo : InvalidArgument: (:) [Export-Csv], ParameterBindingException
    + FullyQualifiedErrorId : CannotConvertArgumentNoMessage,Microsoft.PowerShell.Commands.ExportCsvCommand

Leave a reply

Your email address will not be published. Required fields are marked *

*

© 4sysops 2006 - 2023

CONTACT US

Please ask IT administration questions in the forums. Any other messages are welcome.

Sending

Log in with your credentials

or    

Forgot your details?

Create Account