Too often, IT administrators rely purely on ticket submissions to determine whether Windows clients work reliably. What I want to show you here is how you can use PowerShell to monitor system stability on multiple computers by tracking reliability indicators such as applications crashes, hanging applications, and blue screens of death (BSODs).

With the little PowerShell script discussed below, you can remotely retrieve reliability information and visualize the data with the free PowerBI Desktop tool.

Visualizing system stability with PowerBI Desktop

Visualizing system stability with PowerBI Desktop

However, before I detail the PowerShell solution, let's look at the method that admins typically use when they want to monitor the stability of a Windows computer. This will help us to understand what kind of reliability data is available.

Reliability Monitor ^

Reliability Monitor is a handy little tool built into Windows since Vista. The tool contains a whole lot of helpful information when it comes to troubleshooting a Windows computer. It can be a bit overwhelming when you look at it the first time. I will break it down here for you briefly.

Reliability Monitor

Reliability Monitor

Blue line across the top (top wow): This is your system stability index. It is basically a scoring system based on how often your computer experiences failures. The scoring system ranges from 1 to 10. The more often your computer fails, the lower your score. The longer you go without a system or application failure, the higher your score will be.

Application failures: Every time you have an application failure, which can be an application crashing or hanging, it will show a red "x" in that column.

Windows failures: This column will get a red "x" when you have a BSOD.

Miscellaneous failures: These are when the system unexpectedly loses power. The power button might force a shutdown or possibly the battery could run completely out.

Warnings: These do not impact your stability score but provide good information. They will show when an application installation/removal, Windows Update, or driver update was unsuccessful.

Information: This column will get a blue "i" when there are system changes you should be aware of. Driver installation, Windows Updates, and software installations will all appear in this column. This information can be very handy when troubleshooting what caused a failure.

Clicking on any of the columns will give you more detailed information about the abovementioned events. This information is great. The problem is that I cannot remotely log in to every computer in an enterprise environment and check every one of these PCs. I could try to figure out the events that trigger these reliability records and pull them in, but I would have to recreate the scoring system.

The solution? Read on.

win32_ReliabilityStabilityMetrics, win32_ReliabilityRecords ^

A couple of WMI classes store all the scorings and records discussed above. You will want to collect the following properties from the WMI classes:

Win32_ReliabilityStabilityMetrics

  • TimeGenerated: The system calculates the stability index score every hour the computer is on and will record the associated timestamp in this property
  • SystemStabilityIndex: This is the stability score index calculated

Win32_ReliabilityRecords

  • EventIdentifier: The ID for the event in the Windows Event Log
  • Message: The body of the Windows event associated with the failure or change
  • ProductName: The product name or executable associated with the failure
  • SourceName: This designates what type of event we are looking at and will always be one of the following:
    • Application Error: Application stops responding and crashes
    • Application Hang: Application stops responding but recovers
    • Application-Add-On-Event-Provider: Add-ons were enabled for Internet Explorer
    • EventLog: The only event I have seen from this source is "The system was shut down unexpectedly"
    • Microsoft-Windows-Setup: Occurs when Windows is first installed
    • Microsoft-Windows-StartupRepair: Windows failed to boot and a startup repair was attempted
    • Microsoft-Windows-UserPnp: Driver-related events
    • Microsoft-Windows-WER-SystemErrorReporting: Blue screen of death
    • Microsoft-Windows-WindowsUpdateClient: Windows Updates
    • MsiInstaller: Application installations and removals
  • TimeGenerated: See above
  • User: The user account active during the event

Enterprise client management systems such as Microsoft's System Center Configuration Manager (SCCM) or Symantec's Symantec Management Platform can inventory these classes. However, if such tools are not available in your environment, you can use PowerShell, a CSV file on a network share, and PowerBI Desktop to collect and analyze the data. You could easily adapt the reporting process to use a database as a source in place of the CSV.

Collecting system stability data with PowerShell ^

You can use the simple script below to collect data from a list of computers over the last 30 days.

$30DaysAgo = (Get-Date).AddDays(-30)

$Computers = @("Computer1","Computer2","Computer3")

$ReliabilityStabilityMetrics = Get-CimInstance -ClassName win32_reliabilitystabilitymetrics -filter "TimeGenerated > '$30DaysAgo'" -ComputerName $Computers | Select-Object PSComputerName, SystemStabilityIndex, TimeGenerated

$ReliabilityRecords = Get-CimInstance -ClassName win32_reliabilityRecords -filter "TimeGenerated > '$30DaysAgo'" -ComputerName $Computers | Select-Object PSComputerName, EventIdentifier, LogFile, Message, ProductName, RecordNumber, SourceName, TimeGenerated

$ReliabilityStabilityMetrics | Export-CSV $env:USERPROFILE\Documents\ReliabilityStabilityMetrics.csv -Encoding ASCII ‑NoTypeInformation
$ReliabilityRecords | Export-CSV $env:USERPROFILE\Documents\ReliabilityRecords.csv ‑Encoding ASCII -NoTypeInformation

The script uses the Get-CimInstance cmdlet to query the WMI classes remotely on the computers stored in an array. It then exports the reliability records and the stability metrics into a CSV file in the Documents folder.

Building the PowerBI Report ^

Next, you have to download PowerBI Desktop, the report builder portion of the PowerBI product. You can use this for free without registering for an account. After installing it, you can then import both CSVs created from your PowerShell commands using Get Data in PowerBI Desktop.

Now you can start creating charts based on the data you have collected. I will walk you through creating a couple of easier ones I have found useful. The procedure becomes even more useful when you relate this data to hardware and operating system inventory information.

Average of System Stability Index

I use the System Stability Index to spot major drops in stability in the environment and watch it to make sure "fixes" pushed to clients are making a difference.

Average of System Stability

Average of System Stability

Trending Events

Trending Events are useful for correlating changes to failures.

Trending Events

Trending Events

With just three PCs it's hard to tell, but I will show you an example from the dashboard I have built for our environment. We recently rolled out an update to SnagIt that is causing issues. The correlation of failures to the installation events from MSIInstaller are pretty clearly visible.

Trending SnagIt issue root cause

Trending SnagIt issue root cause

We also use this indicator to rule out updates as a possible cause of crashing. The chart shows that Excel was crashing just as often before the latest update installation events as it was after.

Subscribe to 4sysops newsletter!

Trending issue Excel root cause

Trending issue Excel root cause

Conclusion ^

It is important keep an eye on reliability and stability indicators to fix problems before end users start reporting them. With the help of PowerShell, you can get quickly get an overview of the troubles that are building up in your network.

avataravataravataravatar