- Building a web server with PowerShell - Fri, Jul 21 2017
- Managing MSI installations using the Windows Installer PowerShell module - Thu, Jun 22 2017
- Validating file and folder paths in PowerShell parameters - Fri, Jun 16 2017
With the little PowerShell script discussed below, you can remotely retrieve reliability information and visualize the data with the free PowerBI Desktop tool.
However, before I detail the PowerShell solution, let's look at the method that admins typically use when they want to monitor the stability of a Windows computer. This will help us to understand what kind of reliability data is available.
Reliability Monitor
Reliability Monitor is a handy little tool built into Windows since Vista. The tool contains a whole lot of helpful information when it comes to troubleshooting a Windows computer. It can be a bit overwhelming when you look at it the first time. I will break it down here for you briefly.
Blue line across the top (top wow): This is your system stability index. It is basically a scoring system based on how often your computer experiences failures. The scoring system ranges from 1 to 10. The more often your computer fails, the lower your score. The longer you go without a system or application failure, the higher your score will be.
Application failures: Every time you have an application failure, which can be an application crashing or hanging, it will show a red "x" in that column.
Windows failures: This column will get a red "x" when you have a BSOD.
Miscellaneous failures: These are when the system unexpectedly loses power. The power button might force a shutdown or possibly the battery could run completely out.
Warnings: These do not impact your stability score but provide good information. They will show when an application installation/removal, Windows Update, or driver update was unsuccessful.
Information: This column will get a blue "i" when there are system changes you should be aware of. Driver installation, Windows Updates, and software installations will all appear in this column. This information can be very handy when troubleshooting what caused a failure.
Clicking on any of the columns will give you more detailed information about the abovementioned events. This information is great. The problem is that I cannot remotely log in to every computer in an enterprise environment and check every one of these PCs. I could try to figure out the events that trigger these reliability records and pull them in, but I would have to recreate the scoring system.
The solution? Read on.
win32_ReliabilityStabilityMetrics, win32_ReliabilityRecords
A couple of WMI classes store all the scorings and records discussed above. You will want to collect the following properties from the WMI classes:
Win32_ReliabilityStabilityMetrics
- TimeGenerated: The system calculates the stability index score every hour the computer is on and will record the associated timestamp in this property
- SystemStabilityIndex: This is the stability score index calculated
- EventIdentifier: The ID for the event in the Windows Event Log
- Message: The body of the Windows event associated with the failure or change
- ProductName: The product name or executable associated with the failure
- SourceName: This designates what type of event we are looking at and will always be one of the following:
- Application Error: Application stops responding and crashes
- Application Hang: Application stops responding but recovers
- Application-Add-On-Event-Provider: Add-ons were enabled for Internet Explorer
- EventLog: The only event I have seen from this source is "The system was shut down unexpectedly"
- Microsoft-Windows-Setup: Occurs when Windows is first installed
- Microsoft-Windows-StartupRepair: Windows failed to boot and a startup repair was attempted
- Microsoft-Windows-UserPnp: Driver-related events
- Microsoft-Windows-WER-SystemErrorReporting: Blue screen of death
- Microsoft-Windows-WindowsUpdateClient: Windows Updates
- MsiInstaller: Application installations and removals
- TimeGenerated: See above
- User: The user account active during the event
Enterprise client management systems such as Microsoft's System Center Configuration Manager (SCCM) or Symantec's Symantec Management Platform can inventory these classes. However, if such tools are not available in your environment, you can use PowerShell, a CSV file on a network share, and PowerBI Desktop to collect and analyze the data. You could easily adapt the reporting process to use a database as a source in place of the CSV.
Collecting system stability data with PowerShell
You can use the simple script below to collect data from a list of computers over the last 30 days.
$30DaysAgo = (Get-Date).AddDays(-30) $Computers = @("Computer1","Computer2","Computer3") $ReliabilityStabilityMetrics = Get-CimInstance -ClassName win32_reliabilitystabilitymetrics -filter "TimeGenerated > '$30DaysAgo'" -ComputerName $Computers | Select-Object PSComputerName, SystemStabilityIndex, TimeGenerated $ReliabilityRecords = Get-CimInstance -ClassName win32_reliabilityRecords -filter "TimeGenerated > '$30DaysAgo'" -ComputerName $Computers | Select-Object PSComputerName, EventIdentifier, LogFile, Message, ProductName, RecordNumber, SourceName, TimeGenerated $ReliabilityStabilityMetrics | Export-CSV $env:USERPROFILE\Documents\ReliabilityStabilityMetrics.csv -Encoding ASCII ‑NoTypeInformation $ReliabilityRecords | Export-CSV $env:USERPROFILE\Documents\ReliabilityRecords.csv ‑Encoding ASCII -NoTypeInformation
The script uses the Get-CimInstance cmdlet to query the WMI classes remotely on the computers stored in an array. It then exports the reliability records and the stability metrics into a CSV file in the Documents folder.
Building the PowerBI Report
Next, you have to download PowerBI Desktop, the report builder portion of the PowerBI product. You can use this for free without registering for an account. After installing it, you can then import both CSVs created from your PowerShell commands using Get Data in PowerBI Desktop.
Now you can start creating charts based on the data you have collected. I will walk you through creating a couple of easier ones I have found useful. The procedure becomes even more useful when you relate this data to hardware and operating system inventory information.
Average of System Stability Index
I use the System Stability Index to spot major drops in stability in the environment and watch it to make sure "fixes" pushed to clients are making a difference.
Trending Events
Trending Events are useful for correlating changes to failures.
With just three PCs it's hard to tell, but I will show you an example from the dashboard I have built for our environment. We recently rolled out an update to SnagIt that is causing issues. The correlation of failures to the installation events from MSIInstaller are pretty clearly visible.
We also use this indicator to rule out updates as a possible cause of crashing. The chart shows that Excel was crashing just as often before the latest update installation events as it was after.
Subscribe to 4sysops newsletter!
Conclusion
It is important keep an eye on reliability and stability indicators to fix problems before end users start reporting them. With the help of PowerShell, you can get quickly get an overview of the troubles that are building up in your network.
Great article…can’t wait to give this a try and be more proactive!
Thanks Matt! I’m really curious about what reliability score other people are averaging at. We seem to be between 7.5 and 8.
YouTube the heck out of this. Trying to use it myself now and it is working really well.
Thanks for the comment William. Glad it’s working out well for you! I have not really done much on YouTube but I’m curious to know what else you would like to see.
I haven’t gotten this to work the way you did but man is this an awesome script and BI is really cool to use. I can’t wait to learn more on how to get it working in the same manner.
Hello Micah,
Excellent article, very informative. Thank you for writing this. Do you have thoughts how does this scale? Would it scale to around 200 machines or more?
Thank you.
CMI
Hey CMI,
Great question! It will definitely scale. The queries are pretty low impact on the clients they are requesting info from and the reporting of course is pretty simple data.
We moved to using our client management solution Symantec’s SMP to collecting the information from the WMI classes because we needed to scale it globally from 5k clients and collect info when off the corporate network. SCCM would also let you inventory the two WMI classes you need.
As long as the machines are pingable from the computer you are running this script from it should scale fine to at least 200.
@(‘Computer1′,’Computer2′,’Computer3’) is an array, not a hash table.
Other than being nick picky about that, I want to say thanks for sharing this.
I think this was my bad. I corrected the article. Thanks for the hint.
Nice article. I have to try it.
This is fabulous!!
My various W10 machines only seem to collect 30 days worth of data but my older OS machines collect a year of data. Is that configurable?
Hi Aries,
Glad you liked it! I have not been able to find any documentation on how to change how long it keeps that data, if you find it let me know though!
We ended up just working with the 30 days of data to monitor existing issues more than trend over-time. If management gets more interested the plan was to begin archiving the data off somewhere so we could trace overtime better.
For me to get this working i had to change the first line. else the time format would not work with the format from my logs
so first converting it to datetime to make the getdate know how the conversion should work, and the string is needed to have the non supported dash time format that was needed.
[datetime]$30DaysAgo = (Get-Date).AddDays(-30)
[String]$30DaysAgo = $30DaysAgo.tostring(“dd-MM-yyyy HH:mm:ss”)
Can you provide the Power Bi file for this?
Hello Micah,
Very creative! I was really impressed. I’ll second the request for providing the PowerBI file.
Thanks much,
Bill
Great Article!
I like the idea of being able to dump to .CVS files and look at it later. I’ve implemented that portion into our helpdesk ticket submission system so that whenever a ticket is sent in it also sends in these logs so we can do some investigative work (if necessary) without further requirements to connect to the PC.
I’m curious on the BI Report you made, as others have mentioned, would you be willing to share your BI Report file with us? Thanks!
Albert
Hey My friend, thanks for the excellent article, could u send the PowerBi Files for me?
feliperocp@yahoo.com.br
is there a walk through video on this ?
Another vote for more info on the PowerBI assistance.
I got this error when run that script. Please help
I got this error when run that script. Please help
Export-Csv : Cannot bind parameter ‘Delimiter’. Cannot convert value “NoTypeInformation” to type “System.Char”. Error: “String must be exactly one
character long
Please help. I got below error when run script
Export-Csv : Cannot bind parameter ‘Delimiter’. Cannot convert value “Encoding” to type “System.Char”. Error: “String must be exactly one character
long.”
At D:\Myself\PSTools\CollectingSystemStability&ProblemOri.ps1:5 char:102
+ … ERPROFILE\Documents\ReliabilityStabilityMetrics.csv ‑Encoding ASCII – …
+ ~~~~~~~~~
+ CategoryInfo : InvalidArgument: (:) [Export-Csv], ParameterBindingException
+ FullyQualifiedErrorId : CannotConvertArgumentNoMessage,Microsoft.PowerShell.Commands.ExportCsvCommand
Export-Csv : Cannot bind parameter ‘Delimiter’. Cannot convert value “Encoding” to type “System.Char”. Error: “String must be exactly one character
long.”
At D:\Myself\PSTools\CollectingSystemStability&ProblemOri.ps1:6 char:84
+ … V $env:USERPROFILE\Documents\ReliabilityRecords.csv ‑Encoding ASCII – …
+ ~~~~~~~~~
+ CategoryInfo : InvalidArgument: (:) [Export-Csv], ParameterBindingException
+ FullyQualifiedErrorId : CannotConvertArgumentNoMessage,Microsoft.PowerShell.Commands.ExportCsvCommand