- Azure Recovery Services vault: Ironing out the confusion - Fri, Jan 20 2023
- Regex in VSCode - Tue, Jan 17 2023
- Migrate a SQL Server Database to Azure SQL Database - Wed, Jan 11 2023
You're familiar with the shared responsibility model of cloud computing, correct? This means that the cloud service provider, for instance Microsoft Azure, provides the "on tap" physical infrastructure and is responsible for its availability and security. You, the customer, are responsible for the services and data you consume in the cloud provider's environment.
Stated another way, the shared responsibility model says that Microsoft Azure is responsible for the security of their cloud and that our responsibility is security inside their cloud.
To uphold their responsibilities, Microsoft needs to perform planned maintenance on its Azure infrastructure. How can you stay ahead of these events and plan against a possible service outage?
In contrast, how can you quickly determine whether an outage you experience in your Azure subscription is part of your or Microsoft's side of the shared cloud responsibility model?
If you've had those questions, then you're in the right place. Let me give you a tour of the three Microsoft health services that provide you those insights.
Azure Status dashboard
The Azure Status dashboard (https://status.azure.com) is a public webpage that enables you to review service availability across all Azure regions. Because Azure comprises nearly 200 services, this dashboard can be cumbersome to navigate. I recommend pressing CTRL+F and searching for the service for which you need status information.
Azure Status dashboard
If you're an old dog like me, you'll appreciate that you can subscribe to the Azure Status page by using Really Simple Syndication (RSS). In my opinion, the main advantage of the Azure Status dashboard is its easy accessibility. Its main disadvantage is that its display isn't specific to the particular Azure regions you're using. That's where Azure Service Health comes in.
Azure Service Health
Azure Service Health is a personalized Azure status dashboard accessible from within the Azure portal.
What I love about the Service issues blade (shown in the previous screenshot) is that it represents a filtered status view only of the regions in which you've actually deployed Azure services. You want to see "No service issues found" whenever you visit this page. Note that you can click Health history to view historical advisories that affected your resources.
The Planned maintenance blade, shown in the following screenshot, gives you notice of any Microsoft-side maintenance scheduled that could affect your resources. Note that you can download an incident summary as a PDF file for inclusion in your organizational issue-tracking system.
Health advisories and Security advisories provide filtered views of Azure bulletins that affect your service health and security hygiene, respectively. For instance, the next screenshot shows a recent security advisory I (and many, many other customers) received regarding Cosmos DB.
The Service Health dashboard is also a convenient place to review whether Microsoft violated its Azure service-level agreements (SLAs) with you as a result of a planned or unplanned outage. You'll then be provided with contact information and an issue number to track.
Service Health alerts
On each Active events blade, you'll see a toolbar button to define a corresponding alert. Please don't forget about Azure alerts. By setting a Service Health alert, for example, you and your team no longer have to rely upon good old human memory to remember to check the Service Health dashboard periodically.
Azure alerts are tied to action groups, which enable notifications (email, SMS, push, and voice) and code execution (webhook, function, logic app, or automation runbook). Azure alerts and action groups are powerful tools to have in your Azure governance arsenal.
Azure Resource Health
The most granular Azure status tool is called Resource Health, which you can find in the Support + Troubleshooting settings section for individual Azure resources.
The idea with Resource Health is that you can spot (and alert on) times when the Azure management backplane is unable to communicate with your resource for whatever reason.
Azure-side events and interruptions are called "platform events." In contrast, your own (mis)configurations can result in the resource losing its connectivity to Azure; these "non-platform events" are recorded here as well.
Azure Resource Health also lists events that degrade your resource performance even if there has not been a complete outage.
Something I haven't mentioned explicitly thus far, but I think is important, is that Azure Service Health and Resource Health both provide remediation advice in addition to outlining the issue's root cause analysis (RCA) results.
Azure Resource Graph
You can query Azure Service Health and Resource Health data by using the Azure Resource Graph. Resource Graph is a fully managed performant database of all your Azure subscription resources. For instance, when you search in the Azure portal, the results you see are provided to you by the Azure Resource Graph.
Azure Resource Graph Explorer is a Resource Graph client available to you in the Azure portal. Use Kusto Query Language (KQL) to define your queries. As an example, the following screenshot answers the question, "What is the current availability state of my Azure virtual machines?"
Takeaways
As always, I'd like to leave you with a number of hand-selected learning resources. I hope you now have a better grasp of how to stay on top of platform- and non-platform-related changes throughout your Azure infrastructure.
Subscribe to 4sysops newsletter!
- Azure Resource Health docs
- Azure Service Health docs
- Azure Resource Graph docs
- Creating Azure Service Health alerts
- Kusto Query Language reference
- Azure Health Resource Graph sample queries