- Use Azure Bastion as a jump host for RDP and SSH - Tue, Apr 18 2023
- Azure Virtual Desktop: Getting started - Fri, Apr 14 2023
- Understanding Azure service accounts - Fri, Mar 31 2023
BigPanda then automatically opens tickets or sends notifications to your collaboration tools of choice. BigPanda claims average correlation rates of 90 percent or better. Let’s learn more about their software-as-a-service (SaaS) tool.
Have you, as a Windows systems administrator, grown “numb” to the flood of IT service alerts you receive every day in your inboxes? How do you separate the proverbial wheat from the chaff to figure out root cause? What kind of metrics do you have to track mean time to resolution (MTTR) in your IT service organization?
Nowadays, we’re in the throes of the DevOps/continuous integration evolution. This means that with new code and features shipping daily, you have even more application and service alerts to aggregate, triage, and resolve. It’s a daunting task, no doubt.
Get started with BigPanda
Visit the BigPanda website to register for a free trial account. The trial provides 21 days of full access, with no credit card required. BigPanda is a SaaS app hosted in the Amazon Web Services (AWS) cloud. Go ahead and log in to your personal BigPanda portal, and we’ll proceed.
Not surprisingly, BigPanda built integrations with most of the major monitoring, deployment, and collaboration platforms, including (but not limited to) the following:
- AWS CloudWatch
If BigPanda doesn’t have an integration for your specific product, don’t worry—they have a full Representational State Transfer (REST) application programming interface (API) both for alerts and deployments; these allow you to write your own custom integrations.
Some BigPanda integrations involve agent software, some don’t. What usually happens is that you wind up defining a user account for BigPanda in your monitoring software, and it forwards the messaging traffic to the BigPanda cloud.
I already have Datadog monitoring enabled on my laptop (in fact, I wrote a blog post about Datadog), so it was a no-brainer for me to configure that integration. Here’s what I needed to do in BigPanda:
- Create an app key for my monitoring software (just a button click to do this).
- Define a webhook notification to BigPanda in the Datadog web portal.
- Specify the BigPanda webhook as a monitor target in Datadog.
You can view and manage your integrations through the Integrations tab in the BigPanda dashboard, as shown below:
Automated alert correlation
Once you have your monitoring software reporting its alerts and notifications to BigPanda, simply sit back and let the BigPanda correlation engine do its work. You’ll be able to view your automatically correlated incidents through the BigPanda dashboard.
The BigPanda platform correlates alerts along a variety of parameters, including:
- Topology: the host, host group, service, application, cloud, or other infrastructure element that emits the alerts. Alerts are more likely to be related when they come from the same area in your infrastructure.
- Time: the rate at which the related alerts occur. Alerts occuring around the same time are more likely to be related than alerts occuring far apart.
- Context: the check types of the alerts. Some alert types imply at relationship between them, while others don't.
As you can see in the following screenshot, incoming incidents fall into four categories:
- Active: Incidents that are unsnoozed.
- Unhandled: Incidents that are active but not shared or snoozed.
- Shared: Active and shared with other users (for instance, through a HipChat, JIRA, or Slack integration).
- Snoozed: Incidents that were snoozed. When the snooze period elapses, the incident is moved to Active status.
- Resolved (24h): Incidents that were marked as resolved within the past 24 hours.
It’s important to understand that BigPanda doesn’t simply summarize and triage your incoming alerts. It does that, yes – but an incident is actually a rollup of all correlated alert and notification messages.
This means that as an administrator, you can drill into the individual alerts related to each incident to see the full context. As you can see in the following screenshot, clicking an incident gives you various administrative options:
- A: Examine all the individual alerts that were rolled up into this incident.
- B: Leave comments on the incident.
- C: View the incident life cycle in summary format.
- D: Visualize the incident life cycle in a time chart view.
- E: Snooze the incident; this “freezes” a lower-priority incident for a predefined period.
- F: Share the incident via SMS, e-mail, or an integration like PagerDuty.
The BigPanda incident activity feed. Image credit: BigPanda.
BigPanda automatically creates tickets and notifications based on clustered IT alerts, and keeps those tickets updated in real-time. The AutoShare feature allows you to define rules to instantly share incident notifications with recipients via SMS, e-mail, or through an integration with tools like Slack, Jira, or ServiceNow. The two-way synchronization between BigPanda and your ticketing platform ensures that you always have access to the latest status updates and actions. Tickets are enriched with supporting information, such as runbooks, metrics, CMDB information and related incidents. You can manage your AutoShare rules from the AutoShare page as shown below:
Okay. So, thus far, we’ve seen how BigPanda correlates incoming alerts from your monitoring tools and presents incidents in an easily actionable format. Cool. Now, let’s take a look at Analytics feature, which allows administrators to gain a more holistic view of incidents.
For example, IT shops that use frameworks such as IT Infrastructure Library (ITIL) or another IT Service Management (ITSM) system are concerned with metrics. What are your top issues? Or your average mean time to resolution?
The BigPanda analytics dashboard allows you to identify key trends – such as top misbehaving applications, checks or hosts – and create custom views to drill down by various criteria such as location, microservice, application, customer, team, etc. You can also track mean time to resolution (MTTR) – meaning that you can measure the performance of teams, applications, or geography. The reports are customizable, and can be shared as a snapshot or hyperlink.
The analytics dashboard can also be helpful in identifying flapping. Flapping occurs when a monitored object repeatedly changes state because of a configuration problem. By default, a BigPanda incident is switched to a flapping state when one or more of its correlated alerts changes state more than four times in one hour.
Like most SaaS tools, BigPanda is licensed either monthly or annually. You should check out their pricing page for details, but I’ll give you a nutshell summary of their three subscription plans, current as of May 2016:
Subscribe to 4sysops newsletter!
- Standard: Two users; one connected system (integrations); sharing via e-mail, HipChat, Slack, and PagerDuty; $449/month (billed annually).
- Pro: Four users; two connected systems; higher level of support; $829/month (billed annually).
- Enterprise: Get a custom quote to suit your organization’s needs; includes full reporting functionality.
You can contact BigPanda directly at firstname.lastname@example.org or 1-888-256-1244.
Want to write for 4sysops? We are looking for new authors.