Latest posts by Timothy Warner (see all)
- XIA Configuration - Easy network inventory and documentation solution - Wed, Nov 29 2017
- Backup AWS EC2 instances with NAKIVO Backup & Replication - Mon, Nov 27 2017
- Move an Azure VM to a different virtual network (vNet) - Fri, Nov 24 2017
Don't get me wrong—I'm not putting down data! After all, without quantitative metrics, reports, and alerts, your business would stand a much lower likelihood of success in the medium to long-term.
The advent of cloud computing makes our work as systems administrators slightly more complicated, because we have not only our on-premises servers to manage, but also platform-as-a-service (PaaS) web apps and/or Infrastructure-as-a-Service (IaaS) virtual machines running in the public cloud.
Nowadays, the majority of my fellow systems administrators have one or more of the following technologies as part of their daily work lives:
Clicking any of the above hyperlinks takes you not to the vendor's website, but to their corresponding Datadog integration page. Today I'd like to show you how you can use Datadog to report and alert on key metrics for your on-prem and cloud-based services. Let's begin!
Getting started with Datadog ^
Point your Web browser to the Datadog web site and sign up for the free 14-day trial. You don't need to provide Datadog with a credit card, and you can monitor up to 5 hosts during the evaluation period.
Datadog initial configuration involves only one step: deploying the agent software to your managed hosts. As you can see in the following screenshot, Datadog can latch onto any major server OS in the world today: Windows Server, Apple OS X, Linux distros—you name it. Each agent has its own automated and/or manual installation process. My screenshot below shows the instructions for installing the Mac agent:
The Windows Server agent uses the common .msi file format, so you could conceivably deploy the agent by using Group Policy, Windows PowerShell Desired State Configuration, or another configuration management solution.
One thing that all your agents have in common is your personal Datadog application programming interface (API) key. This unique value is what ties the Datadog agent to your Datadog user account in the cloud.
You'll find that installed agents begin reporting to Datadog almost immediately.
I was pleased to see that Datadog published their agent directly to the Microsoft Azure extension gallery. Check out the following screenshot that shows me integrating Datadog into a new cloud VM I deployed:
The Datadog agent user interface varies somewhat, depending on your host OS platform. Let me show you the Windows version; it's pretty "bread and butter" in my humble opinion:
According to the (open source!) Datadog Agent documentation, all traffic is initiated by the agent—never by Datadog—and it runs over SSL on the traditional TCP port 443.
Working with dashboards ^
The Datadog Web console consists of the following global navigation sections:
- Events: Works similarly to your Facebook "wall" or LinkedIn newsfeed. View events raised by your managed hosts and optionally communicate with your teammates directly in Datadog.
- Dashboards: TimeBoards are "rollups" of key metrics and events, formatted in a way that facilitates correlation and troubleshooting. ScreenBoards also provide at-a-glance data, but the display is optimized for large system status displays in your network operations center (NOC).
- Infrastructure: View and check the status of your managed hosts in list or map form.
- Monitors: Set metric thresholds and receive alerts when your hosts exceed them.
- Metrics: Dive into the specific performance counters available for your managed hosts and integrations.
- Integrations: Learn how to obtain Datadog metrics from your line-of-business (LOB) operating systems and applications, and services.
The integrations are particularly cool, if for no other reason than the fact that there are so dadgum many of them! Datadog exposes their REST/JSON API to the community, so you and your developers can pull metrics from your apps that aren't currently included in a built-in integration.
The following screenshot shows the instruction page for adding support for Microsoft Azure-based virtual machines. As of April 2016, the Azure integration is in beta status, and most metrics are available only for VMs deployed with Azure's Service Management (ASM) model instead of the current Azure Resource Manager (ARM) method.
You'll have to perform a few light configuration tasks to integrate services with Datadog.
Navigate to Infrastructure > Infrastructure List to see your hosts and access a quick update with regard to their status. As you can see below, my test environment consists of the following servers and services:
- Azure-based Windows Server 2012 R2 host running SQL Server 2014
- On-prem Apple OS X Server host
- Azure-based Windows Server 2012 R2 host running IIS and SQL Server 2014
Of course, these dashboards are fully interactive, so clicking any of the app links opens a dedicated sub-dashboard. Below, let me show you the SQL Server integration metrics on my sqlbox704 host:
Of course, when monitoring a database, you'd expect to see metrics concerning elements such as locks, waits, and memory buffers—and that's exactly what Datadog includes in its SQL Server integration.
Datadog allows you to add taxonomic tags to your assets as well; this feature increases discoverability.
Although customizing and creating new dashboards is a topic beyond our scope today, I do want to tell you about overlaying events for correlation purposes. For example, it might be nice to correlate say, high CPU consumption with a particular service or series of Web requests.
As shown in the following screenshot, you (1) clone your target host's existing dashboard, which gives you the ability to overlay events; (2) write a search query to find your target event (the syntax can be a bit wonky; see the documentation); and (3) customize and review any on-screen metric graphs; and finally, (4) profit!
Setting alerts ^
To create an alert trigger, navigate to Monitors > New Monitor and complete the form. Here are the four actions you'll need to complete:
- Define the metric: In my screenshot, I wrote a monitor that alerts on high CPU utilization.
- Set alert conditions: Specify your warning and alert threshold values.
- Say what's happening: Write an informative event message using Markdown
- Notify your team: You can use e-mail notification or notification within your Datadog portal, or integrate messaging with third-party services like Slack and HipChat.
Finishing up ^
Datadog is a software-as-a-service (SaaS), or technically, an infrastructure monitoring-as-a-service (IMaaS) solution. As a result, you subscribe to Datadog on either a yearly or month-to-month basis. Here's the CliffsNotes version from their website:
- Free tier: Track up to 5 hosts; 1-day data retention; no company support
- Pro tier: Track up to 500 hosts; 13-month data retention; e-mail company support; $15/host/month (prepaid yearly)
- Enterprise tier: Track over 500 hosts; customized data retention; e-mail and telephone company support; contact Datadog for a price quote
Datadog isn't the least expensive monitoring software in the market, but if deep data analytics, wide integrations, and an open API appeal to you, then you may have just discovered the perfect RMM solution for you.