In my previous post, I covered Amazon’s content distribution. Today, I will discuss the Analytics services Elastic MapReduce (EMR), Redshift, and Kinesis, which are used by businesses for extracting important information from huge data sets.

Michael Pietroforte

Michael Pietroforte is the founder and editor of 4sysops. He is a Microsoft Most Valuable Professional (MVP) with more than 30 years of experience in IT management and system administration.

Elastic MapReduce (EMR) ^

Elastic MapReduce (EMR) is a service that is based on the open source framework Apache Hadoop and allows you to process all kinds of large data in Amazon’s cloud. The Amazon EMR website lists a few examples: log analysis, web indexing, data warehousing, machine learning, financial analysis, scientific simulation, and bioinformatics.

Elastic MapReduce

Elastic MapReduce

The Getting Started guide demonstrates how to import data from Twitter to analyze how often the word “Kindle” was used in a positive or negative way in tweets. (Not surprisingly, the vast majority of tweets in the sample have been positive: 479 to 13).

Amazon EMR stores the data on S3 (in petabytes, if necessary), and the data analysis is done in EMR pools on EC2. You can manage the service through the AWS Management Console. Developers can write EMR programs in a variety of popular programming languages.

Redshift ^

Redshift is Amazon’s data warehouse service that allows you to analyze vast amounts of structured data. The main difference from EMR is that it supports SQL, and you can therefore use existing business intelligence (BI) tools. It is possible to process unstructured data in EMR and bring it to Redshift for further analysis with your BI tools. Redshift is also the better choice for long-term data storage.

Redshift

Redshift

Data warehousing is an old discipline. What’s new is the amount of data that has to be processed. Conventional data warehouse solutions allow you to analyze data on multiple machines, but only the cloud has the resources for really big data. However, Redshift is also interesting for small organizations who can’t afford expensive, on-premises DW solutions.

The biggest advantage, probably, is that you don’t have to manage the cluster yourself, and you can just focus on data analysis. In addition, you have the typical pay-as-you-go pricing and can leverage the elasticity of the cloud for growing amounts of data.

Kinesis ^

Amazon just launched Kinesis, another data analytics service, in November of 2013. The main difference from EMR and Redshift is that Kinesis is for real-time processing.

Kinesis

Kinesis

 You can pull large data streams into Kinesis, analyze the data, and store it in S3 or DynamoDB. It is also possible to emit the data for further analysis to EMR or Redshift. However, the main point of Kinesis is that you can react in real time to certain events that are hidden in large amounts of data.

A typical example is real-time log file analysis of hundreds or thousands of servers to filter and emit important data to a dashboard or to trigger alerts. Another example is clickstream analysis of large websites to dynamically change advertising strategies.

This post concludes my AWS series. Please let me know how you liked it. I hope you now have a basic understanding of Amazon’s cloud. Rest assured that this won’t be the last time I blog about this fascinating new world for IT pros.

Win the monthly 4sysops member prize for IT pros

Share
1+

0 Comments

Leave a reply

Your email address will not be published. Required fields are marked *

*

CONTACT US

Please ask IT administration questions in the forum. Any other messages are welcome.

Sending
© 4sysops 2006 - 2018

Log in with your credentials

or    

Forgot your details?

Create Account