- Pip install Boto3 - Thu, Mar 24 2022
- Install Boto3 (AWS SDK for Python) in Visual Studio Code (VS Code) on Windows - Wed, Feb 23 2022
- Automatically mount an NVMe EBS volume in an EC2 Linux instance using fstab - Mon, Feb 21 2022
Database services are an essential part of many applications. Not surprisingly, Amazon offers a rich set of powerful, cloud-based database services. Whereas some of these services have their counterpart in conventional on-premises IT, some have been specifically designed for the cloud.
Relational Database Service (RDS)
Amazon’s Relational Database Service (RDS) allows you to run MySQL, Microsoft SQL Server, and Oracle in the cloud. RDS runs on special EC2 instance types and supports multi-instance deployments with master/slave arrangements. Perhaps the biggest advantage over on-premises deployments of relational database systems is that you can seamlessly increase the amount of storage. RDS uses EBS volumes for the database and the log storage and creates automated backups on S3 of database snapshots and transaction logs for point-in-time recovery.
Amazon RDS - SQL Server
DynamoDB is Amazon’s NoSQL database service. NoSQL database systems find use in real-time web applications and for very large and complex data sets. As a key-value store system, DynamoDB is schema-less in the sense that the data items in a table don’t require the same attributes, which makes it very flexible if a new type of data has to be stored in the database. Data records are accessible through a primary key, and each record can have different fields.
Another major advantage of DynamoDB over an RDBMS is that it can span hundreds or thousands of servers and is therefore perfectly suitable for the cloud. It is even possible to run a DynamoDB store on multiple Availability Zones (data centers). Thus, if one of Amazon’s data centers goes down, your database system is still online. Furthermore, performance levels can be changed dynamically, the amount of storage can be increased and decreased easily, the storage is highly redundant, and a DynamoDB table can be resized without downtime.
The main downside of a NoSQL database is that, nowadays, most applications use relational database systems. Thus, DynamoDB is often used to build new applications for the cloud. Amazon provides SDKs for Java, .NET, Python, and PHP and offers a REST/SOAP API.
Like DynamoDB, SimpleDB is a scalable, non-relational data store. Each SimpleDB domain only supports 10GB of data, and you can spread data across multiple domains.
SimpleDB was introduced in 2007 and is still in beta. Some people believe it is only for educational purposes and, since it has disappeared recently from the AWS product page, it probably has no future.
ElastiCache is a Redis and Memcached protocol-compliant service that allows you to cache the results of database queries in RAM to improve the performance of your application. Memcached is a popular open-source memory caching system that is used by many large websites, such as Twitter and Wikipedia. The support in ElastiCache for Redis in ElastiCache is relatively new.
You can launch a Cache Cluster in the AWS Management Console, where you have to configure the node type and the number of nodes. The node type determines the amount of memory on each node; for the number of nodes, you configure the total amount of memory your cache will have. Once your Cache Cluster is running, you can access the cache through a so-called endpoint identifier (an Internet domain name).
ElastiCache - Launch Cache Cluster
Developers can integrate caching in their applications, and many content management systems support Memcached. For instance, you can use the W3 Total Cache plugin for WordPress to use ElastiCache to improve the performance of your website.
As with many cloud services, the advantage of ElastiCache is that you don’t have to install and manage Memcached yourself and, when your application grows, you can easily add memory with a few clicks. Other nice features of ElastiCache are that failed cached nodes are automatically replaced, and you can use CloudWatch to monitor performance metrics.
Data Pipeline is one of these innovative Amazon services that are hard to map onto conventional on-premises IT tools. Data Pipeline allows you to perform various activities according to schedules in order to transfer data between so-called data nodes, which can be AWS services and on-premises data sources. For instance, with CopyActivity, you can copy data from an on-premises MySQL database to RDS or export a DynamoDB database to S3.
AWS Data Pipeline
A Pipeline definition can include preconditions (for instance, the existence of an input file) that are required before the processing starts. Once the Pipeline completes, you receive a message through the Amazon Simple Notification Service (SNS), which I will discuss in a later post. You can manage Data Pipelines through the AWS Management Console, an API, or the Command Line Interface.
Data Pipeline CopyActivity
In the next part of my AWS series, I will write about the tools that Amazon provides to manage large-scale cloud deployments: Auto Scaling, Elastic Load Balancing (ELB), CloudFormation, OpsWorks, and Identity and Access Management (IAM).