Managing Volumes of Information with Elastic Block Storage (EBS)
Elastic Block Storage (EBS) is volume-based storage that isn’t associated with any particular instance; rather, it’s attached to instances to provide additional storage. A different way to say this is that an EBS volume is independent and has a life span separate from EC2 instances. It can be attached to any instance to provide storage for that instance, but is detached from the instance when it terminates. (If you’ve ever worked with SAN storage, you’re familiar with the concept. If you haven’t worked with SAN storage, don’tworry — EBS is simple to understand.) In any case, you’ll almost certainly work with EBS, because it’s extremely useful and it addresses some significant limitations in AWS.
The network-based EBS storage service is delivered in volumes, which can beattached to an EC2 instance and used just like a disk drive. Because a volumecan become unformatted, it must have a file system installed (formatted) on it before it can be used. For example, if you want to attach an EBS volume toa Linux machine, you must first format the volume in one of the many Linux file system formats and then mount it to the instance file system, which allows the operating system to access the EBS volume and to read and write to the volume.
Because an EBS volume is network-based, it can be longer-lived than any specific instance. Consequently, an EBS volume offers persistent storage that’s safe from being lost when an instance is terminated or crashes. The most common (though certainly not only) EBS use case is the file system for a database server.
The database storage is placed on the EBS volume, which must be attached to an instance that is running the database software so that the software can read and write to the EBS-based database storage. This process is a bit more complicated than using the instance’s own storage, but it has a great virtue: By using EBS, the application owner can ensure that data isn’t subject to loss caused by instance interruption. Even if the instance crashes, the EBS volume is safe from data loss. A new instance can be started, the EBS volume can be attached to it, and the instance can begin database operations again.
The size of an EBS volume can be configured by the user and can range from1GB to 1TB. Volumes are associated with accounts and limited by default to20 per account. What if your very large database needs more than 1TB of storage? You can attach multiple EBS volumes to the instance and stripe your file systemacross the volumes. (Stripe here refers to placing portions of a file system onto multiple volumes to increase overall read and write speed, increasing performance because all the reads and writes are spread across multiple hard drives.)
EBS reliability
EBS can make your applications more reliable, because the storage is separate from any specific instance. Nomatter what happens to an instance, your data stays nice and safe. couple years older.) In any new and different product, failure inevitablyoccurs. If you’re concerned about outages, compare AWS reliability with thatof your own data center. This comparison usually helps put AWS outages intoperspective and portrays them as less alarming.
EBS scope
AWS as a whole is organized into regions, each of which contains one or more availability zones (AZs). With EBS, volumes reside in a single AZ within a particular region. When you create an EBS volume, you define which AZ tolocate (only) within a given region. any EC2 instance that needs to mount and use this EBS volume must be located within the same AZ.
EBS use
To use EBS, you simply create the volume with the help of the AWS API or (more likely) by using either the AWS Management Console or a third-party tool. As mentioned earlier in this chapter, before you can begin using the volume, you must attach it to an appropriate operating system device on arunning EC2 instance and then format it with a file system that’s appropriatefor the operating system. The volume is then ready for use. It’s already attached to a running EC2 instance as part of your prep work, and you can start using it immediately.
When you decide to terminate the EC2 instance to which you’ve attached the volume, you simply detach the volume (again, via the AWS API orManagement Console or a third-party tool you’re using). The EBS volume moves into a quiescent state, ready to be attached to a new EC2 instance whenever you choose. Actually, it’s even easier than that — AWS detaches the volume for you when you terminate an EC2 instance, although best practices suggest not relying on the automatic detachment.
Many people avoid the manual attachment/detachment effort altogetherand implement an automated approach instead, by configuring the EC2 AMI launch process to automate the EBS attachment process. (AMI refers to Amazon Machine Image, which is the format EC2 stores instances in when they are not actively running.) Alternatively, many tools (from Amazon orfrom third parties) do this work and avoid the need to implement it withinthe AMI. These tools start an AMI and then execute the API commands to attach the volume.
EBS performance
Obviously, if EBS volumes are used for important application resources, suchas databases, you may wonder whether their performance is critical. How do they rank? Typical EBS performance is around 100 IOPS (I/O operations per second) —that’s what EBS is designed for. The question is, what is the real-world performanceof EBS? Well, it depends. (You may not like that answer, but it’s true. Here’s why.) As I note earlier in this chapter, EBS is network-based storage: It’s remote from the instance that attaches to it. Therefore, all data reads and writes to the volume must pass across the AWS network — and this is where things get tricky.
Any time data must pass across a shared resource like a network, it’s subject to delays and interruptions caused by traffic from other applications. (Thisis true, by the way, of all data center environments, not just AWS.) The traditional way to deal with this issue is to create a dedicated storage network (thus the term storage-area network, or SAN). Amazon, true to its roots as a low-cost company, did not implement a dedicated network for its EBS service, leading to the major complaint about EBS —spotty performance. Overall, EBS performance wasn’t that great, but even worse, it tended to be extremely inconsistent because of the issue of network congestion caused by other applications.
AWS addressed this shortcoming by extending the EBS service in mid-2012 with Provisioned IOPS for EBS — designed to provide fast, predictable EBS performance. Provisioned IOPS delivers between 500 IOPS and 4000 IOPS of guaranteed throughput to EBS volumes. It requires the use of EBS-optimized instances, which provide dedicated throughput, presumably via the use of a storagededicatednetwork. The same strategy of volume striping across multiple EBS volumes can be used with Provisioned IOPS volumes to increase performance well beyond the 4000 Mbps limit.
EBS snapshots
You may recall that EBS volumes are always associated with a single availability zone (AZ), which can present a challenge if a major goal is being able to create highly available applications. You may also recall that I hinted at away to work around the challenge. I’ll let the other shoe drop here and tell you all about the workaround.In addition to EBS’s persistent storage, AWS offers another function within EBS: the snapshot. It’s a point-in-time backup of the data within an EBS volume. The snapshot is stored in S3 in the same region in which the EBS volume resides.
After an initial snapshot of an EBS volume is created, subsequent snapshots store only the modified bits of the volume. So if you have a 10GB volume and create an initial snapshot, all of the data on the volume is in the snapshot. Snapshots of the volume that are created later only store bits that have changed since the previous snapshot. In this way, an EBS snapshot is a highly efficient way to ensure the durability of EBS data, even if the EBS volume itself were to somehow be lost or damaged. A snapshot can be used to create a new volume, so instead of starting withan empty volume, you create a new volume via a snapshot, and when it’sattached to a running instance, all of the data in the original volume is availableto you.
Snapshots can also be transferred betweenAWS regions so that you can easily create a new volume in an entirely different region, attach it to an instance running in an availability zone within that region, and run your application in a location completely different from the original one. By using EBS volumes and snapshots, you can make highly persistent data available throughout the entire AWS environment.
A snapshot is, in effect, a picture of the EBS volume at a given time. The snapshot can be used to re-create an EBS volume. EBS snapshots aren’t backups of the data residing on the volume. You must understand the difference between snapshots and backups for dealing with databases (the most common uses of EBS volumes). When an EBS volume is re-created, it reflects the bits that were residing on it. A database backup, on the other hand, is a file dump of the data residing in the database; the backup can be used to re-create the database on AWS, but also on another cloud service or even in your own data center. EBS snapshots are useful if you want to re-create storage in AWS; database backups are useful if you want to restore a database either in AWS or somewhere else.
A further twist on this topic is your restore time (or, if you need to re-createa database, how long that takes). Creating a new database from a backup can cost you an hour or more. (The process typically takes several hours because the entire backup has to be read into the database before it’s ready.) The EBS approach provides restoration more quickly. After you tell AWS tocreate a new volume from a snapshot, it returns almost immediately with thevolume ID, which you can attach to an instance. The data is then loaded into the volume in the background, and you can request data from anywhere in the volume after it’s mounted. If the data isn’t yet available, AWS requests the necessary blocks; when they’re available, it returns from the request. Though extremely convenient, this process can negatively impact performance until all the data is available on the volume.
EBS pricing
EBS pricing follows the standard AWS practice of paying for what you use and is relatively straight forward, although you should understand the “what ou use” part of the equation.