DynamoDB | awsarq

Scaling Key-Value Data with DynamoDB
DynamoDB is Amazon’s latest AWS storage offering. This key-value storage service is designed to provide very high scalability and performance for the most demanding applications. When you call DynamoDB to insert a row of data, it hashes the index value (the data item used to organize the table, such as the customer’s last name, used as the index for the customer table) and places the row randomly throughout the storage pool. When you request that row, DynamoDB again hashes the indexvalue, goes to the location that the hash identifies, retrieves the data, and gives it back to you.

Key-value versus relational databases

Key-value storage can manage much larger data pools and operate with much higher performance than traditional relational databases because it can span larger number of servers to store its data. You face some trade-offs from using it, however, because it doesn’t support the following features:

✓ Range retrieval: Using this feature, you can say, “Give me all recordsfrom the customer table where the customer last name equals Jones.”Relational databases excel at these kinds of queries.

✓ Joined queries: This type of query lets you say, “Give me a customername from the customer table and a corresponding address from theaddress table where the customer name equals Jones and the addresscity equals Los Angeles.” Joins can be extremely useful in applications,but they harm database performance, particularly when overall storageis very large. Both key-value and relational databases have their uses, and in fact it’s not uncommon for applications to use both, with each applied to the portions ofthe application for which it’s best suited.

DynamoDB characteristics

Amazon has designed DynamoDB to be high-performance, extremely flexible,and with high availability, all based on these characteristics:

✓ The total amount of storage can be increased (or decreased) at anytime. You’re not forced to forecast how much storage is required for an application. For many of the webscale applications that would have a natural affinity for DynamoDB-type storage, this is a strong selling point. Such applications are unpredictable in terms of how much storage the ywill ultimately require, so the flexible scalability of DynamoDB can be areal benefit.

✓ No downtime is required in order to resize a DynamoDB table. AWS automatically adds additional servers to a DynamoDB table pool and redistributes the table data across the pool. This task is performed in the background and requires no application downtime, making it possible to continue running applications, even while resizing the underlying DynamoDB table pool to support necessary throughput.

✓ The schema is flexible. Relational databases require you to define the items you’ll manage, and their types (string or integer, for example) and sizes, all before using the system; this definition is referred to as the database schema. What happens if you need to store additional informationin your database? You have to alter the original database schema, which, if a large amount of data is already in the database, can take (literally) days to execute. DynamoDB, by contrast, has a flexible schema —you can add items to a record at any time without requiring an Alter operation. Moreover, if you add, say, a second address to an individual customer’s record, no other customer’s record needs to be changed,and no additional storage needs to be allocated for all those other customers’potential second addresses. This makes DynamoDB very easy to use, extremely flexible in the type of data that can be stored, and highly eficient in its use of storage.

✓ Solid-state drives are used instead of disk drives. DynamoDB avoids the dreaded latency of data lookups that require seeks across spinning disks by using solid-state drives that incorporate flash storage, to increase data throughput and increase DynamoDB performance.

✓ Performance levels can be changed dynamically while in operation. If you realize that you need more (or less) performance capability from your DynamoDB database, you can adjust it on the fly, without needing to take DynamoDB down. This allows you to dynamically tune your database performance while your application is still in production.

✓ Storage is redundant to ensure high availability. DynamoDB stores multiple copies of each record, thereby avoiding outages caused by hardware failure.

✓ Storage is dispersed across multiple availability zones. By dispersing DynamoDB tables across multiple availability zones, AWS ensures that even a large-scale outage, such as the loss of an entire data center, doesn’t affect the availability of DynamoDB.

Using DynamoDB

You can easily create a DynamoDB table via the AWS Management Console, by following this process (in broad terms):

1. Define the table name. Note: When you name a table, your character pool is limited to a–z, A–Z,0–9, and the underscore, hyphen, and period; no other characters areallowed.

2. Name and define the primary key — the index for the table.You can choose the type of data you’ll use as index: string, number, or binary.

3. Define how much read and write capacity you want for your DynamoDBtable.The amount of capacity affects your DynamoDB table performance, so your choices here are important to overall application performance. You can have up to ten read capacity units and five write capacity units for free each month. (Read and write units are a measurement of performance and represent throughput in these operations — see theDynamoDB cost section below for details.) Don’t worry if you’re unsure about how much you’ll ultimately need — you can dynamically adjust these figures; DynamoDB supports performance levels from tens to hundredsof thousands of capacity units per table.

4. Decide whether to have throughput alarms sent to you. A throughput alarm indicates whether your table’s request rate is consistently above a certain level for an hour. (The default level is 80 percent.) It’s the mechanism that tells you when to increase your table’s read and write capacity.

5. Press the Create button to create the DynamoDB table. A couple minutes later, your DynamoDB table is ready. Easy, eh? It’s easy, especially in comparison with provisioning your own instances, loading akey-value product onto each of them, arranging for redundancy, and so on. DynamoDB hasn’t been around long, but I predict that it will be a huge hitas more and more highly scaled webscale Internet sites adopt it as a moreattractive alternative to “rolling their own.”

Here’s a deeper dive in to the “why and what” of DynamoDB indexes.

First, and vitally important, it’s crucial to use an appropriate index for key-value storage. A key-value product performs a hash on the index value to determine where in the storage pool to place the data associated with the index.The hash implements an algorithm to create unique values for different indexes, which places the data randomly throughout the storage pool.The key (excuse the pun) issue with an index is to define it with a highly variable index. For example, if you have millions of customers, it’s a bad idea to index them by zip code, because the amount of duplication would choke performance. When creating a DynamoDB table, be sure to choose your index carefully.

You can create a customer table index using sequential customer numbers with an associated range of zip code. That way, DynamoDB would use the highly variable customer number to spread the data randomly across the entire table resource pool but keep pointers to the zip code values that it can use in queries on that range. With hash-and-range, users can gain the full benefit of key-value storage along with a limited amount of the benefit availableto relational database storage.

DynamoDB read consistency

Amazon takes a different tack: Provide two types of reads — consistent andeventually consistent. The former performs a read only after DynamoDB is certain that it reflects the latest version of data, and the latter returns data immediately with no guarantee that it reflects the latest-and-greatest version.This choice offers a trade-off: The consistent method is simpler but may provide lower performance, whereas the eventually consistent option is less certain but offers the highest possible level of performance.

DynamoDB scope and availability

DynamoDB tables are AWS region-scoped. The servers that make up the table resource pool are spread among availability zones within the region in which the table lives. Amazon publishes no projection of the expected level of DynamoDB availability. Given its use of redundancy, you should expect extremely high availability from DynamoDB.

DynamoDB cost

DynamoDB has three separate and distinct cost variables:

✓ The size of the server pool, defined as read and write capacity: As you’d expect, larger read and write capacity requires spreading the table across larger numbers of servers, with an accompanying increase in cost. The first 10 units of read capacity and the first 5 units of write capacity per month are free. Above that level, however, the cost is$.01 per hour for every 10 units of write capacity, and $.01 per hour forevery 50 units of read capacity. 1 unit of write capacity enables you to perform 1 write per second for items as large as 1KB. Similarly, 1 unit of read capacity enables you to perform one strongly consistent read persecond (or two eventually consistent reads per second) of items as largeas 1KB.

✓ The storage associated with the DynamoDB table: You get 100MB ofstorage for free every month; above that level, storage is priced at $.25per gigabyte. The total amount of storage within DynamoDB is a little larger than the size of the data being stored; DynamoDB adds 100 bytes of indexing information to each item stored in DynamoDB, which is added to the total storage in DynamoDB.

✓ Data transfer, which is the same price and conditions as for all AWS offerings: The first gigabyte of transfer per month is free, and above that the cost of data transfer varies between $.12 and $.05 per gigabyte,depending on total traffic.

AWS - Amazon Web Services

AWS Certified Solutions Architect