Managing Archive Material with the Glacier Storage Service
Glacier, released in August 2012, is a storage service targeted at a critical (yet often poorly managed) IT requirement: archival storage. Simply stated, archival storage is backup data of any sort. The best-known use of archival storage involves server backups — complete dumps of all data on the server’s drive.
Glacier leverages the AWS infrastructure to provide archival storage that addresses the shortcomings of both tape and disk solutions:
✓ It’s inexpensive: Glacier costs start at less than $.02 per gigabyte ofarchival storage. That’s significantly less expensive than disk archive, and even less expensive than tape archive, the previous low-cost archive solution.
✓ It’s durable: Glacier uses the S3 infrastructure, which means it can offer the same 99.999999999-percent durability as the S3 service. That’s a lot more reliable than the previous archive solutions.
✓ It’s convenient: You just send and retrieve archive files over the Internet, making it simple to extend your current backup solution to Glacier. Many of today’s newer, commercial backup solutions provide deduplication functionality, so if you use one of those, you can be sure that it will soon have an Archive to Glacier option.
✓ It’s highly scalable: An archive file can be as large as 40TB, which should be big — enough for anyone.
✓ It’s secure: Data is transmitted to and from Glacier over SSL encryption,and the archives themselves are encrypted as well while in storage.
✓ It’s fast: Data can be pulled from Glacier in as little as five hours, making it significantly faster than tape archive solutions, which require schlepping out to the archive storage facility.
And while Glacier confronts the same issue as disk archive of having to send data over the Internet, AWS has a couple solutions to this issue:
• AWS Import/Export is a service that allows lets you to send Amazon physical disk drives with your data on them. At the Amazon end, an Amazon employee downloads the data from the drive and adds it into AWS.
• AWS Direct Connect is a service offered by Amazon in partnership with network service providers that place a high bandwidth connection between their facilities (or, indeed, your own data center) and AWS. The connection can be 1 Gbps or 10 Gbps, making it possible to transmit or receive very large volumes of data quickly.
Glacier in action
Glacier is straightforward, conceptually. The idea is for you to create Glacier vaults within your AWS account and then store archives in those vaults. The conceptual similarity between this arrangement and S3 buckets and objects is obvious. Each AWS account can have 1,000 vaults, and each vault can contain an unlimited number of archives. As previously noted, an archive can be as large as 40TB.
Two ways exist to create an archive:
✓ Archive S3 objects into Glacier by setting S3 retention policies for the object. You may, for example, set a retention period of 90 days; after 90 days were up, S3 would migrate the object into Glacier. To retrieve the object, execute an S3 Restore command, and a few hours later the objectis back in S3, ready to be accessed. S3 maintains a mapping between S3 object IDs and Glacier archive IDs and takes care of all archiving management.
✓ Use the Glacier API to manage the creation and retrieval of archives, and let Glacier takes care of storing it securely and robustly. If you need to retrieve an archive, you issue a command (again via the API), specifying the file location you want the retrieved archive placed in, and five hours or so later, the archive is available on the server on which the file location exists. The server can be located in EC2 or in some another non-AWS data center. You can set AWS to notify you when the archive is available by using the Simple Notification Service (SNS).