This article shall detail on how cold storage works and the direct differences between two of the leading cold storage providers: Google Nearline and Amazon Glacier.
What is cold storage?
Cold storage is defined as an operational mode, data storage system, for inactive data. When deploying cold storage, the data retrieval response times are usually longer than the online or production applications data. This is done in order to achieve significant capital and operational savings.
Amazon Glacier is the market leader in cold storage service optimized for infrequently used data, or “cold data.” The service provides durable and extremely low-cost storage with security features for data archiving and backup. With Amazon Glacier, you can store your data cost effectively for months, years, or even decades. Amazon Glacier enables you to offload the administrative burdens of operating and scaling storage to AWS, so you don’t have to worry about capacity planning, hardware provisioning, data replication, hardware failure detection and recovery, or time-consuming hardware migrations.
Google Nearline Google announced its Nearline archival storage product this year and it was quickly seen as a disruptive solution in the market. Why? There was the direct promise of a very quick (only a few seconds) retrieval time. When compared to market leader AWS Glacier – this is pretty fast. According to Google, Nearline offers slightly lower availability and slightly higher latency than the company’s standard storage product but with a lower cost. Nearline’s “time to first byte” is about 2 – 5 seconds which, when you look at other solutions, can be seen as a real game-changer.
Glacier vs Nearline
However digging into the “slightly higher latency” aspects, we promptly discover some significant issues. Google Nearline limits data retrieval to 4MB/sec for every TB stored. This throughput scales linearly with increased storage consumption. For example, storing 3 TB of data would guarantee 12 MB/s of throughput while storing 100 TB of data would provide users with 400 MB/s of throughput. So, if a customer stores 1TB of data within Nearline, their download will start within 2 – 5 seconds, and then promptly take 73 hours to complete (assuming they are downloading 1TB at 4 MB/second).
Comparing the same 1TB case with Amazon Glacier. AWS will have that object available to customers in approximately 3 – 5 hours. Four hours into their download, a Google Nearline customer would be 5% complete on downloading their 1TB of data with approximately 69 hours to completion.
To Summarize
Today, cloud service providers are working to create a new kind of storage platform for cold storage archival. With more organizations looking to a cloud cold storage solution – one of the biggest consideration points after understanding the model itself is management. How do you control what sits in your cold storage environment? How do you label and create policies around specific repositories? What about data encryption, data purging and even API integration?
Leave a Reply