Archive Storage: DIY or Outsource?
Content will increasingly need to be preserved for longer periods of time--decades or more. Image courtesy Visual Insight.net.
Of all the crucial requirements often mentioned about storing data, permanence is the most important when it comes to archival storage.
In my previous storage article, "On-Line, near-Line or archive. Just how should data be stored?", I reviewed some of the keys to an effective and reliable storage system.
To recap; the criteria for data storage are, Permanence, Availability, Scalability and Security. Making an acronym we get P.A.S.S.
- Permanence means that data is never lost.
- Availability means that the user/application requirements for access/performance are met.
- Scalability defines the ease of meeting changing requirements.
- Security defines the granularity and durability of access privileges.
Current archive solutions achieve this through active intervention i.e. copying the data before its current carrier deteriorates or becomes obsolete.
Permanence is the most important criteria for archive storage. The data should never be lost or modified. Current archive solutions achieve this through active intervention i.e. copying the data before its current carrier deteriorates or becomes obsolete.
Availability and security go hand in hand. The more available the data for users and applications the less secure. This is a trade-off usually based on the value of the data being stored. Resources can be spent making data more available, on-line vs on-shelf, but then more resources must be spent, if the same level of security is required, with the downside of reducing availability again.
How tightly integrated should archive storage be to a media workflow? The answer is that it must be integrated in such a way as to meet the needs of both creative and management. And, it must be automatic. If multiple human decisions and effort must be made to archive something, it will not reliably happen. Data will be lost.
To put it another way; if part of your contract is to keep the assets for a specific period then that storage gets billed separately from the production process, however a well-designed workflow can use that storage during the production process thus reducing TCO.
Where to store data?
What belongs on-site, in the cloud or at Iron Mountain (or some other bunker)? Accessibility, quantity and cost need to be considered. For me, the determining factor was cost. When the price of a 6Tbyte disc dropped to $200 we integrated the writing of these disks (physical duplication) into the workflow and treated them as tapes within the library system. This meant the time interval for spooling and checking tapes was the same as for the discs and removal to off-site storage followed the same procedure, when the library got full.
Admittedly, the original material was available elsewhere and the metadata for recreating the finished product was stored separately. Cloud storage was not an option then, but considering that the cost (not price) of a Pbyte in the cloud is around $2500 per month, it might be worth considering. The downside being accessibility, but you no longer have the hassle of duplication and migration.
Choosing a cloud storage provider is not simple. There are many variables that must be weighted before making a decision. Image courtesy Cloudstoragebest
Today if my cloud storage was cost effective, I would keep the data on-site for a reasonable period after production then move it to the cloud and reuse the local storage. In this case the equation about what kind of on-site storage should be used changes, as the amount of needed local storage has been greatly reduced. It may then make sense to have a small near-line system on-site thus facilitating automatic transfer into the cloud. This is where quantity comes into play.
Bandwidth to move content to off-site storage costs about $2 per Mbs/month. Just to make things simple let’s say you already have a real gigabit uplink and you do a petabyte per year. The bandwidth will cost approximately an extra $500 per month and use about a quarter of your GB uplink.
If you do not already have that kind of bandwidth you may want to think about finding a storage provider who will ingest from disk. But that creates another problem. The near-line on-site will make less sense because the data would have to be written back to disk before shipping.
Storing data is tough
Archiving data is hard work, So hard that thousands of assets are lost every year, simply because archiving requires just too much effort and the process is not automatic.
Major physical archives practice triage. We cannot save everything, and in many cases we do not need to. Automatically purging your storage of multiple copies is called deduplication, or de-dup.This would be another advantage of on-site near-line as the deduplication could be done before a transfer to the cloud. Integrating data management into the workflow in order to insure that only the data that needs to be archived, actually is, would therefor mitigate the problem at the source.
Today's DVDs hold 4.76GB (single layer) to 8.5GB (double layer) of data. A single layer Blu-ray can store 25GB and a double-layer Blu-ray can store 50GB of data. When it comes to video, especially 4K imagery, creatives will need more storage options.
Traditional optical storage
I worked at SONYDADC for 10 years. We made the discs, I mean literally. Plastic would arrive in giant trucks and metal pellets in big sacks. At the end of the manufacturing process would come out the comon silver discs we call DVDs. To say we were interested in durability of the storage medium is an understatement. Read only disks are made by pressing, not laser writing. This requires the substrate to be malleable. Unfortunately that means it will deteriorate over time.
WORM (Write Once, Read Many) discs rely on a protective coating that can be ablated with a laser. But this not only limits the size of the bits and thus the amount of data but also the kind of adhesives that can be used. Today, the best optical disks may last about a 100 years.
SONY also had a tape restoration facility in France, DAX. A damaged tape could often be restored, but the process was intensive. First the tape had to be ironed back onto a substrate to make the surface sufficiently stable that it could survive passing throug a tape machine. Even then, the tape could only be played once. As I recall some of the tapes we were tasked to restore were not even 10 years old!
What other storage options do we have? When it comes to film, the images can be printed back to colour separations and stored in climate controlled vaults. But that is also an expensive option. So, the search for the perfect storage medium remains one of the major concerns of media professionals.
This new optical storage process uses high-density five dimensional data storage with ultrafast laser writing and providing up to 360TB/disc data capacity. Courtesy Optoelectronics Research Centre, University of Southampton, SO17 1BJ, United Kingdom
A new optical solultion
There is an organization dedicated to helping solve the long-term storage issue. It’s called the Long Now Foundation. Scientists down in Southampton, UK have made some real progress on an idea that has been kicking around since the ‘90s
They are focused on using a femtosecond laser, similar to that used in eye surgery. These lasers can be used to write data at very high densities. What‘s new is not the laser. It is the nanotechnology used to create the recording medium. The process can allow three bits to be stored in the space of one laser imprinted spot.
The result is an optical disc with a storage density of 360TB/disc, a thermal stability up to 1000°C and practicallyan unlimited lifetime.
You can read the entire paper, “5D Data Storage by Ultrafast Laser Nanostructuring in Glass” here.
Let’s see who buys the patent rights and hope it is not the NSA.
Broadcasters will need to build integrated, connected and often cloud-centric, workflows. Long-term storage will need to be part of that solution.
You might also like...
Designing IP Broadcast Systems - The Book
Designing IP Broadcast Systems is another massive body of research driven work - with over 27,000 words in 18 articles, in a free 84 page eBook. It provides extensive insight into the technology and engineering methodology required to create practical IP based broadcast…
Demands On Production With HDR & WCG
The adoption of HDR requires adjustments in workflow that place different requirements on both people and technology, especially when multiple formats are required simultaneously.
If It Ain’t Broke Still Fix It: Part 2 - Security
The old broadcasting adage: ‘if it ain’t broke don’t fix it’ is no longer relevant and potentially highly dangerous, especially when we consider the security implications of not updating software and operating systems.
Standards: Part 21 - The MPEG, AES & Other Containers
Here we discuss how raw essence data needs to be serialized so it can be stored in media container files. We also describe the various media container file formats and their evolution.
NDI For Broadcast: Part 3 – Bridging The Gap
This third and for now, final part of our mini-series exploring NDI and its place in broadcast infrastructure moves on to a trio of tools released with NDI 5.0 which are all aimed at facilitating remote and collaborative workflows; NDI Audio,…