Storing Data for the Ages

The differences in archive and backup technologies are significant. Be sure you know which solution you really need.

All media companies exist in a data-centric universe today. From the largest media organization to a single freelancer, we all work with computers and generate a trail of data behind us each day. For laymen, how we archive that data for the future can be a complex and daunting subject.

A common misunderstanding that often arises is the difference between data backups and data archives. Many media industry professionals think they are one in the same. That confusion signals a problem — so it’s important for even non-IT experts to know the difference.

Data backups are used to restore data in case it’s lost, corrupted or destroyed. It’s like the Time Machine backup on a Macintosh personal computer or a Drobo backup drive in a small media company. If a hard drive crashes, one restores it with a data backup.

Data archives are different animals. Archives are used to protect older information not needed every day. Such data can be easily searched so that specific information can be located and accessed. Archival storage is normally associated with more extensive metadata than a basic backup.

Data backups are not data archives. The software that controls each kind of data is different, as is the type of media used for the storage. Every person working in entertainment media — including the freelancer working from home — needs both types of storage for their media assets.

W. Curtis Preston notes that there are two types of data. One Is for disaster recovery the other is to review old files. Users must understand this key difference.

W. Curtis Preston notes that there are two types of data. One Is for disaster recovery the other is to review old files. Users must understand this key difference.

W. Curtis Preston, an independent data backup and recovery expert, said one way to view the difference between the two types of data is to think of data backups for disaster retrieval and archives for discovery of older files.

Backups allow users to quickly replace data after it is lost, he said, but archives are used for eDiscovery, including situations where documentation is needed and when the user is being sued or investigated by a government entity.

“Archives are for discovery purposes where backups are not. With archives, users might be asked to provide all the email with certain words in them, or email between two people or to people outside of your company,” Preston said.

“Or you may need to show the history of some project and you need all the files to trace it over time. Backup systems are not very good at providing this kind of information. We are talking ten orders of magnitude difference in pain and difficulty trying to satisfy archive requests with backup files.”

Archival data is also stored differently than backup, Preston said. It includes certain metadata that backups do not store. It can allow for different kinds of searches and can also be packaged by project, where all records are included in a single package. He compared the packages used in archives to the TV show, Cold Case, where all the physical records of an old court case are stored together in a single box.

The Storage Networking Industry Association (SNIA) defines an archive as a collection of data objects, perhaps with associated metadata, in a storage system whose primary purpose is the long-term preservation and retention of that data. The main media choices for archiving are magnetic tape, hard drives or the cloud.

Last September, entertainment media professionals got a new benefit for archiving. The Society of Motion Picture and Television Engineers (SMPTE), published a standard that codifies the Archive eXchange Format (AXF), a file container that can encapsulate any number and type of files in a fully self-contained and self-describing package.AXF supports interoperability among different content storage systems and ensures the content’s long-term availability — no matter how storage or file system technology evolves.

Merrill Weiss, chair of the SMPTE Working Group on AXF, notes that because AXF Objects are essentially immune to changes in technology and formats, they can be transferred from one archive system into remote storage and later retrieved by a different archive system without the loss of any essence or metadata.

Merrill Weiss, chair of the SMPTE Working Group on AXF, notes that because AXF Objects are essentially immune to changes in technology and formats, they can be transferred from one archive system into remote storage and later retrieved by a different archive system without the loss of any essence or metadata.

“When faced with the need either to consolidate archives or to migrate them to new generations of storage technology, media companies traditionally have been forced to perform the costly and time-consuming integration of archive systems and other systems,” said S. Merrill Weiss, chair of the SMPTE Working Group on AXF.

“Now, by abstracting the underlying technology of digital storage, AXF not only supports interoperability among discrete storage systems regardless of the operating and file systems used, but also future-proofs digital storage so that content remains available despite changing formats and storage technologies.”

Designed for operational storage, transport and long-term preservation, AXF was formulated as a wrapper — or container — capable of holding virtually unlimited collections of files and metadata related to one another in any combination.

Known as “AXF Objects,” such containers can package, in different ways, all the specific information different kinds of systems would need in order to restore the content data. The format relies on the Extensible Markup Language (XML) to define the information in a way that can be read and recovered by any modern computer system to which the data is downloaded.

Since AXF Objects are essentially immune to changes in technology and formats, they can be transferred from one archive system into remote storage — geographically remote or in the cloud — and later retrieved and read by different archive systems without the loss of any essence or metadata.

By automatically segmenting, storing on multiple media and reassembling AXF Objects when necessary, “spanned sets” enable storage of AXF Objects on more than one medium. Consequently, AXF Objects may be considerably larger than the individual media on which they are stored.

This scalability helps to ensure that AXF Objects may be stored on any type or generation of media. The use of “collected sets” permits archive operators to make changes to AXF Objects or files within them, while preserving all earlier versions, even when write-once storage is used.

The nature of AXF makes it possible for equipment manufacturers and content owners to move content from their current archive systems into the AXF domain in a strategic way that does not require content owners to abandon existing hardware unless or until they are ready to do so.

AXF already has been employed around the world to help businesses store, protect, preserve and transport many petabytes of file-based content, and the format is proving fundamental to many of the cloud-based storage, preservation and IP-based transport services available today.

Currently, the most frequently used archiving format for media and entertainment is magnetic tape — both LTO 6 tape cartridges and a more expensive enterprise grade system from Oracle called StorageTek T10000 cartridge technology. These tapes are almost as fast as hard drives, but are lower in cost.

This StorageTek T10000 cartridge has a life expectancy of 30 years and can hold up to 8.5TB of data.

This StorageTek T10000 cartridge has a life expectancy of 30 years and can hold up to 8.5TB of data.

LTO and T10000 drives differ by reliability, speed, performance and capacity. LTOs cartridges are not considered enterprise grade, while T10000 cartridges are. An LTO cart, with a life of 15 to 30 years, holds up to 6.25 terabytes of data with a transfer rate up to 400MB/sec compressed.

A single T10000 cart, with a life expectancy of 30 years, holds 8.5 terabytes with a maximum compressed data rate of up to 800 MB/sec. The T10000 technology is twice as fast as LTO technology.

Hard drives are also used for archiving, but cost an average of 26 times more than tape-based archiving systems. Flash media is even more expensive. Cloud technology uses the same technology, but moves archiving responsibility to third parties.

Having a clear understanding between backup strategies and data archiving is essential knowledge for anyone who works with today's IT-centric media systems.

You might also like...

HDR & WCG For Broadcast: Part 3 - Achieving Simultaneous HDR-SDR Workflows

Welcome to Part 3 of ‘HDR & WCG For Broadcast’ - a major 10 article exploration of the science and practical applications of all aspects of High Dynamic Range and Wide Color Gamut for broadcast production. Part 3 discusses the creative challenges of HDR…

The Resolution Revolution

We can now capture video in much higher resolutions than we can transmit, distribute and display. But should we?

Microphones: Part 3 - Human Auditory System

To get the best out of a microphone it is important to understand how it differs from the human ear.

HDR Picture Fundamentals: Camera Technology

Understanding the terminology and technical theory of camera sensors & lenses is a key element of specifying systems to meet the consumer desire for High Dynamic Range.

Demands On Production With HDR & WCG

The adoption of HDR requires adjustments in workflow that place different requirements on both people and technology, especially when multiple formats are required simultaneously.