Disaster recovery strategies

We’ve all heard the phrase Disaster Recovery (DR), but what does it actually mean for broadcasters and content owners, and what constitutes a disaster? DR is a broad term that encompasses a range of scenarios, from catastrophic disaster (for instance, the complete destruction of a whole facility), to operational disaster such as a transmission server failing. The ideal strategy for rescuing a situation in the event of a disaster is the seamless continuity of business under all circumstances with no assets being lost.

In the past DR strategies have involved staff picking up boxes of tapes and equipment, jumping in a car, driving to another facility and getting back on air as quickly as possible. These days being off-air for longer than a few seconds, or a minute at most, is a disaster in itself and with content now being delivered globally, broadcasters have even greater responsibility to ensure that channels stay on air. They often also face 99.999 per cent contracts with their channel partners with reference to airtime, and there are rules on compensation for lost or clipped ads. So there are also strong financial drivers for staying on air. If a catastrophe occurs at a facility in one part of the world, a DR strategy needs to be in place that will automatically kick-in from another. With increased availability of wide area bandwidth and dark fibre it’s much easier to share content globally but it is still not cheap.

So how do broadcasters ensure that their DR strategies are secure enough to continue broadcasting in any situation? The DR utopia includes multi-layered safeguards against the unexpected, using automated content replication systems to provide synchronised, mirrored or like-for-like asset duplication, across the same site or at geographically disparate locations.

At its most straightforward, this can be accomplished by duplicating tapes in the main archive and then moving those tapes to remote DR storage. LTFS works well in this environment as any LTFS-capable system can read a tape created by any other, and can identify and retrieve the files stored on it. This means there is no requirement for a second archive system to simply read those files.

At the other end of the functionality scale, a fully automated DR-configured archive can be connected to a remote facility with either a robotic tape or disk storage. In this configuration media assets can be automatically copied across the network and synchronised with the remote site. This model is ideal for broadcasters whose main and DR archives are separated by many hundreds of kilometres.

As we can see, automated site redundancy is an important factor for broadcasters and can be achieved by using rules-based implementations, providing fully-automated data duplication across multiple storage layers and locations. Disaster Recovery systems enable multi-site operations to be mirrored and data synchronised across the globe. If one site becomes inoperative, it can be rebuilt entirely from data that has been replicated to other sites.

Remote site WAN based SGL FlashNet archive

The more sophisticated archive management systems are able to offer completely customisable rules-based data duplication, through which content can be automatically copied as it is archived across disk and tape layers and, where required, different locations. In single-site scenarios, duplicate tapes can be easily externalised from the storage system, singly or in content-based groups, and removed to safe locations. Once a DR strategy is in place, it’s also important to periodically practise scenarios and test equipment.

SGL has many archives installed around the world where Disaster Recovery workflows are either in use or can be made DR-capable quickly and easily. Its scalable FlashNet architecture provides broadcasters and content owners with a clustered system of multiple servers, or nodes, each in constant communication. Each cluster node has identical software installed, and each is connected via fibre channel into the archive devices - generally disk storage and one or more tape libraries. At the heart of the cluster is a Microsoft SQL database, which is usually installed across two servers running a Microsoft cluster for automatic failover.

Successful DR projects around the world will encourage more multi-site operations to adopt a distributed approach to their content management. By ensuring sufficient time is spent on collaborative design, an efficient DR strategy can be successfully achieved ensuring that a channel can continue transmission regardless of any disaster without losing a single frame.

Quite simply assets equal value, which equals revenue. Can you afford to lose them?

Paul Moran is CTO & joint MD at SGL and Lee Sheppard is director of product management at SGL

You might also like...

Building Software Defined Infrastructure: Monitoring Microservices

Breaking production systems into individual microservice based processors, requires monitoring over IP via RESTful APIs and a database system to capture the results.

Monitoring & Compliance In Broadcast: Monitoring QoS & QoE To Power Monetization

Measuring Quality of Experience (QoE) as perceived by viewers has become critical for monetization both from targeted advertising and direct content consumption.

IP Monitoring & Diagnostics With Command Line Tools: Part 5 - Using Shell Scripts

Shell scripts enable you to edit your diagnostic and monitoring commands into a script file so they can be repeated without needing to type them manually every time. Shell scripts also offer some unique and powerful features that help to…

Building Software Defined Infrastructure: Observability In Microservice Architecture

Building dynamic microservices based infrastructure introduces the potential for variable latency which brings new monitoring challenges that require an understanding of observability.

Broadcast Standards: Kubernetes & The Architecture Of Cloud Compute Based Systems

Here we describe Kubernetes and the taxonomy of containerized architecture based cloud compute system designs it manages.