Vendor Content.

Resilience Is… When The Essence Keeps Coming

Our partner Lawo discuss how software defined broadcast infrastructure can bring true resilience to production systems.

Redundancy has been a major consideration for broadcast infrastructures, especially those that depend on advertising dollars and good viewership ratings. The show needs to go on despite a defective power supply or any other isolated failure. With all eggs in one basket - i.e. in one place and close to one another—this approach certainly has its merits, and has saved quite a few shows. Calling such an approach resilient would nevertheless be a stretch. 

Conventional baseband solutions purchased before open-standards-based IP came along may be redundant up to a point, but that doesn’t make the operation as a whole resilient. And while most infrastructures are able to exchange control, audio and/or video data over an on-prem network, connectivity to a wide-area network (WAN) spanning several cities, or even continents, is definitely not on the cards. Yet, the ability to leverage a processing device in a different location may mean that fewer spare units are required on-prem.

Doubling Down

A WAN-based IP workflow has enabled operators to leverage processing resources in off-premise data centers, of which there should be two, for redundancy. Whilst enjoying access to those processing resources—whether dedicated hardware or hardware-agnostic processing apps—from just about anywhere is an indisputable advantage of an IP setup, a lot more is required to make an operation redundant, let alone resilient.

Separating the mixing console from the DSP processing unit and the I/O stageboxes was an important step to reduce cable runs and making it easier to replace one defective part without having to dismantle the entire audio mixing setup. Plus, the introduction of Pooling slices means that consoles in several locations can share the processing capability of one and the same DSP set up in yet another location.

Building WAN-communication into all of these separate devices through ST2110 and RAVENNA/AES67 compliance not only made a lot of sense from a network topology point of view: it also triggered great expectations regarding resilience, which was confirmed by a recent PoC (proof of concept) that was subsequently rolled-out for almost daily use. The aim was to show the prospective client that immersive audio mixing remains possible even if one of the two processing cores is down. It involved one A__UHD Core in a sporting arena in Hamburg, and a second one serving as a redundant unit, at the production facility near Frankfurt.

The team successfully demonstrated that if the preferred core at the arena, controlled from an mc² console near Frankfurt for the live broadcast mix, becomes unavailable, the core closer to the console immediately and reliably takes over. The physical location of the second core is irrelevant, by the way. It only needs to be connected to the same (private) network.

WAN-based redundancy is an important building block of a robust resilience strategy, even though, according to the experts manning Lawo’s customer service hotline, the unavailability of an audio processing core is the most unlikely incident in a series of plausible failures from which operators can recover automatically.

This degree of redundancy involves so-called “air gapped units”, i.e. hardware in two separate locations, to ensure continuity if the “red” data center is flooded or subject to a fire. In this case, the redundant, “blue” data center automatically takes over.

Strictly speaking, the five likeliest glitches—control connection loss, routing failure, media connection failure, control system failure, and power supply failure—require no hardware redundancy, i.e. a spare unit, when the audio infrastructure is built around an A__UHD Core or .edge unit. That said, having a spare unit online somewhere is always a good idea. It is indeed also required as fail-over for incident number six, DSP/FPGA failure.

At this year’s NAB Show, Lawo announced HOME mc² DSP, a processing app with the same functionality and feel as an A__UHD Core, but in a completely redesigned CPU-based app package that runs on the same standard servers as Lawo’s video HOME Apps. Among other things, HOME mc² DSP is easy and quick to spin up—and can pick up where its hardware audio processing sibling left off. Some manual adjustments will be required, but your audio stack remains resilient despite the relocation from one “planet” to another.

Explode To Reinforce

A second important aspect is to decentralize what used to be in one box. Even certain IP-savvy solutions are still supplied as a single unit that handles both control and processing. For maximum resilience, one device should do the processing, while a standard server or dockerized container transmits the control commands it receives from a mixing console to the processing core, and a switch fabric does the routing.

The same is true of Lawo’s HOME Apps that are, in fact, a collection of microservices: some of these specialize in a specific kind of audio or video processing, while others act as “paraphernalia”, supplying format, compression and transport protocol conversions, the required number of inputs and outputs, etc. Here again, due care has been taken to allow one HOME App instance to stand in for another should the need arise. At a breath-taking speed.

Separating control, processing and routing, and making all three redundant minimizes the risk of downtimes. Plus, except for at least one switch close to each required component, all processing devices or CPU services can be in different geographic locations.

Your Humble Server

And it doesn’t stop there. A redundant IP network with red and blue paths is built around a switch fabric. Without going into too much detail, certain management protocols (PIM and IGMP) may cause issues that could seriously affect broadcast workflows or even bring them to a grinding halt.

The first is related to situations where the red and blue paths are routed to the same spine switch. An issue with that switch means that this part of the network not only ceases to be redundant but may stop working altogether: it is a single point of failure.

The second issue is related to how switches distribute multicast streams over the available number of ports if they are not bandwidth-aware. In a non-SDN network, this may lead to situations where one port is oversubscribed, i.e. asked to transmit more gigabits per second than it can muster. This may cause errors at the receiving end.

These and other topics are being addressed by companies like Arista and Lawo via a Multi Control Service routine and the VSM studio manager’s direct influence on traffic shaping. The goal is to avoid failures, oversubscription of network ports, and to allow operators of large installations to immediately confirm the status of their switching and routing operations.

Combining the above with the HOME management platform for IP infrastructures adds yet another building block. HOME not only assists operators with automatic discovery and registration, but also with controlling processing cores by hosting the control software for mc² consoles on networked standard servers—and to dynamically switch from one processing core to the other, one console surface to the next, or one XCS mixing control instance to another if the need arises.

Stay In Control

Resilience necessarily also includes control. VSM achieves seamless control redundancy with two pairs of COTS servers stationed in two different locations and automatic fail-over routines. Hardware control panels are not forgotten: if one stops working, connecting a spare, or firing up a software panel, and assigning it the same ID—which takes less than a minute—restores interactive control. And just so you know: the control status as such is not affected by control hardware failures.

As installs migrate towards a private cloud/data center infrastructure, provisioning two (or in HOME’s case, three) geographically distanced standard servers with permanent status updates between the main and the redundant units allows users to remain in control.  If the underlying software architecture is cloud-ready, those who wish can ultimately move from hardware servers to service-based infrastructures in the cloud. Technologies like Kubernetes and AWS Load Balancer can then be solicited to provide elastic compute capacity that instantly grows and shrinks in line with changing workflow requirements. A welcome side effect of this is that no new hardware servers need to be purchased to achieve this kind of instant, high-level resilience.

After experiencing the benefits of resilient, elastic and thoroughly redundant control, some operators may wonder whether a similar strategy is also possible for Lawo’s audio and video HOME Apps. The short answer is: “Yes.” Quite a few operators are wary of the “intangible cloud” and may be relieved to learn that the ability to architect private data centers using standard servers in a redundant configuration already allows them to achieve a high degree of “private” resilience.

One Leap Closer

A genuinely resilient broadcast or AV network is a self-healing architecture that always finds a way to get essences from A to B in a secure way. Users may not know, or care, where those locations are; but the tools they use to control them do. They are even good at quickly finding alternatives to keep the infrastructure humming.

The only remaining snag was to provide operators with an almost failsafe infrastructure. A lot has been achieved in this area to make broadcast and AV infrastructures resilient by design while keeping them intuitive to operate.