Building Software Defined Infrastructure: Shifting Data

The fundamental principles of how data flows through local and remote processing systems are central to designing software defined infrastructure.

To fully appreciate the complexity of moving large amounts of data in software defined infrastructures, we need to look more closely at the underlying hardware and how we overcome the challenges it presents.

The first point to note is that traditional SDI/AES infrastructures are primarily designed to move large amounts of data with the smallest delay possible whilst maintaining the highest data integrity. Synchronous distribution removes the need for packet headers so that most of the available data on the datalink can be dedicated to delivering the user data. In the context of television this is video and audio. Maintaining data integrity means that as much data as possible must delivered to the receiver free from loss and distortion.

IP networks and their associated routers, switchers and other infrastructure equipment such as servers and file storage, have a relatively high amount of latency when compared to traditional broadcast equipment. This is a direct consequence of the asynchronous nature of IT equipment. With a few exceptions, such as high-risk applications found in aircraft and medical systems, virtually all IT-COTS infrastructures rely on asynchronous data exchange and processing. This is by design, to keep systems as simple and flexible as possible.

Asynchronous By Design

Most IT type infrastructures used in web applications operate in a transactional manner. For example, a web browser requests a web page, or a string of text is sent to the server to request a response. These data request-reply messages are transactional, and therefore asynchronous by design. Hence the reason that synchronous data exchange is something that is rarely found in IT-COTS type applications as the latency defines the user response time, and if this is within a few hundred milliseconds then the person using the website doesn’t care too much. The same cannot be said for video and audio, especially in the studio and playout environment where consistent and predictable low latency is critical.

Broadcasters often migrate to IT-COTS infrastructures to take advantage of the inherent resilience, reliability and flexibility that they deliver. The downside of this change is that we must make our synchronous video and audio media streams operate on an underlying asynchronous network and server infrastructure with a minimal and predictable latency.

CPU Architecture

A further challenge occurs when we dig deep into the server architecture to understand data processing. The fundamental components of a computer server are the CPU and memory. The Von Neuman architecture has stood the test of time and is still the prevalent design for IT-COTS processing systems. In essence, code is loaded into the memory and then the CPU fetches the instructions from the memory and processes them sequentially. The CPU is a hardware instruction processor that relies on a system of registers and a program counter to provide conditional logic that makes the computer programmable.

As well as the code instructions residing in the memory, the data the CPU is processing also needs to be in the responsibility of the CPU to load the data from these devices into its local memory for processing.

The act of moving the data in traditional server architectures places a huge burden on the CPU, which in turn causes potentially massive latency, much of which is unpredictable. In a typical signal flow, the video and audio media come into the server via the ethernet NIC, and from there the CPU copies the streams into the CPUs local memory for processing. When processing is complete, the video and audio media is either copied back to the ethernet NIC for transfer to the next device or is copied locally to its hard disc drive. Due to the huge amounts of data involved in streaming video and audio media, the latencies in this sort of workflow quickly compound and make the whole server architecture virtually unusable.

Kernel Bypass

Building on the success of other industries, broadcasters can take advantage of systems such as kernel bypass. This is a form of direct memory access (DMA) where the copying of the data from devices such as the ethernet NIC employs a hardware accelerator to transfer it directly from the ethernet NICs memory into the CPUs local system memory, thus negating the need for the CPU to copy the data to and from the system memory.

Figure 1 – The image on the left shows a traditional transfer relying on CPU and operating system data copying resulting in excessive latency. The image on the right shows the kernel bypass approach using RDMA which requires very little CPU overhead resulting in high bandwidth signal transfer with very small latency

Figure 1 – The image on the left shows a traditional transfer relying on CPU and operating system data copying resulting in excessive latency. The image on the right shows the kernel bypass approach using RDMA which requires very little CPU overhead resulting in high bandwidth signal transfer with very small latency

Employing such a strategy speeds up the transfer by many orders of magnitude as the data transfer becomes a dedicated hardware task that only briefly includes the CPU. The CPU, instead of copying data from one device to another, which is highly wasteful of resource, sets up a series of registers so that the DMA hardware system knows where to copy the data from, and where to send it to. When the transfer is complete, the DMA engine sets a flag in one of its control registers that lets the CPU know the transfer is complete allowing it to process the data. This method of kernel bypass using the processors DMA subsystem has effectively synchronized the data transfer with the CPU to keep latency to a minimum within an asynchronous environment.

A modern COTS server employs PCIe buses as a method of transferring high speed data from one device to another within the server. DMAs are employed within the PCIe subsystem that transfer data to and from many different devices so that the CPU doesn’t have to do this. These devices not only include ethernet NICs and disk drives but can also include GPU graphics cards and math coprocessor cards. The PCIe controller working alongside the DMA controller makes sure that there are no data clashes on the PCIe busses so that data integrity is maintained, and data throughput is as high as possible, hence keeping latency low.

Extending DMA To Networks

Although the DMA mechanism resides locally within a server architecture, it can be expanded to a much greater domain through the operation of the RDMA (Remote Direct Memory Access). The RDMA effectively expands the concept of DMA to exchange data between physically separate devices via the IP network.

RDMA facilitates the transfer of data from one device to another via the IP network such that the data is sent from the senders’ memory directly to the receivers’ memory via the RDMA protocol. In this context, when we speak of devices, we mean other servers or microservice software defined processes.

In traditional IT-COTS systems, this type of transfer would be CPU resource intensive as the data would have to be physically copied from the sender’s memory to the ethernet NIC, then from the receivers ethernet NIC to the system memory for processing. The burden on the sender and receivers CPU would be extensive to the point where the overall processing would be greatly delayed to the point where the latency would be at best unpredictable, and at worst incredibly excessive.

The RDMA protocol is effectively abstracted from the general operation through the concept of APIs. The API software interfaces form a method of allowing the controlling software to set up the source and destination end points for the data. If we extend the concept of “the data” to a signal flow, then it can be seen that the RDMA forms the basis of a signal flow from the source and destination, whether this is occurring locally within one physical server, or across a network to multiple servers.

RDMA For Signal Flow

If we extrapolate the concept of data transfer to that of signal flow, then it doesn’t take much of an intellectual leap to think of RDMA in terms of signal flow. Each device, whether it is a physical server, virtual machine, or microservice, can be thought of as a method of data exchange. By employing RDMA, the servers CPU no longer has to be associated directly with the transfer of data and can instead focus on processing the video and audio media streams directly.

The signal flow through RDMA requires the controller to establish the source and destination end points via an API call which will facilitate the video and audio media transfer. Upon completion, the destination device, virtual machine, or microservice will then be able to process the signal as if it had arrived as a synchronous video or audio signal.

There are many other variables that need to be considered when transferring large amounts of data, such as data link latency, bottlenecks, and packet loss, but employing strategies such as RDMA greatly improves video and audio signal flow through microservice, and software defined architectures.

Part of a series supported by

You might also like...

BEITC At NAB 2025: Conference Sessions Preview - Part 2

Once again in 2025 The Broadcast Bridge is proud to be the sole media partner for the BEIT Conference Sessions at NAB. They are not free, but the conference sessions are a unique opportunity to engage with very high quality in-person…

Microphones: Part 8 - Audio Vectorscopes

The audio vectorscope is an excellent tool for assuring quality in stereo sound production, because it makes the virtual sound image visible in the same way that a television vectorscope allows the color signals to be seen.

BEITC At NAB 2025: Conference Sessions Preview - Part 1

Once again in 2025 The Broadcast Bridge is proud to be the sole media partner for the BEIT Conference Sessions at NAB. They are not free, but the conference sessions are a unique opportunity to engage with very high quality in-person…

HDR & WCG For Broadcast - The Book

‘HDR & WCG For Broadcast – The Book’ is a multi-article exploration of the science and practical applications of all aspects of High Dynamic Range and Wide Color Gamut within broadcast production.

Monitoring & Compliance In Broadcast: Part 1 - Cloud, Multi-Site & Remote Systems

‘Monitoring & Compliance In Broadcast’ explores how exemplary content production and delivery standards are maintained and legal obligations are met. The series includes four Themed Content Collections, each of which tackles a different area of the media supply chain. Part 1 con…