Building Software Defined Infrastructure: Shifting Data

The fundamental principles of how data flows through local and remote processing systems are central to designing software defined infrastructure.

This article is part of ‘Building Software Defined Infrastructure: Part 2 - Processing & Streaming Media Essence'

To fully appreciate the complexity of moving large amounts of data in software defined infrastructures, we need to look more closely at the underlying hardware and how we overcome the challenges it presents.

The first point to note is that traditional SDI/AES infrastructures are primarily designed to move large amounts of data with the smallest delay possible whilst maintaining the highest data integrity. Synchronous distribution removes the need for packet headers so that most of the available data on the datalink can be dedicated to delivering the user data. In the context of television this is video and audio. Maintaining data integrity means that as much data as possible must delivered to the receiver free from loss and distortion.

IP networks and their associated routers, switchers and other infrastructure equipment such as servers and file storage, have a relatively high amount of latency when compared to traditional broadcast equipment. This is a direct consequence of the asynchronous nature of IT equipment. With a few exceptions, such as high-risk applications found in aircraft and medical systems, virtually all IT-COTS infrastructures rely on asynchronous data exchange and processing. This is by design, to keep systems as simple and flexible as possible.

Asynchronous By Design

Most IT type infrastructures used in web applications operate in a transactional manner. For example, a web browser requests a web page, or a string of text is sent to the server to request a response. These data request-reply messages are transactional, and therefore asynchronous by design. Hence the reason that synchronous data exchange is something that is rarely found in IT-COTS type applications as the latency defines the user response time, and if this is within a few hundred milliseconds then the person using the website doesn’t care too much. The same cannot be said for video and audio, especially in the studio and playout environment where consistent and predictable low latency is critical.

Broadcasters often migrate to IT-COTS infrastructures to take advantage of the inherent resilience, reliability and flexibility that they deliver. The downside of this change is that we must make our synchronous video and audio media streams operate on an underlying asynchronous network and server infrastructure with a minimal and predictable latency.

CPU Architecture

A further challenge occurs when we dig deep into the server architecture to understand data processing. The fundamental components of a computer server are the CPU and memory. The Von Neuman architecture has stood the test of time and is still the prevalent design for IT-COTS processing systems. In essence, code is loaded into the memory and then the CPU fetches the instructions from the memory and processes them sequentially. The CPU is a hardware instruction processor that relies on a system of registers and a program counter to provide conditional logic that makes the computer programmable.

As well as the code instructions residing in the memory, the data the CPU is processing also needs to be in the responsibility of the CPU to load the data from these devices into its local memory for processing.

The act of moving the data in traditional server architectures places a huge burden on the CPU, which in turn causes potentially massive latency, much of which is unpredictable. In a typical signal flow, the video and audio media come into the server via the ethernet NIC, and from there the CPU copies the streams into the CPUs local memory for processing. When processing is complete, the video and audio media is either copied back to the ethernet NIC for transfer to the next device or is copied locally to its hard disc drive. Due to the huge amounts of data involved in streaming video and audio media, the latencies in this sort of workflow quickly compound and make the whole server architecture virtually unusable.

Kernel Bypass

Building on the success of other industries, broadcasters can take advantage of systems such as kernel bypass. This is a form of direct memory access (DMA) where the copying of the data from devices such as the ethernet NIC employs a hardware accelerator to transfer it directly from the ethernet NICs memory into the CPUs local system memory, thus negating the need for the CPU to copy the data to and from the system memory.

Figure 1 – The image on the left shows a traditional transfer relying on CPU and operating system data copying resulting in excessive latency. The image on the right shows the kernel bypass approach using RDMA which requires very little CPU overhead resulting in high bandwidth signal transfer with very small latency

Employing such a strategy speeds up the transfer by many orders of magnitude as the data transfer becomes a dedicated hardware task that only briefly includes the CPU. The CPU, instead of copying data from one device to another, which is highly wasteful of resource, sets up a series of registers so that the DMA hardware system knows where to copy the data from, and where to send it to. When the transfer is complete, the DMA engine sets a flag in one of its control registers that lets the CPU know the transfer is complete allowing it to process the data. This method of kernel bypass using the processors DMA subsystem has effectively synchronized the data transfer with the CPU to keep latency to a minimum within an asynchronous environment.

A modern COTS server employs PCIe buses as a method of transferring high speed data from one device to another within the server. DMAs are employed within the PCIe subsystem that transfer data to and from many different devices so that the CPU doesn’t have to do this. These devices not only include ethernet NICs and disk drives but can also include GPU graphics cards and math coprocessor cards. The PCIe controller working alongside the DMA controller makes sure that there are no data clashes on the PCIe busses so that data integrity is maintained, and data throughput is as high as possible, hence keeping latency low.

Extending DMA To Networks

Although the DMA mechanism resides locally within a server architecture, it can be expanded to a much greater domain through the operation of the RDMA (Remote Direct Memory Access). The RDMA effectively expands the concept of DMA to exchange data between physically separate devices via the IP network.

RDMA facilitates the transfer of data from one device to another via the IP network such that the data is sent from the senders’ memory directly to the receivers’ memory via the RDMA protocol. In this context, when we speak of devices, we mean other servers or microservice software defined processes.

In traditional IT-COTS systems, this type of transfer would be CPU resource intensive as the data would have to be physically copied from the sender’s memory to the ethernet NIC, then from the receivers ethernet NIC to the system memory for processing. The burden on the sender and receivers CPU would be extensive to the point where the overall processing would be greatly delayed to the point where the latency would be at best unpredictable, and at worst incredibly excessive.

The RDMA protocol is effectively abstracted from the general operation through the concept of APIs. The API software interfaces form a method of allowing the controlling software to set up the source and destination end points for the data. If we extend the concept of “the data” to a signal flow, then it can be seen that the RDMA forms the basis of a signal flow from the source and destination, whether this is occurring locally within one physical server, or across a network to multiple servers.

RDMA For Signal Flow

If we extrapolate the concept of data transfer to that of signal flow, then it doesn’t take much of an intellectual leap to think of RDMA in terms of signal flow. Each device, whether it is a physical server, virtual machine, or microservice, can be thought of as a method of data exchange. By employing RDMA, the servers CPU no longer has to be associated directly with the transfer of data and can instead focus on processing the video and audio media streams directly.

The signal flow through RDMA requires the controller to establish the source and destination end points via an API call which will facilitate the video and audio media transfer. Upon completion, the destination device, virtual machine, or microservice will then be able to process the signal as if it had arrived as a synchronous video or audio signal.

There are many other variables that need to be considered when transferring large amounts of data, such as data link latency, bottlenecks, and packet loss, but employing strategies such as RDMA greatly improves video and audio signal flow through microservice, and software defined architectures.

Part of a series supported by

You might also like...

Building Software Defined Infrastructure: Monitoring Microservices

Breaking production systems into individual microservice based processors, requires monitoring over IP via RESTful APIs and a database system to capture the results.

Monitoring & Compliance In Broadcast: Monitoring QoS & QoE To Power Monetization

Measuring Quality of Experience (QoE) as perceived by viewers has become critical for monetization both from targeted advertising and direct content consumption.

IP Monitoring & Diagnostics With Command Line Tools: Part 5 - Using Shell Scripts

Shell scripts enable you to edit your diagnostic and monitoring commands into a script file so they can be repeated without needing to type them manually every time. Shell scripts also offer some unique and powerful features that help to…

Building Software Defined Infrastructure: Observability In Microservice Architecture

Building dynamic microservices based infrastructure introduces the potential for variable latency which brings new monitoring challenges that require an understanding of observability.

Broadcast Standards: Kubernetes & The Architecture Of Cloud Compute Based Systems

Here we describe Kubernetes and the taxonomy of containerized architecture based cloud compute system designs it manages.