PTP Explained - Part 4 - Requirement’s For Virtualisation Of ST 2110 COTS Infrastructures

In the fourth and final part of this series, we wrap up with an explanation on how PTP is used to support SMPTE ST 2110 based services, we dive into timing constraints related to using COTS (Commercial Off-The-Shelf) hardware, i.e.: servers.

Up front we must call out that we are referring to the use of appropriate hardware for the task at hand: Datacentre class servers. These are designed to support the processing load that media workflows impose and allow for such standard hardware to be used. These systems can then be designed to run as a bare-metal solution with an application on top of an operating system on the hardware or as a virtualized resource amongst others on that same hardware. The actual performance will depend on the software vendor(s) and the application(s) capabilities. Nevertheless, at the core there is the same timing and accuracy requirements set out by the SMPTE ST 2110-21 standard related to the pacing of the IP packets.

SMPTE ST 2110-21 Traffic Model Requirements

The traffic model defined in SMPTE ST 2110-21 specifies a timing model for senders and receivers of video streams encapsulated using the Real-Time Protocol (RTP). Shaping the traffic (i.e. sending data packets at equidistant intervals) will minimise Packet Delay Variations (PDV), which could lead to increased latency or, even worse, dropped packets. The closer to the physical hardware the sender operates, the easier it is to create perfectly shaped traffic (e.g. FPGA based implementations). If the sender is a piece of software installed on an Operating System (OS), in a virtualised environment or not, it abstracts to a certain extent from the driver of the network card and therefore has less control over the traffic shaping. At times, the OS must handle and prioritize other tasks besides the video application. In a virtualized system, this control is even further removed from the hardware.

Diagram 1 - Packet pacing distribution.

Diagram 1 - Packet pacing distribution.

The system wide level of accuracy required to accomplish the targeted packet pacing requirements can only be reached with specific hardware support in the end node so that the timestamps for PTP messages carrying time information as well as the RTP messages within an ST 2110 flow should be taken as close as possible to the physical channel. If the accuracy requirement is violated, the integrity of an ST 2110 flow can be disrupted transiently or even permanently. In a COTS environment, the Network Interface Card (NIC) must therefore provide hardware timestamping support, unfortunately this alone isn’t sufficient to maintain the requested accuracy. “Intelligent” NICs, with onboard acceleration engines to permit switching and packet processing that offloads the host CPU from a number of network related functions is also required. This becomes more critical as the interface speed increases and the number of functions being performed by the host CPU form a performance bottleneck. These cards allow for specific queues to be assigned to specific CPU cores thereby improving performance whilst reducing latency to sub-microsecond with the right class of server hardware. They provide the basis for high-performance packet pacing that enables ST 2110-21 Narrow profile to be achieved with such NICs. Further details can be found in the “The art of conforming to SMPTE 2110-21 traffic model” NAB 2018 BETIC paper [1].

While these NICs may provide high throughput, there is a need for this Input/Output (I/O) performance to be passed on to the application(s) sitting on top of the Operating System (OS). This requires a capability called “kernel bypass” that will reduce the OS kernel overhead by enabling the applications to directly access the network adapter resources. The multiple kernel bypass implementationsdeveloped by the industry focus around high throughput, reduced latency and reducing CPU usage.

Diagram 2 – kernel bypass mode.

Diagram 2 – kernel bypass mode.

The last piece is the Virtual NIC or vNIC capability that isolates the PCI Express resources for sharing the NIC in a virtual environment. By doing so, multiple Virtual Machines (VMs) access the same NIC in a secured manner as if each VM owned its own NIC and can be enabled by Single Root Input/Output Virtualisation (SR-IOV). They are presented to the firmware and operating system as separate devices. Combined with the hardware timestamping and intelligent offload capabilities described above, the vNICs allow for these features to be shared amongst multiple VMs.

Server And OS Requirements

To the contrary of many other types of workloads, the real-time nature of the ST 2110 media flows requires tuning of the typical default settings that are common to general purpose COTS servers. Therefore, the servers must be specifically configured with this in mind. Such configuration settings are common for high-performance workloads across multiple industries and not specific to broadcast media.

It is particularly important to disable any power savings mode(s) that would otherwise impact the performance of the processing of network traffic at the lowest levels of the hardware and systems by sending the CPU to “sleep” and/or enabling energy saving modes. Additionally, in multi-core systems, it is key to make sure that the application processes that are running on the host are tied to the CPU and memory that are on the same Non-Uniform Memory Access (NUMA) as the PCIe bus where the NIC is located. The same applies to memory interleaving settings. Else you may end up having resources taken from another NUMA and degrading the performance of the overall real-time scheduler. Further information about these configurations can be found in the “Measurement Methodology and Real-World Compliance Results for ST 2110-21 Devices” NAB 2019 BEITC paper [2].

Virtualised Environments

The key element to keep your mind on when designing PTP in a virtualised environment is to avoid having concurrent instances of PTP running in the different virtualised machines. This will cause additional and unrequired load on the system, and more to the point, will impact accuracy since multiple instances, possibly with different configurations or even worse multiple PTP stacks with different implementations (servo design, filters) will cause havoc to your system performance.

There should always be one, and only one PTP instance per system, accessed directly or indirectly via all the different applications sitting across the different virtualised instances hosted on that system. Drilling down into the specifics of each virtualised environment, operating system, hypervisor, the system owner must keep in mind that a single representation of the PTP Hardware Clock (PHC) is the key to stability and accuracy.

Each of the virtualised solutions offer different means of exposing the NIC resources, PHC and interface to access the PTP stack in order for the system and, more to the point, the applications can access the required timing information.

Therefore, a certain complexity may be perceived, none that should be unfamiliar with system architects and administrators who deal with virtualised systems in high performance workflows. Fortunately to ease some of this work, there are industry solutions that provide the required framework to address part or all of this complexity and allow application vendors to build cross platform (i.e.: Windows and Linux) solutions that can leverage these high performance NICs, address the accuracy constraints and deliver IP packets according to the ST 2110-21 traffic profile requirements, even matching the most stringent “Narrow” profile definition.

Industry Validation Via The JT-NM Tested Program

Earlier this year, the Joint Taskforce on Networked Media (JT-NM) orchestrated a testing event during which a number of vendors submitted their equipment to demonstrate ST 2110 related capabilities [3]. In the mix of the numerous tested devices, there was a software-based solution built around COTS NIC. The Mellanox Rivermax software library running on top of the ConnectX-5 NIC passed the relevant tests demonstrating that building ST 2110 solutions on top of COTS hardware is possible. Additionally, during NAB 2019, a live demonstration of the virtualised version was demonstrated.

Conclusion

Timing is of the essence for media flows. A detailed understanding of PTP, COTS hardware capabilities, server and OS parameters are required. None which are specific to the media industry, these are all principals that are applied across many industries that require high performance compute and networking capabilities.

References:

[1] “The art of conforming to SMPTE 2110-21 traffic model”, NAB BEITC 2018

[2] “Measurement Methodology and Real-World Compliance Results for ST 2110-21 Devices”, NAB BEITC 2019

You might also like...

Designing IP Broadcast Systems

Designing IP Broadcast Systems is another massive body of research driven work - with over 27,000 words in 18 articles, in a free 84 page eBook. It provides extensive insight into the technology and engineering methodology required to create practical IP based broadcast…

NDI For Broadcast: Part 3 – Bridging The Gap

This third and for now, final part of our mini-series exploring NDI and its place in broadcast infrastructure moves on to a trio of tools released with NDI 5.0 which are all aimed at facilitating remote and collaborative workflows; NDI Audio,…

Microphones: Part 2 - Design Principles

Successful microphones have been built working on a number of different principles. Those ideas will be looked at here.

Expanding Display Capabilities And The Quest For HDR & WCG

Broadcast image production is intrinsically linked to consumer displays and their capacity to reproduce High Dynamic Range and a Wide Color Gamut.

Standards: Part 20 - ST 2110-4x Metadata Standards

Our series continues with Metadata. It is the glue that connects all your media assets to each other and steers your workflow. You cannot find content in the library or manage your creative processes without it. Metadata can also control…