Designing IP Broadcast Systems: System Glue

When we think of glue in broadcast infrastructures, we tend to think of the interface equipment that connects different protocols and systems together. However, IP infrastructures add another level of complexity to our concept of glue.

Color converters, HD-to-SD converters, and frame rate synchronizers are just a few of the examples that describe glue products in broadcasting. IP is not only concerned with the video and audio essence conversion and processing, but also the data link and transport layer exchanges that we have historically taken for granted.

IP systems are powerful because the IP packet is data link layer agnostic. That is, an IP packet can traverse many different data link types without the user knowing and without human intervention. However, to be able to take advantage of this, broadcasters must transfer SDI, AES and MADI into IP in the first place.

There is a school of thought that suggests once a media signal has been transferred into IP, then it can stay in that format. In the utopian world there is an element of truth in this as cloud computing can now process video and audio in real-time and deliver it to the viewer over the internet. But even with this in mind, the media essence must be converted into IP in the first place.

Timing Matters

Within the context of a broadcast facility or tier-1 datacenter, IP packets are distributed throughout the network asynchronously. Although ST2110 evenly distributes packets, it still does this asynchronously with respect to all the other video and audio sources in the facility. Therefore, the data link layer can no longer be used as the timing reference plane, unlike SDI, AES and MADI. The sampling clock is built into the SDI, AES and MADI signal data link layer thus keeping the signal synchronous and low latency. As the data samples are guaranteed to arrive in a timely fashion then traditional SDI/AES/MADI infrastructures do not need to be too concerned with latency.

The abstraction of the media essence from the timing plane is a great bonus for IP as it means we do not have to impose synchronous methodologies onto asynchronous networks at the data link layer. In the context of broadcast infrastructures this is often ethernet or FDDI. However, both video and audio are time invariant sampled systems which means the samples that make up the images and sound must be played back at exactly the sample rate with which they were acquired. In today’s digital networks we don’t have to be too concerned with the pixel accuracy, as there is an abundance of available and affordable memory buffers that iron out any packet jitter or packet order, but we do need to be concerned with frame accuracy for both video and groups of audio samples.

SMPTE have solved the timing plane challenges with PTP and timestamping frames of video and audio within the studio. This isn’t meant to impose any form of timing constraint on the processing but is designed to allow the playback system to reconstruct the time invariant sample temporal distance so that reliable digital-to-analog conversion can be achieved. After all, the human visual and auditory systems require analog signals for our eyes to see and our ears to hear.

When converting SDI/AES/MADI signals to IP within the studio then the embedded sample clock system provided by the respective data link layers is replaced by systems such as PTP and this is an inherent requirement of SDI/AES/MADI to IP converters.

One completely valid question is why bother with the complexity of using PTP at all? There is an argument to suggest that we no longer need pixel accurate timing but can rely on frame alignment instead. If we don’t need the pixel accurate timing and are more focused on frame alignment, then other protocols that encapsulate the timing plane could also be considered, these include NDI, WebRTC, RTMP, RIST, and SRT.

Latency Matters

Converting to IP from SDI/AES/MADI still requires the timing plane to be maintained but not necessarily as accurately as PTP demands. One of the key reasons for this is latency. Within a studio environment latency and packet loss is of great concern to us. Some would argue that packet loss is the greatest problem to solve as the need to resend packets increases latency. To a certain extent, this becomes a decision for the broadcaster and their use-case.

There are many boxes in the marketplace that allow conversion from SDI/AES/MADI to IP but the devil is in the detail. Is the broadcaster able to accept some latency? If so, how much? Do they really need the sub-nanosecond accuracy that PTP and ST2110 delivers or are they able to accept milliseconds of latency and compression? Understanding which glue boxes to buy is secondary to determining these types of decisions.

Controlling Packet Loss

Another challenge occurs that has a major impact on latency and therefore the glue components needed is the type of IP employed. Is the broadcaster using UDP or TCP? TCP will guarantee packet delivery but at the expense of intermittent and variable latency. UDP achieves significantly lower latency but cannot guarantee packet delivery. Therefore, the broadcaster must use some method of Forward Error Correction or packet resend to overcome the problems of packet loss.

ST2110 provides FEC to protect against short bursts of errors but any long-term errors will result in loss of data and hence video and audio corruption. Protocols such as SRT and RIST use UDP but employ a method called ARQ (Automatic Repeat request). ARQ is designed to maintain low latency even when packets are lost and achieves this by using UDP, but there is a twist. Groups of packets are sent containing the video, audio, and metadata essence, and if any are lost during transfer then the receiver will request them to be resent, thus maintaining data accuracy. So, if ARQ keeps data integrity high then this leads to the question of why use SRT/RIST instead of just TCP? The main reason is that ARQ protocols such as SRT/RIST incur less latency than TCP, and SRT/RIST further adds a layer of encryption and transport timing to the video and audio essence, thus providing a complete solution.

Figure 1 – When a network link is congested, the data throughput rapidly decreases even when the data rate is high.

Figure 1 – When a network link is congested, the data throughput rapidly decreases even when the data rate is high.

This still doesn’t explain why ARQ-type protocols generally deliver lower latency than TCP. Fundamentally, this is down to the adoption of congestion control in TCP which solves one problem at the expense of creating variable and intermittent latency. Back in the 1980s researchers discovered that data flows between two buildings on a university campus were showing high data rates but low data throughput, that is, to all intents and purposes the data link looked as if it was working, but wasn’t because the actual data, such as an FTP or web page update wasn’t being delivered. After some detailed investigation (which was difficult at the time due to the lack of data analysis tools), the researchers discovered that the multiple TCP algorithms within several servers had effectively synchronized and were resending lost packets simultaneously, lost packets resulted in resends which added to the congestion, thus generating high data rates but low data throughput, as seen in Figure 1. This was fixed by adding a randomized and ramping resend backoff algorithm called congestion control, which meant that no one TCP instance was trying to claim all the data bandwidth of the link. The data rate stayed the same, but the data throughput returned to its optimum level, and the data throughput problem was fixed, however, this also resulted in increased and indeterminate latency.

Congestion control is a major component of TCP but does have the potential to add massive and variable latency to data flows. ARQ-type protocols do not necessarily have congestion control built into them and consequently have the potential to overflow a link if not used carefully. In managed networks, such as those found in studios, this isn’t so much of a problem, however, when using the public internet or shared services then great care should be exercised when switching off congestion control and rate shaping should be considered.

Other protocols such as NDI are making a big impact on professional broadcasting, although NDI is a proprietary solution it is free to use. NDI provides an SDK (Software Development Kit) which defines an API to allow vendors to build their own products and use NDI as a transport mechanism. Unfortunately, this means the detail of how NDI operates is hidden from the user. However, there is an argument to suggest this is a great strength as nobody can tinker with the code.

When deciding which glue components to procure for an IP broadcast facility, whether on-prem managed datacenters, public cloud, or a hybrid of the two, the broadcast engineer needs to first decide on the details of how their infrastructure will use IP. Decisions such as how much latency and packet loss can be tolerated need to be agreed, and whether video and audio compression can be used. Only when the fundamental parameters have been decided, can the question of which type of glue components can be made.

Part of a series supported by

You might also like...

HDR & WCG For Broadcast: Part 3 - Achieving Simultaneous HDR-SDR Workflows

Welcome to Part 3 of ‘HDR & WCG For Broadcast’ - a major 10 article exploration of the science and practical applications of all aspects of High Dynamic Range and Wide Color Gamut for broadcast production. Part 3 discusses the creative challenges of HDR…

The Resolution Revolution

We can now capture video in much higher resolutions than we can transmit, distribute and display. But should we?

Microphones: Part 3 - Human Auditory System

To get the best out of a microphone it is important to understand how it differs from the human ear.

HDR Picture Fundamentals: Camera Technology

Understanding the terminology and technical theory of camera sensors & lenses is a key element of specifying systems to meet the consumer desire for High Dynamic Range.

Demands On Production With HDR & WCG

The adoption of HDR requires adjustments in workflow that place different requirements on both people and technology, especially when multiple formats are required simultaneously.