Standards: Part 7 - ST 2110 - A Review Of The Current Standard

Of all of the broadcast standards it is perhaps SMPTE ST 2110 which has had the greatest impact on production & distribution infrastructure in recent years, but much has changed since it’s 2017 release.

This article is part of our growing series on Standards.
There is an overview of all 26 articles in Part 1 - An Introduction To Standards.

ST 2110 facilitates the transport of live streams of media content around a broadcast production environment as a replacement for SDI cabling and to enable entirely new production workflows.

ST 2110 is targeted at that part of a broadcast production architecture that needs to send and receive live video feeds. These often come from external sources such as sports, entertainment or news events. Outgoing feeds from playout systems are delivered to distribution outlets. The quality of the content is paramount. Available network bandwidth should not be a limiting factor. Streaming services operate under a different regime where video quality may be compressed to a lower quality so it can work within a limited network capacity.

This time we examine the individual parts of ST 2110 that are currently available. There isn't sufficient space to delve deeply into them all but an overview will cover the more important characteristics. We will come back to deal with them individually later on.

There Are Lots Of Moving Parts

A search for information about ST 2110 reveals many articles that were published immediately after it was introduced in 2017. That early material focuses on the original four core parts of the standard. This was a starting point and some observers suggested the standard was incomplete at that time. From the perspective of 2017, that may be a fair comment because they could not predict how things would evolve over the next 7 years. The situation is quite different now with many additional parts now available:

Document	Description
OV 2110-0	Overview.
ST 2110-10	System Timing.
ST 2110-20	Uncompressed Video.
ST 2110-21	Traffic Shaping for Video.
ST 2110-22	Constant Bit-Rate Compressed Video.
RP 2110-23	Single Video Transport split over Multiple Streams.
RP 2110-24	Special Considerations for SD Video.
RP 2110-25	Measurement Practices.
ST 2110-30	PCM Digital Audio.
ST 2110-31	AES-3 Transport.
ST 2110-40	SDI Ancillary Data transport.
ST 2110-41	Fast Metadata Transport.
ST 2110-42	Fast Metadata Formatting.
ST 2110-43	Transport of Timed Text Captions and Subtitles.

SMPTE standards are generally free of any patent encumbrance. That is not to say there are no patents involved, but they are very rare compared with ISO and MPEG standards for example.

Useful Standards From Other Organisations

To judge any standard to be incomplete on its own is unreasonable. None of the standards we routinely use exist in isolation. Combining standards from different organisations is often necessary. This is evident when you read the ST 2110 overview document (OV 2110-0). Standards are constantly evolving as their working groups discover shortcomings. These are then addressed and the standards are republished with enhancements.

SMPTE manages many other standards that we also need. Other organisations such as AMWA look after NMOS which supplements ST 2110. Precise timing is based on work by IEEE and ISO contributes MPEG and JPEG based video coding tools. Additional help comes from the EBU, IETF and W3C.

Standards bodies don't compete with each other. In fact, they collaborate very closely and share their expertise through liaisons at the working group level. Each standards body has a unique pool of experienced engineers and scientists who specialise in a particular area.

The Networked Media Open Specifications (NMOS) add discoverability, registering of devices, connection and metadata management within the ST 2110 network.

The video and audio formats are not hardwired. Different image sizes and frame-rates including HD and UHD can be supported.

Part 22 describes how to use other video codecs to compress the media. Part 31 supports arbitrary compressed audio formats carried as AES-3 data payloads.

Separated Elementary Streams

The ST 2110 standard mandates that all the elementary streams are de-embedded and delivered individually. This is useful within a production environment. The content is assembled into RTP packets that are time-stamped and synchronised to a precise system-wide clock.

Each stream is delivered to a different IP address. Arguably, this is very wasteful of IP addresses which may be at a premium depending on your sub-net class. They might also need to be mapped to the same physical destination hardware device which may be awkward when using DNS.

By default, the RTP protocol can use any IP port number between 16384 and 32767. Technical details for IP port number configuration in an ST 2110 context are hard to find. This is something that the working groups could address at some point to avoid the need for using so many IP addresses.

Transport Protocol

The networking uses the IETF Real Time Transport Protocol (RTP) with the packets delivered via UDP. The UDP protocol reduces latency because there is no buffering. TCP would be unsuitable in this context. There is no handshake protocol with UDP, so dropped or out of sequence packets might be lost.

ST 2110 manages this potential UDP packet loss with Forward Error Correction (FEC). This sends additional data as a small overhead which is sufficient to fix minor errors in the UDP packet sequence. The ST 2022-7 document describes the FEC technique in more detail.

OV 2110-0 - Overview

Consult the overview document to see the ST 2110 architectural design. It describes the provenance of the standards you need to use:

ST 2110 parts.
Other SMPTE standards.
Standards from other organisations.

The latest version was released in 2018 and needs to be revised to cover the new parts.

ST 2110-10 - System Timing And Definitions

Any system must embody some latency when transmitting material between a source and destination. ST 2110 is designed to minimise the delay. The Precision Time Protocol (PTP) is fundamental to making this all work and is derived from the previously existing IEEE 1588 standard that was developed 10 years before ST 2110.

Part 10 covers the timing and synchronisation of the elementary streams. It covers:

• PTP - Precision Time Protocol based on IEEE 1588 which is globally accurate to a microsecond.
• RTP - Real Time Protocol clock and timestamps are referenced to a common clock.
• SDP - Session Description Protocol signalling requirements.
• UDP - User Datagram Protocol size limits.
• SIP - Session Initiation Protocol.

When converting an SDI stream to ST 2110 IP, time stamps are derived from the frame information in the original source SDI feed and stored in each packet transmitted in the elementary streams. The streams might travel by different routes that have different latency.

Synchronisation of the incoming streams happens at the receiver and PTP ensures that the streams are synchronised regardless of which route was used. Late packet arrivals would imply that the minimum latency is governed by the slowest traffic. PTP would reassemble the packets into the correct sequence but the sequence as a whole might run late.

Read the annex in ST 2022-7 which describes offsetting the streams to maintain synchronisation.

The Joint Taskforce on Network Media (JTNM) suggests these key criteria are important:

Accurate timing and synchronisation within a single site is mandatory.
Provided an incoming feed from an external source can be re-synchronised to the reference clock the individual sites do not need to be so closely time-locked with each other.

ST 2110-20 - Uncompressed Active Video

Part 20 is based on these VSF technical recommendations:

• TR-03 - Technical Recommendation for Transport of Uncompressed Elementary Stream Media Over IP.
• TR-04 - Utilisation of ST-2022-6 Media Flows within a VSF TR-03 Environment.

Uncompressed (lossless) video is delivered using RTP with additional signalling sent via SDP (Session Description Protocol) to tell the destination how to interpret the incoming stream. This is based on earlier work in ST 2022-6. The stream requires enough bandwidth to support a large payload.

The latency depends on how much work is required to convert the incoming source into the elementary streams and unpacking it at the receiver. Audio and Ancillary data are delivered separately.

The RTP payload format is described by IETF RFC 4175 with additional colour sampling modes described by RFC 4421.

ST 2110-21 - Traffic Shaping And Delivery Timing For Video

Traffic shaping is important for managing the available bandwidth where multiple elementary streams are delivered via the same route. Massive bursts of packets could overwhelm a receiver or saturate the available network bandwidth. Wide Area Networks are also prone to Jitter which can affect the PTP resilience.

Part 21 also describes the SDP parameters that signal the timing properties of each RTP stream. there are two alternative scenarios:

Network Compatibility Model - The network will cope with many streams without overflowing the switch buffers or queues.
Virtual Receiver Buffer Model - The sequence of packets will not underflow nor overflow receiver buffers.

The RTP stream senders are defined as one of three basic types. The narrow senders adhere to a more accurate timing resolution and tend to be hardware based. Wide senders are software driven.

Narrow Gapped senders (Type N) may suspend transmission during the vertical blanking interval. Data derived from the vertical blanking period may be transmitted on ANC streams during those gaps.

Narrow Linear senders (Type NL) transmit their packets at a steady and constant cadence.

Wide senders (Type W) are used for software implementations that have more flexible timing than the hardware-based type N and NL senders. Packet times may be earlier or later than a linear sender.

The receivers are designed to handle bursts of packets in different ways. This will reduce the jitter that could disrupt the PTP:

• N - Narrow - Supports bursts of up to 4 packets and is compatible with type N or NL senders.
• W - Wide Synchronous - Supports bursts of up to 20 packets and is compatible with type N, NL and W senders provided they are locked to the same common time-source used by the receiver.
• A - Wide Asynchronous - Similar to the type W receiver but does not need to be locked to the same time-source as the sender.

ST 2110-22 - Constant Bit-rate Compressed Video

Originally ST 2110 only supported the uncompressed video described in part 20. Part 22 adds constant bit-rate compressed video using one of the registered codecs. Other acceptably fast codecs may be registered in due course:

VC-2 (AKA Dirac).
JPEG-XS.

The compressed output is lossy but visually undamaged. The losses are small enough that chroma-keying is unaffected. These are fast codecs with a coding latency of less than 1mS. This is equivalent to just a few lines of video. The compression ratio and hence the reduced network bandwidth depend on the complexity of the content. VC-2 has quoted a compression ratio better than 5:1 and JPEG-XS might be between 10:1 and 20:1.

If these compression ratios are achievable, then it resolves one of the major bandwidth limitation criticisms levelled at the circa 2017 (four-part) ST 2110 design.

The latency needs for ST 2110 probably rule out using AVC and HEVC codecs due to the time it takes to compress video with them. With advanced hardware solutions, you may be able to reduce their latency to an acceptable level.

RP 2110-23 - Uncompressed Active Parallel Video

Part 23 extends part 20 to describe how to split a high-bandwidth video stream into multiple lower bandwidth streams. This is similar to how a high speed 'Supermotion' camera operates. The output streams have lower frame-rates and are created according to several different interleaving decompositions. The same net amount of data is moved via several alternative routes.

These points are addressed:

Several alternative decomposition methods.
Grouping of multiple streams.
Signalling the stream availability.
SDP declarations.
Addressing conventions.
RTP time-stamping constraints.

The receiver composes the frames from the incoming streams to recreate the footage at the original frame-rate.

Note that this only works with uncompressed streams. It would be impractical to use this with previously compressed video streams.

RP 2110-24 - Special Considerations For SD Video

Part 24 refines the definition of the pixel sampling in a line of Standard Definition (SD) video in the source material. It accounts for the following:

• 525 vs 625 lines.
• 4:3 vs 16:9 pixel aspect ratios.
• Horizontal image size.
• Additions to ST 125 that describe the original analogue source video timings.

Part 20 should be reviewed before studying this document as it relates to the formatting of the line-by-line video content into RTP packets.

RP 2110-25 - Measurement Practices

The European Broadcasting Union (EBU) has contributed testing protocols to facilitate the integration of systems from different manufacturers. Use these protocols to monitor and analyze the traffic to indicate failures. This is particularly helpful when analyzing how the content is buffered.

Part 25 describes this EBU research and provides insights about the internal structures and processes within ST 2110. It also provides example code.

Probing tools must not disturb or slow down the flow of the network packets.

ST 2110-30 - PCM Digital Audio

The AES-67 standard describes Pulse Code Modulated (PCM) audio. Part 30 describes how to package uncompressed AES-67 samples into RTP packets for transport on the IP network.

The audio is extracted from the Horizontal Blanking data space which carries audio signals and other ancillary data as well. Only the audio is encapsulated in the Part 30 elementary stream.

The number of supported channels can range from 1 to 64 depending on the timing of the packets and the sample rate. SDI can only accommodate 16 channels.
A default sample-rate of 48 kHz is mandated. The only other alternatives are 44.1kHz and 96kHz.
Sample sizes of 16 or 24 bits are supported.
A packet time of 1mS or 125µS is allowed.
Packet time and sample-rate determine how many channels can be carried.

The conformance level described by the standard determines how the incoming packets will be reassembled into channels.

For example, conformance level CX can reconstruct 32 channels of 96kHz audio when the packets are spaced at 125µS intervals but if the sample rate is reduced to 48kHz, it can then support 64 channels. Packet spacing at 1mS intervals is exactly 8 times slower and the maximum number of channels should therefore be divided by 8 to compensate.

Multi-channel mapping is managed by the Channel Order Convention delivered via SDP. The following types can be specified:

Symbol	Description of group	Channels
M	Single Mono channel.	1
DM	Dual Mono.	2
ST	Standard Stereo pair.	2
LtRt	Matrix Stereo.	2
51	5.1 Surround.	6
71	7.1 Surround.	8
222	22.2 Surround.	24
SGRP	One SDI audio group.	4
U01…U64	Undefined group of {nn} channels.	1 to 64

Note that because an incoming SDI feed can only deliver 16 audio channels, the 222 symbol cannot be used. Neither can undefined group values U17 to U64.

ST 2110-31 - AES-3 Transparent Transport

Part 31 extends part 30 to allow the carriage of AES-3 payloads in RTP streams. SMPTE standard ST 337 is a general-purpose encapsulation format for AES-3 payloads. There are 32 basic data-types described by ST 338. The ST 337 payload format can be extended to support additional data-types if you need them. The payloads can carry audio or other kinds of data.

Here are a few of the supported data-types:

Index	Data Type
1	ST 340 compliant AC-3 audio.
2	Timestamp.
5	MPEG audio (including MP3).
10	MPEG-4 AAC audio.
11	MPEG-4 HE-AAC audio.
27	SMPTE KLV formatted data.
29	Caption data.

Note that you may need to deploy hardware coding in order to reduce the latency sufficiently to use MP3, AAC and HE-AAC codecs for the audio.

ST 2110-40 - Transport Of Ancillary Data (Conforming To ST 291-1)

Part 40 describes how to carry ST 291 compliant ancillary data transmitted during the horizontal and vertical blanking intervals:

Closed captioning.
Subtitles.
Timecode.
Metadata.
Slate information.
Teletext.
Copy prevention signals.
Vertical Interval Test Signals.

There are two parts of the ST 291 standard:

• ST 291-1 - Ancillary data packet and Space Formatting
• RP 291-2 - Ancillary data packet payload formats

Vertical blanking is organised as lines of TV signals carrying a variety of test and other data. It is located before and after the active-image area. A safe area reserves some lines for special purposes. Vertical blanking does not include the horizontal blanking time which is located at the start of all lines whether they are visible or not.

Any audio found in the HANC area will be delivered according to Part 30. The remaining ancillary data will be delivered in the part 40 ANC stream. If there is no ancillary data to send, an empty 'keep-alive' heartbeat packet will be transmitted at least once for every field frame or segment. ANC packets should be transmitted within 1mS of the data appearing in the SDI feed.

The data arriving from an SDI feed should be identifiable using the identifying codes in ST 352. That can be used to determine the mapping of RP 291-2 data-types for the ANC packets.

Consult IETF RFC 8331 for the RTP payload definition for ST 291 ancillary data.

Note that an errata document has been published for ST 2110-40 (ed 2023) which describes some typographical errors.

ST 2110-41 - Faster Metadata eXpress (FMX) Transport

Part 41 specifies how to transport Faster Metadata (FMX) that did not originate from an ST 291 compliant source. Obtaining signalling data from a separate data stream is easier than unpacking video streams.

This document is still being worked on but is due to be released very soon. There is some outstanding work required to define the payload formats to be carried by this transport.

The advantages of FMX over ST 2110-40 + ST 291 are:

Allows more sophisticated metadata schemas than SDI.
Supports larger amounts of metadata than SDI.
Single level of encapsulation.
Uses a Key-Length-Value (KLV) packaging model.
Extensible beyond what we currently think we need.
More easily parsed.
Can be delivered independently of an elementary stream.
Can be synchronised with the system clock more easily.
Payload can be encoded as XML, JSON or any other format.
More efficient use of IP address space.

ST 2110-42 - Fast Metadata eXpress (FMX) Payload Format

Part 42 was intended to describe the object data format and parameter values being carried by the FMX transport defined in part 41. For example, these metadata items have been suggested:

Metadata item	Defined in Part
File Multicast Transport Protocol (FMTP) parameters for the stream.	20
Packetisation time.	30 & 31
Number of channels.	30 & 31
Video format tag.	40
Video Payload ID (VPID).	40
AMWA Sender ID.	All
AMWA Flow ID.	All

According to the quarterly reports from SMPTE, this document was hibernated after the draft was completed. The eventual release strategy has not been publicised.

ST 2110-43 - Timed Text Mark-up Language

Part 43 describes how Captions and Subtitles are carried using Timed-Text-Mark-up-Language. version 2 (TTML2). These are related to WebVTT captions used in Internet video streaming which are described by W3C recommendations. The content might be based on ST 291 ANC data extracted from the vertical blanking interval.

Conclusion

It is important to keep an open mind and use the standards pragmatically where they are best suited to your architectural design. You don't need to implement all of ST 2110, nor should you exclude standards from other organisations.

You should be able to cherry-pick the best solutions. In 2018, the BBC R&D engineers suggested that a hybrid approach was called for when you need to interoperate with cloud-based systems.

Media identification, timing and synchronisation data needs to be preserved when content is moved outside of an ST 2110 environment and reliably restored when it is brought back in. This is no different to any other integration scenario where information loss is always undesirable.

Although there is a lot of helpful information on the Internet. Be wary of taking everything at face value. A lot of ST 2110 coverage was written when it was first introduced and less complete than it is now. Always refer back to the latest editions of the original source standards documents as an authoritative source. Wikipedia is also very reliable. It depends on industry experts to bring it up to date. Their time is precious and updates can take a little while before they are edited in. Try and check your research against several sources. That often reveals the missing information that is not present in the others.

Obtaining the SMPTE standards has become much easier lately. The standards documents can be downloaded free of charge if you are a member. Joining is very simple and the cost of membership is equivalent to purchasing several standards as a non-member. Obtain all the SMPTE standards you need for the price of a single year's membership.

These Appendix articles contain additional information you may find useful:

Part of a series supported by

You might also like...

HDR & WCG For Broadcast: Part 3 - Achieving Simultaneous HDR-SDR Workflows

Welcome to Part 3 of ‘HDR & WCG For Broadcast’ - a major 10 article exploration of the science and practical applications of all aspects of High Dynamic Range and Wide Color Gamut for broadcast production. Part 3 discusses the creative challenges of HDR…

The Resolution Revolution

We can now capture video in much higher resolutions than we can transmit, distribute and display. But should we?

Microphones: Part 3 - Human Auditory System

To get the best out of a microphone it is important to understand how it differs from the human ear.

HDR Picture Fundamentals: Camera Technology

Understanding the terminology and technical theory of camera sensors & lenses is a key element of specifying systems to meet the consumer desire for High Dynamic Range.

Demands On Production With HDR & WCG

The adoption of HDR requires adjustments in workflow that place different requirements on both people and technology, especially when multiple formats are required simultaneously.