Standards: Part 15 - ST2110-2x - Video Coding Standards For Video Transport

SMPTE 2110 and its related standards help to construct workflows and broadcast systems. They coexist with standards from other organizations and incorporate them where necessary. In an earlier article we looked at the ST 2110 standard as a whole. This time we focus on the parts of ST 2110 that are concerned with video transport.

This article is part of our growing series on Broadcast Standards.
The first 26 articles are now available in Broadcast Standards – The Book.

The ST 2110 Standards Family

The ST 2110 standards are organized into these groups:

ST 2110-1x - Systems layer.
ST 2110-2x - Video content.
ST 2110-3x - Audio content.
ST 2110-4x - Ancillary data, metadata and synchronized text.

ST 2110 reserves the range 20 to 29 for video topics. Part 20 is the most complex and the others enhance it for specific situations.

In addition to the main standards which are prefixed with ST, there are some helpful recommended practices which often go unnoticed. These are numbered consistently with the ST documents but have an RP prefix. Draft standards are occasionally published first as RP documents and then promoted to ST status when they are ratified.

Document	Latest	Description
ST 2110-20	2022	Uncompressed Active Video.
ST 2110-21	2022	Traffic Shaping and Delivery Timing for Video.
ST 2110-22	2022	Constant Bit-Rate Compressed Video.
RP 2110-23	2019	Single Video Essence Transport over Multiple ST 2110-20 Streams.
RP 2110-24	2023	Special Considerations for Standard Definition Video Using SMPTE ST 2110-20.
RP 2110-25	2023	Measurement Practices.

The SMPTE standards documents have a familiar structure with a glossary of terms followed by textual conventions and notation before the main body.

SMPTE standards are less complex than MPEG standards and consequently easier to read and understand.

ST 2110-20 - Uncompressed Active Video

Production workflows mandate that quality is more important than bandwidth efficiency. Hence, ST 2110 part 20 describes the delivery of uncompressed video via RTP streams over an IP network.

The 2017 edition of ST 2110-20 has been superseded by the 2022 edition. A side-by-side comparison of both editions reveals a few differences:

Alpha channel support has been added.
Signaling for the ST 2115 LOGS3 transfer characteristic is introduced.
Some bibliographic references have been updated.
Improved descriptions based on the Protocol Implementation Conformance Statement (PICS) for ST 2110 (which is a work in progress).
Some details from ST 2110-21 have been incorporated.

Note: The document with a footer showing the approval date 28-03-2022 contains typographical errors and bibliographic references. These have been corrected in the document republished on 14-12-2022 which is the definitive version.

The Video Services Forum have published additional useful guidance as Technical Recommendation TR-05. This was released in 2018 and is based on the 2017 edition of ST 20110-20. Read this carefully because it pre-dates the revised 2022 edition of 2110-20.

RTP Packet Structures

The RTP packet structure is described in detail in section 6. Refer to IETF RFC 3550 to see this in context. The transmission is affected by traffic shaping considerations covered in ST 2110-21. The Virtual Receiver Buffer model was covered in an earlier article in this series. To recap, the senders will be one of these three types:

Type N - Narrow Senders.
Type NL - Narrow Linear Senders.
Type W - Wide Senders.

They are also further described in Appendix A of VSF TR-05.

Recall that for ST 2110-20, only the active picture area data is transmitted. Audio and Ancillary data are delivered separately.

The RTP payload format is described by IETF RFC 4175.

Some signaling information directly relevant to the essence is carried with it in the RTP packets. The packets are timestamped so the essence can be reconstructed reliably from an ordered sequence of packets. Other supporting metadata is transmitted separately as Session Description Protocol (SDP) streams.

The start and end of each progressive frame or interlaced field are marked in the RTP packet header. The receiver can detect these boundary markers and initiate a picture reconstruction. These must be interpreted carefully when dealing with progressive vs interlaced content as their behavior is context dependent.

For progressive video frames, the Field ID marker acts as a Frame Start indicator. The last packet carries an End of Frame marker:

The structure of an interlaced frame is more complex and the two fields are marked individually:

Restoring the field dominance correctly is important to avoid introducing jitter when the streams are processed back into video frames.

Progressive Segmented Video (PsF) transports the picture as if it were interlaced but the picture can be reconstructed as a progressive image in the receiver. This allows progressive material to be processed through processes that only understand interlaced formats.

The RTP payload is a collection of Pixel Group (pgroup) data structures. A pgroup is a run of pixels managed as a block. It allows picture areas to be defined as non-rectangular shapes. This is possibly the most complex part of ST 2110-20. Read about the RTP payload in section 6.2 then read TR-05 for additional insights and example calculations. The construction of a pgroup is somewhat complex and depends on the chroma-sampling model being used.

The header of the payload describes where in the picture rectangle the pgroup should be placed. The line number and offset describes a pixel accurate location. This is most often used to insert a channel-identifying graphic. Using a pgroup significantly reduces the amount of data being transmitted if the stream only contains a sub-raster (overlay) with most of its picture area unoccupied.

ST 2110-21 - Traffic Shaping & Delivery Timing

Traffic shaping manages the available bandwidth where multiple elementary streams are delivered via the same route. If all the streams burst at the same time, the network would become saturated and the streams would break down due to packet loss.

Traffic shaping also helps to avoid packet jitter where the packet delivery times become irregular. Jitter causes problems with the precision timing of packets. Part 21 describes two alternative scenarios:

Network Compatibility Model - The network will cope with many streams without overflowing the switch buffers or queues. This relies on a receiver buffer being drained at least as fast as new packets are arriving.
Virtual Receiver Buffer Model - The sequence of packets will not underflow nor overflow receiver buffers because they enter the buffer and leave it immediately. This also relies on packets arriving in a timely manner so the buffer is never empty when the next packet needs to be accessed. Because packets are taken from the buffer at regularly timed intervals, the jitter is eliminated.

ST 2110-22 - Constant Bit-rate Compression

ST 2110-22 constant bit-rate delivery is broadly similar to ST 2110-20. Some parameter settings are specific to this format and the video codec is configured for constant rather than variable bit rate.

Although it is designed to be format agnostic, ST2110-22 expects to carry video coded with one of the known registered codecs. Other codecs are likely to be registered in due course:

Be careful to use the correct version of the standard when deploying compatible equipment or software. Specifying ST 2110-22 alone is not enough without indicating which edition is being deployed. The ST 2110-22:2019 and ST 2110-22:2022 documents are different editions of the same standard with slightly different coding strategies.

As well as a few corrections, the 2022 revision clarifies the requirements for traffic shaping and delivery timing. The sender type N option is also added. In order to distinguish the version, the SMPTE Standard Number (SSN) is now included in the SDP signaling metadata to distinguish between these two variants.

RP 2110-23 - Transport Over Multiple Streams

If a single network link has insufficient capacity, ST 2110-23 describes how to split an uncompressed ST 2110-20 video stream across multiple streams. There are several different decomposition methods available. The same net amount of data would be transmitted but each route could carry a smaller bandwidth stream.

The Session Description Protocol (SDP) describes the decomposition method being used so that the receiver can correctly reconstruct the picture.

This recommended practice document may only be of interest to organizations that are working with very high-resolution images. Image sizes of 4K and higher may require such a huge bandwidth that a single network link is overwhelmed.

The HEVC codec is designed for 8K and can operate up to 16K. Prototype 32K cameras are already being designed that will require very high-performance network support.

ST 425-5 and ST 2082-12 describe how to interleave samples. This allows a lower resolution preview image to be reconstructed from any single stream. All four streams are required to reconstruct the complete image. ST 2082-12 has a detailed explanation of the Sample Interleaving decomposition technique:

Four stream decomposition.

RP 2110-23 also offers a simpler solution which divides an image into quadrants. This is called Square Division decomposition.

Square Division decomposition.

High data rates also arise from lower definition video running at very high frame rates. Super-motion cameras are used at sporting events for slow-motion action replays. They divide the video along the time axis (Temporal Division decomposition) and deliver the frames in a cyclic manner across several streams. Observing a single stream previews the video at a lower frame-rate.

Temporal Division decomposition.

Note: This only works with uncompressed streams. It would be impractical to use this with previously compressed video streams. However, we might be able to compress the decomposed streams with a fast enough coder. Special attention would need to be paid to avoid frame-to-frame artefacts in the uncompressed result.

RP 2110-24 - Special Considerations For SD Video

Part 24 refines the definition of the pixel sampling for Standard Definition (SD) video. It accounts for the following:

525 vs 625 lines.
4:3 vs 16:9 pixel aspect ratios.
Horizontal image size.
Additions to ST 125 that describe the original analogue source video timings.

RP 2110-24 describes how to map the Image Format Line Numbers specified in ST 125 to correspond with the Sample Row numbers defined by ST 2110-20.

The media type parameters for height and width in ST 2110-20 are derived from the video area defined in ST 125.

Refer to RP 202 which describes how interlaced video must be properly (vertically) aligned prior to compression. This will affect the height value.

RP 2110-25 - Measurement Practices

Diagnostic techniques are based on the same core principles of measurement, observation and divide/conquer, regardless of the type of architecture being inspected. Insert test-probes and analyze network traffic in the same way that you might have used an oscilloscope to analyze the analogue video signals. The tools may be different but the process is the same.

Validate the ingest data at the front of the workflow. Then if it is not arriving at the destination, observe the network traffic at various points along the route to isolate the fault. Tools such as traceroute will present a list of 'hops' which should correspond with the route that you expect.

The European Broadcasting Union (EBU) contributed their testing protocols to the RP 2110-25 authors. These tests expose the internal structures and processes within ST 2110.

RP 2110-25 illustrates how to analyze traffic down to the IP datagram level where the dispatch and arrival times for each packet are visible.

Study this document carefully and use it to develop automated monitoring tools to watch the system continuously. If any measurements depart from the expected nominal values, trigger an alert to warn the ops team so they can intervene. Monitoring tools must run separately from the main workflow and cannot interrupt or delay the packets being transmitted.

Section 4 describes the measurements as they apply to various ST 2110 parts:

Applies to	Section	Measurements described
ST 2110-10	4.7	Systems layer: Since this is an abstract foundation for timing and delivery, there are no specific measurements applicable.
ST 2110-20	4.8	Network Compatibility Model, Virtual Receive Buffer Model, First Packet Time, RTP Offset, Video Latency, Margin between first packet and VRX buffer read, GAP time between fields/frames.
ST 2110-21	4.9	Network Compatibility Model, Virtual Receive Buffer Model.
ST 2110-22	4.10	Network Compatibility Model, Virtual Receive Buffer Model.
RP 2110-23	-	Stream splitting: Additional work needed.
RP 2110-24	-	SD TV: Covered by ST 2110-20.
ST 2110-30	4.11	AES67 Audio: Audio Delay Variance, Packet Interval Time, Audio Latency, Audio Video Differential Latency.
ST 2110-31	-	AES3 Audio: Derive this from the ST 2110-30 measurements.
ST 2110-40	4.12	Ancillary data: First Packet Time, RTP Offset, ANC Latency, ANC Video Differential Latency, Relative RTP Offset, Metadata Margin for reinsertion in SDI.
ST 2110-41	-	Fast metadata: Additional work needed.
ST 2110-42	-	Fast metadata: Additional work needed.
ST 2110-43	-	Timed Text analysis. Derive this from ST 2110-40 measurements.

Example testing code written in the Python language is provided in Appendix B. This will help to analyze how the content is buffered.

Rewrite the tests in a more efficient language for networks having higher throughput. Systems level programming is ideally coded in ANSII Standard C-Language and compiled down to machine code (perhaps with the GNU CC compiler). Languages such as Python, Perl, JavaScript and Java are executed via an interpreter which adds a performance overhead that C-Language avoids.

Session Description Protocol (SDP)

An SDP-based signaling protocol is defined for technical metadata necessary to receive and interpret the streams. ST 2110-20 section 7 describes this SDP metadata which is transmitted separately to the essence. Refer also to RFC 4566. Additional SDP metadata is described in ST 2110-21 as it applies to traffic shaping. This metadata applies to all samples, rows, fields, and frames in a stream.

The metadata is delivered in name-value pairs or just as names which are treated as flags having a default value associated with them:

Here are a few examples. Some of these are mandatory and others depend on the format being delivered. Read section 7 of the standard for details:

Parameter	Description
sampling	Describes the color difference sub-sampling structure. Section 7.4.1 describes the range of alternatives and includes RGB, XYZ and KEY. See SMPTE RP 157 for a description of Key and Fill. Note also that colorimetry can be set to ALPHA when the value ST2110-20:2022 should be used for the SSN value.
depth	The number of bits per sample. This can be 8, 10, 12 or 16 bits. Using 16f indicates a 16-bit floating-point value.
width	The number of pixels per row (1 - 32767).
height	The number of pixel rows per frame (1 - 32767).
exactframerate	Frames per second. This value is complex. Values such as 24 or 25 fps are described as simple integers. Non-integer values are described as one integer divided by another. Thus 29.97 fps is described as 30000/1001.
colorimetry	Describes the system colorimetry used by the sample pixels. If this is set to ALPHA then the SSN value must be set to indicate ST2110-20:2022.
PM	The packing mode as defined in section 6.3.
SSN	The SMPTE Standard Number indicates which version of ST 2110-20 is used. This should be set to ST2110-20:2017 unless colorimetry is set to ALPHA or the TCS value is set to ST2115LOGS3 in which case the value ST2110-20:2022 should be used.
interlace	An optional parameter with no associated value. If present, the video is either interlaced or PsF. Omitting this parameter flag entirely indicates progressive video.
segmented	This modifies the interlace flag to signify PsF rather than interlaced video. It cannot be used on its own. The Interlace flag must always be present as well.
TCS	The Transfer Characteristic System describes color conversion profiles. If the value ST2115LOGS3 is used then the SSN value must be set to indicate ST2110-20:2022.
RANGE	Describes the coding range of the samples when used in combination with the colorimetry value.
MAXUDP	Describes the maximum size of the UDP packets as specified in ST 2110-10.
PAR	The pixel aspect ratio is expressed as two integer values separated by a colon. A default 1:1 square pixel aspect ratio is assumed if this parameter is absent. The 16:9 value would stretch the pixels horizontally to fill a wide screen display.

Observe that the height and width parameters have a range already capable of dealing with 32K video.

The SDP parameters relating to traffic shaping are:

Parameter	Description
TP	Describes the type of sender. The value will be one of N (Narrow), NL (Narrow Linear) or W (Wide).
TROFF	Describes a number of milliseconds offset of the frame start from a reference time.
CMAX	Describes the capacity of the receiver's buffer when the Network Compatibility Model is used.

The N, NL and W sender types are described in an earlier article.

This example was given in the standard to illustrate 10-bit video, 1280x720 @59.94 fps conforming to BT709:

m=video 30000 RTP/AVP 112
a=rtpmap:112 raw/90000
a=fmtp:112 sampling=YCbCr-4:2:2; width=1280; height=720;
exactframerate=60000/1001; depth=10; TCS=SDR; colorimetry=BT709;
PM=2110GPM; SSN=ST2110-20:2017

Other Relevant Standards

Obtain copies of the documents listed in the Normative References section of each standard. IETF RFC documents and ITU recommendations should be easy to acquire. The VSF technical recommendations are also free to access.

The edition column includes a release date where a specific revision of an associated document is referred to. This is not always the very latest version and some problems with integration between systems from different manufacturers may be due to different revisions of the standards being used to design their products.

Org	Standard	Edition	Description
AES	AES-67	2018	AES standard for audio applications of networks - High-performance streaming audio-over-IP interoperability.
AMWA	BCP-002-01	-	Natural grouping of NMOS resources.
ANSI	CTA-608-E R2014	-	Line 21 Data Services for closed captioning and Teletext. Note that the latest version is free but earlier revisions are not.
IETF	RFC 3550	-	RTP: A Transport Protocol for Real-Time Applications.
IETF	RFC 4175	-	RTP Payload Format for Uncompressed Video.
IETF	RFC 4421	-	Additional Color Sampling Modes.
IETF	RFC 4566	-	SDP: Session Description Protocol.
IETF	RFC 4855	-	Media Type Registration of RTP Payload Formats.
IETF	RFC 8285	-	A General Mechanism for RTP Header Extensions.
ISO	ISO 11664-1	2007	CIE standard colorimetric observers.
ITU	Rec 601-7	-	Studio encoding parameters of digital television for standard 4:3 and wide screen 16:9 aspect ratios.
ITU	Rec 709-6	-	Parameter values for the HDTV standards for production and international program exchange.
ITU	Rec 1700-0	-	Characteristics of composite video signals for conventional analogue television systems.
ITU	Rec 1866	-	Reference electro-optical transfer function for flat panel displays used in HDTV studio production.
ITU	Rec 2020-2	-	Parameter values for ultra-high-definition television systems for production and international program exchange.
ITU	Rec 2100-2	-	Image Parameter Values for High Dynamic Range Television for use in Production and International Program Exchange.
SMPTE	ST 125	2013	SDTV Component Video Signal Coding 4:4:4 and 4:2:2 for 13.5 MHz and 18 MHz Systems.
SMPTE	RP 157	-	Key and Alpha Signals.
SMPTE	RP 186	2008	Video Index Information Coding for 525- and 625-Line Television Systems.
SMPTE	RP 187	1995	Centre, Aspect Ratio and Blanking of Video Images.
SMPTE	RP 202	2008	Video Alignment for Compression Coding.
SMPTE	ST 266	2012	SD Digital Component Systems - Digital Vertical Interval Time Code.
SMPTE	ST 291	2011	Ancillary Data Packet and Space Formatting.
SMPTE	ST 425-5	-	Image Format and Ancillary Data Mapping for the Quad Link 3 Gb/s Serial Interface.
SMPTE	ST 435-1	2012	10 Gb/s Serial Signal / Data Interface - Part 1: Basic Stream Derivation.
SMPTE	ST 428-1	2006	D-Cinema Distribution Master - Image Characteristics.
SMPTE	RP 2077	2013	Full-Range Image Mapping.
SMPTE	ST 2022-6	2012	Transport of High Bit Rate Media Signals over IP Networks (HBRMT).
SMPTE	ST 2022-7	-	Seamless Protection Switching of RTP Datagrams.
SMPTE	ST 2065-1	2012	Academy Color Encoding Specification (ACES).
SMPTE	ST 2065-3	2012	Academy Density Exchange Encoding (ADX) - Encoding Academy Printing Density (APD) Values.
SMPTE	ST 2082-12	2019	4320-line and 2160-line Source Image and Ancillary Data Mapping for Quad-link 12G-SDI.
SMPTE	ST 2110-10	2022	Professional Media over Managed IP Networks: System Timing and Definitions.
SMPTE	ST 2115	2019	Free Scale Gamut and Free Scale Log Characteristics of Camera Signals.
VSF	TR-03	-	Transport of Uncompressed Elementary Stream Media over IP.
VSF	TR-04	-	Utilization of ST 2022-6 Media Flows within a VSF TR03 Environment.

Conclusion

In addition to the SMPTE ST standards documents, there are many Recommended Practice (RP) documents that support them. Joining the SMPTE organization as an associate member is not expensive and gives access to all of their standards documents free-of-charge. This is a very worthwhile benefit and SMPTE should be applauded for that gesture. The AES offers a similar incentive.

ST 2110 also relies on many other standards bodies and expert groups. Most of these (other than ISO) also offer their standards documents free of charge.

Looking ahead, there are royalty free open-source codecs on the horizon that will likely be registered within ST 2110. If these encode the video with sufficient performance, they could be used with ST 2110-20. If they provide constant bit-rate variants, they will augment ST 2110-22.

These Appendix articles contain additional information you may find useful:

Part of a series supported by

You might also like...

Building Hybrid IP Systems

It is easy to assume that the industry is inevitably and rapidly making the move to all-IP infrastructures to leverage IP’s flexibility and scalability, but the reality is often a bit more complex.

Microphones: Part 9 - The Science Of Stereo Capture & Reproduction

Here we look at the science of using a matched pair of microphones positioned as a coincident pair to capture stereo sound images.

Monitoring & Compliance In Broadcast: Monitoring Cloud Networks

Networks, by their very definition are dispersed. But some are more dispersed than others, especially when we look at the challenges multi-site and remote teams face.

Audio At NAB 2025

Key audio themes at NAB 2025 remain persistently familiar – remote workflows, distributed teams, and ultra-efficiency… and of course AI. These themes have been around for a long time now but the audio community always seems to find very new ways of del…

Remote Contribution At NAB 2025

The technology required to get high quality content from the venue to the viewer for live sports production remains an area of intense research and development, so there will be plenty of innovation and expertise in this area on the…