Standards: Part 15 - ST2110-2x - Video Coding Standards For Video Transport
SMPTE 2110 and its related standards help to construct workflows and broadcast systems. They coexist with standards from other organizations and incorporate them where necessary. In an earlier article we looked at the ST 2110 standard as a whole. This time we focus on the parts of ST 2110 that are concerned with video transport.
This article is part of our growing series on Standards.
There is an overview of all 26 articles in Part 1 - An Introduction To Standards.
The ST 2110 Standards Family
The ST 2110 standards are organized into these groups:
- ST 2110-1x - Systems layer.
- ST 2110-2x - Video content.
- ST 2110-3x - Audio content.
- ST 2110-4x - Ancillary data, metadata and synchronized text.
ST 2110 reserves the range 20 to 29 for video topics. Part 20 is the most complex and the others enhance it for specific situations.
In addition to the main standards which are prefixed with ST, there are some helpful recommended practices which often go unnoticed. These are numbered consistently with the ST documents but have an RP prefix. Draft standards are occasionally published first as RP documents and then promoted to ST status when they are ratified.
Document | Latest | Description |
---|---|---|
ST 2110-20 | 2022 | Uncompressed Active Video. |
ST 2110-21 | 2022 | Traffic Shaping and Delivery Timing for Video. |
ST 2110-22 | 2022 | Constant Bit-Rate Compressed Video. |
RP 2110-23 | 2019 | Single Video Essence Transport over Multiple ST 2110-20 Streams. |
RP 2110-24 | 2023 | Special Considerations for Standard Definition Video Using SMPTE ST 2110-20. |
RP 2110-25 | 2023 | Measurement Practices. |
The SMPTE standards documents have a familiar structure with a glossary of terms followed by textual conventions and notation before the main body.
SMPTE standards are less complex than MPEG standards and consequently easier to read and understand.
ST 2110-20 - Uncompressed Active Video
Production workflows mandate that quality is more important than bandwidth efficiency. Hence, ST 2110 part 20 describes the delivery of uncompressed video via RTP streams over an IP network.
The 2017 edition of ST 2110-20 has been superseded by the 2022 edition. A side-by-side comparison of both editions reveals a few differences:
- Alpha channel support has been added.
- Signaling for the ST 2115 LOGS3 transfer characteristic is introduced.
- Some bibliographic references have been updated.
- Improved descriptions based on the Protocol Implementation Conformance Statement (PICS) for ST 2110 (which is a work in progress).
- Some details from ST 2110-21 have been incorporated.
Note: The document with a footer showing the approval date 28-03-2022 contains typographical errors and bibliographic references. These have been corrected in the document republished on 14-12-2022 which is the definitive version.
The Video Services Forum have published additional useful guidance as Technical Recommendation TR-05. This was released in 2018 and is based on the 2017 edition of ST 20110-20. Read this carefully because it pre-dates the revised 2022 edition of 2110-20.
RTP Packet Structures
The RTP packet structure is described in detail in section 6. Refer to IETF RFC 3550 to see this in context. The transmission is affected by traffic shaping considerations covered in ST 2110-21. The Virtual Receiver Buffer model was covered in an earlier article in this series. To recap, the senders will be one of these three types:
- Type N - Narrow Senders.
- Type NL - Narrow Linear Senders.
- Type W - Wide Senders.
They are also further described in Appendix A of VSF TR-05.
Recall that for ST 2110-20, only the active picture area data is transmitted. Audio and Ancillary data are delivered separately.
The RTP payload format is described by IETF RFC 4175.
Some signaling information directly relevant to the essence is carried with it in the RTP packets. The packets are timestamped so the essence can be reconstructed reliably from an ordered sequence of packets. Other supporting metadata is transmitted separately as Session Description Protocol (SDP) streams.
The start and end of each progressive frame or interlaced field are marked in the RTP packet header. The receiver can detect these boundary markers and initiate a picture reconstruction. These must be interpreted carefully when dealing with progressive vs interlaced content as their behavior is context dependent.
For progressive video frames, the Field ID marker acts as a Frame Start indicator. The last packet carries an End of Frame marker:
The structure of an interlaced frame is more complex and the two fields are marked individually:
Restoring the field dominance correctly is important to avoid introducing jitter when the streams are processed back into video frames.
Progressive Segmented Video (PsF) transports the picture as if it were interlaced but the picture can be reconstructed as a progressive image in the receiver. This allows progressive material to be processed through processes that only understand interlaced formats.
The RTP payload is a collection of Pixel Group (pgroup) data structures. A pgroup is a run of pixels managed as a block. It allows picture areas to be defined as non-rectangular shapes. This is possibly the most complex part of ST 2110-20. Read about the RTP payload in section 6.2 then read TR-05 for additional insights and example calculations. The construction of a pgroup is somewhat complex and depends on the chroma-sampling model being used.
The header of the payload describes where in the picture rectangle the pgroup should be placed. The line number and offset describes a pixel accurate location. This is most often used to insert a channel-identifying graphic. Using a pgroup significantly reduces the amount of data being transmitted if the stream only contains a sub-raster (overlay) with most of its picture area unoccupied.
ST 2110-21 - Traffic Shaping & Delivery Timing
Traffic shaping manages the available bandwidth where multiple elementary streams are delivered via the same route. If all the streams burst at the same time, the network would become saturated and the streams would break down due to packet loss.
Traffic shaping also helps to avoid packet jitter where the packet delivery times become irregular. Jitter causes problems with the precision timing of packets. Part 21 describes two alternative scenarios:
- Network Compatibility Model - The network will cope with many streams without overflowing the switch buffers or queues. This relies on a receiver buffer being drained at least as fast as new packets are arriving.
- Virtual Receiver Buffer Model - The sequence of packets will not underflow nor overflow receiver buffers because they enter the buffer and leave it immediately. This also relies on packets arriving in a timely manner so the buffer is never empty when the next packet needs to be accessed. Because packets are taken from the buffer at regularly timed intervals, the jitter is eliminated.
ST 2110-22 - Constant Bit-rate Compression
ST 2110-22 constant bit-rate delivery is broadly similar to ST 2110-20. Some parameter settings are specific to this format and the video codec is configured for constant rather than variable bit rate.
Although it is designed to be format agnostic, ST2110-22 expects to carry video coded with one of the known registered codecs. Other codecs are likely to be registered in due course:
Be careful to use the correct version of the standard when deploying compatible equipment or software. Specifying ST 2110-22 alone is not enough without indicating which edition is being deployed. The ST 2110-22:2019 and ST 2110-22:2022 documents are different editions of the same standard with slightly different coding strategies.
As well as a few corrections, the 2022 revision clarifies the requirements for traffic shaping and delivery timing. The sender type N option is also added. In order to distinguish the version, the SMPTE Standard Number (SSN) is now included in the SDP signaling metadata to distinguish between these two variants.
RP 2110-23 - Transport Over Multiple Streams
If a single network link has insufficient capacity, ST 2110-23 describes how to split an uncompressed ST 2110-20 video stream across multiple streams. There are several different decomposition methods available. The same net amount of data would be transmitted but each route could carry a smaller bandwidth stream.
The Session Description Protocol (SDP) describes the decomposition method being used so that the receiver can correctly reconstruct the picture.
This recommended practice document may only be of interest to organizations that are working with very high-resolution images. Image sizes of 4K and higher may require such a huge bandwidth that a single network link is overwhelmed.
The HEVC codec is designed for 8K and can operate up to 16K. Prototype 32K cameras are already being designed that will require very high-performance network support.
ST 425-5 and ST 2082-12 describe how to interleave samples. This allows a lower resolution preview image to be reconstructed from any single stream. All four streams are required to reconstruct the complete image. ST 2082-12 has a detailed explanation of the Sample Interleaving decomposition technique:
RP 2110-23 also offers a simpler solution which divides an image into quadrants. This is called Square Division decomposition.
High data rates also arise from lower definition video running at very high frame rates. Super-motion cameras are used at sporting events for slow-motion action replays. They divide the video along the time axis (Temporal Division decomposition) and deliver the frames in a cyclic manner across several streams. Observing a single stream previews the video at a lower frame-rate.
Note: This only works with uncompressed streams. It would be impractical to use this with previously compressed video streams. However, we might be able to compress the decomposed streams with a fast enough coder. Special attention would need to be paid to avoid frame-to-frame artefacts in the uncompressed result.
RP 2110-24 - Special Considerations For SD Video
Part 24 refines the definition of the pixel sampling for Standard Definition (SD) video. It accounts for the following:
- 525 vs 625 lines.
- 4:3 vs 16:9 pixel aspect ratios.
- Horizontal image size.
- Additions to ST 125 that describe the original analogue source video timings.
RP 2110-24 describes how to map the Image Format Line Numbers specified in ST 125 to correspond with the Sample Row numbers defined by ST 2110-20.
The media type parameters for height and width in ST 2110-20 are derived from the video area defined in ST 125.
Refer to RP 202 which describes how interlaced video must be properly (vertically) aligned prior to compression. This will affect the height value.
RP 2110-25 - Measurement Practices
Diagnostic techniques are based on the same core principles of measurement, observation and divide/conquer, regardless of the type of architecture being inspected. Insert test-probes and analyze network traffic in the same way that you might have used an oscilloscope to analyze the analogue video signals. The tools may be different but the process is the same.
Validate the ingest data at the front of the workflow. Then if it is not arriving at the destination, observe the network traffic at various points along the route to isolate the fault. Tools such as traceroute will present a list of 'hops' which should correspond with the route that you expect.
The European Broadcasting Union (EBU) contributed their testing protocols to the RP 2110-25 authors. These tests expose the internal structures and processes within ST 2110.
RP 2110-25 illustrates how to analyze traffic down to the IP datagram level where the dispatch and arrival times for each packet are visible.
Study this document carefully and use it to develop automated monitoring tools to watch the system continuously. If any measurements depart from the expected nominal values, trigger an alert to warn the ops team so they can intervene. Monitoring tools must run separately from the main workflow and cannot interrupt or delay the packets being transmitted.
Section 4 describes the measurements as they apply to various ST 2110 parts:
Applies to | Section | Measurements described |
---|---|---|
ST 2110-10 | 4.7 | Systems layer: Since this is an abstract foundation for timing and delivery, there are no specific measurements applicable. |
ST 2110-20 | 4.8 | Network Compatibility Model, Virtual Receive Buffer Model, First Packet Time, RTP Offset, Video Latency, Margin between first packet and VRX buffer read, GAP time between fields/frames. |
ST 2110-21 | 4.9 | Network Compatibility Model, Virtual Receive Buffer Model. |
ST 2110-22 | 4.10 | Network Compatibility Model, Virtual Receive Buffer Model. |
RP 2110-23 | - | Stream splitting: Additional work needed. |
RP 2110-24 | - | SD TV: Covered by ST 2110-20. |
ST 2110-30 | 4.11 | AES67 Audio: Audio Delay Variance, Packet Interval Time, Audio Latency, Audio Video Differential Latency. |
ST 2110-31 | - | AES3 Audio: Derive this from the ST 2110-30 measurements. |
ST 2110-40 | 4.12 | Ancillary data: First Packet Time, RTP Offset, ANC Latency, ANC Video Differential Latency, Relative RTP Offset, Metadata Margin for reinsertion in SDI. |
ST 2110-41 | - | Fast metadata: Additional work needed. |
ST 2110-42 | - | Fast metadata: Additional work needed. |
ST 2110-43 | - | Timed Text analysis. Derive this from ST 2110-40 measurements. |
Example testing code written in the Python language is provided in Appendix B. This will help to analyze how the content is buffered.
Rewrite the tests in a more efficient language for networks having higher throughput. Systems level programming is ideally coded in ANSII Standard C-Language and compiled down to machine code (perhaps with the GNU CC compiler). Languages such as Python, Perl, JavaScript and Java are executed via an interpreter which adds a performance overhead that C-Language avoids.
Session Description Protocol (SDP)
An SDP-based signaling protocol is defined for technical metadata necessary to receive and interpret the streams. ST 2110-20 section 7 describes this SDP metadata which is transmitted separately to the essence. Refer also to RFC 4566. Additional SDP metadata is described in ST 2110-21 as it applies to traffic shaping. This metadata applies to all samples, rows, fields, and frames in a stream.
The metadata is delivered in name-value pairs or just as names which are treated as flags having a default value associated with them:
<name>=<value>
<name>
Here are a few examples. Some of these are mandatory and others depend on the format being delivered. Read section 7 of the standard for details:
Parameter | Description |
---|---|
sampling | Describes the color difference sub-sampling structure. Section 7.4.1 describes the range of alternatives and includes RGB, XYZ and KEY. See SMPTE RP 157 for a description of Key and Fill. Note also that colorimetry can be set to ALPHA when the value ST2110-20:2022 should be used for the SSN value. |
depth | The number of bits per sample. This can be 8, 10, 12 or 16 bits. Using 16f indicates a 16-bit floating-point value. |
width | The number of pixels per row (1 - 32767). |
height | The number of pixel rows per frame (1 - 32767). |
exactframerate | Frames per second. This value is complex. Values such as 24 or 25 fps are described as simple integers. Non-integer values are described as one integer divided by another. Thus 29.97 fps is described as 30000/1001. |
colorimetry | Describes the system colorimetry used by the sample pixels. If this is set to ALPHA then the SSN value must be set to indicate ST2110-20:2022. |
PM | The packing mode as defined in section 6.3. |
SSN | The SMPTE Standard Number indicates which version of ST 2110-20 is used. This should be set to ST2110-20:2017 unless colorimetry is set to ALPHA or the TCS value is set to ST2115LOGS3 in which case the value ST2110-20:2022 should be used. |
interlace | An optional parameter with no associated value. If present, the video is either interlaced or PsF. Omitting this parameter flag entirely indicates progressive video. |
segmented | This modifies the interlace flag to signify PsF rather than interlaced video. It cannot be used on its own. The Interlace flag must always be present as well. |
TCS | The Transfer Characteristic System describes color conversion profiles. If the value ST2115LOGS3 is used then the SSN value must be set to indicate ST2110-20:2022. |
RANGE | Describes the coding range of the samples when used in combination with the colorimetry value. |
MAXUDP | Describes the maximum size of the UDP packets as specified in ST 2110-10. |
PAR | The pixel aspect ratio is expressed as two integer values separated by a colon. A default 1:1 square pixel aspect ratio is assumed if this parameter is absent. The 16:9 value would stretch the pixels horizontally to fill a wide screen display. |
Observe that the height and width parameters have a range already capable of dealing with 32K video.
The SDP parameters relating to traffic shaping are:
Parameter | Description |
---|---|
TP | Describes the type of sender. The value will be one of N (Narrow), NL (Narrow Linear) or W (Wide). |
TROFF | Describes a number of milliseconds offset of the frame start from a reference time. |
CMAX | Describes the capacity of the receiver's buffer when the Network Compatibility Model is used. |
The N, NL and W sender types are described in an earlier article.
This example was given in the standard to illustrate 10-bit video, 1280x720 @59.94 fps conforming to BT709:
m=video 30000 RTP/AVP 112
a=rtpmap:112 raw/90000
a=fmtp:112 sampling=YCbCr-4:2:2; width=1280; height=720;
exactframerate=60000/1001; depth=10; TCS=SDR; colorimetry=BT709;
PM=2110GPM; SSN=ST2110-20:2017
Other Relevant Standards
Obtain copies of the documents listed in the Normative References section of each standard. IETF RFC documents and ITU recommendations should be easy to acquire. The VSF technical recommendations are also free to access.
The edition column includes a release date where a specific revision of an associated document is referred to. This is not always the very latest version and some problems with integration between systems from different manufacturers may be due to different revisions of the standards being used to design their products.
Org | Standard | Edition | Description |
---|---|---|---|
AES | AES-67 | 2018 | AES standard for audio applications of networks - High-performance streaming audio-over-IP interoperability. |
AMWA | BCP-002-01 | - | Natural grouping of NMOS resources. |
ANSI | CTA-608-E R2014 | - | Line 21 Data Services for closed captioning and Teletext. Note that the latest version is free but earlier revisions are not. |
IETF | RFC 3550 | - | RTP: A Transport Protocol for Real-Time Applications. |
IETF | RFC 4175 | - | RTP Payload Format for Uncompressed Video. |
IETF | RFC 4421 | - | Additional Color Sampling Modes. |
IETF | RFC 4566 | - | SDP: Session Description Protocol. |
IETF | RFC 4855 | - | Media Type Registration of RTP Payload Formats. |
IETF | RFC 8285 | - | A General Mechanism for RTP Header Extensions. |
ISO | ISO 11664-1 | 2007 | CIE standard colorimetric observers. |
ITU | Rec 601-7 | - | Studio encoding parameters of digital television for standard 4:3 and wide screen 16:9 aspect ratios. |
ITU | Rec 709-6 | - | Parameter values for the HDTV standards for production and international program exchange. |
ITU | Rec 1700-0 | - | Characteristics of composite video signals for conventional analogue television systems. |
ITU | Rec 1866 | - | Reference electro-optical transfer function for flat panel displays used in HDTV studio production. |
ITU | Rec 2020-2 | - | Parameter values for ultra-high-definition television systems for production and international program exchange. |
ITU | Rec 2100-2 | - | Image Parameter Values for High Dynamic Range Television for use in Production and International Program Exchange. |
SMPTE | ST 125 | 2013 | SDTV Component Video Signal Coding 4:4:4 and 4:2:2 for 13.5 MHz and 18 MHz Systems. |
SMPTE | RP 157 | - | Key and Alpha Signals. |
SMPTE | RP 186 | 2008 | Video Index Information Coding for 525- and 625-Line Television Systems. |
SMPTE | RP 187 | 1995 | Centre, Aspect Ratio and Blanking of Video Images. |
SMPTE | RP 202 | 2008 | Video Alignment for Compression Coding. |
SMPTE | ST 266 | 2012 | SD Digital Component Systems - Digital Vertical Interval Time Code. |
SMPTE | ST 291 | 2011 | Ancillary Data Packet and Space Formatting. |
SMPTE | ST 425-5 | - | Image Format and Ancillary Data Mapping for the Quad Link 3 Gb/s Serial Interface. |
SMPTE | ST 435-1 | 2012 | 10 Gb/s Serial Signal / Data Interface - Part 1: Basic Stream Derivation. |
SMPTE | ST 428-1 | 2006 | D-Cinema Distribution Master - Image Characteristics. |
SMPTE | RP 2077 | 2013 | Full-Range Image Mapping. |
SMPTE | ST 2022-6 | 2012 | Transport of High Bit Rate Media Signals over IP Networks (HBRMT). |
SMPTE | ST 2022-7 | - | Seamless Protection Switching of RTP Datagrams. |
SMPTE | ST 2065-1 | 2012 | Academy Color Encoding Specification (ACES). |
SMPTE | ST 2065-3 | 2012 | Academy Density Exchange Encoding (ADX) - Encoding Academy Printing Density (APD) Values. |
SMPTE | ST 2082-12 | 2019 | 4320-line and 2160-line Source Image and Ancillary Data Mapping for Quad-link 12G-SDI. |
SMPTE | ST 2110-10 | 2022 | Professional Media over Managed IP Networks: System Timing and Definitions. |
SMPTE | ST 2115 | 2019 | Free Scale Gamut and Free Scale Log Characteristics of Camera Signals. |
VSF | TR-03 | - | Transport of Uncompressed Elementary Stream Media over IP. |
VSF | TR-04 | - | Utilization of ST 2022-6 Media Flows within a VSF TR03 Environment. |
Conclusion
In addition to the SMPTE ST standards documents, there are many Recommended Practice (RP) documents that support them. Joining the SMPTE organization as an associate member is not expensive and gives access to all of their standards documents free-of-charge. This is a very worthwhile benefit and SMPTE should be applauded for that gesture. The AES offers a similar incentive.
ST 2110 also relies on many other standards bodies and expert groups. Most of these (other than ISO) also offer their standards documents free of charge.
Looking ahead, there are royalty free open-source codecs on the horizon that will likely be registered within ST 2110. If these encode the video with sufficient performance, they could be used with ST 2110-20. If they provide constant bit-rate variants, they will augment ST 2110-22.
These Appendix articles contain additional information you may find useful:
Part of a series supported by
You might also like...
HDR & WCG For Broadcast: Part 3 - Achieving Simultaneous HDR-SDR Workflows
Welcome to Part 3 of ‘HDR & WCG For Broadcast’ - a major 10 article exploration of the science and practical applications of all aspects of High Dynamic Range and Wide Color Gamut for broadcast production. Part 3 discusses the creative challenges of HDR…
IP Security For Broadcasters: Part 4 - MACsec Explained
IPsec and VPN provide much improved security over untrusted networks such as the internet. However, security may need to improve within a local area network, and to achieve this we have MACsec in our arsenal of security solutions.
Standards: Part 23 - Media Types Vs MIME Types
Media Types describe the container and content format when delivering media over a network. Historically they were described as MIME Types.
Building Software Defined Infrastructure: Part 1 - System Topologies
Welcome to Part 1 of Building Software Defined Infrastructure - a new multi-part content collection from Tony Orme. This series is for broadcast engineering & IT teams seeking to deepen their technical understanding of the microservices based IT technologies that are…
IP Security For Broadcasters: Part 3 - IPsec Explained
One of the great advantages of the internet is that it relies on open standards that promote routing of IP packets between multiple networks. But this provides many challenges when considering security. The good news is that we have solutions…