Standards: SMPTE ST 2110 - ST 2110-3x Audio Transport
The ST 2110-3x suite of standards are all about audio. Here we explain how it handles multi-channel PCM audio, why conformance levels matter for channel capacity, and what the AES has got to do with it all.
The ST 2110-3x Documents
The ST 2110-3x group of standards describe how to deliver digital audio within an IP studio environment. There are currently only two parts relating to uncompressed audio. Their payloads are described by AES3 and AES67, hence the SMPTE standards themselves are quite brief.
- AES3 – Describes a simple two channel stereo streamed transmission. Deployed as ST 2110 Part 31.
- AES67 – Adds more channels and improves the performance. Deployed as ST 2110 Part 30.
AES3 and AES67 are based on and refer to many other AES documents. Study them all to understand the complete picture. These are the most important but there are others:
- AES5 – Preferred sample frequencies.
- AES11 – Synchronization of digital audio equipment.
- AES18 – Ancillary user data channel format.
SDI Audio Embedding
ST 2110-2x standards accommodate SDI conforming to the ST 292 standard as a source format. SDI has the capacity to carry up to 16 channels of audio depending on the sample size and frequency.
The SDI format is derived from classic analog TV services, having a space in the line structure where the video is blanked. Each horizontal line has a space at the start for ancillary data (HANC). Lines are reserved at the top and bottom of the frame for more ancillary data (VANC).
Digital audio is stored in the HANC space and is extracted for conversion to MPEG or ST 2110 compatible formats.
ST 2110-30 – RTP Streamed AES67 Audio
The AES67 standard describes Pulse Code Modulated (PCM) audio. ST 2110 Part 30 describes how to package uncompressed AES67 samples into RTP packets for transport on the IP network.
The delivery is supported by signaling metadata delivered using the Session Description Protocol (SDP). This metadata is necessary to receive and correctly interpret the stream.
ST 2110-30 can be seen as a subset of AES67. Most of the requirements for stream transport, packet setup, and signaling are common to ST 2110-30 and AES67. ST 2110-30 profiles AES67 with these constraints to ensure more reliable interoperability:
- Support of the PTP profile defined in SMPTE ST 2059-2 instead of that defined in AES67.
- An offset value of zero between the media clock and the RTP stream clock.
- Mandatory signaling to force a device to operate in PTP follower-only mode.
- Support of IGMPv3 for multicasting. Refer to RFC 3376 for details.
- RTCP messaging is tolerated but not mandated or required.
- Receivers need not support the Session Initiation Protocol (SIP) or other connection management support.
- The size of UDP datagrams (packets) is specified in ST 2110-10 and supersedes those in AES67.
- SDP channel ordering syntax must follow RFC 3190 conventions.
- The maximum number of channels is limited depending on the conformance level.
The AIMS Alliance has published a helpful white paper that describes how ST 2110-30 and AES67 interact. Download a copy of “AES67-SMPTE-ST-2110- Commonalities-and-Constraints” here:
https://aimsalliance.org/white papers/
The audio is extracted from the Horizontal Blanking data space which carries audio signals and other ancillary data as well. Only the audio is encapsulated in the Part 30 elementary stream.
The number of supported channels can range from 1 to 64 depending on the timing of the packets and the sample rate:
- SDI can only accommodate 16 channels.
- A default sample-rate of 48 kHz is mandated. The only other alternatives are 44.1kHz and 96kHz.
- Sample sizes of 16 or 24 bits are supported.
- A packet time of 1ms or 125µs is allowed.
- Packet time and sample-rate determine how many channels can be carried.
Channel Ordering
Channel ordering is described in a Session Description Protocol message with symbolic names. The receiver can use them to deduce how to unpack the received samples and reconstruct the correct channel mapping. If this SDP description is omitted the receiver will assume all channels are an undefined type. Sending the correct channel configuration facilitates the delivery of surround-sound multi-channel audio. Multi-channel mapping is managed by the Channel Order Convention delivered via an SDP message. The following types are supported:
| Symbol | Description | Channels |
|---|---|---|
| M | Single Mono channel. | 1 |
| DM | Dual Mono. | 2 |
| ST | Standard Stereo pair. | 2 |
| LtRt | Matrix Stereo. | 2 |
| 51 | 5.1 Surround. | 6 |
| 71 | 7.1 Surround. | 8 |
| 222 | 22.2 Surround. | 24 |
| SGRP | One SDI audio group. | 4 |
| U01…U64 | Undefined group of numbered channels (1 to 64). | 1 to 64 |
Because an incoming SDI feed can only deliver 16 audio channels, the 222 symbol cannot be used. Neither can undefined group values U17 to U64.
Note that the channel ordering can be stacked. This example SDP fragment describes the first six channels as 5.1 surround format and the next two as a stereo-pair delivered alongside them:
channel-order=SMPTE2110.
(51,ST)8
This second example SDP fragment describes the first four channels as separate monophonic channels, and the next two channels as a stereo-pair and the last two channels as an undefined type:
channel-order=SMPTE2110.
(M,M,M,M,ST,U02
Conformance Levels
The ST 2110-30 standard describes receiver conformance levels that determine how the incoming packets will be reassembled into channels.
This is based on the sample rate vs. the packet times. The multiple of these determine how many channels can be carried within the available capacity. If the chosen level limits the number of channels to less than you need, multiple AES67 links can be delivered with the channels sensibly divided between them. If all 16 channels are arriving in an SDI stream but your receiver is only able to support Level A (as specified in the table), you will need to transmit eight channels in each of two separate AES67 links running side-by-side.
There are three basic conformance levels. Extended variants sacrifice the maximum channel count to allow larger sample rates:
| Level | Receiver Must Support |
|---|---|
| A | 48 kHz incoming streams. 1 to 8 audio channels at packet times of 1ms. |
| B | 48 kHz incoming streams. 1 to 8 audio channels at packet times of 1ms. 1 to 8 audio channels at packet times of 0.125ms. |
| C | 48 kHz incoming streams. 1 to 8 audio channels at packet times of 1.0ms. 1 to 64 audio channels at packet times of 0.125ms. |
Level A is mandatory and must be supported by all receivers. This is also defined in AES67 as the minimum support. These are all based on 24-bit samples.
Levels B & C support shorter packet times to improve latency. They also support more channels for interoperability with MADI (AES10) systems.
Levels AX, BX and CX add support for 96 kHz sample rates but reduce the number of supported channels where necessary.
For example, conformance level CX can reconstruct 32 channels of 96kHz audio when the packets are spaced at 125µS intervals but if the sample rate is reduced to 48kHz, it can then support 64 channels. Packet spacing at 1ms intervals is exactly eight times slower and the maximum number of channels should therefore be divided by eight to compensate.
ST 2110-31 – AES3 Transparent Transport Over RTP
The ST 2110-31 standard extends ST 2110-30 to describe real time, RTP-based transport of any audio format that can be encapsulated into AES3.
The RTP packet header and payload format is described in Section 5. Packets are synchronized to a network reference clock described in ST 2110-10. Session Description Protocol (SDP) messages describe how the payload is constructed for the benefit of the receiver. See Section 6 of the standard.
This is all based on the original Ravenna AM824 specification and registered with the IANA as RTP Media Type ‘AM824’. Refer to RFC 4855 and RFC 6838 for details.
Annex A of the standard describes how AES3 and AES10 (MADI) protocols interact. They are broadly compatible but some data needs to be correctly framed and some flag-bits must be adjusted within the packets as they move between the two environments.
SMPTE standard ST 337 is an earlier but still relevant general-purpose encapsulation format for AES3 payloads. There are 32 basic data-types described by ST 338. The ST 337 payload format can be extended to support additional data-types. The payloads can carry audio and other kinds of data.
Here are a few of the supported data-types:
| Index | Data type |
|---|---|
| 1 | ST 340 compliant AC-3 audio. |
| 2 | Timestamp. |
| 5 | MPEG audio (including MP3). |
| 10 | MPEG-4 AAC audio. |
| 11 | MPEG-4 HE-AAC audio. |
| 27 | SMPTE KLV formatted data. |
| 29 | Caption data. |
Note that you may need to deploy hardware coding in order to reduce the latency sufficiently to use MP3, AAC and HE-AAC codecs for the audio.
Relevant Standards
These are the relevant standards you need to fully explore ST 2110-30 and ST 2110- 31. The vintage column indicates when they were revised or in the case of AES standards reaffirmed as being up to date. Note that the AES standards refer to some IEC standards that are more often identified as ISO standards. Where standards are superseded, the original reference (from AES or SMPTE) and the replacement standard are listed. The version column lists the most recent edition, amendment or reaffirmation of a standard:
| Document | Vintage | Description |
|---|---|---|
| AES3 | - | A specification for 2-channel digital audio interconnection. Commonly known as AES/EBU. |
| AES3-1 | 2009 | Part 1 – Audio Content. |
| AES3-2 | 2009 | Part 2 – Metadata and Sub-code. |
| AES3-3 | 2009 | Part 3 – Transport. |
| AES3-4 | 2009 | Part 4 – Physical and electrical. |
| AES5 | 2018 | Preferred sampling frequencies for applications employing pulse-code modulation. |
| AES10 | 2020 | Serial Multichannel Audio Digital Interface (MADI). |
| AES11 | 2020 | Synchronization of digital audio equipment in studio operations. |
| AES14-1 | 1992 | Part 1 – Analog XLR pin-out polarity and gender. |
| AES31 | - | AES standard for network and file transfer of audio – Audio-file transfer and exchange. |
| AES31-1 | 2001 | Part 1 – Disk format. |
| AES31-2 | 2019 | Part 2 – File format for transferring digital audio data between systems of different type and manufacture. |
| AES31-3 | 2021 | Part 3 – Simple project interchanges. |
| AES31-4 | 2015 | Part 4 – XML Implementation of Audio Decision Lists. |
| AES42 | 2019 | Digitally interfaced microphones. |
| AES47 | 2017 | Transmission of digital audio over asynchronous transfer mode (ATM) networks. |
| AES51 | 2017 | Transmission of ATM cells over Ethernet physical layer. |
| AES53 | 2018 | Sample-accurate timing in AES47. |
| AES67 | 2018 | High-performance streaming audio-over-IP interoperability. |
| AES70 | - | Open Control Architecture. |
| AES70-1 | 2018 | Part 1 – Framework. |
| AES70-2 | 2018 | Part 2 – Class structure. |
| AES70-3 | 2018 | Part 3 – OCP.1 Protocol for IP Networks. |
| AES74 | 2019 | Requirements for Media Network Directories and Directory Services. |
| RP 168 | 2009 | SMPTE Definition of Vertical Interval Switching Point for Synchronous Video Switching. |
| ST 318 | 2015 | Synchronization of 59.94 or 50 Hertz related video and audio systems in analog and digital areas. |
| ST 337 | 2015 | Format for Non-PCM Audio and Data in an AES3 Serial Digital Audio Interface. |
| ST 338 | 2019 | Format for Non-PCM Audio and Data in AES3 – Data Types. |
| ST 339 | 2015 | Format for Non-PCM Audio and Data in AES3 – Generic Data Types. |
| ST 340 | 2015 | Format for Non-PCM Audio and Data in AES3 – ATSC A/52B Digital Audio Compression Standard for AC-3 and Enhanced AC-3 Data Types. |
| ST 2036-2 | 2013 | Ultra-High-Definition-Television – Audio Characteristics and Audio Channel Mapping for Program production. |
| ST 2059-2 | 2021 | SMPTE Profile for Use of IEEE-1588 Precision Time Protocol in Professional Broadcast Applications. |
| ST 2067-8 | 2013 | Interoperable Master Format – Common Audio Labels. |
| ST 2110-30 | 2017 | PCM Digital Audio. |
| ST 2110-31 | 2022 | AES3 Transparent Transport. |
| ST 2116 | 2019 | Format for Non-PCM Audio and Data in AES3 – Carriage of Metadata of Serial ADM (Audio Definition Model). |
| ISO 3309 | - | Information processing systems, Data communications High-level data link control procedures and Frame structure. Referred to by AES18 but now withdrawn and replaced by ISO 13239. |
| ISO 13239 | 2002 | High-level data link control (HDLC) procedures. |
| ISO 9314-1 | 1989 | Part 1 – Token Ring Physical Layer Protocol (PHY). |
| ISO 9314-3 | 1990 | Fiber Distributed Data Interface (FDDI) – Part 3 – Physical Layer Medium Dependent (PMD). |
| ISO 10646 | 2023 | Information technology — Universal coded character set (UCS). Currently being revised. |
| IEC 60169-8 | 1978 | Radio-frequency connectors. Part 8 – R.F. coaxial connectors with inner diameter of outer conductor 6.5 mm (0.256 in) with bayonet lock (BNC). Replaced by 61169-8. |
| IEC 60958 | Two channel digital audio data format used by S/PDIF and AES3. | |
| IEC 60958-1 | 2021 | Digital audio interface – Part 1 General. |
| IEC 60958-3 | 2006 | Part 3 – Consumer applications – Sony/Philips consumer optical digital interface (S/PDIF) based on AES3. |
| IEC 60958-4 | 2016 | Digital audio interface – Part 4 Professional applications. |
| IEC 61169-8 | 2007 | RF coaxial connectors. |
| IEC 61937 | 2021 | Digital audio – Interface for non-linear PCM encoded audio bitstreams applying IEC 60958 – Surround sound digital audio data format. |
| IEC 61883-6 | 2014 | Part 6 – Audio and music data transmission protocol. |
| IEEE 1588 | 2019 | PTP – Precision Clock Synchronization Protocol for Networked Measurement and Control Systems. |
| RFC 3190 | 2002 | RTP Payload Format for 12-bit DAT Audio and 20- and 24-bit Linear Sampled Audio. |
| RFC 3376 | 2002 | Internet Group Management Protocol, Version 3. |
| RFC 3550 | 2003 | RTPA Transport Protocol for Real-Time Applications. |
| RFC 3551 | 2003 | RTP Profile for Audio and Video Conferences with Minimal Control. |
| RFC 3629 | 2003 | UTF-8, a transformation format of ISO 10646. |
| RFC 4566 | 2006 | SDP – Session Description Protocol. |
| RFC 4855 | 2007 | Media Type Registration of RTP Payload Formats. |
| RFC 6838 | 2013 | Media Type Specifications and Registration Procedures. |
| RFC 7273 | 2014 | RTP Clock Source Signaling. |
| EBU T3250 | 2017 | Specification of the digital audio interface (the AES/ EBU interface). |
| ITU-R BS.450-3 | 2001 | Transmission standards for FM sound broadcasting at VHF (aka CCIR Rec 450-1). |
| ITU-R BS.647 | 2011 | A digital audio interface for broadcasting studios. |
| ITU-T J.17 | 1988 | Pre-emphasis used on sound-program circuits. |
| ITU-T J.53 | 2000 | Sampling frequency to be used for the digital transmission of studio-quality and high-quality sound-program signals. |
| Ravenna AM824 | 2012 | RTP Payload Format for AES3. |
| Rane Note 149 | 2014 | Describes the differences between AES3 and S/PDIF and how to interface them correctly. |
| BBC WHP 074 | IBC 2003 | BBC White Paper – The development of ATM network technology for live production infrastructure. |
| IS-04 | v1.3.2 | NMOS Discovery & Registration. |
| IGMPv3 | 2002 | See RFC 3376. |
Applying AES & ST 2110-3x Standards
The ST 2110 and AES standards described here convey uncompressed digital audio around an IP network. Compressed audio formats such as AAC are currently outside the scope of these standards.
Although AES67 is an open standard, patent licensing may be necessary if you are designing a commercial product based on it. The principal patent holder is an Australian company called Audinate Pty Ltd. There may be other relevant patents that AES are unaware of.
The AES and SMPTE standards are easier to read and understand than MPEG standards. They tend to be much shorter and focused on a specific topic. They are very dependent on other earlier documents. SMPTE, AES and IETF RFC documents frequently refer to each other.
IETF standards are available online to download free of charge. SMPTE and AES standards are free downloads for subscribing members. Joining both societies is easy and worth the subscription fee if you intend purchasing more than a couple of their standards.
These Appendix articles contain additional information you may find useful:
Supported by
You might also like...
Standards: Video - Advanced Video Coding (AVC)
AVC remains one of the most widely deployed video codecs in the world, but navigating its profiles, levels and signaling mechanisms is far from straightforward.
Network Traffic Engineering: RIST & SRT - The Success Of ARQ Based Protocols
IP networks are inherently unreliable. We kick off this series on IP Network Traffic Engineering with a look at how RIST and SRT give broadcast engineers user-configurable control over the latency-versus-reliability trade-off for real-time media streaming.
Standards: Video - Standards For Video Coding
From 4K to 32K, the demand for ever-larger video formats is pushing codec technology to its limits. This guide surveys the landscape of video coding standards – from legacy MPEG formats to AI-driven neural network compression – to help navigate the choices sha…
Broadcast Standards 2026 – Video Coding
Video coding was developed to deliver video conferencing services over low-bandwidth modem connections, but modern demands for ever-larger video formats are pushing codec technology to its limits.
Network Traffic Engineering: Part 1
IP networks are inherently unreliable. They always have been – it is literally designed in as a feature.