Standards: Part 19 - ST 2110-30/31 & AES Standards For Audio

Our series continues with the ST 2110-3x standards which deploy AES3 and AES67 digital audio in an IP networked studio. Many other AES standards are important as the foundations on which AES3 and AES67 are constructed.


This article is part of our growing series on Standards.
There is an overview of all 26 articles in Part 1 -  An Introduction To Standards.


The ST 2110-3x series of standards describe how to deliver digital audio within an IP studio. There are currently only two parts relating to uncompressed audio. Their payloads are described by AES3 and AES67, hence the SMPTE standards themselves are quite brief.

  • AES3 - Describes a simple two channel stereo streamed transmission. Deployed as ST 2110 Part 31.
  • AES67 - Adds more channels and improves the performance. Deployed as ST2110 Part 30.

Those standards are based on many other AES documents on related topics. Study them all to understand the complete picture. These are the most important but they do refer to others:

  • AES5 - Preferred sample frequencies.
  • AES11 - Synchronization of digital audio equipment.
  • AES18 - Ancillary user data channel format.

The Audio Engineering Society (AES)

The Audio Engineering Society was established in 1948. They produce standards and guidelines for audio engineers. Annual conventions are currently organized every year in Europe, the USA and Latin America. The technical papers are collected into a proceedings volume and selected papers are published in the member’s journal. The individual papers are easily obtained from the AES Electronic Library.

The AES works closely with the European Broadcasting Union (EBU) and provides input to the ISO/IEC standards bodies.

AES3 - Serial Transmission For 2-channel Digital Audio

This standard describes how to transmit 2 channels of digital audio over a variety of different mediums. The supported audio format is linear Pulse Code Modulation (PCM) which is an uncompressed stream of samples. Sample sizes between 16 and 24 bits are supported. Other formats are possible but not described by AES3. See AES5 for the list of acceptable sample rates.

AES3 comprises four separate parts:

Part Description
1 Audio content semantics. Describing the sampling frequency based on AES5.
2 Metadata and sub-code data transmitted with the audio content such as channel-status, user and ancillary data. The use of pre-emphasis to enhance the audio is indicated in the channel status.
3 Unidirectional transport link framing and channel co-ordination. This also embeds a recoverable clock signal.
4 Physical and electrical signal levels and wiring.


The use of abbreviations in audio/visual contexts is sometimes ambiguous and overloaded with hidden meaning. For example, when the interface is described as AES rather than AES/EBU, the means of electrical connection might be different. This quotation from Ray Arthur Rayburn - a highly respected audio engineer in the AES community explains why:

“AES3 allows the use of transformer or transformerless interfaces, while the corresponding EBU standard requires the use of transformers. Therefore, it has become a common shorthand to say AES/EBU when the interface is transformer coupled, and AES3 when it is not or if the interface type is unknown.”

AES/EBU is described in the third edition of the EBU Tech 3250 document.

AES67 - High-Performance Streaming Audio-Over-IP Interoperability

The original intent for AES67 was to deliver professional quality audio over a high-performance IP network with less than 10 ms latency. Bridging diverse pre-existing audio networking systems to provide interoperability is also a core goal. This is suitable for sound reinforcement at live events.

High performance is feasible on existing local area networks (LAN). If suitable switching hardware is available, it can be supported widely across an enterprise.

These are the main features:

  • Based on existing and well-known IT standards described in IETF RFC documents.
  • Synchronization with boundary clock converters.
  • Streaming transport via RTP.
  • Session description with SDP.
  • Low-latency delivery of uncompressed audio.
  • Ideal for live, studio and broadcast situations.
  • Decentralized configuration and management of devices.
  • Coexists with other IT data traffic on the same network.

Prior to AES67, the available audio networking solutions were incompatible with one another. AES67 is designed to reconcile the needs of architectures designed by different manufacturers and facilitates interoperability between:

  • Dante.
  • Ravenna.
  • QLAN.
  • WheatNet-IP.
  • Livewire.

These topics are addressed by the standard:

Transport synchronization - A variety of techniques are discussed in Section 4 of the standard.

Media profiles - Standard IP networks must adhere to a media profile (see Annex A) to ensure timely delivery of packets.

Boundary clock converters - Networks using switching hardware that supports IEE PTP protocols can provide boundary clock conversion and should provide adequate performance for audio delivery.

AVB - Enhanced Ethernet Networks that conform to IEEE 802.1Q are described as Audio Video Bridging (AVB) and provide synchronization based on IEEE PTP. This is covered in Annexes C and D.

Media clocks - These are described in Section 5 and provide synchronization at the sample level. A media clock advances in sync with the sample rate. The same frequency should be used for the RTP clock.

Payload encoding - This is described in Section 7, which reiterates the limited range of three preferred sample rates from AES5 with two possible sample sizes. Packet sizes are determined primarily by how long the data in them would play for the given sample rate. AES67 describes these sample rates (derived from AES5):

  • 48 kHz.
  • 96 kHz.
  • 44.1 kHz.

The standardized sample sizes and formats are defined in great detail in these IETF RFC documents:

  • L16 - 16-bit linear format as defined in RFC 3551 clause 4.5.11.
  • L24 - 24-bit linear format as defined in RFC 3190 clause 4.

Channel count - Up to 120 channels of audio can be carried in a generic AES67 link. ST 2110-30 limits the number of channels depending on the conformance level of the receiving device. This may be as low as 4 channels at level AX and not more than 64 for level C.

SDP - Session Description Protocol provides discovery and connection management support. This includes keep-alive heartbeats to maintain connections. The discovery systems are described in Annex E. These include the AMWA NMOS IS-04 specification used by ST 2110.

IETF RFC references - Because this is a standard describing IP network transmission, there are many RFC documents cited in the normative references in Section 2 of the standard and more references are included in the bibliography in Annex H. Using the IETF specifications ensures compatibility with the rest of the IP network traffic.

AES5 - Preferred Sample Frequencies

This standard describes various sample rates and recommends 48 kHz at the outset because it is numerically easier to convert this to other sample rates. See Section 4.2 of the standard for an explanation. Sample rates at 96 and 44.1 kHz are also described.

There is an interesting paragraph on bandwidth (see Section 4.1) based on the Nyquist-Shannon sampling theorem.

Derived sample rates ranging from half to 8 times the basic sample rate are also described. There are tables listing the number of samples per frame of video at different frames per second rates vs. audio sample rates.

This is an important foundational standard referred to by AES3 and AES67 and other related documents.

AES11 - Synchronization Of Digital Audio Equipment

Multiple channels of audio must be carefully synchronized. The sample clocks governing when the source audio is captured must be accurately regulated. Any downstream processing needs to maintain the phase relationships between channels to avoid introducing unwanted audible artefacts. This is a complex topic and there are many solutions.

Equipment having an internal sample clock must be locked to an external source. AES11 describes this as a Digital Audio Reference Signal (DARS) which is delivered separately from the audio content (usually via a separate connection). AES5 describes multiples of (up to 8 times) the basic sample rate. The internal sample clock must be capable of reliably locking to all of these.

Alternative synchronization techniques can be used instead of DARS:

  • Embedded time signatures based on the packet header timestamps. This may drift out of sync with other streams.
  • Video reference syncing to frame start events.
  • GPS locked. This requires a separate receiver device and locks to real-world time.

AES11 describes the word clock (see Annex B). This synchronizes hardware devices (such as digital tape machines or CD players). The word clock governs the timing of each sample passing through the system and is derived from a centralized reference. This will be familiar to broadcast engineers who ensure that video across an enterprise is frame synchronous by distributing sync pulses from a reliable source.

The word clock is not the same as timecode. The word clock is integral to the sampling process and transmission of the digital audio where the timecode is a separate metadata service that describes the media being transmitted.

AES11 refers to AES5 and augments the sample rate descriptions with advice pertaining to video reference timing.

AES18 - Ancillary User Data Channel Format

Ancillary user specified metadata can be embedded within an AES3 audio stream. Messages can be any length. The only limitation is the maximum bitrate which caps the amount of data that can be inserted in addition to the audio payload. A long message could describe the entire asset with an abstract for display in an EPG. Shorter messages provide synchronous data such as:

  • Subtitle text.
  • Script cues.
  • Editing information.
  • Copyright assertions.
  • Performer credits.
  • Downstream switching instructions.

This is managed carefully to avoid delaying the audio content. Messages can be split and portions deferred to accommodate the bitrate capping limit.

Ancillary data adapts the High-level Data Link Control (HDLC) protocol originally defined in ISO 3309 (as defined in AES18). That standard has now been withdrawn and replaced by ISO 132239. HDLC is bi-directional, but in the context of AES3, the messages only travel one way with no handshaking.

Error resilience helps detect data corruption at the receiver. If necessary, important data could be delivered in a carousel-like structure and repeated periodically.

The standard lists many external references in the Annex C Bibliography. These date from the mid 1980's to the 1990's and cover radio text services which are now deployed worldwide. Because of the vintage, the specified character sets do not yet use Unicode. Text is constrained to 8-bit character codes as defined in ISO 4873. UTF-8 character encoding of Unicode text is briefly mentioned in the AES67 standard.

ST 2110-30 - RTP Streamed PCM AES67 Audio

The ST 2110-30 standard describes how to deliver uncompressed AES67 audio via RTP streams in an IP based studio. The delivery is supported by signaling metadata delivered using the Session Description Protocol (SDP). This metadata is necessary to receive and correctly interpret the stream.

SMPTE ST 2110-30 can be seen as a subset of AES67. Most of the requirements for stream transport, packet setup, and signaling are common to ST 2110-30 and AES67. ST 2110-30 profiles AES67 with these constraints to ensure more reliable interoperability:

  • Support of the PTP profile defined in SMPTE ST 2059-2 instead of that defined in AES67.
  • An offset value of zero between the media clock and the RTP stream clock.
  • Mandatory signaling to force a device to operate in PTP slave-only mode.
  • Support of IGMPv3 for multicasting. Refer to RFC 3376 for details.
  • RTCP messaging is tolerated but not mandated or required.
  • Receivers need not support the Session Initiation Protocol (SIP) or other connection management support.
  • The size of UDP datagrams (packets) is specified in ST 2110-10 and supersedes those in AES67.
  • SDP channel ordering syntax must follow RFC 3190 conventions.
  • The maximum number of channels is limited depending on the conformance level.

The AIMS Alliance has published a helpful white paper that describes how ST 2110-30 and AES67 interact. Download a copy of AES67-SMPTE-ST-2110-Commonalities-and-Constraints here:

https://aimsalliance.org/white-papers/

Channel Ordering

Channel ordering is described in a Session Description Protocol message with symbolic names. The receiver can use them to deduce how to unpack the received samples and reconstruct the correct channel mapping. If this SDP description is omitted the receiver will assume all channels are of an undefined type:

Symbol Channels Description
M 1 Single Monophonic.
DM 2 Dual Monophonic.
ST 2 Stereo-pair.
LtRt 2 Matrix Stereo.
51 6 Surround 5.1.
71 8 Surround 7.1.
222 24 Surround 22.2.
SGRP 4 SDI audio group.
U{xx} 1 - 64 Arbitrary number channels of an undefined type indicated by the value {xx} which must be between 01 and 64. Subject also to the overall 64 channel maximum in an ST 2110-30 implementation of an AES67 link.


Note that the channel ordering can be stacked. This example SDP fragment describes the first six channels as 5.1 surround format and the next two as a stereo-pair delivered alongside them:

channel-order=SMPTE2110.(51,ST)

This second example SDP fragment describes the first four channels as separate monophonic channels, and the next two channels as a stereo-pair and the last two channels as an undefined type:

channel-order=SMPTE2110.(M,M,M,M,ST,U02)

Conformance Levels

The ST 2110-30 standard describes receiver conformance levels that mandate how many streams must be supported. This is based on the sample rate vs. the packet times. The multiple of these determine how many channels can be carried within the available capacity. If the chosen level limits the number of channels to less than you need, multiple AES67 links can be delivered with the channels sensibly divided between them. If all 16 channels are arriving in an SDI stream but your receiver is only able to support Level A, you will need to transmit 8 channels in each of two separate AES67 links running side-by-side.

These are the three basic conformance levels:

Level Receiver must support
A 48 kHz incoming streams.
1 to 8 audio channels at packet times of 1.0 ms.
B 48 kHz incoming streams.
1 to 8 audio channels at packet times of 1.0 ms.
1 to 8 audio channels at packet times of 0.125 ms.
C 48 kHz incoming streams.
1 to 8 audio channels at packet times of 1.0 ms.
1 to 64 audio channels at packet times of 0.125 ms.

 

Level A is mandatory and must be supported by all receivers. This is also defined in AES67 as the minimum support. These are all based on 24-bit samples.

Levels B & C support shorter packet times to improve latency. They also support more channels for interoperability with MADI (AES10) systems.

Levels AX, BX and CX add support for 96 kHz sample rates but reduce the number of supported channels where necessary.

ST 2110-31 - AES3 Transparent Transport Over RTP

The ST 2110-31 standard describes real-time, RTP-based transport of any audio format that can be encapsulated into AES3.

The RTP packet header and payload format is described in Section 5. Packets are synchronized to a network reference clock described in ST 2110-10. Session Description Protocol (SDP) messages describe how the payload is constructed for the benefit of the receiver. See Section 6 of the standard.

This is all based on the original Ravenna AM824 specification and registered with the IANA as RTP Media Type 'AM824'. Refer to RFC 4855 and RFC 6838 for details.

Annex A of the standard describes how AES3 and AES10 (MADI) protocols interact. They are broadly compatible but some data needs to be correctly framed and some flag-bits must be adjusted within the packets as they move between the two environments.

Relevant Standards

These are the relevant standards you need to fully explore ST 2110-30 and ST 2110-31. The version indicates when they were revised or in the case of AES standards reaffirmed as being up to date. Note that the AES standards refer to some IEC standards that are more often identified as ISO standards. Where standards are superseded, the original reference (from AES or SMPTE) and the replacement standard are listed. The version column lists the most recent edition, amendment or reaffirmation of a standard:

Standard Version Description
AES3 - A specification for 2-channel digital audio interconnection. Commonly known as AES/EBU.
AES3-1 2009 Part 1: Audio Content.
AES3-2 2009 Part 2: Metadata and Sub-code.
AES3-3 2009 Part 3: Transport.
AES3-4 2009 Part 4: Physical and electrical.
AES5 2018 Preferred sampling frequencies for applications employing pulse-code modulation.
AES10 2020 Serial Multichannel Audio Digital Interface (MADI).
AES11 2020 Synchronization of digital audio equipment in studio operations.
AES14-1 1992 Part 1: Analog XLR pin-out polarity and gender.
AES31 - AES standard for network and file transfer of audio - Audio-file transfer and exchange.
AES31-1 2001 Part 1: Disk format.
AES31-2 2019 Part 2: File format for transferring digital audio data between systems of different type and manufacture.
AES31-3 2021 Part 3: Simple project interchanges.
AES31-4 2015 Part 4: XML Implementation of Audio Decision Lists.
AES42 2019 Digitally interfaced microphones.
AES47 2017 Transmission of digital audio over asynchronous transfer mode (ATM) networks.
AES51 2017 Transmission of ATM cells over Ethernet physical layer.
AES53 2018 Sample-accurate timing in AES47.
AES67 2018 High-performance streaming audio-over-IP interoperability.
AES70 - Open Control Architecture.
AES70-1 2018 Part 1: Framework.
AES70-2 2018 Part 2: Class structure.
AES70-3 2018 Part 3: OCP.1: Protocol for IP Networks.
AES74 2019 Requirements for Media Network Directories and Directory Services.
RP 168 2009 SMPTE Definition of Vertical Interval Switching Point for Synchronous Video Switching.
ST 318 2015 Synchronization of 59.94 or 50 Hertz related video and audio systems in analogue and digital areas.
ST 337 2015 Format for Non-PCM Audio and Data in an AES3 Serial Digital Audio Interface.
ST 338 2019 Format for Non-PCM Audio and Data in AES3 - Data Types.
ST 339 2015 Format for Non-PCM Audio and Data in AES3 - Generic Data Types.
ST 340 2015 Format for Non-PCM Audio and Data in AES3 - ATSC A/52B Digital Audio Compression Standard for AC-3 and Enhanced AC-3 Data Types.
ST 2036-2 2013 Ultra-High-Definition-Television - Audio Characteristics and Audio Channel Mapping for Program production.
ST 2059-2 2021 SMPTE Profile for Use of IEEE-1588 Precision Time Protocol in Professional Broadcast Applications.
ST 2067-8 2013 Interoperable Master Format - Common Audio Labels.
ST 2110-30 2017 PCM Digital Audio.
ST 2110-31 2022 AES3 Transparent Transport.
ST 2116 2019 Format for Non-PCM Audio and Data in AES3 - Carriage of Metadata of Serial ADM (Audio Definition Model).
ISO 3309 - Information processing systems, Data communications High-level data link control procedures and Frame structure. Referred to by AES18 but now withdrawn and replaced by ISO 13239.
ISO 13239 2002 High-level data link control (HDLC) procedures.
ISO 9314-1 1989 Part 1: Token Ring Physical Layer Protocol (PHY).
ISO 9314-3 1990 Fibre Distributed Data Interface (FDDI) - Part 3: Physical Layer Medium Dependent (PMD).
ISO 10646 2023 Information technology — Universal coded character set (UCS). Currently being revised.
IEC 60169-8 1978 Radio-frequency connectors. Part 8: R.F. coaxial connectors with inner diameter of outer conductor 6.5 mm (0.256 in) with bayonet lock (BNC). Replaced by 61169-8.
IEC 60958   Two channel digital audio data format used by S/PDIF and AES3.
IEC 60958-1 2021 Digital audio interface - Part 1: General.
IEC 60958-3 2006 Part 3: Consumer applications - Sony/Philips consumer optical digital interface (S/PDIF) based on AES3.
IEC 60958-4 2016 Digital audio interface - Part 4: Professional applications.
IEC 61169-8 2007 RF coaxial connectors.
IEC 61937 2021 Digital audio - Interface for non-linear PCM encoded audio bitstreams applying IEC 60958 - Surround sound digital audio data format.
IEC 61883-6 2014 Part 6: Audio and music data transmission protocol.
IEEE 1588-2008 2008 PTP - Precision Clock Synchronization Protocol for Networked Measurement and Control Systems.
RFC 3190 2002 RTP Payload Format for 12-bit DAT Audio and 20- and 24-bit Linear Sampled Audio.
RFC 3376 2002 Internet Group Management Protocol, Version 3.
RFC 3550 2003 RTP: A Transport Protocol for Real-Time Applications.
RFC 3551 2003 RTP Profile for Audio and Video Conferences with Minimal Control.
RFC 3629 2003 UTF-8, a transformation format of ISO 10646.
RFC 4566 2006 SDP: Session Description Protocol.
RFC 4855 2007 Media Type Registration of RTP Payload Formats.
RFC 6838 2013 Media Type Specifications and Registration Procedures.
RFC 7273 2014 RTP Clock Source Signaling.
EBU Tech 3250 2017 Specification of the digital audio interface (the AES/ EBU interface).
ITU-R BS.450-3 2001 Transmission standards for FM sound broadcasting at VHF (aka CCIR Rec 450-1).
ITU-R BS.647 2011 A digital audio interface for broadcasting studios.
ITU-T J.17 1988 Pre-emphasis used on sound-program circuits.
ITU-T J.53 2000 Sampling frequency to be used for the digital transmission of studio-quality and high-quality sound-program signals.
Ravenna AM824 2012 RTP Payload Format for AES3.
Rane Note 149 2014 Describes the differences between AES3 and S/PDIF and how to interface them correctly.
BBC WHP 074 IBC 2003 BBC White Paper - The development of ATM network technology for live production infrastructure.
IS-04 Version 1.3.2 NMOS Discovery & Registration.
IGMPv3 2002 See RFC 3376.

 

Refer to the appendices for a complete list of AES standards. Guideline documents have an 'id' suffix.

Conclusion

The ST 2110 and AES standards described here convey uncompressed digital audio around an IP network. Compressed audio formats such as AAC are currently outside the scope of these standards.

Although AES67 is an open standard, patent licensing may be necessary if you are designing a commercial product based on it. The principal patent holder is an Australian company called Audinate Pty Ltd. There may be other relevant patents that AES are unaware of.

The AES and SMPTE standards are easier to read and understand than MPEG standards. They tend to be much shorter and focused on a specific topic. They are very dependent on other earlier documents. SMPTE, AES and IETF RFC documents frequently refer to each other.

IETF standards are available online to download free of charge. SMPTE and AES standards are free downloads for subscribing members. Joining both societies is easy and worth the subscription fee if you intend purchasing more than a couple of their standards.

Part of a series supported by

You might also like...

Brazil Adopts ATSC 3.0 For NextGen TV Physical Layer

The decision by Brazil’s SBTVD Forum to recommend ATSC 3.0 as the physical layer of its TV 3.0 standard after field testing is a particular blow to Japan’s ISDB-T, because that was the incumbent digital terrestrial platform in the country. C…

Broadcasting Innovations At Paris 2024 Olympic Games

France Télévisions was the standout video service performer at the 2024 Paris Summer Olympics, with a collection of technical deployments that secured the EBU’s Excellence in Media Award for innovations enabled by application of cloud-based IP production.

Standards: Part 18 - High Efficiency And Other Advanced Audio Codecs

Our series on Standards moves on to discussion of advancements in AAC coding, alternative coders for special case scenarios, and their management within a consistent framework.

HDR & WCG For Broadcast - Expanding Acquisition Capabilities With HDR & WCG

HDR & WCG do present new requirements for vision engineers, but the fundamental principles described here remain familiar and easily manageable.

What Does Hybrid Really Mean?

In this article we discuss the philosophy of hybrid systems, where assets, software and compute resource are located across on-prem, cloud and hybrid infrastructure.