Standards: Part 5 - Standards For Audio Coding

This article describes the various AES, MPEG, Proprietary and Open Standards that pertain to audio.


This article is part of our growing series on Standards.
There is an overview of all 26 articles in Part 1 -  An Introduction To Standards.


Audio production follows a similar workflow concept to video but the tools and container files are slightly different. The necessary computing and storage capacity is also reduced. Within broadcast workflows the management of audio content can be approached as additional tracks within the video container or separately in a specialized audio container. In a radio or podcast production workflow, there is no accompanying video.

Some file formats that store audio efficiently are useful when you ingest and file new recordings in a digital librarian system. The audio samples should be uncompressed to avoid artefacts. The files will support some metadata tagging of the content. Additional metadata goes into the content management database.

In addition to the summaries below you will find a far more comprehensive listing of the AES Standards & Recommended Practices, AES Information Documents and AES Project Reports in Appendix H.

Useful Standards For Audio Recording & Production

There are several sources of international standards for recording audio which benefit from the knowledge and experience of many industry experts:

  • AES - Audio Engineering Society
  • EBU - European Broadcasting Union
  • MPEG - Motion Picture Experts Group
  • SMPTE - Society of Motion Picture and Television Engineers

The MPEG standards are managed by ISO and are obtained through the online storefront. AES standards are available directly from the society where members can enjoy a discounted price.

Proprietary standards are embedded in the production tools. These will store their project assets in a more compact form but need exporting for more portable use downstream in the workflow.

License-free open-source standards and tools are a very attractive solution.

Relevant AES Standards

The Audio Engineering Society (AES) was established in 1948 and has been publishing standards since 1977.

AES strives to avoid the use of patented technologies or requires the patent holder to allow their use on a minimal or zero fee basis. The society also collaborates with other standards bodies such as the SMPTE, ISO, IEC, BSI and EBU.

The SMPTE ST2110 specification deploys AES standards in the context of an IP driven studio workflow. Data formats and transmission are covered by AES while ST 2110 describes how to apply them in a practical situation.

These AES standards are particularly relevant to an IP based workflow but you may find some of the others are useful too:

Number Description
AES 3 Used for digital audio interconnection and also known as AES/EBU.
AES 10 Describes multi-channel digital audio interconnection and generically referred to as MADI.
AES 11 Describes digital audio synchronisation.
AES 31 A file format for exchanging audio data between systems and applications.
AES 50 Multi-channel audio over Ethernet.
AES 52 Describes how to insert of unique identifiers into AES 3 digital audio content.
AES 67 Interoperability of Audio over IP networks.
AES 70 Open Control Architecture.

 

Earlier AES standards are based on Asynchronous Transfer Mode (ATM) networks. An ATM network can carry voice and data simultaneously. Ethernet can only carry data but Voice over IP (VoIP) supports telephony applications as well.


AES 47 & 51 describe how to transmit audio over ATM networks.


Relevant MPEG Standards

This is a short list of the individual parts of the MPEG standards that are directly related to Audio processing. Some of these will have had contributions from AES and EBU experts. Some standards define audio modelling strategies where the audio is described algorithmically rather than as direct samples of recorded sound. The MPEG standards focus on coding techniques and storage container formats.


The MPEG standards may require patent license fees to be paid.


Standard ISO Part No. Description
MPEG-1 ISO 11172-3 Audio - Layers 1, 2 & 3 (mp1, mp2, mp3).
MPEG-2 ISO 13818-3 Audio - Adds lower bit rates and Multi-channel support to MPEG-1.
MPEG-2 ISO 13818-7 AAC - Advanced Audio Coding.
MPEG-4 ISO 14496-14 MP4 File Format.
MPEG-4 ISO 14496-15 AVC File Format.
MPEG-4 ISO 14496-23 Symbolic Music Representation (SMR).
MPEG-4 ISO 14496-24 Audio and systems interaction.
MPEG-4 ISO 14496-26 Audio Conformance.
MPEG-4 ISO 14496-3 Audio (Many subparts describing complex audio coding strategies).
MPEG-4 ISO 14496-8 Carriage over IP.
MPEG-7 ISO 15938-4 Specification for audio descriptors in a multimedia content description interface.
MPEG-A ISO 23000-12 Interactive music application format.
MPEG-A ISO 23000-2 MPEG music player application format.
MPEG-A ISO 23000-4 Musical slide show application format.
MPEG-D ISO 23003-1 MPEG Surround.
MPEG-D ISO 23003-2 Spatial Audio Object Coding (SAOC).
MPEG-D ISO 23003-3 Unified speech and audio coding.
MPEG-D ISO 23003-4 Dynamic range control.
MPEG-D ISO 23003-5 Uncompressed audio in MPEG-4 file format.
MPEG-D ISO 23003-6 Unified speech and audio coding reference software.
MPEG-D ISO 23003-7 Unified speech and audio coding conformance testing.
MPEG-H ISO 23008-3 3D Audio.
MPEG-H ISO 23008-6 3D Audio Reference Software.
MPEG-H ISO 23008-9 3D Audio Conformance Testing.
MPEG-CICP ISO 23091-3 Coding Independent Code Point descriptions for audio content.

 

Proprietary Standards

These are some proprietary container formats described here as file-type extensions. The license-fees depend on how they are used and deployed and what the target platforms are. The license-fees are usually included in the purchase of the tools or hardware used to create them. Some of these are platform specific which makes them less portable. They might be designed to carry combined video and audio but can also be used in audio only scenarios.

Extension Format
ac3 Dolby AC3 surround sound file.
aif See AIFF.
aifc Compressed AIFF file.
aiff Audio Interleave File Format extracted from a CD. Designed by Apple and based on IFF.
alac Apple Lossless Audio Codec.
asf Advanced Systems Format (alternative to wmv).
avi Audio Video Interleave.
caf Apple Lossless Audio (ALAC) files (uncommon).
dts Digital Theatre Systems sound file.
evo Enhanced VOB.
f4v Flash Video file with H•264 video & AAC audio.
flv Flash Video file containing SWF encoded content. Deprecated and should not be used for new projects.
iff Electronic Arts Interchange File Format.
mov QuickTime File Format.
qt Early QuickTime File Format (rarely used now).
rmvb RealMedia Variable Bitrate file.
vob DVD Video Object.

 

Open Standards

Open-source codecs and storage container files offer many advantages. They are supported by a community of enthusiastic developers and perform well. They are ported to virtually every platform. Because the supporting source-code is available, you can customize them or port them to new platforms very easily. Open-source projects actively seek to avoid patents and license-fees so they are also attractive commercially.


If you benefit from their technology, then an occasional donation to support them would be good. This will ensure the project continues to thrive. Open does not mean free but the choice to pay is optional.


Extension Format
ape Monkey Audio file.
flac Free Lossless Audio Codec (FLAC) coded audio.
mka Matroška audio.
mpc Musepack audio file.
mxf Material Exchange Format.
ofr OptimFROG lossless coded audio.
oga Ogg audio file.
ogg Ogg audio/video file.
ogm Ogg media file.
opus An Ogg format container containing Opus coded audio.
wav WAV audio file. These are often used in radio broadcasting.
wave See wav.
webm WebM based on the Matroška format.

 

Tools & Software Apps

There is a diverse and sophisticated array of audio production tools available and many of the most popular tools are platform specific. Digital Audio Workstations and other tools aimed primarily at music production can be used very effectively for broadcast audio editing and post-production. Most of the main video post-production platforms offer increasingly sophisticated, tightly integrated, audio editing and production tools. Most professional software supports most of the commonly used standards, but if you need to use or deliver specific file formats it is wise to ensure that the tools you select are compatible before embarking on any project. Many platforms are supported on MacOS and Windows but not on Linux.

Deploying open-source audio tools instead is appropriate in these scenarios:

  • Editing on Linux workstations
  • Portability across all the major platforms
  • Conversion between formats
  • Detailed analysis of the content
  • Diagnosis of problems

The ffmpeg tool is often thought of as a video-conversion tool but it also has powerful support for audio file conversions too. Useful analysis tools are built-in and accessible from the command-line-interface.

Choosing An Appropriate Sample-rate

Harry Nyquist suggested the sample-rate should be at least twice the highest audible frequency to capture all the content. The bit-depth is also important in avoiding a staircase effect due to quantisation. This introduces harmonics that are well above the audible range and are removed by a post-processing filter on playback.

If the sample-rate is too low, then ghost frequencies that were not in the original recording will appear when the samples are rendered as output audio. This is called aliasing and should be avoided by choosing a high enough sample-rate.

The deployment sample-rate should be 44.1 kHz for audio only projects and 48 kHz if you want to embed the audio into a video project. Stick to one of these throughout your project to avoid unnecessary sample-rate conversions.

If you have sufficient storage and fast enough computers, work at twice or four times your deployment sample-rate. Filter, mix and process your audio with effects, then down-sample the finished recording without degradation.

Quality Control

Use uncompressed formats for editing and preparing content. Compressing the audio introduces artefacts which are not obvious at first but multiple compress-uncompress cycles will compound them distorting the output.

The sound playback must be continuous and consistent. Your customers can tolerate dropouts in the visual content far more easily than losing the sound momentarily. A programme with perfect video and intermittent sound is far less enjoyable than lower quality video with robust soundtracks.

The compression algorithm in MP3 discards sounds that the human ear will theoretically not hear. For example, soft violins during a loud cymbal crash. The human ear is very good at reconstructing that missing sound component so theoretically we don’t notice it. At the highest bit rates the result can be quite effective, but reducing the bit rate will degrade the quality of the audio. Do not use MP3 as a production or archival format. Retain the original raw-uncompressed source material in the archives.

Deployment

At the very final deployment stage, you will want to compress the audio for streaming or downloading.

For an audio-only deployment, MP3 is very widely supported but not as good quality as AAC which delivers higher quality for the same bitrate. HE-AAC is even better if your target devices support it.

Conclusion

Consider the codec and container formats for your audio content as different choices. Some combinations are mandated or prohibited so choose carefully.

For optimum portability, FLAC or Matroška containers are useful in a production workflow. On the other hand, if your workstations are all Apple based, the ALAC or AIFF formats are appropriate.

Part of a series supported by

You might also like...

Live Sports Production: Part 1 - New Sports Production Workflows

Welcome to Part 1 of ‘Live Sports Production’ - This new multi-part series uses a round table style format to explore the technology of live sports production with some of the industry’s leading system designers. It is a fascinating insight i…

IP Security For Broadcasters: Part 8 - RADIUS Network Access

Maintaining controlled access is critical for any secure network, especially when working with high-value media in broadcast environments.

Microphones: Part 5 - The Variable Directivity Microphone

The variable directivity microphone is very popular for studio work. What goes on inside is very clever and not widely appreciated.

IP Security For Broadcasters: Part 7 - Operating Systems

As well as providing the core functionality of a computer, operating systems have the potential to be a primary issue for security and keeping hackers at bay.

Deep Learning Accelerates Object Tracking In TV Production

Advances in application motion tracking in audiovisual production, both live and recorded, have been slow until recently accelerated by the advent of modern AI techniques associated with neural network based deep learning and mathematical graph theory. These advances have converged…