Standards: Part 17 - About AAC Audio Coding

Advanced Audio Coding improves on the MP3 Perceptual Coding solution to achieve higher compression ratios and better playback quality.

This article is part of our growing series on Broadcast Standards.
The first 26 articles are now available in Broadcast Standards – The Book.

The MP3 audio codec published in the MPEG-1 and MPEG-2 standards has been very successful. Subsequent research explored how to reduce the bitrate and deliver better quality. Fraunhofer IIS were deeply involved again, this time with a larger cohort of collaborators.

MPEG-2 part 7 introduces the Advanced Audio Coding (AAC) standard, which supersedes MP3. Coding algorithms have been improved and new tools have been introduced to achieve a better compression ratio. It can be used at a higher bit-rate for better quality but is still a lossy codec.

About The ISO Standards

MPEG audio compression is described by a series standards published and updated over several decades:

ISO 11172-3 - The origins of MPEG-1 Audio.
ISO 13818-3 - Enhancements with MPEG-2. MP1, MP2 & MP3 are backwards compatible.
ISO 13818-7 - Advanced Audio Coding (AAC) introduced as a non-backwards compatible codec.
ISO 14496-3 - Sub-part 4 introduces Object based General Audio Coding (GA) including AAC, TwinVQ and BSAC. TwinVQ is an alternative quantization tool. Bitrate Scalable Audio Coding (BSAC) is optional for reducing the size of the output bitstream.
ISO 15938 - Describes MPEG-7 metadata for tagging.
ISO 23091-3 - Coding Independent Code Points for Audio. Describes metadata relating to loudspeaker performance and playback characteristics.
ISO 23003 - MPEG-D describes how to apply the MPEG-Audio described in earlier standards. Consult this standard for additional background information.

The bitstream syntax must be reconciled across all of these standards to remove any ambiguity.

Improvements Over MP3

The AAC codec improves on MP3 in several important areas:

Compression algorithms.
Sample rates.
Multi-channel support.
Base and enhancement layer encoding.
Combining different encoders in MPEG-4.
Additional profiles.

Compression Algorithms

MP3 provided two different but related kinds of compression:

DCT - The Discrete Cosine Transform applied to a single frequency band.
MDCT - The Modified Discrete Cosine Transformation applied to overlapping groups of frequency bands.

AAC is implemented purely with MDCT which significantly improves compression efficiency but is not backwards compatible with MP3. The number of frequency sub-bands is increased to 1024. Calculating the energy level for masking threshold control is now much more accurate and fine-grained. This algorithm is explored in the previous article about MP3.

Sample Rates

Additional sample rates from 8kHz up to 96kHz extend the 16kHz to 48kHz range previously supported by MP3. Very low sample rates improve coding latency and are suitable for speech and telephony applications.

Multi-Channel Support

Where MP3 supports only 6 channels of audio, AAC supports many more:

48 full-audio channels.
16 channels of low frequency effects below 120Hz.
16 dialogue channels.
16 data streams.

Base & Enhancement Layered Encoding

New concepts introduce the idea of base layer encoding with optional enhancement layers using other codecs. For example, CELP (speech coding) can be improved by adding an AAC enhancement layer.

Combining Different Encoders

The General Audio (GA) coding environment described in MPEG-4 Part 3 adds two new encoders which can be used interchangeably with AAC:

TwinVQ - Suitable for very low bitrates.
BSAC Encoder - A scalable bitrate encoder with an error resilient bitstream.

The coding algorithms are particularly well described in the ISO 14496 Part 3 standard. Refer to sub-part 4 (around page 487) for a break down and description of each tool. The block diagram showing how these tools work in the encoder is especially helpful.

Profile Support

AAC is a modular codec with a variety of tools that can be optionally switched on when needed. MPEG-2 part 7 defines several profiles which configure the AAC encoder:

Low-Complexity Audio.
Main Audio.
Scalable Sampling Rate.

MPEG-4 introduces more profiles to address a wider range of applications. The coding tools configured by each profile are now redesigned and described as Audio Objects. Some profiles combine other codecs with AAC using the layered support and a few do not use AAC at all:

Main (updated).
AAC profile.
Long Term Prediction.
Scalable Audio.
Speech Audio.
Synthetic audio.
High quality audio.
Low delay audio.
Low Delay AAC.
Low Delay AAC V2.
Mobile Audio Inter Networking.
Natural audio.
High Definition AAC.
ALS Simple.

The entire menagerie of 42 audio objects (tools) are described in sub-section 1.5.1.2 of the standard. Study these objects to better understand the profiles. Audio objects are mapped to the profiles in Table 1.3.

Storing AAC Content In Files

MPEG standards describe these alternatives for storing AAC coded audio content in file containers:

ADIF - The Audio Data Interchange Format is used to store AAC coded audio on its own ADTS stream in a .aac file. This is initially defined in ISO 13818 part 7 and also discussed in ISO 14496 part 3. This format places all the data that controls the decoder into a single header that precedes the content stream. This is optimal for file exchange since it is available right away. Randomly seeking to different points in the stream is not supported during playback. Because the content is a raw encoded audio elementary stream, metadata tagging is also not supported.
MP4FF - The MPEG-4 File Format is described in ISO 14496 part 12. This format does support metadata tagging and is stored in .mp4 or .m4a files.
3GP - AAC audio can be carried in .3gp files since they are derived from the MPEG-4 Part-12 standard. Mobile applications require low bit rates and the AAC content should be coded accordingly.

Other containers such as QuickTime and Matroška can also be used.

File Extensions

These file extensions are relevant when coding MPEG AAC Audio:

File type	Details
.aac	Contains an ADTS stream of raw AAC coded content.
.mp4	A general-purpose digital media container to carry videos, images, timed text and subtitles. Based on MPEG-4 part 12 and derived from the Apple QuickTime .mov file format.
.m4a	Describes an MPEG4 Audio only file. Originally created by Apple for use with iTunes.
.m4b	Designed for use with Audio Book content.
.m4p	This is an .m4a AAC file that has been copy-protected with a proprietary Digital Rights Management (DRM) technology created by Apple for iTunes.
.m4r	An Apple iPhone ringtone container.
.m4v	An MPEG-4 video file which may also contain AAC audio.
.mpg	One of several file types used for MPEG-1 or MPEG-2 audio and video content. This describes an MPEG-1 or 2 program stream or an MPEG-2 transport stream. Audio coded with AAC can be stored in .mpg files but this is uncommon and not recommended.
.mov	QuickTime media platform container file. Typically contains a movie but could be an interactive multimedia presentation.
.3gp	Based on MPEG-4 Part 12. Originally designed for early mobile (feature) phones. This is the preferred file extension.
.3g2	A second-generation file format for low bitrate content.
.3ga	A variation of .3gp for audio only.
.3gpa	A variation of .3gp for audio only.
.3gpp	Mixed media format for mobile phone use.
.3gpp2	Mixed media format for mobile phone use.
.3gp2	Mixed media format for mobile phone use.

MIME Types

MIME types are registered for many different kinds of content. AAC coded audio should be delivered with the audio/aac MIME type so the receiving player can correctly determine the payload format.

Mime type	Status	Description
audio/aac	Preferred	The preferred default MIME type. Defined in ISO 13818-7 and ISO 14496-3.
audio/aacp	Next generation	Describes AAC Plus (HE-AAC).
audio/3gpp	Legacy	Used with feature phones and defined in RFC 3839.
audio/3gpp2	Legacy	Used with feature phones and defined in RFC 4393.
audio/mp4	Current	Described in RFC 4337 and updated in RFC 6381 to add ISO file containers.
audio/mp4a-latm	Current	RTP payload format suitable for teleconferencing. Described in RFC 3016 and updated in RFC 6416.
audio/mpeg4-generic	Current	RFC 3640 describes the RTP Payload Format for Transport of MPEG-4 Elementary Streams. Updated by RFC 6295.
audio/x-aac	Proprietary	Deprecated for use in new projects. Use audio/aac instead. Not registered with IANA.

The 'X-' MIME types are sometimes used to introduce new features into web browsers and other software. The prefix hides them for general use and allows experimentation with the features until they are confirmed to work. At that point the prefix is removed. They are not canonical and never registered with the IANA.

Tagging AAC Files

The .aac files are simple binary containers carrying raw encoded audio in an ADTS elementary stream. These cannot be tagged with metadata without breaking the bitstream syntax. To add metadata, encapsulate the AAC elementary streams inside an .mp4 or .m4a file and then add the tags. The conversion to MPEG-4 files breaks the ADTS stream into segments, which allows the essence and metadata packets to be interleaved.

Do not perform this conversion simply by renaming the file to change the file extension! The internal content will not be changed and the file now has an incorrect extension to describe the content.

Convert an .aac file to an MPEG container with the ffmpeg command-line tool like this:

ffmpeg -i input.aac -c:a copy output.m4a

The audio is properly transcoded into segments without being uncompressed first. This avoids introducing additional lossy artefacts from a recompression.

Then add the metadata tagging to the MPEG container file with ffmpeg or ExifTool.

Two alternative ID3 metadata tagging dictionaries are in popular use:

Generic (vanilla) ID3 metadata tags described by the de-facto specification.
Proprietary Apple iTunes extended ID3 metadata tags that enhance the generic specification.

The ISO 15938 (MPEG-7) standard describes an alternative tagging scheme for use with MPEG content. Where ID3 tags are simple name-value pairs, MPEG-7 is a bulkier XML structured format. ID3 is more popular.

AAC Implementations

Coding tools are often presented via a Graphical User Interface (GUI) wrapper for easier access. It may not be obvious at first, but the encoders are also accessible from the shell scripting environment. Command-line tools integrate more easily with workflow automation than graphical user interface applications.

Encoders supporting 8 channels are necessary for 7.1 surround-sound content. The 5.1 format can be encoded with 6 channels. This table is arranged in performance and functionality order.

Project	Chan	Description
Apple AAC	8	Part of QuickTime and iTunes but can be called to action with the afconvert command on a macOS system. It also integrates with the ffmpeg tool. This is thought to be the best performing encoder for general use.
Fraunhofer FDK AAC	8	Released as part of the Android project. It is an open-source library but may require license fees. This is a low latency version of the encoder. FDK can be integrated with the ffmpeg tool. It is recommended as a good quality encoder and is widely supported on different OS platforms.
fdkaac	8	A command-line tool built on top of the Fraunhofer FDK AAC software library.
ffmpeg/Libav fork	8	The ffmpeg project has incorporated improvements to make this a more stable coder. The VBR support is reckoned to be poor and some of the more sophisticated audio object types are unsupported.
Fraunhofer FhG AAC	6	Embedded inside Winamp on Windows but can be called to action from the command-line with the fhgaacenc command. This is developed by an entirely different team and uses a different mathematical technique compared to the FDK encoder.
Nero AAC	6	Free for non-commercial use. Only available on Windows and Linux. Unsupported since 2010. The neroAacEnc command-line tool converts .wav files into .mp4 files containing AAC audio.
FAAC	6	Partly open-sourced with proprietary components. The CBR support is reckoned to be inadequate. Based on the MPEG reference code published as part of the ISO standard.
Microsoft MFT AAC	6	The supported channel count varies depending on the Windows OS this is hosted on.
Libav	2	Stereo only. Not as up to date as the ffmpeg fork of this project. Can be used as a foundation to build command-line tools.
VisualOn AAC	2	Poorly performing CBR performance and no support for VBR. This project is declared to be open-source but that is unconfirmed from a patents and legal perspective.

Note that ffmpeg is both a command-line tool (ffmpeg) and an open-source project (ffmpeg).

The Apple AAC encoder is considered to be the best implementation for medium bit rate scenarios. The CBR and VBR support is exceedingly good. It was originally part of the QuickTime media framework but has now been moved into the AV Foundation and AudioToolbox. This is all part of what Apple calls CoreAudio. For many workflow automation situations, a macOS based processing node running this encoder will be an optimal solution.

Fraunhofer codecs are very versatile and technically the best. This is because Fraunhofer was one of the key developers of the psychoacoustic approach to audio coding. Deploying the Fraunhofer FDK AAC encoder on a Linux platform would be a very good solution but be careful to investigate and pay the licensing fees if necessary.

The Nero encoder is also highly recommended but is not being actively developed any further. It has not been revised since 2010. Whilst it is good for niche situations it is not recommended for new project deployments.

The rest of the AAC implementations are somewhat lacking in their support for all the different modes of operation. There are a few implementations based on the libraries developed by Coding Technologies. They collaborated with Fraunhofer on the AAC research. These may incur license fees when deployed.

Deploying The Apple AAC Encoder In A Workflow

Deploy a processing node in your workflow based on a single Macintosh computer to call the Apple AAC encoder to action from the command-line. Running a scheduled task to check a watch folder for input files is easy to implement. The macOS environment supports all the tools you need to pass the output to the next stage of the workflow.

In more recent versions of macOS the afconvert command has a very simple syntax:

afconvert {options} {input file} {output file}

Open the Terminal app and type this to see all the supported options on your macOS platform:

afconvert -hf

This will provide some additional explanations about the options:

afconvert -h

The ffmpeg tool will invoke the Apple AAC encoder when it is installed and run on a macOS platform. This may be useful for adding ID3 metadata tags after encoding.

The Apple encoder library is present on a Windows platform if it has been installed as part of a QuickTime or iTunes installation. The QAAC open-source command-line tool calls it to action. Use this instead of afconvert which is not supported on Windows.

Alternatively, deploy the Apple Compressor application on a macOS system configured as a server node in your workflow infrastructure.

Related Standards

Consult these standards documents for background information:

Standard	Version	Description
ISO 11172-3	1996	MPEG-1 Part 3 - Audio is the foundation on which the earliest MPEG audio coding is built. The latest version is dated 1993 with a corrigendum published in 1996.
ISO 13818-1	2023	MPEG-2 Systems. Describes packaging and stream structures. An amendment to the codec parameters is in progress.
ISO 13818-3	1998	MPEG-2 Audio. This is definitive for Layers I, II and III (MP1, MP2 and MP3).
ISO 13818-7	2010	Describes MPEG-2 Advanced Audio Coding (AAC). Published in 2006 with revisions added in 2010.
ISO 14496-1	2014	MPEG-4 Systems and original container format. Published in 2010 with corrections added in 2014. A new version is under development.
ISO 14496-3	2019	MPEG-4 Audio. Describes how to combine AAC with other codecs.
ISO 14496-6	2000	MPEG-4 Delivery Multimedia Interface Format (DMIF).
ISO 14496-12	2022	MPEG-4 file format. A new version is under development.
ISO 14496-14	2020	MPEG-4 version 2 file format.
ISO 15938-1	2006	MPEG-7 Systems. Originally published in 2002 and updated in 2006.
ISO 15938-2	2002	MPEG-7 Descriptions Definition Language (DDL).
ISO 15938-4	2006	MPEG-7 Metadata for audio. Originally published in 2002 and updated in 2006.
ISO 15938-8	2011	Extraction and use of MPEG-7 metadata descriptions. Originally published in 2002 and updated in 2011.
ISO 15938-9	2012	MPEG-7 Profiles & Levels. Originally published in 2005 and updated in 2012.
ISO 15938-10	2007	MPEG-7 Schema definition. Originally published in 2005 and updated in 2007.
ISO 15938-11	2012	MPEG-7 Profile schemas. Originally published in 2005 and updated in 2012.
ISO 15938-12	2012	MPEG-7 Query format.
ISO 21000	Various	MPEG-21 describes mechanisms for access control for multimedia content. This set of standards is under review with a new version expecting to be published.
ISO 23001-8	Withdrawn	Coding-independent code points. This is superseded by ISO 23091.
ISO 23003-1	2017	MPEG Surround for multi-channel audio. Originally published in 2007 and updated in 2017.
ISO 23003-2	2008	Spatial Audio Object Coding (SAOC).
ISO 23003-4	2020	Dynamic Range Control. A new version is being prepared.
ISO 23091-3	2022	Coding Independent Code Points for Audio. Playback controlling metadata. Originally published in 2018 and updated in 2022.
ITU-T Rec. H.222.0	2022	See ISO 13818-1.
ETSI TS 126 244	2008	Defines the .3gp container file format. Freely available to download from the ETSI.org web site.

Conclusion

AAC coding will be subject to patent licensing fees until 2030. For the time being you will need to contact a patent pool for a license if you build and distribute an encoder or player implemented in hardware or software. The coded bitstreams transmitted to end-users are free of any licensing obligations.

This is not the whole AAC story. High Efficiency AAC was developed to improve the performance still further. That is sometimes called AAC+.

There is also more to study and understand in the MPEG-4 (ISO 14496) standard. MPEG is also reorganizing the collection of standards and MPEG-D (ISO 23003) has some relevant material and ISO 23091-3 is helpful for player design.

These Appendix articles contain additional information you may find useful:

Part of a series supported by

You might also like...

Building Software Defined Infrastructure: Monitoring Microservices

Breaking production systems into individual microservice based processors, requires monitoring over IP via RESTful APIs and a database system to capture the results.

Monitoring & Compliance In Broadcast: Monitoring QoS & QoE To Power Monetization

Measuring Quality of Experience (QoE) as perceived by viewers has become critical for monetization both from targeted advertising and direct content consumption.

IP Monitoring & Diagnostics With Command Line Tools: Part 5 - Using Shell Scripts

Shell scripts enable you to edit your diagnostic and monitoring commands into a script file so they can be repeated without needing to type them manually every time. Shell scripts also offer some unique and powerful features that help to…

Building Software Defined Infrastructure: Observability In Microservice Architecture

Building dynamic microservices based infrastructure introduces the potential for variable latency which brings new monitoring challenges that require an understanding of observability.

Broadcast Standards: Kubernetes & The Architecture Of Cloud Compute Based Systems

Here we describe Kubernetes and the taxonomy of containerized architecture based cloud compute system designs it manages.