Standards: Part 17 - About AAC Audio Coding
Advanced Audio Coding improves on the MP3 Perceptual Coding solution to achieve higher compression ratios and better playback quality.
This article is part of our growing series on Standards.
There is an overview of all 26 articles in Part 1 - An Introduction To Standards.
The MP3 audio codec published in the MPEG-1 and MPEG-2 standards has been very successful. Subsequent research explored how to reduce the bitrate and deliver better quality. Fraunhofer IIS were deeply involved again, this time with a larger cohort of collaborators.
MPEG-2 part 7 introduces the Advanced Audio Coding (AAC) standard, which supersedes MP3. Coding algorithms have been improved and new tools have been introduced to achieve a better compression ratio. It can be used at a higher bit-rate for better quality but is still a lossy codec.
About The ISO Standards
MPEG audio compression is described by a series standards published and updated over several decades:
- ISO 11172-3 - The origins of MPEG-1 Audio.
- ISO 13818-3 - Enhancements with MPEG-2. MP1, MP2 & MP3 are backwards compatible.
- ISO 13818-7 - Advanced Audio Coding (AAC) introduced as a non-backwards compatible codec.
- ISO 14496-3 - Sub-part 4 introduces Object based General Audio Coding (GA) including AAC, TwinVQ and BSAC. TwinVQ is an alternative quantization tool. Bitrate Scalable Audio Coding (BSAC) is optional for reducing the size of the output bitstream.
- ISO 15938 - Describes MPEG-7 metadata for tagging.
- ISO 23091-3 - Coding Independent Code Points for Audio. Describes metadata relating to loudspeaker performance and playback characteristics.
- ISO 23003 - MPEG-D describes how to apply the MPEG-Audio described in earlier standards. Consult this standard for additional background information.
The bitstream syntax must be reconciled across all of these standards to remove any ambiguity.
Improvements Over MP3
The AAC codec improves on MP3 in several important areas:
- Compression algorithms.
- Sample rates.
- Multi-channel support.
- Base and enhancement layer encoding.
- Combining different encoders in MPEG-4.
- Additional profiles.
Compression Algorithms
MP3 provided two different but related kinds of compression:
- DCT - The Discrete Cosine Transform applied to a single frequency band.
- MDCT - The Modified Discrete Cosine Transformation applied to overlapping groups of frequency bands.
AAC is implemented purely with MDCT which significantly improves compression efficiency but is not backwards compatible with MP3. The number of frequency sub-bands is increased to 1024. Calculating the energy level for masking threshold control is now much more accurate and fine-grained. This algorithm is explored in the previous article about MP3.
Sample Rates
Additional sample rates from 8kHz up to 96kHz extend the 16kHz to 48kHz range previously supported by MP3. Very low sample rates improve coding latency and are suitable for speech and telephony applications.
Multi-Channel Support
Where MP3 supports only 6 channels of audio, AAC supports many more:
- 48 full-audio channels.
- 16 channels of low frequency effects below 120Hz.
- 16 dialogue channels.
- 16 data streams.
Base & Enhancement Layered Encoding
New concepts introduce the idea of base layer encoding with optional enhancement layers using other codecs. For example, CELP (speech coding) can be improved by adding an AAC enhancement layer.
Combining Different Encoders
The General Audio (GA) coding environment described in MPEG-4 Part 3 adds two new encoders which can be used interchangeably with AAC:
- TwinVQ - Suitable for very low bitrates.
- BSAC Encoder - A scalable bitrate encoder with an error resilient bitstream.
The coding algorithms are particularly well described in the ISO 14496 Part 3 standard. Refer to sub-part 4 (around page 487) for a break down and description of each tool. The block diagram showing how these tools work in the encoder is especially helpful.
Profile Support
AAC is a modular codec with a variety of tools that can be optionally switched on when needed. MPEG-2 part 7 defines several profiles which configure the AAC encoder:
- Low-Complexity Audio.
- Main Audio.
- Scalable Sampling Rate.
MPEG-4 introduces more profiles to address a wider range of applications. The coding tools configured by each profile are now redesigned and described as Audio Objects. Some profiles combine other codecs with AAC using the layered support and a few do not use AAC at all:
- Main (updated).
- AAC profile.
- Long Term Prediction.
- Scalable Audio.
- Speech Audio.
- Synthetic audio.
- High quality audio.
- Low delay audio.
- Low Delay AAC.
- Low Delay AAC V2.
- Mobile Audio Inter Networking.
- Natural audio.
- High Definition AAC.
- ALS Simple.
The entire menagerie of 42 audio objects (tools) are described in sub-section 1.5.1.2 of the standard. Study these objects to better understand the profiles. Audio objects are mapped to the profiles in Table 1.3.
Storing AAC Content In Files
MPEG standards describe these alternatives for storing AAC coded audio content in file containers:
- ADIF - The Audio Data Interchange Format is used to store AAC coded audio on its own ADTS stream in a .aac file. This is initially defined in ISO 13818 part 7 and also discussed in ISO 14496 part 3. This format places all the data that controls the decoder into a single header that precedes the content stream. This is optimal for file exchange since it is available right away. Randomly seeking to different points in the stream is not supported during playback. Because the content is a raw encoded audio elementary stream, metadata tagging is also not supported.
- MP4FF - The MPEG-4 File Format is described in ISO 14496 part 12. This format does support metadata tagging and is stored in .mp4 or .m4a files.
- 3GP - AAC audio can be carried in .3gp files since they are derived from the MPEG-4 Part-12 standard. Mobile applications require low bit rates and the AAC content should be coded accordingly.
Other containers such as QuickTime and Matroška can also be used.
File Extensions
These file extensions are relevant when coding MPEG AAC Audio:
File type | Details |
---|---|
.aac | Contains an ADTS stream of raw AAC coded content. |
.mp4 | A general-purpose digital media container to carry videos, images, timed text and subtitles. Based on MPEG-4 part 12 and derived from the Apple QuickTime .mov file format. |
.m4a | Describes an MPEG4 Audio only file. Originally created by Apple for use with iTunes. |
.m4b | Designed for use with Audio Book content. |
.m4p | This is an .m4a AAC file that has been copy-protected with a proprietary Digital Rights Management (DRM) technology created by Apple for iTunes. |
.m4r | An Apple iPhone ringtone container. |
.m4v | An MPEG-4 video file which may also contain AAC audio. |
.mpg | One of several file types used for MPEG-1 or MPEG-2 audio and video content. This describes an MPEG-1 or 2 program stream or an MPEG-2 transport stream. Audio coded with AAC can be stored in .mpg files but this is uncommon and not recommended. |
.mov | QuickTime media platform container file. Typically contains a movie but could be an interactive multimedia presentation. |
.3gp | Based on MPEG-4 Part 12. Originally designed for early mobile (feature) phones. This is the preferred file extension. |
.3g2 | A second-generation file format for low bitrate content. |
.3ga | A variation of .3gp for audio only. |
.3gpa | A variation of .3gp for audio only. |
.3gpp | Mixed media format for mobile phone use. |
.3gpp2 | Mixed media format for mobile phone use. |
.3gp2 | Mixed media format for mobile phone use. |
MIME Types
MIME types are registered for many different kinds of content. AAC coded audio should be delivered with the audio/aac MIME type so the receiving player can correctly determine the payload format.
Mime type | Status | Description |
---|---|---|
audio/aac | Preferred | The preferred default MIME type. Defined in ISO 13818-7 and ISO 14496-3. |
audio/aacp | Next generation | Describes AAC Plus (HE-AAC). |
audio/3gpp | Legacy | Used with feature phones and defined in RFC 3839. |
audio/3gpp2 | Legacy | Used with feature phones and defined in RFC 4393. |
audio/mp4 | Current | Described in RFC 4337 and updated in RFC 6381 to add ISO file containers. |
audio/mp4a-latm | Current | RTP payload format suitable for teleconferencing. Described in RFC 3016 and updated in RFC 6416. |
audio/mpeg4-generic | Current | RFC 3640 describes the RTP Payload Format for Transport of MPEG-4 Elementary Streams. Updated by RFC 6295. |
audio/x-aac | Proprietary | Deprecated for use in new projects. Use audio/aac instead. Not registered with IANA. |
The 'X-' MIME types are sometimes used to introduce new features into web browsers and other software. The prefix hides them for general use and allows experimentation with the features until they are confirmed to work. At that point the prefix is removed. They are not canonical and never registered with the IANA.
Tagging AAC Files
The .aac files are simple binary containers carrying raw encoded audio in an ADTS elementary stream. These cannot be tagged with metadata without breaking the bitstream syntax. To add metadata, encapsulate the AAC elementary streams inside an .mp4 or .m4a file and then add the tags. The conversion to MPEG-4 files breaks the ADTS stream into segments, which allows the essence and metadata packets to be interleaved.
Do not perform this conversion simply by renaming the file to change the file extension! The internal content will not be changed and the file now has an incorrect extension to describe the content.
Convert an .aac file to an MPEG container with the ffmpeg command-line tool like this:
ffmpeg -i input.aac -c:a copy output.m4a
The audio is properly transcoded into segments without being uncompressed first. This avoids introducing additional lossy artefacts from a recompression.
Then add the metadata tagging to the MPEG container file with ffmpeg or ExifTool.
Two alternative ID3 metadata tagging dictionaries are in popular use:
- Generic (vanilla) ID3 metadata tags described by the de-facto specification.
- Proprietary Apple iTunes extended ID3 metadata tags that enhance the generic specification.
The ISO 15938 (MPEG-7) standard describes an alternative tagging scheme for use with MPEG content. Where ID3 tags are simple name-value pairs, MPEG-7 is a bulkier XML structured format. ID3 is more popular.
AAC Implementations
Coding tools are often presented via a Graphical User Interface (GUI) wrapper for easier access. It may not be obvious at first, but the encoders are also accessible from the shell scripting environment. Command-line tools integrate more easily with workflow automation than graphical user interface applications.
Encoders supporting 8 channels are necessary for 7.1 surround-sound content. The 5.1 format can be encoded with 6 channels. This table is arranged in performance and functionality order.
Project | Chan | Description |
---|---|---|
Apple AAC | 8 | Part of QuickTime and iTunes but can be called to action with the afconvert command on a macOS system. It also integrates with the ffmpeg tool. This is thought to be the best performing encoder for general use. |
Fraunhofer FDK AAC | 8 | Released as part of the Android project. It is an open-source library but may require license fees. This is a low latency version of the encoder. FDK can be integrated with the ffmpeg tool. It is recommended as a good quality encoder and is widely supported on different OS platforms. |
fdkaac | 8 | A command-line tool built on top of the Fraunhofer FDK AAC software library. |
ffmpeg/Libav fork | 8 | The ffmpeg project has incorporated improvements to make this a more stable coder. The VBR support is reckoned to be poor and some of the more sophisticated audio object types are unsupported. |
Fraunhofer FhG AAC | 6 | Embedded inside Winamp on Windows but can be called to action from the command-line with the fhgaacenc command. This is developed by an entirely different team and uses a different mathematical technique compared to the FDK encoder. |
Nero AAC | 6 | Free for non-commercial use. Only available on Windows and Linux. Unsupported since 2010. The neroAacEnc command-line tool converts .wav files into .mp4 files containing AAC audio. |
FAAC | 6 | Partly open-sourced with proprietary components. The CBR support is reckoned to be inadequate. Based on the MPEG reference code published as part of the ISO standard. |
Microsoft MFT AAC | 6 | The supported channel count varies depending on the Windows OS this is hosted on. |
Libav | 2 | Stereo only. Not as up to date as the ffmpeg fork of this project. Can be used as a foundation to build command-line tools. |
VisualOn AAC | 2 | Poorly performing CBR performance and no support for VBR. This project is declared to be open-source but that is unconfirmed from a patents and legal perspective. |
Note that ffmpeg is both a command-line tool (ffmpeg) and an open-source project (ffmpeg).
The Apple AAC encoder is considered to be the best implementation for medium bit rate scenarios. The CBR and VBR support is exceedingly good. It was originally part of the QuickTime media framework but has now been moved into the AV Foundation and AudioToolbox. This is all part of what Apple calls CoreAudio. For many workflow automation situations, a macOS based processing node running this encoder will be an optimal solution.
Fraunhofer codecs are very versatile and technically the best. This is because Fraunhofer was one of the key developers of the psychoacoustic approach to audio coding. Deploying the Fraunhofer FDK AAC encoder on a Linux platform would be a very good solution but be careful to investigate and pay the licensing fees if necessary.
The Nero encoder is also highly recommended but is not being actively developed any further. It has not been revised since 2010. Whilst it is good for niche situations it is not recommended for new project deployments.
The rest of the AAC implementations are somewhat lacking in their support for all the different modes of operation. There are a few implementations based on the libraries developed by Coding Technologies. They collaborated with Fraunhofer on the AAC research. These may incur license fees when deployed.
Deploying The Apple AAC Encoder In A Workflow
Deploy a processing node in your workflow based on a single Macintosh computer to call the Apple AAC encoder to action from the command-line. Running a scheduled task to check a watch folder for input files is easy to implement. The macOS environment supports all the tools you need to pass the output to the next stage of the workflow.
In more recent versions of macOS the afconvert command has a very simple syntax:
afconvert {options} {input file} {output file}
Open the Terminal app and type this to see all the supported options on your macOS platform:
afconvert -hf
This will provide some additional explanations about the options:
afconvert -h
The ffmpeg tool will invoke the Apple AAC encoder when it is installed and run on a macOS platform. This may be useful for adding ID3 metadata tags after encoding.
The Apple encoder library is present on a Windows platform if it has been installed as part of a QuickTime or iTunes installation. The QAAC open-source command-line tool calls it to action. Use this instead of afconvert which is not supported on Windows.
Alternatively, deploy the Apple Compressor application on a macOS system configured as a server node in your workflow infrastructure.
Related Standards
Consult these standards documents for background information:
Standard | Version | Description |
---|---|---|
ISO 11172-3 | 1996 | MPEG-1 Part 3 - Audio is the foundation on which the earliest MPEG audio coding is built. The latest version is dated 1993 with a corrigendum published in 1996. |
ISO 13818-1 | 2023 | MPEG-2 Systems. Describes packaging and stream structures. An amendment to the codec parameters is in progress. |
ISO 13818-3 | 1998 | MPEG-2 Audio. This is definitive for Layers I, II and III (MP1, MP2 and MP3). |
ISO 13818-7 | 2010 | Describes MPEG-2 Advanced Audio Coding (AAC). Published in 2006 with revisions added in 2010. |
ISO 14496-1 | 2014 | MPEG-4 Systems and original container format. Published in 2010 with corrections added in 2014. A new version is under development. |
ISO 14496-3 | 2019 | MPEG-4 Audio. Describes how to combine AAC with other codecs. |
ISO 14496-6 | 2000 | MPEG-4 Delivery Multimedia Interface Format (DMIF). |
ISO 14496-12 | 2022 | MPEG-4 file format. A new version is under development. |
ISO 14496-14 | 2020 | MPEG-4 version 2 file format. |
ISO 15938-1 | 2006 | MPEG-7 Systems. Originally published in 2002 and updated in 2006. |
ISO 15938-2 | 2002 | MPEG-7 Descriptions Definition Language (DDL). |
ISO 15938-4 | 2006 | MPEG-7 Metadata for audio. Originally published in 2002 and updated in 2006. |
ISO 15938-8 | 2011 | Extraction and use of MPEG-7 metadata descriptions. Originally published in 2002 and updated in 2011. |
ISO 15938-9 | 2012 | MPEG-7 Profiles & Levels. Originally published in 2005 and updated in 2012. |
ISO 15938-10 | 2007 | MPEG-7 Schema definition. Originally published in 2005 and updated in 2007. |
ISO 15938-11 | 2012 | MPEG-7 Profile schemas. Originally published in 2005 and updated in 2012. |
ISO 15938-12 | 2012 | MPEG-7 Query format. |
ISO 21000 | Various | MPEG-21 describes mechanisms for access control for multimedia content. This set of standards is under review with a new version expecting to be published. |
ISO 23001-8 | Withdrawn | Coding-independent code points. This is superseded by ISO 23091. |
ISO 23003-1 | 2017 | MPEG Surround for multi-channel audio. Originally published in 2007 and updated in 2017. |
ISO 23003-2 | 2008 | Spatial Audio Object Coding (SAOC). |
ISO 23003-4 | 2020 | Dynamic Range Control. A new version is being prepared. |
ISO 23091-3 | 2022 | Coding Independent Code Points for Audio. Playback controlling metadata. Originally published in 2018 and updated in 2022. |
ITU-T Rec. H.222.0 | 2022 | See ISO 13818-1. |
ETSI TS 126 244 | 2008 | Defines the .3gp container file format. Freely available to download from the ETSI.org web site. |
Conclusion
AAC coding will be subject to patent licensing fees until 2030. For the time being you will need to contact a patent pool for a license if you build and distribute an encoder or player implemented in hardware or software. The coded bitstreams transmitted to end-users are free of any licensing obligations.
This is not the whole AAC story. High Efficiency AAC was developed to improve the performance still further. That is sometimes called AAC+.
There is also more to study and understand in the MPEG-4 (ISO 14496) standard. MPEG is also reorganizing the collection of standards and MPEG-D (ISO 23003) has some relevant material and ISO 23091-3 is helpful for player design.
These Appendix articles contain additional information you may find useful:
Part of a series supported by
You might also like...
Designing IP Broadcast Systems - The Book
Designing IP Broadcast Systems is another massive body of research driven work - with over 27,000 words in 18 articles, in a free 84 page eBook. It provides extensive insight into the technology and engineering methodology required to create practical IP based broadcast…
Demands On Production With HDR & WCG
The adoption of HDR requires adjustments in workflow that place different requirements on both people and technology, especially when multiple formats are required simultaneously.
If It Ain’t Broke Still Fix It: Part 2 - Security
The old broadcasting adage: ‘if it ain’t broke don’t fix it’ is no longer relevant and potentially highly dangerous, especially when we consider the security implications of not updating software and operating systems.
Standards: Part 21 - The MPEG, AES & Other Containers
Here we discuss how raw essence data needs to be serialized so it can be stored in media container files. We also describe the various media container file formats and their evolution.
NDI For Broadcast: Part 3 – Bridging The Gap
This third and for now, final part of our mini-series exploring NDI and its place in broadcast infrastructure moves on to a trio of tools released with NDI 5.0 which are all aimed at facilitating remote and collaborative workflows; NDI Audio,…