Standards: Audio - High Efficiency Audio Codecs (HE-AAC)

HE-AAC builds on the foundations of AAC to deliver near CD-quality audio at bitrates as low as 32 kbps, making it the codec of choice for mobile TV, digital radio and low-bandwidth streaming. This guide unpacks the key technologies behind its efficiency gains.

High Efficiency AAC Audio Coding

High Efficiency AAC (HE-AAC) refines the original AAC coding techniques but remains compatible with MPEG-2 Part 7. It can deliver almost uncompressed CD-quality sound at 32 Kbps.

HE-AAC is known by a variety of other names:

Alias	Canonical name
AAC+	HE-AAC v1
aacPlus	HE-AAC v1
aacPlus v2	HE-AAC v2
eAAC+	HE-AAC v2

These are some important features that HE-AAC introduces:

SpectralBand Replication (SBR).
Parametric Stereo (SSC).
Perceptual Noise Substitution (PNS).
Long Term Predictor (LTP).
Low Delay (AAC-LD).
MPEG-4 Scalable to Lossless (SLS).
AAC Scalable Sample Rate (SSR).
Structured Audio (SA).
Text To Speech (TTSI).
Error Resilience (ER).

Structured Audio and text to speech are extremely compact because they describe a sound that the player renders entirely in the receiving client. This can deliver performance as low as 100bps.

There are many historical versions. Each additional profile will add more variants:

Version	Description
AAC (Original)	Described in ISO 13818-7:1997.
AAC (Version 1)	Described in ISO 14496-3:1999.
AAC (Version 2)	Described in the ISO 14496-3:2000 revision.
AAC (Current)	Described in ISO 14496-3:2009.
AAC+ (Version 2)	Version 2 of aacPlus is described in ETSI TS 102 005:2010.
HE-AAC v1 (AAC+) Profile	Described in the ISO 14496-3:2001 revision. Combines AAC LC with Spectral Band Replication (SBR).
HE-AAC v2 (aacPlus) Profile	Described in the ISO 14496-3:2005 Revision. Adds Parametric Stereo (PS) to the version 1 features to achieve lower bitrates. Sometimes described as eAAC+ (Extended AAC plus).
xHE-AAC	Fraunhofer introduced Loudness Control and adaptive streaming around 2016 (See MPEG-DASH). Well supported by many players including iOS and Android.
Extended HE-AAC	ISO 23003-3:2020 adds USAC coding to HE-AAC version 2, extending the tool set.

AAC Tools & Technologies

The AAC coding tools were reorganized in MPEG-4 to make them more flexible. New tools were added and older tools were refactored into separate items. They are now described as Audio Objects, each one having a specific identity and purpose. Some of them are containers for descriptive information. This is an additional layer of abstraction facilitating profile definitions.

These are the basic AAC audio-objects and the technologies that are used inside them. Refer to the MPEG-4 Part 3 numbered sub-parts in the ISO MPEG-4 part 3 official standard for functional descriptions of these tools and how they are mapped into the profiles. The sub-part references are all in the MPEG-4 Part 3 standard:

Terminology	Description
AAC Main	Based on AAC LC.
AAC LC	The Low Complexity Audio Object combines the MPEG-2 Part 7 Low Complexity profile (LC) with Perceptual Noise Substitution (PNS). See sub-part 4.
AAC SSR	Scalable Sample Rate is based on the MPEG-2 Part 7 Scalable Sampling Rate profile (SSR) combined with Perceptual Noise Substitution (PNS). See sub-part 4.
AAC LTP	Long Term Prediction introduces a forward predictor with lower computational complexity. Also uses AAC LC.
AAC LD	Low Delay, used with CELP, HVXC, and TTSI in the Low Delay Profile. Suitable for real-time conversation applications.
AAC ELD	Enhanced Low Delay improves the bitrate and latency at the expense of a small increase in computational workload.
SBR	Spectral Band Replication used with AAC LC in the HE-AAC Profile version 1.
TwinVQ	Transform-domain Weighted Interleave Vector Quantization is designed for coding audio at extremely low bitrates (8 kbps). See sub-part 4.
CELP	Speech coding with Code Excited Linear Prediction operates at low bitrates. TwinVQ may be more efficient. Not suited for use with music. See sub-part 3.
HVXC	Speech coding with Harmonic Vector eXcitation Coding works well with low sample rates around 8 kHz delivering coded output at 1.6 kbps. Latency is very low making it suitable for telephony applications. See sub-part 2.
SSC	SinuSoidal Coding. The technical underpinnings of Parametric Stereo coding for high quality audio. See sub-part 8.
PS	Parametric Stereo used with AAC LC and SBR in the HE-AAC v2 Profile. The implementation uses SinuSoidal Coding (SSC). Stereo audio is coded as a monaural channel with two differential channels for the left and right signals. See sub-part 8.
MP1, MP2, MP3	MPEG-1/MPEG-2 Audio Layer 1,2 & 3 in MPEG-4 See sub-part 9.
USAC	Unified Speech and Audio Coding switches the coding strategy between low bitrate CELP ( for speech) and HE-AAC (for music) mid-stream as it determines which is more efficient for each segment. See ISO 23003-3.
BSAC	Bit Sliced Arithmetic Coding is an alternative scalable noiseless coding mechanism providing almost perfect quality at 64 kbps. Used for Digital Media Broadcasting (DMB) services. See sub-part 4.
HILN	Parametric audio coding with Harmonic and Individual Line plus Noise. Sound can be coded as various harmonics of a sine wave plus a noise component described as a spectral envelope. See sub-part 7.
PNS	Perceptual Noise Substitution improves efficiency by representing noise-like signal components with a parametric representation instead of coding the exact waveform. The decoder synthesizes the noise component based on the description.
DST	Lossless coding of oversampled audio with Direct Stream Transfer. Popularized by Super Audio CDs. See sub-part 10.
ALS	Audio Lossless Coding uses short and long-term predictors to encode sounds that are rich in harmonics. See sub-part 11.
SLS	Scalable Lossless Coding is based on a layered approach which implements a lossy coding component in AAC with an additional correction layer that enhances it to provide the lossless result. SLS and ALS are not related to one another. See sub-part 12.
SLS non-core	A lossless audio coder with a single coding stream without the lossy General Audio base layer.
MPEG Surround	Also known as MPEG Spatial Audio Coding (SAC). Not the same as SAOC.
SAOC	Spatial Audio Object Coding. See ISO 23003-2.
SAOC-DE	Spatial Audio Object Coding Dialogue Enhancement.
LD MPEG Surround	Low Delay MPEG Surround coding. The side channel information is described in ISO 23003-2.
Audio Sync	Audio synchronization maintains the coherence of multiple content streams in multiple devices. See sub-part 13.
TTSI	Text to Speech Interface that synthesizes the audio. See sub-part 6.
SA	Structured Audio describes the audio as components or algorithms. The top level is a scheduler for controlling the construction and playback. See sub-part 5.
Wavetable synthesis	Uses combinations of waveforms to create virtual instrument sounds.
Sample based synthesis	Sampled natural sound fragments are combined and mixed to create a track. Based on SoundFont technologies.
Algorithmic synthesis	Converts a description of a sound with instructions for how to play it into a compiled source code form (such as C Language). Then an application can be created to generate the sound.
Audio effects	Part of the structured audio toolset.
SMR Simple	Simplified version of Symbolic Music Representation. See ISO 14496-23.
SMR Main	Main version of Symbolic Music Representation. See ISO 14496-23.
SAOL	Structured Audio Orchestra Language. Derived from the earlier MUSIC-N language.
SASL	Structured Audio Score Language.
SASBF	Structured Audio Sample Bank Format.
MIDI	Musical Instrument Digital Interface describes sound (predominantly music based) as a series of events (notes), sounds (patches) and modulations (controls).
General MIDI	A standard set of sounds defined by Roland Corp to provide instrument sound (patch) compatibility across multiple MIDI devices.
DLS	Downloadable Sounds standardized digital musical instrument sound banks which can be used with data driven sound tracks such as MIDI or SAOL.

Spectral Band Replication (SBR)

Spectral Band Replication discards redundant harmonic components in the encoder but reconstructs them by replicating the lower frequencies to derive suitable replacements in the player. This can be used with any codec.

A typical stream of audio might be coded to a target maximum 128kbps bitrate. This would reproduce all frequencies up to 15kHz with a small reduction in the frequency response at the top end.

SBR cuts off the incoming frequencies at around 7.5kHz. This loses a lot of the detail but reduces the bitrate to 64kbps.

The higher band from 7.5kHz to 15kHz is processed through a more aggressive compression tool. This generates a description of the high frequency sounds that can be used in the decoder to reconstruct them from the lower order harmonics. The description is carried in auxiliary segments within the stream and only adds 1.5kbps to the bitstream (65.5 kbps in total).

The player transposes the lower frequencies into the upper band where it can filter and mix them in using the descriptions in the auxiliary segments derived from the higher frequencies. This is practical because the upper frequencies are likely to be harmonics of the lower band with a different amplitude envelope.

Perceptual Noise Substitution (PNS)

The bitrate gains from using PNS are often not worth the computational workload when the audio is of a high quality.

For noisy audio sources, the noise can be filtered out and described as control parameters for a pseudorandom noise generator in the player where they can be recreated.

Parametric Stereo (PS) & SinuSoidal Coding (SSC)

Parametric stereo exploits the similarity between the left and right channels to code them more efficiently.

The two channels are mixed down into a single monophonic channel and coded at full resolution. This is a base from which two differential channels can be derived. Those differences can be coded to a 3-kbps bitrate using SinuSoidal Coding.

The player decodes the mono channel and applies the differences to make the left and right outputs.

Instead of delivering two full bitrate channels, the encoder delivers one full bitrate channel and two very low bitrate differential channels.

Scalable Sample Rate (SSR)

Scaling the audio coding by splitting at the sample level is an interesting alternative to using base and enhancement layers.

If we de-interleave CD audio into three scalable streams then stream one carries the first sample, stream two the second, and stream three carries the third and perhaps the fourth. The next sample is added to stream one and so on. This yields two 11 kHz sample streams and one 22 kHz stream which can be used by the target device in any combination.

A low-quality service can be reconstructed from one stream or all of them can be combined to reconstruct the original sample stream.

Error Resilience (ER)

Some audio objects have Error Resilient counterparts which are indicated with the ‘ER’ prefix. This is useful for transmitting coded audio over unreliable and error prone network links.

Additional error resilience is possible with checksums and Forward Error Correction introduced as the payload is segmented into network packets.

Patent Licenses

Patents for MPEG-4 Audio coding are managed by Via Licensing. Contact them for a license if you design and sell an Encoder or Decoder (Player) of your own.

Content owners do not need a license to distribute their MPEG Audio content. They have implicitly paid for it when purchasing the encoder or decoder.

Patents for AAC baseline technologies expire in 2028 and some newer extensions will have active patents until 2031.

Profiles & Audio Objects

A profile could use a single Audio Object while other profiles stack the tools hierarchically to make more efficient and sophisticated coders. Complexity requires more computational effort and increases the latency:

MPEG-2 AAC-LC profile only uses the Low Complexity AAC-LC audio object.
MPEG-4 AAC-LC adds Perceptual Noise Substitution (PNS).
MPEG-4 HE-AAC v1 adds Spectral Band Replication (SBR).
MPEG-4 HE-AAC v2 Adds Parametric Stereo (PS).

Because the specification is hierarchical, HE-AAC v2 players can decode any of the lower stacked levels.

These are the standardized profiles. Organizations such as Fraunhofer create their own proprietary profiles.

The Fraunhofer Scalable Lossless Coding (HD-AAC) is not the same as the SLS support defined by the MPEG-4 standard.

Refer to section 1.5 of the MPEG-4 Part 3 standard for a detailed description of the Audio Objects and how they are mapped to the profiles.

Profile	Introduced by
Low-Complexity	MPEG-2
Main	MPEG-2
Scalable Sampling Rate	MPEG-2
AAC	MPEG-4
High Efficiency AAC (v1)	MPEG-4
HE-AAC v2	MPEG-4
Main Audio	MPEG-4
Scalable Audio	MPEG-4
Speech Audio	MPEG-4
Synthetic Audio	MPEG-4
High Quality Audio	MPEG-4
Low Delay Audio	MPEG-4
Low Delay v2 Audio	MPEG-4
Natural Audio	MPEG-4
Mobile Audio Inter-networking	MPEG-4
HD-AAC	MPEG-4
ALS Simple	MPEG-4
Extended High Efficiency AAC	MPEG-D
(Limited) Scalable Lossless Coding	Fraunhofer HD-AAC

Media Type Identifiers

Because HE-AAC is coded differently to classic AAC, a new media type is needed so that browsers can distinguish between the two formats:

Media type	Description
audio/aac	Use this for Standard AAC format content. This is the most widely supported.
audio/aacp	Describes AAC+ content but is not as widely supported by web browsers.

Relevant Standards

The vintage column indicates the most recent base standard, corrigenda or amendment. Although the latest versions are indicated, earlier versions may contain relevant information that is removed from later standards. Some devices may be compatible only with an earlier version and you should use that if necessary when developing your services for them.

There is a gradual refactoring of the MPEG standards underway so they can benefit from reusing supporting technologies without needing to repeat them. The MPEG-D and Coding Independent Code Points standards are examples of that as are the ISO 23XXX group of MPEG relevant standards, which provide additional infrastructural support outside of the individual coding specifications.

Standard	Version	Description
ISO 11172-3	1996	MPEG-1 Part 3 - Audio.
ISO 13818-1	2023	MPEG-2 Part 1 - Systems.
ISO 13818-3	1998	MPEG-2 Part 3 - Audio.
ISO 13818-7	2007	MPEG-2 Part 7 - Advanced Audio Coding (AAC).
ISO 14496-1	2014	MPEG-4 Part 1 - Systems. Currently being revised.
ISO 14496-3	2009	MPEG-4 Part 3 - Audio coding. Released in 2001 & amended in 2003 & 2004.
ISO 14496-4	2019	MPEG-4 Part 4 - Conformance bit-streams specification.
ISO 14496-5	2019	MPEG-4 Part 5 - Reference Software.
ISO 14496-11	2015	MPEG-4 Part 11 - Scene description & application engine.
ISO 14496-23	2008	Symbolic Music Representation.
ISO 23091-3	2022	MPEG-CICP - Coding Independent Code Points for delivering out of band metadata.
ISO 23001-8	n/a	Withdrawn & replaced by ISO 23091.
ISO 23003	n/a	MPEG-D is a group of standards for audio coding.
ISO 23003-1	2017	MPEG-D Part 1 - MPEG Surround (a.k.a. Spatial Audio Coding).
ISO 23003-2	2018	MPEG-D Part 2 - Spatial Audio Object Coding (SAOC).
ISO 23003-3	2021	MPEG-D Part 3 - Unified speech & audio coding (USAC).
ISO 23003-4	2023	MPEG-D Part 4 - Dynamic Range Control. Currently being revised.
ISO 23003-5	2020	MPEG-D Part 5 - Uncompressed audio in MPEG-4 File Format.
ISO 23003-6	ISO 23003-6	MPEG-D Part 6 - USAC Reference Software.
ISO 23003-7	2022	MPEG-D Part 7 - USAC Conformance specification.
DVB-H	2004	Handheld mobile TV services.
DVB-SH	2008	Handheld mobile TV services delivered via a satellite link.
ETSI TS 101 154	2019	HE-AAC & HE-AAC v2 audio coding for DVB applications.
ETSI TS 102 005	2010	Video & Audio Coding in DVB services delivered directly over IP protocols.
ETSI TR 102 377	2009	DVB-H Implementation guidelines.
ETSI TS 103 466	2019	DAB audio coding (MPEG Layer II).
ETSI TS 126 401	2017	Enhanced aacPlus general audio codec.
ETSI EN 302 304	2004	Describes DVB-H.
3GPP TS 26.401	2024	Describes the use of Enhanced AAC+ for mobile services.
General MIDI	1999	Developed by Roland to allow MIDI devices to sound similar when music sequences are played through them.
DLS	1998	The MIDI Downloadable Sounds Specification by the MIDI Manufacturers Association.
MIDI 1.0	1996	The Complete MIDI 1.0 Detailed Specification by the MIDI Manufacturers Association.
MIDI 2.0	2020	Extends MIDI 1.0 with additional capabilities.
ITU Rec H.223	1998	Annex C describes a Multiplexing Protocol For Low Bitrate Multimedia Communication Over Highly Error-Prone Channels.
ITU Rec H.222.0	1995	See ISO/IEC 13818-1 - Systems.

Applying HE-AAC

Audio and video compression is a complex subject. We balance it here at a level sufficient to explain the fundamentals whilst avoiding a deep dive down the rabbit hole. Consult the MPEG-4 Part 3 standard if you need to explore MPEG Audio coding in greater detail.

The AAC audio standard is increasingly being used with High-Definition TV services (HDTV). This is supported by the DVB standards that are distributed by ETSI. HE-AAC is particularly relevant for mobile TV using DVB-H.

Digital radio services such as DAB+ and Digital Radio Mondiale are also adopting HE-AAC.

These Appendix articles contain additional information you may find useful:

Supported by

You might also like...

Broadcast Standards – The Science Of AI

Artificial Intelligence is already an integral part of our everyday lives and it is already making our lives more productive. But it is far from risk-free.

Broadcast Standards 2026 – Audio Coding

Audio is central to the whole broadcast experience. While video can show us what’s going on, it is audio that tells us how to feel about it. If only it wasn’t all so complicated.

Discoverability And Findability: Part 2 – Broadcasters Harness AI To Show Trust And Authenticity

After discussing the policy and democratic issues for broadcasters around content findability, we delve into technologies and standards, looking at how they can help exploit opportunities as well as meet challenges.

Production–Delivery Convergence: Part 7 - The Economics Of Ambition

Streaming has introduced multiple viewer innovations and benefits, but there is always a hidden cost. Content providers must find a way to innovate within a financial model that can sustain their creative ambitions.

Standards: Video - High Efficiency Video Coding (HEVC)

Designed to halve the bitrate of AVC while supporting resolutions up to 16K, HEVC represents a significant leap in video coding efficiency. This guide explores its profiles, tiers and levels, and examines whether it can overcome the challenges of entrenched…