Standards: Audio - MPEG Layer 3 Audio Coding (MP3)
Launched in 1995, MP3 remains one of the most ubiquitous audio formats in the world. This guide explains how psychoacoustic compression works, explains the differences between MPEG-1 and MPEG-2 implementations, and finds out where MP3 works – and where it doesn’t.
About MP3 Audio Coding
The MP3 audio format has been around for over thirty years, and while it has been largely superseded by other codecs, it continues to be used in many situations.
With metadata carried in ID3 tags and often embedded inside MP3 files, MP3 audio is widely supported on almost every computing platform, mobile device, TV receiver and tablet. It is ideal for distribution to consumers even though there are higher quality alternatives such as AAC, HE-AAC and MPEG-H.
MP3 is a mathematically lossy but perceptually lossless codec. It is never an optimal solution for creating, editing and archiving purposes. It is optimized for distributing content to end-users.
It is highly recommended that MP3 is NOT used in production workflows. Arguably, contributions delivered in MP3 format should be rejected and requested again in a lossless format instead. MP3 may be adequate for spoken content but not for music delivery.
MP3 is NOT an abbreviation for MPEG-3.
Why Did We Need The MP3 Codec?
The MP3 compression technology was devised because early Internet connections were unable to deliver audio files in a timely manner. A three-minute raw audio track extracted from a CD creates a 32MB file:
- Duration - 3 minutes (180 seconds).
- Sample rate - 44100 samples per second.
- Sample size - 16-bits.
- Number of tracks - 2 (Stereo).
- Resulting file size - 31.752 MB.
MP3 achieves a compression ratio of between 10:1 and 14:1 whilst still preserving (audibly) almost CD quality. The encoded output file would be approximately 3MB, depending on the content.
Audio delivered on DVD disks for movies and TV programs is sampled at a higher rate of 48kHz. This will create slightly larger files.
Audio compression was originally developed for the MPEG-1 standard. It was then inherited into MPEG-2 and enhanced.
MPEG-1 and MPEG-2 happily coexist and are not mutually exclusive. MPEG-2 does not replace MPEG-1 but inherits capabilities from it and augments them.
The bitstream syntax definition must be carefully reconciled across the MPEG-1 and MPEG-2 standards to remove any ambiguity.
How It Works
It’s not necessary to fully understand the inner workings of MP3 to utilize it. A brief overview here will suffice.
The MPEG Audio coding is organized into three layers which offer different levels of complexity and compression ratios.
Much of this work was done by Fraunhofer IIS collaborating with other international experts. Fraunhofer subsequently developed a (non-canonical) variant which they called MPEG 2.5 Layer III. This supports additional bit-rates but is not widely adopted.
The next generation AAC encoder refines these techniques to improve the quality and compression ratios.
MP3 players are expected to decode and play all MPEG content but may present an error message when encountering MPEG 2.5 Audio.
The original research was based on Psychoacoustic Analysis. This is the study of sound perception to compile statistics based on listener feedback about quality. A mathematical model is constructed from those statistical results.
MP3 Perceptual Encoding is driven by a statistical model that removes components the human ear would not perceive when they are masked by other louder components. This immediately makes MP3 a lossy codec. It reduces the complexity of the content to yield a better compression ratio and that discarded information cannot be restored by uncompressing the file.
Coding for layers I and II splits the audio into discrete sub-bands which it then analyzes individually. Rather like a spectrum analyzer displays a snapshot of the frequency response:
Layer III splits the audio into many more sub-bands and passes overlapping groups of them through a Modified Discrete Cosine Transform (MDCT) to apply a more fine-grained process. This trades complexity and computing load for better compression ratios but introduces more latency and artefacts that need to be removed by post-processing.
The DCT transform yields a list of coefficients which gradually decay down to zero. A quantization/entropy cut-off truncates the coefficients at the point where they are all at zero (and expected to remain so) for lossless compression.
Truncating earlier, when they are nearly zero, causes lossy compression.
The energy level of each band is measured and compared to a masking threshold value. This is fed back into the Quantizer which moves the entropy coder cut-off point earlier in the series of coefficients. Louder passages are coded less accurately and can withstand a higher level of quantization.
The entropy cut-off point determines the amount of quantization noise introduced as an audible artefact. It is constrained within the limits determined by the psychoacoustic statistical analysis to hide the artefacts.
MP3 Joint Stereo processing looks for similarities between the audio channels. Differences depend on how the sound sources are panned across the soundscape during mix-down. Coding a single channel plus the differences needed to recreate a second channel improves the compression ratio.
Audio Layers
Layers I and II were defined during the MPEG-1 standardization. Layer III was added for MPEG-2. The layer names use Roman numerals which may be confusing.
The term MP3 is an abbreviation of MPEG Audio layer III and describes MPEG-1 and MPEG-2 content.
The term MP2 describes MPEG Audio Layer II and is sometimes incorrectly used to describe other kinds of MPEG-2 Audio.
- Layer I - This is simpler than Layer II. The frame sizes are smaller which reduces coding delay (latency). It is useful for tele-conferencing applications and was designed for real-time encoding on early hardware systems. Layer I is now deemed to be obsolete.
- Layer II - Layer II performs well with orchestral content and delivers results nearly as good as AAC. Players decode this with less computational effort than Layer III. It is more complex than Layer I but yields a better compression ratio.
- Layer III - This is designed to operate at a lower bitrate than Layer II. It works quite differently with a much larger number of sub-bands which are processed in overlapping groups with a Modified Discrete Cosine Transform (MDCT) algorithm, as detailed above. Layer III does not handle transients quite as well as Layer II and needs additional pre-echo detection to increase the available bit rate during difficult passages. Additional post processing techniques are necessary to reduce artefacts which increases the computational workload.
For some content, Layer II performs better than Layer III even though it is less efficient.
These are the preferred file type extensions appropriate for the three layers:
| File type | Content |
|---|---|
| .mp1 | MPEG Audio - Layer I. |
| .mp2 | MPEG Audio - Layer II. |
| .mp3 | MPEG Audio - Layer III. |
Supported Bit Rates
MPEG-2 adds bit rates in the lower range that MPEG-1 does not already support. These are useful for implementing compression for spoken-word content rather than music. Higher bit-rates are undefined in MPEG-2 and are inherited from MPEG-1:
| File type | Content |
|---|---|
| - | 8 kbps |
| - | 16 kbps |
| - | 24 kbps |
| 32 kbps | 32 kbps |
| 40 kbps | 40 kbps |
| 48 kbps | 48 kbps |
| 56 kbps | 56 kbps |
| 64 kbps | 64 kbps |
| 80 kbps | 80 kbps |
| 96 kbps | 96 kbps |
| 112 kbps | 112 kbps |
| 128 kbps | 128 kbps |
| - | 144 kbps |
| 160 kbps | 160 kbps |
| 192 kbps | 192 kbps |
| 224 kbps | 224 kbps |
| 256 kbps | 256 kbps |
| 320 kbps | 320 kbps |
Supported Sample Rates
The lower sample rates added by MPEG-2 support more efficient speech encoding. The higher sample rates are undefined in MPEG-2 and are inherited from MPEG-1:
| MPEG-1 Layer III | MPEG-2 Layer III |
|---|---|
| - | 16 kHz |
| - | 22.05 kHz |
| - | 24 kHz |
| 32 kHz | 32 kHz |
| 44.1 kHz | 44.1 kHz |
| 48 kHz | 48 kHz |
Channel Encoding Modes
The MPEG-1 standard only supports two channels. These can be configured for mono or stereo applications.
- Mono - Only one single channel is required.
- Stereo – two channels.
- Joint stereo - intensity encoded.
- Joint stereo - Mid/side encoded (Layer III only).
- Dual mono - Two uncorrelated mono channels.
MPEG-2 adds four more channels (six in all) to potentially carry 5.1 surround-sound content. Backwards compatible two channel stereo uses a sub-set of two out of the six available channels in the same way as MPEG-1:
- Front-Left - Used when two channel stereo is encoded.
- Front-Right - Used when two channel stereo is encoded.
- Front-Center - For dialog.
- Low frequency.
- Surround (rear) - Left.
- Surround (rear) - Right.
Surround systems with more channels than the 5.1 configuration cannot be delivered with MPEG-2.
File Name Extensions
MP3 streams can be packaged into any file type that carries binary data. For example, a .wav file could contain MP3 coded audio. Using file types other than the default .mp3 is not recommended.
| File ext | Disposition | Description |
|---|---|---|
| .mp1 | Obsolete | MPEG Audio Layer I encoding. |
| .m1a | Obsolete | MPEG Audio Layer I encoding. Alternative but little used. |
| .mp2 | Default | MPEG Audio Layer II encoding. |
| .m2a | MPEG Audio Layer II encoding. Alternative but little used. | |
| .mpa | MPEG Audio Layer II encoding. | |
| .mp2a | MPEG Audio Layer II encoding. Rarely used. | |
| .mp3 | Preferred | MPEG Audio Layer III encoding. This is the optimal choice. |
| .m3a | MPEG Audio Layer III encoding. Alternative but little used. | |
| .mpga | MPEG Audio Layer III encoding. Only used by MPEG-1. | |
| .aa | Proprietary | Special file extension used by Audio Book files. |
| .mpe | Less popular abbreviation for the .mpeg file format. | |
| .mpg | MPEG content stored in a Program Stream format. Superseded by the .mp4 file type. | |
| .mpeg | Normally expected to carry video and audio but can carry audio only. Superseded by the .mp4 file type. | |
| .bit | Very old legacy archive collections may contain files created before 1995 which use the .bit file extension. Superseded by the .mp3 file type. | |
| .m3u | Playlist file originally intended for use by MP3 and WAV players. Now supports MP4, TS and WEBM files. |
Media Type Identifiers
When serving files with HTTP or uploading them from a browser, the media type header in the transaction will indicate the essence format.
The preferred media type for MP3 is audio/mpeg which is described in IETF RFC 3003. Inspect the embedded metadata to determine other characteristics of the encoded audio.
Other media type identifiers may be used by proprietary systems when they deliver content:
| Mime type | Status | Description |
|---|---|---|
| audio/mpeg | Preferred | This is the preferred default media type defined in IETF RFC 3003. It can also identify MPEG Audio Layer II and other content. |
| audio/mpeg3 | Deprecated | MPEG Audio Layer III files. |
| audio/MPA | Obsoleted | MPEG Audio Layer I and II content. |
| audio/mpa-robust | For RTP only | A More Loss-Tolerant RTP Payload Format for MP3 Audio. Described in IETF RFC 5219. |
| audio/mp3 | Deprecated | Recognized by Google Chrome and Opera browsers. |
| application/octet-stream | Ambiguous | Generic binary stream which would not invoke special MP3 handling and is not recommended as it could be carrying a rogue .exe file instead. |
| audio/mpg | Ambiguous | Not recommended. Might also carry video content. |
| audio/x-mpeg-3 | Proprietary | No longer used but should be recognized when handling incoming feeds. Not registered with IANA. |
| audio/x-mpg | Proprietary | Not registered with IANA. |
| audio/x- | Proprietary | Not registered with IANA. |
| audio/x-mpeg | Proprietary | Not registered with IANA. |
| audio/x-mp3 | Proprietary | Not registered with IANA. |
Relevant Standards
These are the relevant standards that you should acquire to support your use of MP3:
| Document | Vintage | Description |
|---|---|---|
| Unicode | 15.1.0 | A unified character set that brings together all international glyph characters for use in ID3 tags. |
| ISO 639 | 2023 | Country names and codes used in ID3 tags. |
| ISO 8859 | Various | There are 16 different parts describing alphabets and character-glyphs for international use for ID3 tags. |
| ISO 11172-1 | 1999 | MPEG-1 Part 1 - Systems. Describes packaging and stream structures. The latest version is dated 1993 with additional corrigenda published in 1996 and 1999. |
| ISO 11172-3 | 1996 | MPEG-1 Part 3 - Audio is the foundation on which the earliest MPEG audio coding is built. The latest version is dated 1993 with a corrigendum published in 1996. |
| ISO 11172-5 | 1998 | MPEG-1 Software simulation for reference. |
| ISO 13818-1 | 2023 | MPEG-2 Systems. Describes packaging and stream structures. An amendment to codec parameters is in progress. |
| ISO 13818-3 | 1998 | MPEG-2 Audio. This is definitive for Layers I, II and III (MP1, MP2 and MP3). |
| ISO 13838-5 | 2005 | MPEG-2 Software simulation for reference. |
| RFC 3003 | 2000 | The audio/mpeg media type. |
| RFC 3555 | 2003 | Registration of RTP Payload Media Type Formats. |
| RFC 3625 | 2003 | The QCP File Format and media type identifiers for Speech Data. |
| RFC 4855 | 2007 | Media Type Registration of RTP Payload Formats. |
| RFC 4856 | 2007 | Media Type Registration of Payload Formats in the RTP Profile for Audio and Video Conferences. Obsoletes RFC 3555. |
| RFC 5219 | 2008 | A Loss-Tolerant RTP Payload Format for MP3 Audio. Obsoletes RFC 3119. |
| MUSICAM | - | MPEG Audio Layer II is sometimes described as MUSICAM which is a proprietary brand name and was initially used during the standards development process. |
Applying MP3
MPEG-2 does not replace MPEG-1 but augments it with additional features.
The MP3 audio file format is widely used and supported on most platforms and devices. Content creation is easy and metadata editing is a feature of many compatible applications.
The patents relevant to MP3 all expired in 2017. This facilitates the use of open-source libraries such as LAME without risk of litigation.
These Appendix articles contain additional information you may find useful:
Supported by
You might also like...
Network Traffic Engineering: Why MPEG-TS Is Still The Standard
MPEG transport stream (MPEG TS) was designed in the 1990s to deliver continuous video and audio over unreliable, one-way networks, such as satellite, terrestrial RF, and cable, where packet loss and corruption are expected. But it is still prevalent in…
Standards: Video - High Efficiency Video Coding (HEVC)
Designed to halve the bitrate of AVC while supporting resolutions up to 16K, HEVC represents a significant leap in video coding efficiency. This guide explores its profiles, tiers and levels, and examines whether it can overcome the challenges of entrenched…
SMPTE Education Launches Summer 2026 Lineup Of IP And ST 2110 Courses
Boasting two standalone courses, an intensive boot camp, and a hands-on practical lab, SMPTE Education has launched its summer 2026 Lineup of IP and ST 2110 Courses.
Standards: Video - Advanced Video Coding (AVC)
AVC remains one of the most widely deployed video codecs in the world, but navigating its profiles, levels and signaling mechanisms is far from straightforward.
Network Traffic Engineering: RIST & SRT - The Success Of ARQ Based Protocols
IP networks are inherently unreliable. We kick off this series on IP Network Traffic Engineering with a look at how RIST and SRT give broadcast engineers user-configurable control over the latency-versus-reliability trade-off for real-time media streaming.