Standards: Part 16 - About MP3 Audio Coding & ID3 Metadata
The MP3 audio format has been around for thirty years and has been superseded by several other codecs – so here we discuss why it still has a very strong position in broadcast. We also discuss ID3 metadata tags which often accompany MP3 files.
This article is part of our growing series on Standards.
There is an overview of all 26 articles in Part 1 - An Introduction To Standards.
The MP3 format is widely supported on almost every computing platform, mobile device, TV receiver and tablet. It is ideal for distribution to consumers even though there are higher quality alternatives such as AAC and MPEG-H.
MP3 is a mathematically lossy but perceptually lossless codec so it is never an optimal solution for mastering, editing and archiving purposes. It is optimized for distributing content to end-users.
MP3 is NOT an abbreviation for MPEG-3.
Why Do We Need The MP3 Codec?
The MP3 compression technology was devised because early Internet connections were unable to deliver audio files in a timely manner. A three-minute raw audio track extracted from a CD creates a 32MB file:
- Duration - 3 minutes (180 seconds).
- Sample rate - 44100 samples per second.
- Sample size - 16-bits.
- Number of tracks - 2 (Stereo - Left and Right).
- Resulting file size - 31.752 MB.
MP3 achieves a compression ratio of between 10:1 and 14:1 whilst still preserving almostCD quality. The encoded output file would be approximately 3MB, depending on the content.
Audio delivered on DVD disks for movies and TV programs is sampled at a higher rate of 48kHz. This will create slightly larger files.
Audio compression was originally developed for the MPEG-1 standard. It was then inherited into MPEG-2 and enhanced.
MPEG-1 and 2 happily coexist and are not mutually exclusive. MPEG-2 does not replace MPEG-1 but inherits capabilities from it and augments them with additional features.
How It Works
It is not necessary to fully understand the inner workings of MP3 to utilize it in your workflow. A brief overview here will suffice.
The MPEG Audio coding is organized into three layers which offer different levels of complexity and compression ratios.
Much of this work was done by Fraunhofer IIS collaborating with other international experts. Later, Fraunhofer developed a (non-canonical) variant which they called MPEG 2.5 Layer III. This supports additional bit rates but is not widely adopted.
The next generation AAC encoder refines these techniques to improve the quality and compression ratios.
MP3 players are expected to decode and play all MPEG content but may present an error message when encountering MPEG 2.5 Audio.
The original research was based on Psychoacoustic Analysis. This is the study of sound perception to compile statistics based on listener feedback about quality. A mathematical model is constructed from those statistical results.
The MP3 Perceptual Encoding driven by the statistical model removes components that the human ear would not perceive when they are masked by other louder components. This immediately makes MP3 a lossy codec. It reduces the complexity of the content to yield a better compression ratio. That discarded information cannot be restored by uncompressing the file.
Coding for layers I and II splits the audio into discrete sub-bands which it then analyses individually. Rather like a spectrum analyzer displays a snapshot of the frequency response:
Layer III splits the audio into many more sub-bands and passes overlapping groups of them through a Modified Discrete Cosine Transform (MDCT) to apply a more fine-grained process. This trades complexity and computing load for better compression ratios but introduces more latency and artefacts which need to be removed by post-processing:
The DCT transform yields a series of coefficients which gradually decay down to zero. A quantization/entropy cut-off truncates the coefficients at the point where they are all zero for lossless compression or earlier for lossy compression.
The energy level of each band is measured and compared to a masking threshold value. This is fed back into the Quantizer which moves the entropy coder cut-off point earlier in the series of coefficients. Louder passages are coded less accurately and can withstand a higher level of quantization. This is why MP3 coding is lossy.
The entropy cut-off point determines the amount of quantization noise introduced as an audible artefact. It is constrained within the limits determined by the psychoacoustic statistical analysis to hide the artefacts.
MP3 Joint-stereo processing looks for similarities between the audio channels. Differences depend on how the sound sources are panned across the soundscape during mix-down. Coding a single channel plus the differences compared with a second channel improves the compression ratio.
Audio Layers
Layers I and II were defined during the MPEG-1 standardization. Layer III was added in MPEG-2. The layer names use Roman numerals which may be confusing.
The term MP3 is an abbreviation of MPEG Audio layer III and describes MPEG-1 and MPEG-2 content.
The term MP2 describes MPEG Audio Layer II and is often incorrectly used to describe other kinds of MPEG-2 Audio.
Layer I - This is simpler than Layer II. The frame sizes are smaller which reduces coding delay (latency). It is useful for tele-conferencing applications and was designed for real-time encoding on early hardware systems. Layer I is now deemed to be obsolete.
Layer II - Layer II performs well with orchestral content and delivers results nearly as good as AAC. Players decode this with less computational effort than Layer III. It is more complex than Layer I but yields a better compression ratio.
Layer III - This is designed to operate at a lower bit rate than Layer II. It works quite differently with a much larger number of sub-bands which are processed in overlapping groups with a Modified Discrete Cosine Transform (MDCT) algorithm. Layer III does not handle transients quite as well as Layer II and needs additional pre-echo detection to increase the available bit rate during difficult passages. Additional post processing techniques are necessary to reduce artefacts which increases the computational workload.
For some content, Layer II performs better than Layer III even though it is less efficient.
These are the preferred file type extensions appropriate for the three layers:
File type | Content |
---|---|
.mp1 | MPEG Audio - Layer I. |
.mp2 | MPEG Audio - Layer II. |
.mp3 | MPEG Audio - Layer III. |
Supported Bit Rates
MPEG-2 adds bit rates in the lower range that MPEG-1 does not already support. These are useful for implementing compression for spoken-word content rather than music. Higher bit rates are undefined in MPEG-2 and are inherited from MPEG-1:
MPEG-1 Layer III | MPEG-2 Layer III |
---|---|
- | 8 kbps |
- | 16 kbps |
- | 24 kbps |
32 kbps | 32 kbps |
40 kbps | 40 kbps |
48 kbps | 48 kbps |
56 kbps | 56 kbps |
64 kbps | 64 kbps |
80 kbps | 80 kbps |
96 kbps | 96 kbps |
112 kbps | 112 kbps |
128 kbps | 128 kbps |
- | 144 kbps |
160 kbps | 160 kbps |
192 kbps | 192 kbps |
224 kbps | 224 kbps |
256 kbps | 256 kbps |
320 kbps | 320 kbps |
Supported Sample Rates
The lower sample rates added by MPEG-2 support more efficient speech encoding. The higher sample rates are undefined in MPEG-2 and are inherited from MPEG-1:
MPEG-1 Layer III | MPEG-2 Layer III |
---|---|
- | 16 kHz |
- | 22.05 kHz |
- | 24 kHz |
32 kHz | 32 kHz |
44.1 kHz | 44.1 kHz |
48 kHz | 48 kHz |
Channel Encoding Modes
The MPEG-1 standard only supports 2 channels. These can be configured for mono or stereo applications.
- Mono - Only one single channel is required.
- Joint stereo – intensity encoded.
- Joint stereo – Mid/side encoded (Layer III only).
- Dual mono - Two uncorrelated mono channels.
MPEG-2 adds four more channels (six in all) to potentially carry 5.1 surround-sound content. Backwards compatible two-channel stereo uses a sub-set of 2 out of the 6 available channels in the same way as MPEG-1:
- Front - Left - Used when two channel stereo is encoded.
- Front - Right - Used when two channel stereo is encoded.
- Front - Centre.
- Low frequency.
- Surround (rear) - Left.
- Surround (rear) - Right.
File Name Extensions
MP3 streams can be packaged into any file type that carries binary data. For example, a .wav file could contain MP3 coded audio. Using inappropriate file types other than the default .mp3 is not recommended.
File ext | Disposition | Description |
---|---|---|
.mp1 | Obsolete | MPEG Audio Layer I encoding. |
.m1a | Obsolete | MPEG Audio Layer I encoding. Alternative but little used. |
.mp2 | Default | MPEG Audio Layer II encoding. |
.m2a | MPEG Audio Layer II encoding. Alternative but little used. | |
.mpa | MPEG Audio Layer II encoding. | |
.mp2a | MPEG Audio Layer II encoding. Rarely used. | |
.mp3 | Preferred | MPEG Audio Layer III encoding. This is the optimal choice. |
.m3a | MPEG Audio Layer III encoding. Alternative but little used. | |
.mpga | MPEG Audio Layer III encoding. Only used by MPEG-1. | |
.aa | Proprietary | Special file extension used by Audio Book files. |
.mpe | Less popular abbreviation for the .mpeg file format. | |
.mpg | MPEG content stored in a Program Stream format. Superseded by the .mp4 file type. | |
.mpeg | Normally expected to carry video and audio but can carry audio only. Superseded by the .mp4 file type. | |
.bit | Very old legacy archive collections may contain files created before 1995 which use the .bit file extension. Superseded by the .mp3 file type. | |
.m3u | Playlist file originally intended for use by MP3 and WAV players. Now supports MP4, TS and WEBM files. |
MIME Types
When serving files with HTTP or uploading them from a browser, the MIME type header in the transaction will indicate the essence format.
The preferred MIME type for MP3 is audio/mpeg which is described in IETF RFC 3003. Inspect the embedded metadata to determine other characteristics of the encoded audio.
Other MIME types may be used by proprietary systems when they deliver content:
Mime type | Status | Description |
---|---|---|
audio/mpeg | Preferred | This is the preferred default MIME type defined in IETF RFC 3003. It can also identify MPEG Audio Layer II and other content. |
audio/mpeg3 | Deprecated | MPEG Audio Layer III files. |
audio/MPA | Obsoleted | MPEG Audio Layer I and II content. |
audio/mpa-robust | For RTP only | A More Loss-Tolerant RTP Payload Format for MP3 Audio. Described in IETF RFC 5219. |
audio/mp3 | Deprecated | Recognized by Google Chrome and Opera browsers. |
application/octet-stream | Ambiguous | Generic binary stream which would not invoke special MP3 handling and is not recommended as it could be carrying a rogue .exe file instead. |
audio/mpg | Ambiguous | Not recommended. Might also carry video content. |
audio/x-mpeg-3 | Proprietary | No longer used but should be recognized when handling incoming feeds. Not registered with IANA. |
audio/x-mpg | Proprietary | Not registered with IANA. |
audio/x-mpegaudio | Proprietary | Not registered with IANA. |
audio/x-mpeg | Proprietary | Not registered with IANA. |
audio/x-mp3 | Proprietary | Not registered with IANA. |
ID3 Metadata Support
Workflow processes often need to access metadata about the files they are working on. MP3 files support embedded tagging using the ID3 convention. This is an informal standard but is widely used and can embed metadata into several kinds of files other than MP3:
- AIFF
- WAV
- MP4
- OGG
- FLAC
- APE
- MPC
- RealAudio
ID3 is not part of the MPEG standards and is managed independently. The informal specification is maintained at the ID3 website:
https://id3.org
Each ID3 tag is stored in one or more frames in the file. Encoded audio is also stored in frames which contain a synchronization pattern that decoders detect to access playable content. ID3 describes a way to ensure it never spuriously triggers that synchronization by avoiding that bit-pattern and thereby hides the metadata from the stream player. Client player apps can access the content in other ways to extract the ID3 metadata by looking specifically for its signature independently of the streaming process.
ID3 tags were originally designed for annotating tracks imported from music CDs. Typical and obvious tags are:
- Song title.
- Artist name.
- Album name.
- Track number.
There are many other tags described in the ID3 specification and more have been added as proprietary and de-facto extensions.
ID3 Versions
Over several revisions, the ID3 metadata structures have evolved and there are several ways in which the metadata might be optionally embedded in the file. New versions must be backwards compatible and not break earlier implementations. ID3 metadata tags conforming to version 1 are always placed at the tail end of the file. Version 2 tags are placed at the front.
The version 2.4 specification allows the tag metadata to be placed at the end of the file. It must precede the version 1 metadata to avoid breaking older players. Version 2.3 is the most popular kind of tagging and places the metadata only at the front of the file.
Version | Disposition | Description |
---|---|---|
1 | Obsolete | Fixed format suffix appended to the end of the file. Carries the title, artist, album, and a short comment. These are all limited to 30 characters. A year number is added and a value representing a genre from an indexed list. |
1.1 | Obsolete | Track numbers added by shortening the comment field. |
1.2 | Obsolete | Text fields increased in length and a sub-genre field added. Backwards compatibility with earlier versions was maintained but this version was never widely adopted. |
2 | Obsolete | The format and structure is completely revised. It is constructed from multiple frames that can each grow to 16MB within a total capacity of 256MB. This metadata is now placed at the front of the file so it is immediately available when streaming the MP3 content. Unicode compatible text strings. |
2.2 | Obsolete | Tag identifiers limited to three characters. |
2.3 | Most popular | Added album sleeve artwork images and disc numbers for boxed sets. Tag names are four characters. Added the disc number tag. |
2.3+ | Current | Chapter marks added with support for displaying synchronized slide show images. Very useful for podcasts. |
2.4 | Latest | Additional frame types and text frames can contain multiple NULL separated values. Tags can be stored at the start or end of the file. |
2.4+ | Latest | The same chapter mark support is added as per 2.3+. |
ID3 Tag Names
From version 2.3 onwards, tag names are described with four letters instead of three which were defined in the earlier versions. Tag translation is necessary when converting the metadata. Where the tags are localized for international use, an additional three letter ISO 639 county-code is added. A non-definitive list of country codes is also available on Wikipedia.
Version 2.3 facilitates image embedding for various purposes such as album cover artworks. The tag describes how the image is to be used. PNG is the optimal image type but JPEG and GIF are also supported.
Some players and metadata browsing systems may have difficulty in rendering a PNG file if it has an alpha channel to cookie cut the image to a non-rectangular shape. You might do that to display an image of a scanned CD or vinyl album disc.
Here is an informal (third-party) description of the version 2.3 standard which enumerates the tags and describes how they all work. This supplements the id3.org documentation:
https://mutagen-specs.readthedocs.io/en/latest/id3/id3v2.3.0.html
Support For Lyrics & Subtitles
Lyric tags were defined before ID3v2 and are always placed at the end of the file. They must be located prior to the ID3v1 tag metadata if it is included. The disadvantage is that an entire file needs to be delivered before the lyrics are available.
Work round this in web streaming by delivering separate VTT text tracks along with the audio stream.
Useful Tools
In an automated workflow context, the metadata can be extracted with command-line tools wrapped inside shell scripts that are called to action by the scheduler. There are several free and open-source tools which are highly recommended:
Tool | Description |
---|---|
ExifTool | Although ID3 tags are not strictly EXIF metadata, they serve a similar purpose. ExifTool is written in Perl which is easily supported on any OS platform. This is an astonishingly complete metadata extraction tool for over a hundred different metadata schemas. It also documents all the ID3 tags that it supports. Download it here: https://exiftool.org/ |
ffmpeg | This is an open-source tool for processing and converting video and audio. It has ID3 and MP3 support built in. Install a downloadable binary or acquire the open-source code and compile it directly for your OS platform. The MP3 support is provided by the LAME library. Download it here: https://www.ffmpeg.org/ |
LAME | This a library for building MP3 applications. It is supplied with a command line tool and is used by ffmpeg. LAME is considered to be the best available MP3 encoder for moderately high bit rates. Acquire the open-source code and compile it directly for your OS platform. Download it here: https://lame.sourceforge.io/ |
Use these tools to write new or modified metadata tags back into a file based on the workflow processing results.
Related Standards
These are the relevant standards that you should acquire to support your use of MP3:
Standard | Version | Description |
---|---|---|
Unicode | 15.1.0 | A unified character set that brings together all previous glyph code sets. |
ISO 639 | 2023 | Country names and codes. |
ISO 8859 | Various | There are 16 different parts describing alphabets and character-glyphs for international use. |
ISO 11172-1 | 1999 | MPEG-1 Part 1 - Systems. Describes packaging and stream structures. The latest version is dated 1993 with additional corrigenda published in 1996 and 1999. |
ISO 11172-3 | 1996 | MPEG-1 Part 3 - Audio is the foundation on which the earliest MPEG audio coding is built. The latest version is dated 1993 with a corrigendum published in 1996. |
ISO 11172-5 | 1998 | MPEG-1 Software simulation for reference. |
ISO 13818-1 | 2023 | MPEG-2 Systems. Describes packaging and stream structures. An amendment to codec parameters is in progress. |
ISO 13818-3 | 1998 | MPEG-2 Audio. This is definitive for Layers I, II and III (MP1, MP2 and MP3). |
ISO 13838-5 | 2005 | MPEG-2 Software simulation for reference. |
RFC 3003 | 2000 | The audio/mpeg Media MIME Type. |
RFC 3555 | 2003 | MIME Type Registration of RTP Payload Formats. |
RFC 3625 | 2003 | The QCP File Format and Media Types for Speech Data. |
RFC 4855 | 2007 | Media Type Registration of RTP Payload Formats. |
RFC 4856 | 2007 | Media Type Registration of Payload Formats in the RTP Profile for Audio and Video Conferences. Obsoletes RFC 3555. |
RFC 5219 | 2008 | A Loss-Tolerant RTP Payload Format for MP3 Audio. Obsoletes RFC 3119. |
MUSICAM | - | MPEG Audio Layer II is sometimes described as MUSICAM which is a proprietary brand name and was initially used during the standards development process. |
Conclusion
MPEG-2 does not replace MPEG-1 but augments it with additional features.
The MP3 audio file format is widely used and supported on most platforms and devices. Content creation is easy and metadata editing is a feature of many compatible applications.
The ID3 metadata version 2.3 is an informal standard and widely supported for MP3 and many other media file types.
The patents relevant to MP3 all expired in 2017 and this facilitates the use of open-source libraries such as LAME without risk of litigation.
These Appendix articles contain additional information you may find useful:
Part of a series supported by
You might also like...
Delivering Intelligent Multicast Networks - Part 2
The second half of our exploration of how bandwidth aware infrastructure can improve data throughput, reduce latency and reduce the risk of congestion in IP networks.
If It Ain’t Broke Still Fix It: Part 1 - Reliability
IP is an enabling technology which provides access to the massive compute and GPU resource available both on- and off-prem. However, the old broadcasting adage: if it ain’t broke don’t fix it, is no longer relevant, and potentially hig…
NDI For Broadcast: Part 2 – The NDI Tool Kit
This second part of our mini-series exploring NDI and its place in broadcast infrastructure moves on to exploring the NDI Tools and what they now offer broadcasters.
HDR & WCG For Broadcast: Part 2 - The Production Challenges Of HDR & WCG
Welcome to Part 2 of ‘HDR & WCG For Broadcast’ - a major 10 article exploration of the science and practical applications of all aspects of High Dynamic Range and Wide Color Gamut for broadcast production. Part 2 discusses expanding display capabilities and…
Great Things Happen When We Learn To Work Together
Why doesn’t everything “just work together”? And how much better would it be if it did? This is an in-depth look at the issues around why production and broadcast systems typically don’t work together and how we can change …