Standards: Part 16 - About MP3 Audio Coding & ID3 Metadata

The MP3 audio format has been around for thirty years and has been superseded by several other codecs – so here we discuss why it still has a very strong position in broadcast. We also discuss ID3 metadata tags which often accompany MP3 files.


This article is part of our growing series on Standards.
There is an overview of all 26 articles in Part 1 -  An Introduction To Standards.


The MP3 format is widely supported on almost every computing platform, mobile device, TV receiver and tablet. It is ideal for distribution to consumers even though there are higher quality alternatives such as AAC and MPEG-H.

MP3 is a mathematically lossy but perceptually lossless codec so it is never an optimal solution for mastering, editing and archiving purposes. It is optimized for distributing content to end-users.


MP3 is NOT an abbreviation for MPEG-3.


Why Do We Need The MP3 Codec?

The MP3 compression technology was devised because early Internet connections were unable to deliver audio files in a timely manner. A three-minute raw audio track extracted from a CD creates a 32MB file:

  • Duration - 3 minutes (180 seconds).
  • Sample rate - 44100 samples per second.
  • Sample size - 16-bits.
  • Number of tracks - 2 (Stereo - Left and Right).
  • Resulting file size - 31.752 MB.

MP3 achieves a compression ratio of between 10:1 and 14:1 whilst still preserving almostCD quality. The encoded output file would be approximately 3MB, depending on the content.


Audio delivered on DVD disks for movies and TV programs is sampled at a higher rate of 48kHz. This will create slightly larger files.


Audio compression was originally developed for the MPEG-1 standard. It was then inherited into MPEG-2 and enhanced.

MPEG-1 and 2 happily coexist and are not mutually exclusive. MPEG-2 does not replace MPEG-1 but inherits capabilities from it and augments them with additional features.

How It Works

It is not necessary to fully understand the inner workings of MP3 to utilize it in your workflow. A brief overview here will suffice.

The MPEG Audio coding is organized into three layers which offer different levels of complexity and compression ratios.

Much of this work was done by Fraunhofer IIS collaborating with other international experts. Later, Fraunhofer developed a (non-canonical) variant which they called MPEG 2.5 Layer III. This supports additional bit rates but is not widely adopted.

The next generation AAC encoder refines these techniques to improve the quality and compression ratios.

MP3 players are expected to decode and play all MPEG content but may present an error message when encountering MPEG 2.5 Audio.

The original research was based on Psychoacoustic Analysis. This is the study of sound perception to compile statistics based on listener feedback about quality. A mathematical model is constructed from those statistical results.

The MP3 Perceptual Encoding driven by the statistical model removes components that the human ear would not perceive when they are masked by other louder components. This immediately makes MP3 a lossy codec. It reduces the complexity of the content to yield a better compression ratio. That discarded information cannot be restored by uncompressing the file.

Coding for layers I and II splits the audio into discrete sub-bands which it then analyses individually. Rather like a spectrum analyzer displays a snapshot of the frequency response:

Layer III splits the audio into many more sub-bands and passes overlapping groups of them through a Modified Discrete Cosine Transform (MDCT) to apply a more fine-grained process. This trades complexity and computing load for better compression ratios but introduces more latency and artefacts which need to be removed by post-processing:

The DCT transform yields a series of coefficients which gradually decay down to zero. A quantization/entropy cut-off truncates the coefficients at the point where they are all zero for lossless compression or earlier for lossy compression.

The energy level of each band is measured and compared to a masking threshold value. This is fed back into the Quantizer which moves the entropy coder cut-off point earlier in the series of coefficients. Louder passages are coded less accurately and can withstand a higher level of quantization. This is why MP3 coding is lossy.

The entropy cut-off point determines the amount of quantization noise introduced as an audible artefact. It is constrained within the limits determined by the psychoacoustic statistical analysis to hide the artefacts.

MP3 Joint-stereo processing looks for similarities between the audio channels. Differences depend on how the sound sources are panned across the soundscape during mix-down. Coding a single channel plus the differences compared with a second channel improves the compression ratio.

Audio Layers

Layers I and II were defined during the MPEG-1 standardization. Layer III was added in MPEG-2. The layer names use Roman numerals which may be confusing.

The term MP3 is an abbreviation of MPEG Audio layer III and describes MPEG-1 and MPEG-2 content.


The term MP2 describes MPEG Audio Layer II and is often incorrectly used to describe other kinds of MPEG-2 Audio.


Layer I - This is simpler than Layer II. The frame sizes are smaller which reduces coding delay (latency). It is useful for tele-conferencing applications and was designed for real-time encoding on early hardware systems. Layer I is now deemed to be obsolete.

Layer II - Layer II performs well with orchestral content and delivers results nearly as good as AAC. Players decode this with less computational effort than Layer III. It is more complex than Layer I but yields a better compression ratio.

Layer III - This is designed to operate at a lower bit rate than Layer II. It works quite differently with a much larger number of sub-bands which are processed in overlapping groups with a Modified Discrete Cosine Transform (MDCT) algorithm. Layer III does not handle transients quite as well as Layer II and needs additional pre-echo detection to increase the available bit rate during difficult passages. Additional post processing techniques are necessary to reduce artefacts which increases the computational workload.


For some content, Layer II performs better than Layer III even though it is less efficient.


These are the preferred file type extensions appropriate for the three layers:

File type Content
.mp1 MPEG Audio - Layer I.
.mp2 MPEG Audio - Layer II.
.mp3 MPEG Audio - Layer III.

 

Supported Bit Rates

MPEG-2 adds bit rates in the lower range that MPEG-1 does not already support. These are useful for implementing compression for spoken-word content rather than music. Higher bit rates are undefined in MPEG-2 and are inherited from MPEG-1:

MPEG-1 Layer III MPEG-2 Layer III
- 8 kbps
- 16 kbps
- 24 kbps
32 kbps 32 kbps
40 kbps 40 kbps
48 kbps 48 kbps
56 kbps 56 kbps
64 kbps 64 kbps
80 kbps 80 kbps
96 kbps 96 kbps
112 kbps 112 kbps
128 kbps 128 kbps
- 144 kbps
160 kbps 160 kbps
192 kbps 192 kbps
224 kbps 224 kbps
256 kbps 256 kbps
320 kbps 320 kbps

 

Supported Sample Rates

The lower sample rates added by MPEG-2 support more efficient speech encoding. The higher sample rates are undefined in MPEG-2 and are inherited from MPEG-1:

MPEG-1 Layer III MPEG-2 Layer III
- 16 kHz
- 22.05 kHz
- 24 kHz
32 kHz 32 kHz
44.1 kHz 44.1 kHz
48 kHz 48 kHz

 

Channel Encoding Modes

The MPEG-1 standard only supports 2 channels. These can be configured for mono or stereo applications.

  • Mono - Only one single channel is required.
  • Joint stereo – intensity encoded.
  • Joint stereo – Mid/side encoded (Layer III only).
  • Dual mono - Two uncorrelated mono channels.

MPEG-2 adds four more channels (six in all) to potentially carry 5.1 surround-sound content. Backwards compatible two-channel stereo uses a sub-set of 2 out of the 6 available channels in the same way as MPEG-1:

  • Front - Left - Used when two channel stereo is encoded.
  • Front - Right - Used when two channel stereo is encoded.
  • Front - Centre.
  • Low frequency.
  • Surround (rear) - Left.
  • Surround (rear) - Right.
File Name Extensions

MP3 streams can be packaged into any file type that carries binary data. For example, a .wav file could contain MP3 coded audio. Using inappropriate file types other than the default .mp3 is not recommended.

File ext Disposition Description
.mp1 Obsolete MPEG Audio Layer I encoding.
.m1a Obsolete MPEG Audio Layer I encoding. Alternative but little used.
.mp2 Default MPEG Audio Layer II encoding.
.m2a   MPEG Audio Layer II encoding. Alternative but little used.
.mpa   MPEG Audio Layer II encoding.
.mp2a   MPEG Audio Layer II encoding. Rarely used.
.mp3 Preferred MPEG Audio Layer III encoding. This is the optimal choice.
.m3a   MPEG Audio Layer III encoding. Alternative but little used.
.mpga   MPEG Audio Layer III encoding. Only used by MPEG-1.
.aa Proprietary Special file extension used by Audio Book files.
.mpe   Less popular abbreviation for the .mpeg file format.
.mpg   MPEG content stored in a Program Stream format. Superseded by the .mp4 file type.
.mpeg   Normally expected to carry video and audio but can carry audio only. Superseded by the .mp4 file type.
.bit   Very old legacy archive collections may contain files created before 1995 which use the .bit file extension. Superseded by the .mp3 file type.
.m3u   Playlist file originally intended for use by MP3 and WAV players. Now supports MP4, TS and WEBM files.

 

MIME Types

When serving files with HTTP or uploading them from a browser, the MIME type header in the transaction will indicate the essence format.

The preferred MIME type for MP3 is audio/mpeg which is described in IETF RFC 3003. Inspect the embedded metadata to determine other characteristics of the encoded audio.

Other MIME types may be used by proprietary systems when they deliver content:

Mime type Status Description
audio/mpeg Preferred This is the preferred default MIME type defined in IETF RFC 3003. It can also identify MPEG Audio Layer II and other content.
audio/mpeg3 Deprecated MPEG Audio Layer III files.
audio/MPA Obsoleted MPEG Audio Layer I and II content.
audio/mpa-robust For RTP only A More Loss-Tolerant RTP Payload Format for MP3 Audio. Described in IETF RFC 5219.
audio/mp3 Deprecated Recognized by Google Chrome and Opera browsers.
application/octet-stream Ambiguous Generic binary stream which would not invoke special MP3 handling and is not recommended as it could be carrying a rogue .exe file instead.
audio/mpg Ambiguous Not recommended. Might also carry video content.
audio/x-mpeg-3 Proprietary No longer used but should be recognized when handling incoming feeds. Not registered with IANA.
audio/x-mpg Proprietary Not registered with IANA.
audio/x-mpegaudio Proprietary Not registered with IANA.
audio/x-mpeg Proprietary Not registered with IANA.
audio/x-mp3 Proprietary Not registered with IANA.

 

ID3 Metadata Support

Workflow processes often need to access metadata about the files they are working on. MP3 files support embedded tagging using the ID3 convention. This is an informal standard but is widely used and can embed metadata into several kinds of files other than MP3:

  • AIFF
  • WAV
  • MP4
  • OGG
  • FLAC
  • APE
  • MPC
  • RealAudio

ID3 is not part of the MPEG standards and is managed independently. The informal specification is maintained at the ID3 website:

https://id3.org

Each ID3 tag is stored in one or more frames in the file. Encoded audio is also stored in frames which contain a synchronization pattern that decoders detect to access playable content. ID3 describes a way to ensure it never spuriously triggers that synchronization by avoiding that bit-pattern and thereby hides the metadata from the stream player. Client player apps can access the content in other ways to extract the ID3 metadata by looking specifically for its signature independently of the streaming process.

ID3 tags were originally designed for annotating tracks imported from music CDs. Typical and obvious tags are:

  • Song title.
  • Artist name.
  • Album name.
  • Track number.

There are many other tags described in the ID3 specification and more have been added as proprietary and de-facto extensions.

ID3 Versions

Over several revisions, the ID3 metadata structures have evolved and there are several ways in which the metadata might be optionally embedded in the file. New versions must be backwards compatible and not break earlier implementations. ID3 metadata tags conforming to version 1 are always placed at the tail end of the file. Version 2 tags are placed at the front.

The version 2.4 specification allows the tag metadata to be placed at the end of the file. It must precede the version 1 metadata to avoid breaking older players. Version 2.3 is the most popular kind of tagging and places the metadata only at the front of the file.

Version Disposition Description
1 Obsolete Fixed format suffix appended to the end of the file. Carries the title, artist, album, and a short comment. These are all limited to 30 characters. A year number is added and a value representing a genre from an indexed list.
1.1 Obsolete Track numbers added by shortening the comment field.
1.2 Obsolete Text fields increased in length and a sub-genre field added. Backwards compatibility with earlier versions was maintained but this version was never widely adopted.
2 Obsolete The format and structure is completely revised. It is constructed from multiple frames that can each grow to 16MB within a total capacity of 256MB. This metadata is now placed at the front of the file so it is immediately available when streaming the MP3 content. Unicode compatible text strings.
2.2 Obsolete Tag identifiers limited to three characters.
2.3 Most popular Added album sleeve artwork images and disc numbers for boxed sets. Tag names are four characters. Added the disc number tag.
2.3+ Current Chapter marks added with support for displaying synchronized slide show images. Very useful for podcasts.
2.4 Latest Additional frame types and text frames can contain multiple NULL separated values. Tags can be stored at the start or end of the file.
2.4+ Latest The same chapter mark support is added as per 2.3+.

 

ID3 Tag Names

From version 2.3 onwards, tag names are described with four letters instead of three which were defined in the earlier versions. Tag translation is necessary when converting the metadata. Where the tags are localized for international use, an additional three letter ISO 639 county-code is added. A non-definitive list of country codes is also available on Wikipedia.

Version 2.3 facilitates image embedding for various purposes such as album cover artworks. The tag describes how the image is to be used. PNG is the optimal image type but JPEG and GIF are also supported.

Some players and metadata browsing systems may have difficulty in rendering a PNG file if it has an alpha channel to cookie cut the image to a non-rectangular shape. You might do that to display an image of a scanned CD or vinyl album disc.

Here is an informal (third-party) description of the version 2.3 standard which enumerates the tags and describes how they all work. This supplements the id3.org documentation:

https://mutagen-specs.readthedocs.io/en/latest/id3/id3v2.3.0.html

Support For Lyrics & Subtitles

Lyric tags were defined before ID3v2 and are always placed at the end of the file. They must be located prior to the ID3v1 tag metadata if it is included. The disadvantage is that an entire file needs to be delivered before the lyrics are available.

Work round this in web streaming by delivering separate VTT text tracks along with the audio stream.

Useful Tools

In an automated workflow context, the metadata can be extracted with command-line tools wrapped inside shell scripts that are called to action by the scheduler. There are several free and open-source tools which are highly recommended:

Tool Description
ExifTool Although ID3 tags are not strictly EXIF metadata, they serve a similar purpose. ExifTool is written in Perl which is easily supported on any OS platform. This is an astonishingly complete metadata extraction tool for over a hundred different metadata schemas. It also documents all the ID3 tags that it supports. Download it here: https://exiftool.org/
ffmpeg This is an open-source tool for processing and converting video and audio. It has ID3 and MP3 support built in. Install a downloadable binary or acquire the open-source code and compile it directly for your OS platform. The MP3 support is provided by the LAME library. Download it here: https://www.ffmpeg.org/
LAME This a library for building MP3 applications. It is supplied with a command line tool and is used by ffmpeg. LAME is considered to be the best available MP3 encoder for moderately high bit rates. Acquire the open-source code and compile it directly for your OS platform. Download it here: https://lame.sourceforge.io/

 

Use these tools to write new or modified metadata tags back into a file based on the workflow processing results.

Related Standards

These are the relevant standards that you should acquire to support your use of MP3:

Standard Version Description
Unicode 15.1.0 A unified character set that brings together all previous glyph code sets.
ISO 639 2023 Country names and codes.
ISO 8859 Various There are 16 different parts describing alphabets and character-glyphs for international use.
ISO 11172-1 1999 MPEG-1 Part 1 - Systems. Describes packaging and stream structures. The latest version is dated 1993 with additional corrigenda published in 1996 and 1999.
ISO 11172-3 1996 MPEG-1 Part 3 - Audio is the foundation on which the earliest MPEG audio coding is built. The latest version is dated 1993 with a corrigendum published in 1996.
ISO 11172-5 1998 MPEG-1 Software simulation for reference.
ISO 13818-1 2023 MPEG-2 Systems. Describes packaging and stream structures. An amendment to codec parameters is in progress.
ISO 13818-3 1998 MPEG-2 Audio. This is definitive for Layers I, II and III (MP1, MP2 and MP3).
ISO 13838-5 2005 MPEG-2 Software simulation for reference.
RFC 3003 2000 The audio/mpeg Media MIME Type.
RFC 3555 2003 MIME Type Registration of RTP Payload Formats.
RFC 3625 2003 The QCP File Format and Media Types for Speech Data.
RFC 4855 2007 Media Type Registration of RTP Payload Formats.
RFC 4856 2007 Media Type Registration of Payload Formats in the RTP Profile for Audio and Video Conferences. Obsoletes RFC 3555.
RFC 5219 2008 A Loss-Tolerant RTP Payload Format for MP3 Audio. Obsoletes RFC 3119.
MUSICAM - MPEG Audio Layer II is sometimes described as MUSICAM which is a proprietary brand name and was initially used during the standards development process.

 

Conclusion

MPEG-2 does not replace MPEG-1 but augments it with additional features.

The MP3 audio file format is widely used and supported on most platforms and devices. Content creation is easy and metadata editing is a feature of many compatible applications.

The ID3 metadata version 2.3 is an informal standard and widely supported for MP3 and many other media file types.

The patents relevant to MP3 all expired in 2017 and this facilitates the use of open-source libraries such as LAME without risk of litigation.

Part of a series supported by

You might also like...

Designing IP Broadcast Systems - The Book

Designing IP Broadcast Systems is another massive body of research driven work - with over 27,000 words in 18 articles, in a free 84 page eBook. It provides extensive insight into the technology and engineering methodology required to create practical IP based broadcast…

Demands On Production With HDR & WCG

The adoption of HDR requires adjustments in workflow that place different requirements on both people and technology, especially when multiple formats are required simultaneously.

If It Ain’t Broke Still Fix It: Part 2 - Security

The old broadcasting adage: ‘if it ain’t broke don’t fix it’ is no longer relevant and potentially highly dangerous, especially when we consider the security implications of not updating software and operating systems.

Standards: Part 21 - The MPEG, AES & Other Containers

Here we discuss how raw essence data needs to be serialized so it can be stored in media container files. We also describe the various media container file formats and their evolution.

NDI For Broadcast: Part 3 – Bridging The Gap

This third and for now, final part of our mini-series exploring NDI and its place in broadcast infrastructure moves on to a trio of tools released with NDI 5.0 which are all aimed at facilitating remote and collaborative workflows; NDI Audio,…