Standards: Part 16 - About MP3 Audio Coding & ID3 Metadata

The MP3 audio format has been around for thirty years and has been superseded by several other codecs – so here we discuss why it still has a very strong position in broadcast. We also discuss ID3 metadata tags which often accompany MP3 files.

This article is part of our growing series on Broadcast Standards.
The first 26 articles are now available in Broadcast Standards – The Book.

The MP3 format is widely supported on almost every computing platform, mobile device, TV receiver and tablet. It is ideal for distribution to consumers even though there are higher quality alternatives such as AAC and MPEG-H.

MP3 is a mathematically lossy but perceptually lossless codec so it is never an optimal solution for mastering, editing and archiving purposes. It is optimized for distributing content to end-users.

MP3 is NOT an abbreviation for MPEG-3.

Why Do We Need The MP3 Codec?

The MP3 compression technology was devised because early Internet connections were unable to deliver audio files in a timely manner. A three-minute raw audio track extracted from a CD creates a 32MB file:

Duration - 3 minutes (180 seconds).
Sample rate - 44100 samples per second.
Sample size - 16-bits.
Number of tracks - 2 (Stereo - Left and Right).
Resulting file size - 31.752 MB.

MP3 achieves a compression ratio of between 10:1 and 14:1 whilst still preserving almostCD quality. The encoded output file would be approximately 3MB, depending on the content.

Audio delivered on DVD disks for movies and TV programs is sampled at a higher rate of 48kHz. This will create slightly larger files.

Audio compression was originally developed for the MPEG-1 standard. It was then inherited into MPEG-2 and enhanced.

MPEG-1 and 2 happily coexist and are not mutually exclusive. MPEG-2 does not replace MPEG-1 but inherits capabilities from it and augments them with additional features.

How It Works

It is not necessary to fully understand the inner workings of MP3 to utilize it in your workflow. A brief overview here will suffice.

The MPEG Audio coding is organized into three layers which offer different levels of complexity and compression ratios.

Much of this work was done by Fraunhofer IIS collaborating with other international experts. Later, Fraunhofer developed a (non-canonical) variant which they called MPEG 2.5 Layer III. This supports additional bit rates but is not widely adopted.

The next generation AAC encoder refines these techniques to improve the quality and compression ratios.

MP3 players are expected to decode and play all MPEG content but may present an error message when encountering MPEG 2.5 Audio.

The original research was based on Psychoacoustic Analysis. This is the study of sound perception to compile statistics based on listener feedback about quality. A mathematical model is constructed from those statistical results.

The MP3 Perceptual Encoding driven by the statistical model removes components that the human ear would not perceive when they are masked by other louder components. This immediately makes MP3 a lossy codec. It reduces the complexity of the content to yield a better compression ratio. That discarded information cannot be restored by uncompressing the file.

Coding for layers I and II splits the audio into discrete sub-bands which it then analyses individually. Rather like a spectrum analyzer displays a snapshot of the frequency response:

Layer III splits the audio into many more sub-bands and passes overlapping groups of them through a Modified Discrete Cosine Transform (MDCT) to apply a more fine-grained process. This trades complexity and computing load for better compression ratios but introduces more latency and artefacts which need to be removed by post-processing:

The DCT transform yields a series of coefficients which gradually decay down to zero. A quantization/entropy cut-off truncates the coefficients at the point where they are all zero for lossless compression or earlier for lossy compression.

The energy level of each band is measured and compared to a masking threshold value. This is fed back into the Quantizer which moves the entropy coder cut-off point earlier in the series of coefficients. Louder passages are coded less accurately and can withstand a higher level of quantization. This is why MP3 coding is lossy.

The entropy cut-off point determines the amount of quantization noise introduced as an audible artefact. It is constrained within the limits determined by the psychoacoustic statistical analysis to hide the artefacts.

MP3 Joint-stereo processing looks for similarities between the audio channels. Differences depend on how the sound sources are panned across the soundscape during mix-down. Coding a single channel plus the differences compared with a second channel improves the compression ratio.

Audio Layers

Layers I and II were defined during the MPEG-1 standardization. Layer III was added in MPEG-2. The layer names use Roman numerals which may be confusing.

The term MP3 is an abbreviation of MPEG Audio layer III and describes MPEG-1 and MPEG-2 content.

The term MP2 describes MPEG Audio Layer II and is often incorrectly used to describe other kinds of MPEG-2 Audio.

Layer I - This is simpler than Layer II. The frame sizes are smaller which reduces coding delay (latency). It is useful for tele-conferencing applications and was designed for real-time encoding on early hardware systems. Layer I is now deemed to be obsolete.

Layer II - Layer II performs well with orchestral content and delivers results nearly as good as AAC. Players decode this with less computational effort than Layer III. It is more complex than Layer I but yields a better compression ratio.

Layer III - This is designed to operate at a lower bit rate than Layer II. It works quite differently with a much larger number of sub-bands which are processed in overlapping groups with a Modified Discrete Cosine Transform (MDCT) algorithm. Layer III does not handle transients quite as well as Layer II and needs additional pre-echo detection to increase the available bit rate during difficult passages. Additional post processing techniques are necessary to reduce artefacts which increases the computational workload.

For some content, Layer II performs better than Layer III even though it is less efficient.

These are the preferred file type extensions appropriate for the three layers:

File type	Content
.mp1	MPEG Audio - Layer I.
.mp2	MPEG Audio - Layer II.
.mp3	MPEG Audio - Layer III.

Supported Bit Rates

MPEG-2 adds bit rates in the lower range that MPEG-1 does not already support. These are useful for implementing compression for spoken-word content rather than music. Higher bit rates are undefined in MPEG-2 and are inherited from MPEG-1:

MPEG-1 Layer III	MPEG-2 Layer III
-	8 kbps
-	16 kbps
-	24 kbps
32 kbps	32 kbps
40 kbps	40 kbps
48 kbps	48 kbps
56 kbps	56 kbps
64 kbps	64 kbps
80 kbps	80 kbps
96 kbps	96 kbps
112 kbps	112 kbps
128 kbps	128 kbps
-	144 kbps
160 kbps	160 kbps
192 kbps	192 kbps
224 kbps	224 kbps
256 kbps	256 kbps
320 kbps	320 kbps

Supported Sample Rates

The lower sample rates added by MPEG-2 support more efficient speech encoding. The higher sample rates are undefined in MPEG-2 and are inherited from MPEG-1:

MPEG-1 Layer III	MPEG-2 Layer III
-	16 kHz
-	22.05 kHz
-	24 kHz
32 kHz	32 kHz
44.1 kHz	44.1 kHz
48 kHz	48 kHz

Channel Encoding Modes

The MPEG-1 standard only supports 2 channels. These can be configured for mono or stereo applications.

Mono - Only one single channel is required.
Joint stereo – intensity encoded.
Joint stereo – Mid/side encoded (Layer III only).
Dual mono - Two uncorrelated mono channels.

MPEG-2 adds four more channels (six in all) to potentially carry 5.1 surround-sound content. Backwards compatible two-channel stereo uses a sub-set of 2 out of the 6 available channels in the same way as MPEG-1:

Front - Left - Used when two channel stereo is encoded.
Front - Right - Used when two channel stereo is encoded.
Front - Centre.
Low frequency.
Surround (rear) - Left.
Surround (rear) - Right.

File Name Extensions

MP3 streams can be packaged into any file type that carries binary data. For example, a .wav file could contain MP3 coded audio. Using inappropriate file types other than the default .mp3 is not recommended.

File ext	Disposition	Description
.mp1	Obsolete	MPEG Audio Layer I encoding.
.m1a	Obsolete	MPEG Audio Layer I encoding. Alternative but little used.
.mp2	Default	MPEG Audio Layer II encoding.
.m2a		MPEG Audio Layer II encoding. Alternative but little used.
.mpa		MPEG Audio Layer II encoding.
.mp2a		MPEG Audio Layer II encoding. Rarely used.
.mp3	Preferred	MPEG Audio Layer III encoding. This is the optimal choice.
.m3a		MPEG Audio Layer III encoding. Alternative but little used.
.mpga		MPEG Audio Layer III encoding. Only used by MPEG-1.
.aa	Proprietary	Special file extension used by Audio Book files.
.mpe		Less popular abbreviation for the .mpeg file format.
.mpg		MPEG content stored in a Program Stream format. Superseded by the .mp4 file type.
.mpeg		Normally expected to carry video and audio but can carry audio only. Superseded by the .mp4 file type.
.bit		Very old legacy archive collections may contain files created before 1995 which use the .bit file extension. Superseded by the .mp3 file type.
.m3u		Playlist file originally intended for use by MP3 and WAV players. Now supports MP4, TS and WEBM files.

MIME Types

When serving files with HTTP or uploading them from a browser, the MIME type header in the transaction will indicate the essence format.

The preferred MIME type for MP3 is audio/mpeg which is described in IETF RFC 3003. Inspect the embedded metadata to determine other characteristics of the encoded audio.

Other MIME types may be used by proprietary systems when they deliver content:

Mime type	Status	Description
audio/mpeg	Preferred	This is the preferred default MIME type defined in IETF RFC 3003. It can also identify MPEG Audio Layer II and other content.
audio/mpeg3	Deprecated	MPEG Audio Layer III files.
audio/MPA	Obsoleted	MPEG Audio Layer I and II content.
audio/mpa-robust	For RTP only	A More Loss-Tolerant RTP Payload Format for MP3 Audio. Described in IETF RFC 5219.
audio/mp3	Deprecated	Recognized by Google Chrome and Opera browsers.
application/octet-stream	Ambiguous	Generic binary stream which would not invoke special MP3 handling and is not recommended as it could be carrying a rogue .exe file instead.
audio/mpg	Ambiguous	Not recommended. Might also carry video content.
audio/x-mpeg-3	Proprietary	No longer used but should be recognized when handling incoming feeds. Not registered with IANA.
audio/x-mpg	Proprietary	Not registered with IANA.
audio/x-mpegaudio	Proprietary	Not registered with IANA.
audio/x-mpeg	Proprietary	Not registered with IANA.
audio/x-mp3	Proprietary	Not registered with IANA.

ID3 Metadata Support

Workflow processes often need to access metadata about the files they are working on. MP3 files support embedded tagging using the ID3 convention. This is an informal standard but is widely used and can embed metadata into several kinds of files other than MP3:

AIFF
WAV
MP4
OGG
FLAC
APE
MPC
RealAudio

ID3 is not part of the MPEG standards and is managed independently. The informal specification is maintained at the ID3 website:

https://id3.org

Each ID3 tag is stored in one or more frames in the file. Encoded audio is also stored in frames which contain a synchronization pattern that decoders detect to access playable content. ID3 describes a way to ensure it never spuriously triggers that synchronization by avoiding that bit-pattern and thereby hides the metadata from the stream player. Client player apps can access the content in other ways to extract the ID3 metadata by looking specifically for its signature independently of the streaming process.

ID3 tags were originally designed for annotating tracks imported from music CDs. Typical and obvious tags are:

Song title.
Artist name.
Album name.
Track number.

There are many other tags described in the ID3 specification and more have been added as proprietary and de-facto extensions.

ID3 Versions

Over several revisions, the ID3 metadata structures have evolved and there are several ways in which the metadata might be optionally embedded in the file. New versions must be backwards compatible and not break earlier implementations. ID3 metadata tags conforming to version 1 are always placed at the tail end of the file. Version 2 tags are placed at the front.

The version 2.4 specification allows the tag metadata to be placed at the end of the file. It must precede the version 1 metadata to avoid breaking older players. Version 2.3 is the most popular kind of tagging and places the metadata only at the front of the file.

Version	Disposition	Description
1	Obsolete	Fixed format suffix appended to the end of the file. Carries the title, artist, album, and a short comment. These are all limited to 30 characters. A year number is added and a value representing a genre from an indexed list.
1.1	Obsolete	Track numbers added by shortening the comment field.
1.2	Obsolete	Text fields increased in length and a sub-genre field added. Backwards compatibility with earlier versions was maintained but this version was never widely adopted.
2	Obsolete	The format and structure is completely revised. It is constructed from multiple frames that can each grow to 16MB within a total capacity of 256MB. This metadata is now placed at the front of the file so it is immediately available when streaming the MP3 content. Unicode compatible text strings.
2.2	Obsolete	Tag identifiers limited to three characters.
2.3	Most popular	Added album sleeve artwork images and disc numbers for boxed sets. Tag names are four characters. Added the disc number tag.
2.3+	Current	Chapter marks added with support for displaying synchronized slide show images. Very useful for podcasts.
2.4	Latest	Additional frame types and text frames can contain multiple NULL separated values. Tags can be stored at the start or end of the file.
2.4+	Latest	The same chapter mark support is added as per 2.3+.

ID3 Tag Names

From version 2.3 onwards, tag names are described with four letters instead of three which were defined in the earlier versions. Tag translation is necessary when converting the metadata. Where the tags are localized for international use, an additional three letter ISO 639 county-code is added. A non-definitive list of country codes is also available on Wikipedia.

Version 2.3 facilitates image embedding for various purposes such as album cover artworks. The tag describes how the image is to be used. PNG is the optimal image type but JPEG and GIF are also supported.

Some players and metadata browsing systems may have difficulty in rendering a PNG file if it has an alpha channel to cookie cut the image to a non-rectangular shape. You might do that to display an image of a scanned CD or vinyl album disc.

Here is an informal (third-party) description of the version 2.3 standard which enumerates the tags and describes how they all work. This supplements the id3.org documentation:

https://mutagen-specs.readthedocs.io/en/latest/id3/id3v2.3.0.html

Support For Lyrics & Subtitles

Lyric tags were defined before ID3v2 and are always placed at the end of the file. They must be located prior to the ID3v1 tag metadata if it is included. The disadvantage is that an entire file needs to be delivered before the lyrics are available.

Work round this in web streaming by delivering separate VTT text tracks along with the audio stream.

Useful Tools

In an automated workflow context, the metadata can be extracted with command-line tools wrapped inside shell scripts that are called to action by the scheduler. There are several free and open-source tools which are highly recommended:

Tool	Description
ExifTool	Although ID3 tags are not strictly EXIF metadata, they serve a similar purpose. ExifTool is written in Perl which is easily supported on any OS platform. This is an astonishingly complete metadata extraction tool for over a hundred different metadata schemas. It also documents all the ID3 tags that it supports. Download it here: https://exiftool.org/
ffmpeg	This is an open-source tool for processing and converting video and audio. It has ID3 and MP3 support built in. Install a downloadable binary or acquire the open-source code and compile it directly for your OS platform. The MP3 support is provided by the LAME library. Download it here: https://www.ffmpeg.org/
LAME	This a library for building MP3 applications. It is supplied with a command line tool and is used by ffmpeg. LAME is considered to be the best available MP3 encoder for moderately high bit rates. Acquire the open-source code and compile it directly for your OS platform. Download it here: https://lame.sourceforge.io/

Use these tools to write new or modified metadata tags back into a file based on the workflow processing results.

Related Standards

These are the relevant standards that you should acquire to support your use of MP3:

Standard	Version	Description
Unicode	15.1.0	A unified character set that brings together all previous glyph code sets.
ISO 639	2023	Country names and codes.
ISO 8859	Various	There are 16 different parts describing alphabets and character-glyphs for international use.
ISO 11172-1	1999	MPEG-1 Part 1 - Systems. Describes packaging and stream structures. The latest version is dated 1993 with additional corrigenda published in 1996 and 1999.
ISO 11172-3	1996	MPEG-1 Part 3 - Audio is the foundation on which the earliest MPEG audio coding is built. The latest version is dated 1993 with a corrigendum published in 1996.
ISO 11172-5	1998	MPEG-1 Software simulation for reference.
ISO 13818-1	2023	MPEG-2 Systems. Describes packaging and stream structures. An amendment to codec parameters is in progress.
ISO 13818-3	1998	MPEG-2 Audio. This is definitive for Layers I, II and III (MP1, MP2 and MP3).
ISO 13838-5	2005	MPEG-2 Software simulation for reference.
RFC 3003	2000	The audio/mpeg Media MIME Type.
RFC 3555	2003	MIME Type Registration of RTP Payload Formats.
RFC 3625	2003	The QCP File Format and Media Types for Speech Data.
RFC 4855	2007	Media Type Registration of RTP Payload Formats.
RFC 4856	2007	Media Type Registration of Payload Formats in the RTP Profile for Audio and Video Conferences. Obsoletes RFC 3555.
RFC 5219	2008	A Loss-Tolerant RTP Payload Format for MP3 Audio. Obsoletes RFC 3119.
MUSICAM	-	MPEG Audio Layer II is sometimes described as MUSICAM which is a proprietary brand name and was initially used during the standards development process.

Conclusion

MPEG-2 does not replace MPEG-1 but augments it with additional features.

The MP3 audio file format is widely used and supported on most platforms and devices. Content creation is easy and metadata editing is a feature of many compatible applications.

The ID3 metadata version 2.3 is an informal standard and widely supported for MP3 and many other media file types.

The patents relevant to MP3 all expired in 2017 and this facilitates the use of open-source libraries such as LAME without risk of litigation.

These Appendix articles contain additional information you may find useful:

Part of a series supported by

You might also like...

IP Monitoring & Diagnostics With Command Line Tools: Part 6 - Advanced Command Line Tools

We continue our series with some small code examples that will make your monitoring and diagnostic scripts more robust and reliable

Building Software Defined Infrastructure: Monitoring Microservices

Breaking production systems into individual microservice based processors, requires monitoring over IP via RESTful APIs and a database system to capture the results.

Monitoring & Compliance In Broadcast: Monitoring QoS & QoE To Power Monetization

Measuring Quality of Experience (QoE) as perceived by viewers has become critical for monetization both from targeted advertising and direct content consumption.

IP Monitoring & Diagnostics With Command Line Tools: Part 5 - Using Shell Scripts

Shell scripts enable you to edit your diagnostic and monitoring commands into a script file so they can be repeated without needing to type them manually every time. Shell scripts also offer some unique and powerful features that help to…

Building Software Defined Infrastructure: Observability In Microservice Architecture

Building dynamic microservices based infrastructure introduces the potential for variable latency which brings new monitoring challenges that require an understanding of observability.