Standards: Part 22 - Inside AIFF Files

Compared with other popular standards in use, AIFF is ancient. The core functionality was stabilized over 30 years ago and remains unchanged.


This article is part of our growing series on Standards.
There is an overview of all 26 articles in Part 1 -  An Introduction To Standards.


Compared with other popular standards in use, Apple's Audio Interchange File Format (AIFF) is ancient. The core functionality was stabilized over 30 years ago and has not changed since then. Given the pace of change in our industry that is a remarkable achievement.

AIFF files contain raw audio samples. This is optimum for production use. Most editing tools can open and save AIFF content after editing.

The embedded metadata describes the sample rates, track configuration and bit-depths for the audio essence.

Despite its lack of complexity, AIFF is a versatile format and not likely to go out of use soon. Not least because of the massive legacy of archived recordings in this format.

Compressed Audio Storage With AIFF-C

The original classic AIFF files only contain uncompressed data. The AIFF-C specification was published as a replacement for version 1.3 of the original AIFF document and allows any kind of sound data to be carried. This has become the de-facto default format. When encountering what appears to be a classic AIFF file, it is likely to be an AIFF-C file.

Searching online for documentation only reveals a draft document. Additional supporting information can be found in this developer documentation from Apple:

• Original AIFF Version 1.3 specification
• Sound Manager.
• Core Audio.
• QuickTime version 4 to 7.
• Source code header files AIFF.h and Sound.h in the QuickTime developer kit.
• AV Foundation.

There are a few important differences between AIFF and AIFF-C files:

Element AIFF AIFF-C
The FORM type identifier 'AIFF' 'AIFC'
FVER chunks Never present. Always present.
COMM chunk Four properties describing the sampling structure. Six properties describing sampling and the compression codec wrapper used for the SSND chunk.
SSND chunk Always Big-Endian data. Can be any format, compressed or uncompressed.
Preferred file extension .aiff .aifc

 

File Extensions

Do not rely on the file extension to determine the exact format of an AIFF or AIFF-C file. It is helpful for invoking an AIFF parser or player but then inspect the FORM and FVER chunks to properly identify the file content.

Extension Description
.aiff Preferred for classic AIFF files but also used for AIFF-C files which can be determined from the FORM and FVER chunks inside the file.
.aif Less common file extension for platforms that cannot support more than 3-character file extensions.
.aifc Preferred for AIFF-C files.
.caf This is a Core Audio File but AIFF data may be carried inside it when used for sample loops in GarageBand and Logic Pro.

 

Multi-byte Data Formats

When AIFF was invented, CPU microprocessors had 8-bit architectures. With the enhancement of CPU chips to 16-bits, then 32 and 64-bits, the ordering of the bytes becomes important. The two principal CPU manufacturers (Motorola and Intel) chose opposing byte arrangements.

Since Apple based the original Macintosh design on Motorola 68000 and then PowerPC microprocessors, the data was organized with the highest ordered byte first. This is called Big-Endian.

Conversely, Intel arranged their data with the lowest ordered byte first. This is called Little-Endian. Since Windows PCs were based on the Intel architecture, this required some format conversion when moving content between the two platforms.

Apple later adopted Intel processors and have now migrated to their own CPU design. Apple Silicon is based on the ARM architecture. This is the third CPU architecture migration that Apple has undergone with the Macintosh operating system and their implementation of the AIFF-C support ensured that it worked seamlessly.

Multiple Channel Support

AIFF can theoretically support an unlimited number of channels. The samples are interleaved together so that samples for a stereo pair (left and right channels) are stored adjacently. A group of samples stored like this across all the channels is called a Sample Frame.

Monaural sound is just a sequence of single samples so the sample framing is implied.

Multiple channels are mapped to numbers according to this grid (extracted from the specification). A sample frame spans all of the channels in use.

There are other alternative channel mapping arrangements. The specification does not include 7.1 surround configurations for example.

Sample Sizes

AIFF supports sample sizes from 1 to 32-bits. Interestingly, music can be produced with a simple square wave by adjusting the on and off duty cycle in the time domain. So, specifying 1-bit audio is not such a crazy idea after all.

Realistically you would probably need at least 5-bits for low-quality speech coding. Sampling music at 8-bits delivers poor quality and 16-bits is considered a minimum for most applications. Larger sample sizes are useful in studio and production environments.

Sample size Storage arrangement
1-bit to 8-bits. One byte per sample.
9-bits to 16-bits. Two bytes per sample.
17-bits to 24-bits. Three bytes per sample.
25-bits to 32-bits. Four bytes per sample.

 

If the sample size does not fully occupy the bytes in the storage arrangement, it is left justified and right padded with zero bits. This example shows how a 12-bit sample is placed into a 16-bit word.

Sample Rates

The AIFF specification does not mandate any sample rates. The sample frames are delivered at a rate determined by an IEEE 64-bit double-precision floating point value in the COMM chunk. The range of possible values supported by this floating-point value far exceeds any sample rates we might ever encounter.

Chunk Format Details

AIFF files are organized into chunks of data. Each chunk has a header and a payload body.

A chunk header starts with a 32-bit long word containing a FourCC chunk type value. This determines how the application software should interpret the payload. The size of the payload is another 32-bit value. This size value does not include the 8 bytes of header data.

The chunk type ID is composed of four printable ASCII compatible characters packaged into a 32-bit long-word. The space character and other punctuation symbols are permitted. This was described in a previous article as a FourCC code.

In some scenarios, non-printing characters are used which require special software to interpret and present in a human readable form.

The entire chunk must be constructed with an even number of bytes. Zero padding bytes must be added to variable length chunks such as the SSND sound data chunks to ensure they meet this criterion.

The chunk size value facilitates a rapid walk through the AIFF file by aggregating the 8-bytes of header data and the chunk size with the current index position in the file. Adding the size value to the current index easily locates the start of the next chunk.

File Format Details

AIFF files are based on the IFF format originally designed by Electronic Arts in 1985. The first chunk in an AIFF file is always a container whose FourCC type code is 'FORM'. The rest of the chunks in the file are semantically nested inside it.

This chunk defines the file type more reliably than the file extension. The length value in the header defines the data length minus the 8-byte header for this containing chunk.

After that, the chunks can be presented in any order. The second and subsequent chunks are described as Local data.

The maximum file size is 4 Gigabytes. This is the only constraint on the number of channels or the run-time of the sampled sound data.

These chunk types are mandatory and must be present:

FourCC ID Description
FORM Form chunks describe the file format and act as a container for the rest of the chunks.
FVER Format Version descriptor describing the AIFF-C revision date. Only present in AIFF-C files and not present in older AIFF files.
COMM Common data describing the fundamental attributes of the sampled sound.
SSND Sound sample frames containing the essence data. Only present if the COMM chunk describes the presence of sample data.

 

Here is an illustration showing a simple AIFF-C file with four chunks. The brackets indicate the example value stored in each property. The shaded boxes represent the individual bytes.

  • 1 box = 8-bit byte.
  • 2 boxes = 16-bit word.
  • 4 boxes = 32-bit long-word.
  • 8 boxes = 64 bit double-precision value.
  • 16 boxes = Illustrates a container for a text string in this example.

FORM - File Format Descriptor Chunk

The FORM chunks describe the file format as AIFF or AIFF-C. This is the outermost container and must always be present. It will always be the first chunk in the AIFF file. The foundational IFF specification allows for a range of nested containers similar to the ISOBMFF file structure. The AIFF specification profiles this behaviour to constrain it to one single top-level FORM container.

The payload of the FORM chunk is a single FourCC code that identifies the file type as one of these values:

Type codes Description
AIFF Generic Big-Endian classic AIFF file.
AIFC AIFF-C files containing compressed or Little-Endian uncompressed data.
AIFS A deprecated file type used during initial development of the format. A few files of this type have escaped into the wild. File readers should reject these files as not being compatible with the AIFF specification.

 

This chunk accurately describes the overall length of the data in the file. This is more reliable than deriving it from the value provided by the OS file system manager.

The form type value will either be 'AIFF' or 'AIFC'. Note that this may not be consistent with the file extension on the physical file.

Refer to Annex A in the AIFF-C specification for examples of how a FORM container is constructed.

FVER - Format Version Chunk

Format Version (FVER) chunks have a 4-byte payload containing a version descriptor for the applicable version of the specification. The value is unique to each revision and rendered as a timestamp. This timestamp is the release date of the specification that the file conforms to.

An FVER chunk will only be present in an AIFF-C file and was not supported in the earlier classic AIFF files.

Note that this is not related to the creation or modification date of the physical file in any way. This is a mandatory item and only one FVER must be present in the file.

COMM - Common Data Chunk

The Common chunk describes fundamental parameters for the sampled data. This chunk must be present. Without it, the player cannot unpack and stream the sample frames.

The basic format for classic AIFF files carries an 18-byte payload with the following properties:

• Number of channels which affects the size of the sample frame structure.
• Total number of sample frames in the SSND chunk.
• Number of bits-per-sample. This describes how to extract the sample data after stripping off the padding zero-bits.
• The sample-rate of the audio. It is stored as an IEEE double-precision floating point value. It describes the rate for sample frames per second. This delivers a sample for all channels simultaneously.

The Extended Common chunk in the AIFF-C files has two additional parameters that increase size of the payload. Most AIFF files use this format now:

• A FourCC compression type that describes the audio codec being used.
• A human readable name for the compression type.

Note that the compression type value is case sensitive with similar upper and lower-case variants. Consult their own specifications to ascertain whether they are otherwise identical.

Many earlier codecs that were used with AIFF-C are now obsolete and have been superseded by MPEG and other standards. These remaining codec types are still relevant and useful:

Compression type SSND format Description
NONE Big-Endian Raw uncompressed samples.
sowt Little-Endian Raw uncompressed byte swopped samples when compared with 'NONE'.
fl32 IEEE-32 32-bit floating point.
fl64 IEEE-64 64-bit floating point.
alaw 8-bit samples ITU-T G.711 ALaw 2:1.
ulaw 8-bit samples ITU-T G.711 μLaw 2:1.

 

SSND - Sampled Sound Data Chunk

The audio essence is stored in a Sampled Sound data chunk. Although the chunks can appear in any order, this chunk is normally placed at the end of the file. There will only be one SSND chunk with all the sample frames contained within it.

A SSND chunk must be present if the number of sample frames described in the COMM chunk is non-zero.

The SSND header contains these properties:

  • Size of the sound data chunk (not including the header data).
  • Offset to the first sample frame at the start of the playable sound. This could adjust the in-point for playback. It is normally set to zero to play sample frames from the beginning.
  • Size of alignment blocks. This indicates the size in bytes of the blocks that the audio data is packaged into. It is used in conjunction with the offset value. Most applications do not use this and it is usually set to zero. Block alignment speeds up disk access for real-time recording applications.

The sampled sound data immediately follows the block-size value in the header.

The sound data must contain an even number of bytes. A padding zero-byte might be added to the end to ensure the samples finish on a word boundary. Additional zero-byte padding may be necessary if alignment blocks are used.

Other Optional AIFF Chunk Types

These chunk types are optional and may not be present in an AIFF file. This is compiled from a variety of sources and includes some non-standard items. It may help when disassembling AIFF files. Some chunk types may only be supported by specific implementations.

FourCC ID Description
MARK Markers point to uncompressed sample frame locations. They are used by instruments to define loop points or as chapter marks and cue points for UI controls.
INST Instrument description that configures the sound generation.
MIDI MIDI Data containing system exclusive data, note on/off signals and controller instructions.
AESD Recording device configuration.
APPL Application specific information.
SAXL Hardware sound accelerator configuration and parameters. This was experimental when the AIFF-C specification was being compiled. Refer to Annex D for more information.
COMT Comment texts that describe the file content.
NAME Name of the sampled audio.
AUTH Author/creator of the recording.
(c){space-character} Copyright notice and date. Note the use of punctuation characters and the trailing space-character.
ANNO Annotation carrying an additional commentary text.
ID3{space-character} Non-standard extension for carrying ID3 tag data. Note the trailing space-character.

 

Tagging & Metadata

Searching online for information about metadata storage in AIFF files is challenging. Some commentators state that AIFF files cannot carry metadata. Others tell us that ID3 is incompatible with AIFF containers. Neither of these assertions is true.

AIFF is flexible and extensible and carries data chunks other than audio samples. This is self-evident with the use of AIFF in iTunes where artist, track information and cover-artwork will survive when moving AIFF files from one library to another.

Any arbitrary metadata can be stored in the text chunks within an AIFF file. Since that data does not affect the sound playback, it could be ID3 structured tag data.

There are three embedded metadata/tagging approaches suitable for AIFF files. Externally maintained metadata is always a possibility too:

  • AIFF standardized Native chunks - The AIFF specification describes Name, Author, Comment, Annotation, and Copyright text chunks. These are well supported by applications running on MacOS. The specification is completely open and available for other platform developers to implement.
  • ID3 tags - These are widely supported by many tools. Some tools are free and others have commercial fee-based licenses. All platforms are supported to some extent. You must use ID3v2 structured tag data in AIFF files.
  • XMP - The Adobe Extensible Metadata Platform (XMP) was devised for use in JPEG images but can also be used with AIFF files. Even though XMP has been standardized as ISO 16684-1:2012, the tools to support it are predominantly provided as proprietary solutions by Adobe.

There are occasional references online to an 'ID3 ' chunk (note the embedded space character). This is not mentioned in any AIFF specifications. Theoretically, it should not cause any problems because players are supposed to ignore chunks that they do not recognize. Non-standard chunks may not survive an edit cycle though.

Conclusion

Although AIFF is an old format, it is a good solution for storing uncompressed audio, especially for long-term archiving purposes.

Whilst this is an ancient and no longer actively developed format, files of this type will be widely used in libraries and archives. It has been sufficiently popular that media libraries will very likely continue to hold AIFF and AIFF-C files for hundreds of years into the future.

Part of a series supported by

You might also like...

Standards: Part 25 - Designing Client-Side Video Players

Here we chart the historical development of client-side video players, describe the building blocks used to create them and the relevant standards.

Microphones: Part 5 - The Variable Directivity Microphone

The variable directivity microphone is very popular for studio work. What goes on inside is very clever and not widely appreciated.

IP Security For Broadcasters: Part 7 - Operating Systems

As well as providing the core functionality of a computer, operating systems have the potential to be a primary issue for security and keeping hackers at bay.

The Creative Challenges Of HDR-SDR Simulcast

HDR can make choices easier - or harder - at every stage of production but the biggest challenge may be just how subjective those choices are.

Building Software Defined Infrastructure: What Is Software Defined Infrastructure?

We begin our new series by asking a simple question; what is Software Defined Infrastructure and why do we need it?