The Sponsors Perspective: An Ambisonics Primer

Sennheiser examines the theory, implementation, and uses of the Ambisonic soundfield, and its important role in the immersive audio world.


This article was first published as part of Essential Guide: Immersive Audio Pt 2 - Immersive Audio Compatibility

Ambisonics is probably the original speaker-agnostic immersive format, and it’s been waiting a while for everyone to catch up. If you’re familiar with the Mid-Side microphone technique, that gives you an idea of how this format works - in those terms, ‘first-order’ Ambisonics is essentially a central omni-directional ‘’mid’ or pressure component (W), plus three different ‘side’ figure-of-eights: Back-front (X), orientated left-right (Y), and up-down (Z). These four signals make up the so-called ‘B-Format’ first-order Ambisonic format.

This is not an object-based format like Atmos. In fact, if you tried to split Ambisonics up into individual objects with position you would defeat one of its most useful features. All components, together, form the entire soundfield and are, as such, inseparable. However, it is a speaker-agnostic immersive format as it does describe a full 360-degree sound field without referencing speaker positions.

Because of the way this format stores the soundfield, it can easily be ‘decoded’ into any type of speaker set-up or number of speakers and panning and effects can be implemented directly in B-format, which maintains that speaker-agnostic status and explains its starring role in the upcoming 360-degree video boom - especially with live head tracking, which enables audio sources to effectively remain static in the space, while the video reflects the viewing angle. It can also be relatively easily encoded with environment and/or HRTF at the replay end if required to enhance the soundfield for headphones (see Essential Guide “Immersive Audio – Part 1” on binaural audio and the personalised HRTF).

The Sennheiser AMBEO A-B converter for transforming A-Format channels into B-Format Ambisonics components.

The Sennheiser AMBEO A-B converter for transforming A-Format channels into B-Format Ambisonics components.

Getting High

Higher Order Ambisonics (HOA) - anything above 1st order - effectively increases the number of ‘sides’ in our virtual Ambisonic microphone (except they’re no-longer figure-of-eights) - a mathematical idea termed Spherical harmonics. As you work your way up the ‘orders’ of Ambisonics, effective resolution of the sound field increases, the sweet-spot gets bigger, and the number of channels required goes up too: For second order Ambisonics you need nine channels, for third-order you need 16.

You don’t need a microphone to create and work with an Ambisonic sound field. There are plenty of Ambisonic panning and processing tools available for different platforms, DAWs, phones, and so on, including headphone encoders for working on Ambisonics when you don’t have the luxury of lots of speakers, along with head tracking options so you can effectively monitor your head-tracking-enabled VR 360 mixes.

Ambisonic audio is specified as an option for both MPEG-H Audio and for DTS-UHD, and therefore can be part of DVB-MPEG/UHD or ATSC 3.0 broadcasts.

Slightly confusingly, standard formats and files for higher order Ambisonics are rather fraught with variations, mainly because there are different options for the derivation and ordering of the spherical harmonic components. The main sequences are ACN and Furse-Malham (FuMa). ACN starts with WYZX for 1st order while FuMa starts with WXYZ. It’s important to be aware that of the potential for mixing up the order, which will definitely lead to a disappointing, or disorientating, Ambisonic experience. There are also different options for the normalisation of those components such as maxN (for FuMa ordering), SN3D, N3D. and more. Of the proposed file formats, AmbiX seems to be the most popular option and is scalable to any order. It uses ACN ordering, SN3D normalisation, and the core audio format (.caf) container.

YouTube and Facebook now support 360 video and Ambisonic audio and in fact there is a free software suite called Facebook 360 Spatial Workstation available for designing spatial audio for Facebook, also compatible with YouTube 360 spatial audio metadata. YouTube’s encoding process specifies the Spatial Media Metadata Injector.

Ambisonic Microphones

The standard way of recording Ambisonics has always been a tetrahedral array or cardioid capsules. This was first seen in the Soundfield Microphone, brought to market in the 70s by Calrec. More recently, a good number of tetrahedral array mics have come to market, made economically viable by the upsurge of interest in immersive audio and probably, in particular, the 360 video trend.

The Sennhesier AMBEO VR microphone for Ambisonic recording.

The Sennhesier AMBEO VR microphone for Ambisonic recording.

The raw audio from a tetrahedral array of cardioid microphones is normally termed ‘A-format’. This can then be transformed into the B-Format 1st-order Ambisonic components of W, X, Y, and Z.

The Sennheiser AMBEO VR microphone is one such product and fits into the Sennheiser AMBEO immersive technology landscape along with products like the free AMBEO Orbit plug-in for mixing various sources into binaural audio, plug-ins from it’s partner in VR, Dear Reality, the Neumann KU 100 dummy head microphone, and - for the end-user - the high-end Sennheiser AMBEO Soundbar.

The AMBEO VR microphone uses four matched KE 14 capsules and outputs four corresponding audio channels for the A-Format feed. It also comes with the A-B converter tool for getting the A-Format signal into a DAW in B-Format with various adjustments, such as FuMa or AmbiX ordering / normalisation, microphone position, and filters.

Ambisonic Potential

The rise of Ambisonics has been a long-time coming. The very fact that people are waking up to the advantages of speaker-agnostic immersive audio, and that the consumer now has the technology and every opportunity to experience it in many convenient forms, is driving this boost.

It fits very nicely into the grand immersive scheme along with object -based audio, channel-based beds with height, and with binaural audio for headphones, which is why it’s included in the MPEG-H Audio and DTS-UHD specs. A-format capture is well-suited to encoding into channel-based bed as well, so even if you didn’t want to include the raw Ambisonic channels, the techniques and technology can be the basis of a high-quality ambience feed for sports broadcast and so on.

Ambisonics should be a valuable part of your immersive audio toolbox. 

Supported by

You might also like...

HDR & WCG For Broadcast: Part 3 - Achieving Simultaneous HDR-SDR Workflows

Welcome to Part 3 of ‘HDR & WCG For Broadcast’ - a major 10 article exploration of the science and practical applications of all aspects of High Dynamic Range and Wide Color Gamut for broadcast production. Part 3 discusses the creative challenges of HDR…

IP Security For Broadcasters: Part 4 - MACsec Explained

IPsec and VPN provide much improved security over untrusted networks such as the internet. However, security may need to improve within a local area network, and to achieve this we have MACsec in our arsenal of security solutions.

IP Security For Broadcasters: Part 3 - IPsec Explained

One of the great advantages of the internet is that it relies on open standards that promote routing of IP packets between multiple networks. But this provides many challenges when considering security. The good news is that we have solutions…

The Resolution Revolution

We can now capture video in much higher resolutions than we can transmit, distribute and display. But should we?

Microphones: Part 3 - Human Auditory System

To get the best out of a microphone it is important to understand how it differs from the human ear.