The Sponsors Perspective: The Personal HRTF - An Aural Fingerprint

HRTF stands for Head Related Transfer Function and, simply put, is a catch-all term for the characteristics a human head imparts on sound before it enters the ear canal. Everything from level tonal changes caused by our head, shoulders, and pinna (external ear parts), to arrival-time differences (Interaural Time Difference, or ITD) between the two ears have an effect on our perception of the direction and distance of sources.


This article was first published as part of Essential Guide: Immersive Audio Pt 1 - An Immersive Audio Primer

It’s a concept that explains the necessity of headphones with binaural sound, for example. That is, if you record a source by sticking two microphones in your ears, that recording will incorporate your HRTF, both considering the direct sound and room reflections. If you then play that back through speakers, the HRTF effect becomes a disadvantage because of pronounced coloring and new room reflections clashing against the recording. The source would have to be replayed through headphones to avoid the HRTF effect being imparted a second time.

A Place of Your Own

Part of the issue with immersive audio reproduced through speakers in a space is the effective localization of sources within that space. With object-based reproduction or with wave field synthesis you can approximate actual source position, but in the end it all gets injected into your ear canal after processing by your own HRTF. Therefore, a binaural source over headphones should be capable of producing the ultimate immersive experience.

However, everyone has their own personal HRTF. Our aural perception filter is as personal as a fingerprint. A generic binaural signal such as might be recorded with a ‘dummy head’ microphone will be a good approximation, but to a certain extent it will always be like looking through someone else’s spectacles.

What if you could easily measure and define your own HRTF? That could then be used by rendering engines to produce a personalized binaural feed from any source – including the most extreme object- and scene-based immersive formats. Set-top boxes, sound cards, games, mixing console monitoring sections, and DAWs could all incorporate rendering engines based on personalized HRTFs.

Enter SOFA

The SOFA file, or ‘Spatially Oriented Format for Acoustics’, is a general-purpose file format for storing spatial acoustic data, standardized by the AES as ‘AES69’. The data does not only have to be a HRTF but could be applied to a specific listening position in a room or for modelling a full acoustic response of a concert hall at various positions, for example.

The data is made up of multiple impulse responses –a representation of how a given input is changed at an output. In the case of measuring HRTF, each impulse response represents a measurement for each ear, from a particular direction that is defined with elevation and azimuth. Therefore, to measure an HRTF with microphones you need to take enough responses to adequately represent the full source sphere around a test subject.

How many responses is enough? Well, this method of modelling and quantifying HRTFs is not new and the University of California, Davis’ CPIC Interface Laboratory’s HRTF Database has been in existence for some time with a compiled library of HRTFs where each one is made up of 1250 directional readings for each ear of the subject. However, numbers of readings in the 200 region are more common, such as for the Listen Library, which was a joint project between microphone and headphone manufacturer AKG, and IRCAM (Institute for Research and Coordination in Acoustics/Music).

Aural ID

Thankfully, an alternative to sitting in an anechoic chamber for several hours is here… Genelec recently announced its new Aural ID process for modelling an individual’s HRTF and compiling that into a SOFA file that does not involve sticking microphones in your ears.

The idea is to create each model from a 360-degree video of the head and shoulders of each customer that can be acquired simply on a high-quality mobile phone.

Simplified HRTF: a couple of HRTF aspects that help determine source direction. HRTF is more complicated than this though, as it uses the entire upper torso and acts in three dimensions where both angle and azimuth are relevant.

Simplified HRTF: a couple of HRTF aspects that help determine source direction. HRTF is more complicated than this though, as it uses the entire upper torso and acts in three dimensions where both angle and azimuth are relevant.

That video is uploaded to the Genelec web-based calculation service, which builds a virtual 3D model, including especially detailed modelling of the pinna. This model is put into a full wave analysis of the HRTF using lots of virtual sources from many angles, which in turn generates the full HRTF data and the SOFA file.

Once you have your own personal HRTF data, a rendering engine can personalize any sound reproduction specifically for your headphones, bringing stereo and immersive content straight to your ear canals, and missing out those pesky monitors.

Of course, the monitors themselves, the room they are in, head movements, and other people listening with you have such a significant effect on a social listening experience that Aural ID is unlikely to spell the end of monitors just yet (something Genelec is no doubt pleased about), but this technology does have some significant practical applications and advantages in both consumer and professional worlds.

Immersive games should get a big reality boost for a start, and if mixing on headphones is necessary, it won’t be such a hit-and-miss affair if your DAW or console headphone output can model stereo, surround, and immersive experiences comparable to loudspeaker reproduction at the touch of a button.

The Aural ID service should be available from Genelec very soon.

The SOFA file format is already in use in game development and is specified as the format of choice for Steam Audio from Valve Corporation, for example - a solution for developers that integrates environment and listener simulation.

Personalized HRTFs can be loaded into the Unity, FMOD, Unreal, and C environments, so expect to be able to load you Aural ID into your favorite VR game in the not-too-distant future...

A Head Related Future

In the creative space, you could argue that awareness of HRTF and its effects could inform mixers and engineers to an extent, particularly in narrative audio and effects for film and TV, for example. But because of the issues around headphones versus monitors, and the complications in generating content for every eventuality, history has generally settled for ignoring the HRTF principals, choosing to mix on monitors and leave everything else to take care of itself. Binaural productions have tended to be niche products because translation has been best assured using in-room monitoring.

However, listening habits are changing and more people are putting on headsets and consuming content as a personal experience. Real time rendering of a binaural experience from immersive source material is already happening and will be completely relevant to how we approach broadcast audio production in the future.

Supported by

You might also like...

IP Security For Broadcasters: Part 1 - Psychology Of Security

As engineers and technologists, it’s easy to become bogged down in the technical solutions that maintain high levels of computer security, but the first port of call in designing any secure system should be to consider the user and t…

Demands On Production With HDR & WCG

The adoption of HDR requires adjustments in workflow that place different requirements on both people and technology, especially when multiple formats are required simultaneously.

NDI For Broadcast: Part 3 – Bridging The Gap

This third and for now, final part of our mini-series exploring NDI and its place in broadcast infrastructure moves on to a trio of tools released with NDI 5.0 which are all aimed at facilitating remote and collaborative workflows; NDI Audio,…

Designing IP Broadcast Systems - The Book

Designing IP Broadcast Systems is another massive body of research driven work - with over 27,000 words in 18 articles, in a free 84 page eBook. It provides extensive insight into the technology and engineering methodology required to create practical IP based broadcast…

Designing An LED Wall Display For Virtual Production - Part 2

We conclude our discussion of how the LED wall is far more than just a backdrop for the actors on a virtual production stage - it must be calibrated to work in harmony with camera, tracking and lighting systems in…