Microphones: Part 3 - Human Auditory System

To get the best out of a microphone it is important to understand how it differs from the human ear.

This series of articles was originally published in 2021. It was very well read at the time and has continued to draw visitors, so we are re-publishing it for those who may have missed it first time around.

There are 11 articles in the series:

Superficially a microphone resembles an ear. There is a diaphragm that vibrates with the sound in both cases, and some sort of transducer, but that's as far as it goes. The human auditory system (HAS) follows the transducer with a highly evolved and sophisticated signal processor, which the microphone simply doesn't have.

The HAS is an integral part of the human senses and evolved as a survival tool in humans and other animals alike. It is the nature of evolution that the HAS as we now find it is the best-adapted system for survival. Living things with inferior hearing would have a reduced chance of survival.

Long before speech or music evolved, the function of the HAS could be defined relatively simply as extending consciousness into the realm of sound. Consciousness is what defines a living being. It is no more and no less than an awareness of the surroundings and the ability to make and execute decisions that improve the chances of survival. Avoid threats; find sustenance and a mate.

From a survival standpoint, the most useful information that might be gained from a sound is to establish the place where it was generated and the likely size of the source. In the real world, a genuine sound source is seldom heard alone. Real sound sources may generate sound in all kinds of directions and the listener is in the position shown in Fig.1, in which reflections approach from many directions, none of which is the true direction.

Establishing the direction and preventing the reflections causing confusion is done by one and the same mechanism, which is evolutionarily speaking extremely old. This mechanism necessarily works in the time domain.

In vision we are accustomed to the idea of our peripheral vision looking out for any movement that would indicate a change to our surroundings, be that a threat, a benefit or neither. Movements also result in sounds, so it is not unreasonable to augment peripheral vision with sonic information.

Fig.1 - In the real world any sound source will be accompanied by reflections that come from anywhere but the true location of the source.

The majority of sounds directly resulting from movement are transient in nature. The slamming of a door; the breaking of a twig underfoot; the horseshoe against stone. There is no periodicity in a single event and the concept of frequency or pitch does not exist in the transient with which a new sonic event begins. Much later the structure of whatever received the blow reveals itself in the existence of resonances. In most cases the resonances are incidental. Only in the case of musical instruments are they deliberate and controlled.

Given the two-stage creation of real sounds, it is hardly surprising that the HAS evolved a two-stage system of dealing with them. The first stage works on the time of arrival of transients. The HAS employs two ears that are spaced apart by the head. A sound from dead ahead would arrive at both ears simultaneously. A sound arriving at an angle would reach one ear sooner. Reflected sounds must have traveled by a longer path and will arrive later. Reliance on the first version of a new sonic event is called the precedence effect.

The HAS contains memory and is capable of inserting controlled delays in the signals in order to find those that result in strong correlation. For example in a real acoustic environment an earlier acoustic event will continue to reverberate for some time. The correlation process will recognize the reflections. However a new sonic event will not correlate with anything that has gone before, but there will be correlation between the signals from the two ears when the appropriate delay, of less than a millisecond, has been found corresponding to the angle of arrival.

A reflection from the new sonic event will be correlated with the first version when the appropriate delays have been found.

Fig.2a) shows the timing of the direct sound and two early reflections at the ears whereas Fig.2b) shows the delay structure created in the HAS such that the six different versions of the sound all come out at the same time and can be added up to form one coherent sound.

Fig.2 - At a) direct sound and two reflections result in received sound having three different timings. At b) the Haas effect as if by magic time-aligns the direct sound and the reflections to allow it to be heard clearly. The poor microphone can't do that.

This mechanism was first explained by Helmut Haas in about 1949 and is known as the Haas effect in his memory. The sound energy in early reflections is time aligned and used to augment the direct sound and is not heard as a reflection.

The HAS is also equipped to concentrate on sounds arriving from a particular direction by inserting a suitable delay in the leading ear such that the sound from the trailing ear time aligns. This is the basis of attentional selectivity, also known colloquially as the cocktail party effect, until someone concludes that such a term is not politically correct.

The HAS is thus well equipped to function in reverberant environments without losing the ability to locate sound sources and with relatively undiminished intelligibility. The poor microphone doesn't have any of this signal processing ability and in comparison with the HAS it is primitive. One of the key things to remember about microphones is that they don't hear as well as we do because they are unable to distinguish between the real sound and a reflection and they are unable to concentrate on a wanted sound. In order to use microphones effectively it is necessary to learn how to listen as they do and then to compensate if the result isn't good enough.

One learns as a general rule to keep microphones away from reflecting surfaces, especially if such surfaces are hard and flat and produce specular reflections. Tabletops are notorious reflectors and there are some interesting solutions to the problem. One of these is the pressure-zone microphone, which is intended to mount essentially flush with a tabletop.

Mounting the microphone at the reflecting surface means that there can be no time delay between the sound arriving and the reflection. As far as the microphone is concerned there is no reflection to damage the time-domain waveform. Being located where the reflection takes place, the pressure zone microphone experiences a 6dB boost in level where the arriving sound and the reflected sound add up coherently.

Fig.3 In difficult locations it may be better to aim the null of the microphone at the unwanted sound.

Spaces used for recording purposes may receive acoustic treatment such that reflections may be beneficial, but such solutions are not available out in the real world, where ignorance about acoustics and hostility to the microphone are almost total.

In a real room, a source of sound builds up a sound field that is the result of countless reflections in countless directions. That reverberant sound field doesn't differ much in level over large parts of the room. On the other hand the level of the direct sound from the actual source falls with distance according to the inverse square law.

The relationship between the amount of direct sound and the amount of reverberant sound depends on two inter-related factors: the position of the microphone and its directivity. For a given microphone type, moving the microphone physically towards or away from the sound source changes the mix between direct and reverberant sound. For a given microphone location, changing the directivity has the same effect.

Using an omni-directional microphone as a reference, it can be shown by calculation or measurement that cardioid and figure-of-eight microphones pick up only one third as much reverberant sound. This is known as their random energy efficiency (REE). In a given location a directional microphone will sound drier than an omni.

However, it can also be shown by calculation or measurement that the directional microphone can be made to display the same relationship between direct and reverberant sound simply by moving it further away from the source than the omni microphone. The source-to-microphone distance is increased by the distance factor (DF) which for cardioids and figure-of-eights is 1.7.

The frequently heard claims regarding the superior performance of the omni-directional microphone are not based on any science, as directional microphones can offer exactly the same balance between direct and diffuse sound simply by siting them at an appropriate distance. As will be seen, there is much to be said for siting microphones at a reasonable distance when it is possible, and little to be said for having them too close.

Fig.4 The cardioid microphone in a live performance is deaf to the fold-back speaker behind it.

There are too many applications of microphones to make generalizations. In music recording in an auditorium optimized for the purpose, picking up reverberation will enhance the recording. On the other hand trying to interview someone who has just stepped off a train represents a completely different problem, where ambient sounds are not in any way correlated with the wanted speech and must effectively be suppressed.

In such cases the directivity of the microphone can be used to good effect. Fig.3 shows that directional microphones often have nulls in their polar diagrams. Instead of pointing the microphone at the wanted sound source, better results may be obtained if the null of the microphone is aimed at the unwanted sound source.

Fig.4 shows why the cardioid microphone has had universal success in live performances. The null in the response at the rear of the cardioid allows the fold back speaker for the performer to be placed in line with the back of the microphone, which will not be sensitive to it.

You might also like...

Phil Rhodes Image Capture NAB 2025 Show Floor Report

Our resident image capture expert Phil Rhodes offers up his own personal impressions of the technology he encountered walking the halls at the 2025 NAB Show.

The DOP As Sound Recordist: 32-BIT Float Is Our Godsend

As a cinematographer with several decades of experience on feature films and large broadcast projects, my current work on smaller productions and documentaries has increasingly added the duties of a sound recordist, and with it a greater appreciation for 32-bit…

Microphones: Part 9 - The Science Of Stereo Capture & Reproduction

Here we look at the science of using a matched pair of microphones positioned as a coincident pair to capture stereo sound images.

Monitoring & Compliance In Broadcast: Monitoring Cloud Networks

Networks, by their very definition are dispersed. But some are more dispersed than others, especially when we look at the challenges multi-site and remote teams face.

Audio At NAB 2025

Key audio themes at NAB 2025 remain persistently familiar – remote workflows, distributed teams, and ultra-efficiency… and of course AI. These themes have been around for a long time now but the audio community always seems to find very new ways of del…