Microphones: Part 7 - Microphones For Stereophony

Once the basic requirements for reproducing sound were in place, the most significant next step was to reproduce to some extent the spatial attributes of sound. Stereophony, using two channels, was the first successful system.


Other articles in this series:


The human auditory system begins with a pair of ears that pick up two versions of any ambient sound. Some highly complex processing follows before we are allowed to hear anything. The processing concentrates on the differences between what the two ears pick up. For an off-center sound source, such differences include timing and frequency response.

The difference in timing can readily be explained by the difference in path length. The presence of the head causes shading effects to the distant ear that mostly affect high frequencies, as at low frequencies the head is acoustically small and essentially not there. The shape of the external ear also causes some direction-dependent frequency response shaping, again restricted to wavelengths small enough to be affected.

As the variety of sound the ear can detect is practically without limit, the HAS evolved to obtain the best directional information possible using a variety of mechanisms and selecting the ones that appear best using fuzzy logic. Direction sensing with low frequency sound is quite poor because little shading takes place on the distant ear and the phase difference between the two ears is small on account of the long wavelength. However, it is quite easy to determine the location of a double bass because of the presence of harmonics.

Sustained sounds, ones that have stationary statistics, are typically harder to locate than transient sounds. Some of this is down to basics. All cycles of a sinusoid are the same and the same inter aural phase difference could result from a number of time differences, causing ambiguity. A further problem with sine waves is that when reproduced from a loudspeaker, a sine wave will excite a pattern of standing waves in the room that make the sound appear to come from almost anywhere except the loudspeaker.

That is consistent with the fact that a pure tone has no bandwidth and thus has no information content and the HAS did not, could not, evolve to utilize them. Fortunately sine waves are very rare in naturally occurring sound. Everyday sound due to footfalls, closing doors, bats hitting balls and so on is transient and as a result contains timing information as well as a broad spectrum. As a transient is unique there is then no ambiguity in the difference in arrival time at the two ears.

Fig.1 - a) Both ears hear both speakers, but with time differences due to the geometry. b) A summing process takes place at both ears. The apparent source of sound is a function of the relative level of the signals supplied to the speakers.

Fig.1 - a) Both ears hear both speakers, but with time differences due to the geometry. b) A summing process takes place at both ears. The apparent source of sound is a function of the relative level of the signals supplied to the speakers.

The HAS accordingly has different amounts of success in localizing sound sources, depending strongly on their transient content. The wailing siren of the traditional emergency vehicle is practically free of transients and although it can be heard well enough, the listener cannot discern where it is, often until the vehicle is visible.

A completely unsuitable warning sound on an emergency vehicle that remains in use for decades simply reinforces the unfortunate conclusion that acoustics is a Cinderella subject about which practically nothing is known in the world at large.

One way of capturing a stereophonic sound image is to use a so-called dummy head that is essentially an acoustic replica of the human head fitted with two microphones that pick up essentially what a real person's ears would have heard. It is, however, important to appreciate that the binaural signals from the microphones can only be heard correctly using headphones, which deliver the sound from the microphones only to the appropriate ear. If reproduced on loudspeakers the realism will be lost because both ears can hear both loudspeakers.

Binaural signals intended for headphone reproduction will differ in time, phase, and frequency response and monophonic compatibility is poor. Signals intended for loudspeaker reproduction should differ only in level. These are called intensity stereo signals and mono compatibility is good. Binaural and intensity stereo are essentially two different and incompatible coding schemes, somewhat like TV standards, and interchange between them requires a form of standards conversion.

For related reasons, stereophonic sound intended for loudspeaker reproduction is generally unsatisfactory when heard on headphones. Although the tonal or timbral characteristic may be reproduced well enough, the directional or imaging information will not be and a central sound appears to be inside the head rather than in front of the listener.

One approach to the problem of stereophonic sound is first to consider how a pair of loudspeakers can give the illusion of directionality to the listener. Once that is understood, the characteristics of a suitable stereophonic microphone practically define themselves.

It has been found by experience that the best results are obtained when the loudspeakers and the listener are on the vertices of a more-or-less equilateral triangle. Such an arrangement can reproduce convincing virtual sound sources anywhere between the speakers, but nowhere else.

The central mechanism is that, unlike the use of headphones, both ears hear sound from both loudspeakers. Fig.1 shows that two summing processes take place, one at each ear. Those of us who remember NTSC will be familiar with the idea of creating a chroma signal of any desired phase by adding together in various proportions a pair of signals in quadrature. Something very similar happens at each ear. 

Fig.2 - The pan pot produces a pair of signals that differ only in level, and which steer a virtual source between the speakers.

Fig.2 - The pan pot produces a pair of signals that differ only in level, and which steer a virtual source between the speakers.

It can be seen that the geometry of the system causes sound from one speaker to be received at different times by each ear. The left ear finds direct sound from the left speaker and delayed sound from the right speaker, for example. If the level emitted from both speakers is the same, the summing process results in identical signals at each ear, which the HAS interprets as coming from a sound source half way between the speakers.

However, should one speaker be louder than the other, the summation processes will result in waveforms that are earlier in the ear on the louder side and later in the ear on the quieter side. The HAS interprets the time difference normally and places the sound towards the louder speaker.

It can now be seen why this system is called intensity stereo. The position of a virtual sound source is determined solely by the difference in level of a pair of signals that are otherwise identical. The fundamental linearity of audio systems means that many such pairs of signals can be superimposed to reproduce many different sound sources at one and the same time.

Fig.2 shows how a pair of intensity stereo signals can be created from a single microphone that places the virtual sound source anywhere between the speakers. This device is the panoramic potentiometer, universally abbreviated to pan pot, that will be found on most audio mixers.

Fig.3 shows how three different signals can be made to occupy three locations in a stereo mix using pan pots. Many pop records are made in this way. The virtual sound source from a pan pot has zero width, and between the virtual sources the sound stage is empty. The solution is to use artificial reverberation, which produces sound sources across the entire stereo image.

Fig.3 - A typical pop mix using pan pots to create a stereo sound stage.

Fig.3 - A typical pop mix using pan pots to create a stereo sound stage.

As a pan pot is simply a couple of variable resistors, it should be clear that all it can do is to change the relative level or intensity of the output signals. It cannot change the waveform or the timing. If instead of using mono microphones and pan pots we wish to capture stereophonic sound directly, then this reasoning tells us how to do it.

What we need is to produce a pair of signals. Obviously we need two microphones. The signals must not differ in timing, so the two microphones must be coincident, meaning acoustically they are in the same place. Equally obviously if the signals from two microphones in the same place are required to differ, that can only be achieved if they are directional and their axes are separated by some angle.

Alan Blumlein was the first to work this out prior to WW II. He also worked out that the phase differences between signals from a pair of omni-directional microphones could be processed in a device he called a shuffler (a kind of standards convertor) to create intensity stereo signals. All of Blumlein's work on stereo was patented and to use it required royalties to be paid.

As a result, broadcasters and record companies went to a great deal of trouble to try to find a microphone technique that did not violate Blumlein's patents in order to avoid paying royalties. Any number of these techniques was dreamed up, and they had four things in common. The first was that they could not use either coincidence or shuffling and the second was that their imaging performance was questionable. The third is that there is no theory that explains how they produce a virtual image and finally their mono compatibility is poor.

These royalty-avoidance techniques are best described as not-mono, having some spatial attributes, but failing to reproduce a recognizable image. With the mediocre loudspeaker technology of the day, their inferiority was somewhat disguised.

Broadcast Bridge Survey

You might also like...

HDR & WCG For Broadcast: Part 3 - Achieving Simultaneous HDR-SDR Workflows

Welcome to Part 3 of ‘HDR & WCG For Broadcast’ - a major 10 article exploration of the science and practical applications of all aspects of High Dynamic Range and Wide Color Gamut for broadcast production. Part 3 discusses the creative challenges of HDR…

The Resolution Revolution

We can now capture video in much higher resolutions than we can transmit, distribute and display. But should we?

Microphones: Part 3 - Human Auditory System

To get the best out of a microphone it is important to understand how it differs from the human ear.

HDR Picture Fundamentals: Camera Technology

Understanding the terminology and technical theory of camera sensors & lenses is a key element of specifying systems to meet the consumer desire for High Dynamic Range.

Demands On Production With HDR & WCG

The adoption of HDR requires adjustments in workflow that place different requirements on both people and technology, especially when multiple formats are required simultaneously.