Microphones: Part 1 - Basic Principles

This 11 part series by John Watkinson looks at the scientific theory of microphone design and use, to create a technical reference resource for professional broadcast audio engineers. It begins with the basic principles of what a microphone is and does.

This series of articles was originally published in 2021. It was very well read at the time and has continued to draw visitors, so we are re-publishing it for those who may have missed it first time around.

There are 11 articles in the series:

The microphone is a transducer, which means that it converts energy from one form to another. Specifically, acoustic energy is converted to electrical energy. Unlike a loudspeaker, that has to transduce significant amounts of power in order to generate sound, the microphone has it easier because the amount of power being handled is tiny. It is probably better to think of the microphone as transducing information rather than power.

It follows immediately that the average microphone is likely to be rather better than the average loudspeaker. Most people make recordings having quality that they will never hear, because their loudspeakers won't be as good. Present someone with a new microphone and the first thing they will do is to listen to it through their mediocre loudspeakers, in the mistaken belief that they are testing it. Some aspects of microphone performance, such as phase linearity, will have to be pretty terrible before any problem will be heard because the loudspeaker is typically so much worse.

There are a number of parallels between cameras and microphones that are helpful. For example the Human Visual System (HVS) is superficially similar to a camera in that it has a lens and an iris and so on, but human vision does not work like a camera nor does it see like a camera. Successful camera operators know the differences and can compensate for them.

In the same way the Human Auditory System (HAS) is superficially like a microphone in that there is a diaphragm, but human hearing does not work like a microphone and successful microphone use requires knowledge of the differences.

Cameras may have different fields of view and microphones have different acceptance angles, although in the case of the microphone no sound source is ever out of the frame.

Where the parallel breaks down is that one can see mistakes made with cameras and potentially deal with them, whereas one cannot see sound and more mistakes tend to be made. As the human is primarily a visual animal, anything that cannot be seen is likely to be misunderstood and become a source of fear and poor decision-making. Sound shares some of the attributes of nuclear power, electricity, global warming and, topically, viruses in that much of what is said and done in connection with those subjects is highly questionable.

We are all aware of buildings that look nice but are acoustic nightmares. On the other hand, the popularity of preserved steam locomotives is assured because most of the workings are clearly visible.

I was lucky enough to study under one of the world's leading acousticians, Philip Doak. He had been thrown out of MIT by McCarthyism and found a home in England. Their loss was my gain. Doak would regularly refer to acoustics as a "Cinderella subject", one that was mostly noted for its neglect. He was right then and what he said remains true.

Fig.1 - Sound progresses as a series of compressions and rarefactions depicted here by lines getting closer and further apart. There is movement shown by the arrows as well as pressure change and either or both can be sensed. Pressure is non-directional whereas velocity has associated direction.

Ultimately the goal of the microphone is to be part of a system that to some extent takes the human listener to the source of the sound. In some cases we want to increase the extent, to achieve realism for want of a better word. In other cases we may simply want to improve the intelligibility of a message without being bothered by fidelity. In both cases it is necessary to consider not just how the sound interacts with the microphone and its surroundings, but also how the same sound would have interacted with the HAS, for the two are rather different.

Sound is carried by variations above and below ambient pressure excited by some moving object. The wave fronts are longitudinal, meaning that most things happen along the direction of propagation, as opposed to the transverse waves in a guitar string, for example. As Fig.1 shows, at a point where the pressure has risen, air is moving inwards towards the high pressure. Sound therefore consists of both pressure variations and physical movement of the air.

Microphones can transduce either or both. One important point to note is that pressure is a scalar quantity: it is not associated with any direction. The pressure in a party balloon is acting outwards in all directions equally. It follows that any microphone that measures only pressure cannot help but respond to sound arriving from any direction. The term is omni-directional.

On the other hand, velocity is a vector quantity; it has a magnitude and an associated direction. Any microphone that is in some way directional must be taking notice of the velocity of the air. If they measure nothing else, they are called velocity microphones. Velocity microphones and pressure microphones represent the extremes of a scale and between them can be found all manner of devices that respond to some combination of both.

That combination process ought to be done extremely well if something else is not to suffer. Unfortunately in many cases the combination is not done well, which explains some of the preference of many sound engineers for the realism of omni-directional and velocity microphones. Unfortunately it is simply not possible for a pair of omni-directional microphones to produce a realistic stereophonic image because the necessary directionality is simply not there.

Fig.2 - The ideal omni microphone has a circular polar diagram shown at a). Output of a velocity microphone b) shows a cosine effect with direction. At c) the effect of b) is transformed to polar co-ordinates and results in the familiar figure-of-eight response.

About the most important aspect of any microphone is its polar diagram. A polar diagram resembles the display of a vectorscope used in video production and is a kind of graph having a center point or origin where the distance from the origin to the trace is proportional to something that is a function of the angle. In a microphone it will typically be the output level as a function of angle.

Fig.2a) shows that the polar response of an ideal omni-directional microphone is circular.

Fig.2b) shows the response of a velocity microphone in Cartesian co-ordinates. When the diaphragm is parallel to the wave fronts, the level is maximized, but when the diaphragm is turned at 90 degrees to the wave fronts, the air moves across the diaphragm and doesn't affect it, so the response is nulled. Further rotation causes the response to have opposite polarity.

Fig.2c) shows the velocity microphone response of b) converted to a polar diagram. Immediately it can be seen why these devices are often called figure-of-eight microphones. As the polar diagram cannot display negative quantities, the reversed polarity area carries a minus sign.

The description of sound as pressure and velocity variations is correct but insufficient as it does not consider the requirements of the HAS. Hearing evolved in all life forms long before speech emerged. The purpose of hearing was survival and it had two benefits: the improved avoidance of threats and the increased chances of locating food and/or a partner.

On an evolutionary time scale, the development of telephones and audio systems has occurred in the last microsecond and the results of all of this technology are essentially being absorbed by the auditory system of a cave man.

The overriding requirements of a survival system based on hearing is to be able to locate the source of the sound, and to estimate the size of the sound source, followed by an attempt to identify it if the sonic signature corresponds to something remembered from a previous encounter. The HAS can do all of these things, and how it does it will be considered in due course.

In the real world sounds reaching the human ear and the microphone consist of a mixture of the direct sound from the source, muddied by any number of reflections. Our cave man would not benefit by running away from a reflection, so the HAS evolved to be able to determine where the true sound source was despite reflections.

Regrettably, a microphone does not have the same ability. The microphone is just a transducer. It may be hearing after a fashion, but it's not listening. This means that in many locations where a human listener can hear adequately well, a microphone will be overwhelmed by reflections, because it doesn't know which is the sound it is meant to capture.

It follows that the successful use of microphones requires the user to listen to the location not as a human, but instead to pretend to be a microphone and listen to what the microphone would hear. If the additional sound that the HAS would reject is more than the microphone could cope with, something will have to be done, either by moving to a different location or by installing some acoustic treatment.

Although the HAS allows us to concentrate on a wanted sound and reject reflections, we are still aware of them and they allow us to draw conclusions about the acoustic of the space we are in. That aspect of sound is generally known as ambience, or air. It follows that if we go too far and shut out that ambience, the resulting sound is going to be dry and unnatural. It may be necessary artificially to add reverberation to a dry recording to make it sound acceptable.

You might also like...

Standards: Video - High Efficiency Video Coding (HEVC)

Designed to halve the bitrate of AVC while supporting resolutions up to 16K, HEVC represents a significant leap in video coding efficiency. This guide explores its profiles, tiers and levels, and examines whether it can overcome the challenges of entrenched…

Production–Delivery Convergence: Part 6 - Designing Experiences That Viewers Trust

Performance reliability is an invisible contract between a streaming service and its customer, and it is fundamental to guaranteeing viewer retention. The problem is that performance isn’t just about delivery. Here we identify where to look and why it’s c…

SMPTE Education Launches Summer 2026 Lineup Of IP And ST 2110 Courses

Boasting two standalone courses, an intensive boot camp, and a hands-on practical lab, SMPTE Education has launched its summer 2026 Lineup of IP and ST 2110 Courses.

Virtual Production For Broadcast: Principles, Terminology & Technology

The technology and techniques of virtual production, from the camera back through the video wall, processors, and rendering servers.

Standards: Video - Advanced Video Coding (AVC)

AVC remains one of the most widely deployed video codecs in the world, but navigating its profiles, levels and signaling mechanisms is far from straightforward.