Microphones: Part 10 - Mid-Side (M-S) Recording And Processing

M-S techniques provide useful sound-field positioning and a convenient way to check mono compatibility. We explain the hard science behind this often misunderstood technique.


This series of articles was originally published in 2021. It was very well read at the time and has continued to draw visitors, so we are re-publishing it for those who may have missed it first time around.

There are 11 articles in the series:


The use of a pair of loudspeakers driven by coincident microphones is also called intensity stereo, which is actually a good name because intensity is a vector quantity having a direction.

In stereophony we are trying to transmit directions as well as the timbral qualities of the sounds. Direction is transmitted as differences in level between the left and right signals corresponding to each sound. It follows immediately that a pair of signals carrying a stereo image must contain more information than two unrelated monophonic signals and must therefore be delivered with greater precision if that information is not to be impaired.

The absolute specification of the end-to-end audio channels is not as important and the requirement that they should be matched in gain, frequency response and phase response. With the advent of digital audio, making a pair of matched audio channels became trivial, whereas matched microphones were a bit harder and matched loudspeakers were harder still.

In acoustics, intensity is defined as the average sound power per unit of area flowing in a specific direction. Power is the product of pressure and velocity and is therefore a vector quantity. Fig.1a) shows a vector describing the intensity of sound leaving a source at one particular location, which happens to be where we put our stereo microphone.

In the left, right signal format the vector would be resolved into the components on the left and right axes as shown in Fig.1b), but the vector itself remains unchanged. It is the direction and length of the vector that matters, not the arbitrary co-ordinate system we built around it. Accordingly we can build any co-ordinate system we like around a vector and move between co-ordinate systems using transforms.

Fig.1c) shows an alternative co-ordinate system that has axes turned 45 degrees compared to the left, right format. The new axes are called M and S, which are abbreviations for Mid and Side. The M axis points to the center of the stereo image and the S axis is at right angles to M.

The transform between Left, Right and M, S is simple. M = L + R and S = L - R.

Fig.1 a) A sound described by a vector pointing away from the source along the direction of motion of the air particles, b) The vector can be resolved into left and right components for reproduction on loudspeakers, c) The vector can also be resolved into M and S components.

Fig.1 a) A sound described by a vector pointing away from the source along the direction of motion of the air particles, b) The vector can be resolved into left and right components for reproduction on loudspeakers, c) The vector can also be resolved into M and S components.

In the good old days, the summing and differencing could be done with transformers as in Fig.2a) whereas later it would be done with operational amplifiers as in Fig.2b). Now that everything is digital, the samples would be added and subtracted in software.

Fig.2 a) Conversion between L, R and M, S was traditionally done with transformers, b) The conversion can also be implemented with operational amplifiers.

Fig.2 a) Conversion between L, R and M, S was traditionally done with transformers, b) The conversion can also be implemented with operational amplifiers.

Returning to the L, R format for monitoring or transmission is just a matter of repeating the sum and difference process.

In practice it is necessary to compensate for the effect on level of adding left and right channels. If those signals have come from a pan pot, they will be coherent. If they have come from a coincident microphone they will be nearly so. Adding two coherent signals results in a level increase of 6dB, so in many cases reducing the level of M by that amount will do the trick.

In the case of spaced microphones, left and right will not be coherent and instead the signal powers add, giving a level increase of 3dB. It could therefore be argued that reducing the level of M by 3dB would be correct for spaced microphones. In practice it doesn't matter a whole lot. Ultimately when returning to L, R format all that matters is to use the correct gains in the inverse transform so that L and R do not suffer an inadvertent level change due to passing through the M, S domain.

Traditionally the color-coding for professional analog stereo signals has followed the same standard as maritime and aircraft navigation lights, namely red for port or left and green for starboard or right. This should not be confused with the consumer standard in which red represents right. M is represented by white and S by yellow.

Many two-channel digital formats, such as the AES/EBU interface and various recording formats have such good crosstalk performance that they can handle completely unrelated sounds on the two channels, such as a soundtrack in two languages, without difficulty. The two channels would usually be labeled A and B and if used for stereo the convention is that channel A would carry the left signal.

In the case that L = R, which is true for a central image of zero width, then L - R, which is S, must be zero. The case of S = 0 corresponds to the monophonic condition. In the M, S format, the M signal alone corresponds to the monophonic condition, in other words S = 0 = Mono.

Using a meter that reads the level of the S signal can be helpful because the size of the S signal gives an impression of how wide a sound source is.The opposite case, where there is only an S signal and M = 0, corresponds to the mono condition where Left and Right are out of phase. This is not good news, as the stereo listener will hear a peculiar image along with bass cancellation, whereas the mono listener will hear nothing. The fully left and fully right conditions correspond to M = S and M = -S respectively. Any time S gets bigger than M, there is an anti-phase component.

One can think of an M, S stereo signal as the M signal determining the timbre and the S signal steering the virtual image to the correct place in the sound stage. Given that S = 0 = mono, it should be reasonably clear that reducing the level of the S signal reduces the width of the reproduced image, which is the same thing as increasing the acceptance angle of the microphone.

That is one of the reasons for adopting the M - S format for production: it allows simple width control of the stereo image. Temporarily reducing the width to zero also allows a check to be made for monophonic compatibility.

Attentional selectivity, popularly known as the cocktail party effect, allows, for example, speech to be discerned in the presence of other noises and effects if the latter are in a different place in the stereo image. However, in the monophonic version, all of the sounds come from the same place and attentional selectivity is not possible. It follows that a perfectly intelligible stereo sound mix might become unintelligible in mono, hence the need for compatibility checks.

Reducing S gain increases the acceptance angle and so narrows the reproduced image, and one would expect that increasing the S gain would have the opposite effect. That is true if the original stereo image is narrow, but if the stereo image already fills the sound stage between the speakers, increasing S gain will not make it any wider. Instead it makes S greater than M, which is an anti-phase condition. Increasing S level should be done with great care.

Trying to zoom in to a distant sound source using S gain generally will not work, as increased S gain amplifies ambience, which then masks the wanted sound. The only condition in which it does work is in anechoic surroundings where there are no sound sources other than the wanted source. Such conditions are rarely met in real life.

Fig.3 A crossed-8 M, S microphone. The axis on which the M microphone produces its best quality is directed to the center of the sound stage

Fig.3 A crossed-8 M, S microphone. The axis on which the M microphone produces its best quality is directed to the center of the sound stage

If it is essential to capture a distant sound source the only possibility is to use a highly directional rifle or shotgun microphone, which is effectively producing an M signal directly. Instead of using a conventional co-incident microphone and creating M, S signals in a processor, there are some advantages to building a microphone that generates M, S signals from the outset.

The S capsule will always be mounted sideways and have a figure-of-eight polar diagram. The M capsule can have a number of different directivities according to the application. Fig.3 shows a crossed-eight M, S microphone, which illustrates that the L, R, to M, S transform is simply a rotation of the co-ordinate system through 45 degrees.

It can be seen that the fully left and right conditions are where M = S and M = -S and that any time S is bigger than M there is an anti-phase condition. In an ideal world with ideal microphones it would make no difference whether the microphone was physically L, R or M, S. In the real world microphones have non-ideal directivity and the M, S configuration puts a central sound source on axis to the M capsule in which location the best frequency response will be obtained.

Given that much television has central sound sources such as reporters, newsreaders and so on, the popularity of the M, S microphone for television use is unsurprising. The poor news-gathering videographer has no control over the location and in the presence of emergency sirens, chanting demonstrators and helicopters the only hope of capturing intelligible dialog is to use a highly directional microphone. If that is the M capsule of an M, S microphone, a small amount of S can be faded up to give an impression of what is happening nearby without drowning the dialog.

You might also like...

HDR & WCG For Broadcast: Part 3 - Achieving Simultaneous HDR-SDR Workflows

Welcome to Part 3 of ‘HDR & WCG For Broadcast’ - a major 10 article exploration of the science and practical applications of all aspects of High Dynamic Range and Wide Color Gamut for broadcast production. Part 3 discusses the creative challenges of HDR…

The Resolution Revolution

We can now capture video in much higher resolutions than we can transmit, distribute and display. But should we?

Microphones: Part 3 - Human Auditory System

To get the best out of a microphone it is important to understand how it differs from the human ear.

HDR Picture Fundamentals: Camera Technology

Understanding the terminology and technical theory of camera sensors & lenses is a key element of specifying systems to meet the consumer desire for High Dynamic Range.

Demands On Production With HDR & WCG

The adoption of HDR requires adjustments in workflow that place different requirements on both people and technology, especially when multiple formats are required simultaneously.