Audio Levels - Part 5

There is level and then there is loudness. Neither can be measured absolutely, but by adopting standardized approaches it is possible to have measurements that are useful.

In digital audio a single sample has a numerical value that is discrete and therefore absolute. The precision of that single sample depends upon the wordlength. For example, in 16-bit audio there are 63,536 possible levels so the single sample is incredibly precise, far more precise than any human listener.

However, a single audio sample is totally meaningless, and audio only becomes meaningful when a series of such samples are taken from an audio waveform provided, perhaps, by a microphone. Those samples are measured at uniformly spaced points.

Were the audio waveform not band-limited before sampling, we would have no idea what the waveform did between samples and our samples might not represent the waveform uniquely. However, if we adhere to sampling theory there is only one waveform that can result from a series of samples, and that is the input waveform.

The original continuous waveform can be reconstructed from the samples by filtering. As the sampling instants come typically from a crystal oscillator, they are precise in time; far more precise than any real audio waveform will ever be. As a result, when an audio signal contains a recognizable pitch, or frequency, there is no fixed time relationship between the sample instants and the waveform to be sampled.

Fig. 1a) shows a contrived instance in which a sample coincided with an audio peak. Clearly the level of the resulting sample truly is the peak level of the waveform. Fig.1b) shows another contrived instance in which the waveform peak occurred exactly between two samples. Now the level of those samples is not the peak level of the waveform.

We are already aware that the PPM and the VU meter under-read audio level because of the finite time constants of their ballistics. As digital audio is band limited, we expect it to react to peaks with a finite time constant, so the under-reading of peaks is no surprise.

However, the under-reading of a digital meter based on sample values is of a completely different kind and order to the under reading of PPMs and VUs. For example, PPMs and VUs always under read the same amount on the same waveform, whereas the under-reading of sample values is statistical: dependent on the random timing relationship between the sampling clock and the audio waveform. 

Fig.1 - At a) the peak of a waveform coincides with a sample instant so the sample measures the true level. At b) the sample does not coincide with the peak and so it must under read the peak. In practice it is a minor problem.

Fig.1 - At a) the peak of a waveform coincides with a sample instant so the sample measures the true level. At b) the sample does not coincide with the peak and so it must under read the peak. In practice it is a minor problem.

The under reading of a digital meter is a function of frequency. In order to contrive the worst case under-read the sample phase must be controlled by making the frequency of the sampled waveform an integer sub-multiple of the sampling rate. The case of half the sampling rate will not occur because such a signal cannot pass the anti-aliasing filter, and only dogs, bats and hi-fi enthusiasts can hear it anyway.

A sine wave is one dimension of a rotation and Fig.2a) shows the case of a sine wave at one quarter the sampling rate, which is 12kHz in a 48kHz system. There are four samples per cycle and the worst case is where they are at +/- 45 degrees. The cosine of 45 degrees is 0.7, which the mathematically inclined will recognize as the reciprocal of the square root of two or - 3dB; the worst-case under-read.

Fig. 2b) shows the case of a sine wave at 6kHz, where the samples are 22.5 degrees from the peak and the under-read is a good deal smaller at 0.7dB. Fig.2c) shows the case of 3kHz where the samples at 11.25 degrees from the peak and the under read is less than 0.2dB. Down at 1.5khz, the samples are at 5.625 degrees from the peak and the under read is 0.04dB.

These examples are contrived by having the sampled waveform synchronous with the sampling clock. Were this not the case - if for example the sine wave had a frequency of 5.9kHz instead of 6.0 - the samples would slide through the waveform and some would be at the peak. If the meter had some finite decay time, the peaks would be held and there would be no under-read.

Nevertheless the spectrum of typical audio signals is dominated by low frequencies. A typical violin doesn't output much above 5kHz and a vocalist peaks at about 1.5kHz. The worst-case under-read is then inaudible.

Audio waveforms can exceed the level of the samples carrying them. The theory is correct. The practice is that the amount of the under-reading is so small and so infrequent on typical audio signals that the effects on everyday audio procedures are insignificant.

Given the logarithmic nature of human senses, there is generally no need for fanatical precision in audiovisual material. The exception is where litigation is involved, where a binary decision has to be made about whether or not some limit was exceeded.

Television sound went through a regrettable phase in which every psychoacoustic trick in the book was used to make a given commercial stand out from the rest of the output. The problem was, of course that when every commercial stands out, none of them do and the viewer gets irritated. Inevitably regulations had to be introduced and that is why the seemingly insignificant under-reading of a sample-value based level meter suddenly became important.

Where it matters, any under-read can be essentially eliminated by placing an oversampling filter between the source of samples and the meter. In the case of Fig.1b) the oversampling filter digitally simulates the analog reconstruction filter and replicates the peak of the waveform.

Fig.2  - A sine wave is shown as a constant angular velocity rotation so that sampling instants at various frequencies can be seen. The case of one quarter the sampling rate with 45 degree sampling phase is shown, at a) where the samples under read by 3dB. For one eighth of the sampling rate b) and 22.5 degree phase the under read is 0.7db and at one sixteenth c) it is a mere 0.04dB.

Fig.2 - A sine wave is shown as a constant angular velocity rotation so that sampling instants at various frequencies can be seen. The case of one quarter the sampling rate with 45 degree sampling phase is shown, at a) where the samples under read by 3dB. For one eighth of the sampling rate b) and 22.5 degree phase the under read is 0.7db and at one sixteenth c) it is a mere 0.04dB.

Such an oversampling level measurement forms part of a loudness meter by providing precise level information. Measuring loudness is not easy, especially as any regulation tends to result in loopholes being found.

As the response of human hearing is level dependent, some form of weighting is needed. The traditional A-weighting reflects hearing at its threshold where very low levels are concerned, so that was not appropriate. The B-weighting is more representative and the low-frequency curve of B-weighting was adopted to produce the LRB filter (Low frequency Revised B weighting).

The weighting was also modified to allow for low frequency shelving where sound diffracts around the head. Fig.3 shows the combined head effect and LRB curve, which is known as K-weighting.

Fig.3 – The combined head effect and LRB curve known as K-weighting.

Fig.3 – The combined head effect and LRB curve known as K-weighting.

The K-weighted sample stream, typically at 48kHz, is then oversampled to eliminate under-reading, and the samples are squared, which has the effect of rectifying them. Running averaging then takes place to produce a mean square figure, which is proportional to the sound power in the channel concerned.

This is important because the loudness of multiple sound sources can only be obtained by adding the power. All of the mean square values are summed to produce a combined mean square signal. In, for example, a five channel surround system, the front channels are added equally, whereas the two surround channels are given 1.5dB of gain since they likely to be facing the listener's ears.

If averaged over an entire audio segment, a single number results and this can be expressed in dB relative to full scale, so all real readings are negative. This is called the LKFS value for that audio segment. (Loudness, K-weighted Full Scale).

Clearly the LKFS value is not known until the entire segment has been processed. Where it exists as a file on a server, LKFS can be obtained by processing the file, but this is not useful in a live situation. Instead a moving average based on a three second window can produce a quasi-real time loudness reading. This is called short-term loudness of just S. Shortening the window to 400 milliseconds produces a reading somewhat like that of a VU meter, and this is called momentary loudness or M.

The measurement of loudness adopted had to ensure that there was limited scope for circumventing the measurement. For example, if simple averaging were to be used, a period of very low level in which there was short massive peak would result in the same average being calculated but with obviously different subjective loudness.

The solution is to prevent periods of low level from lowering the average. Two averaging processes are used. One of these is continuous and forms a reference. When the short-term loudness is more than 10db below the continuous average, the main averaging process simply stops; so quiet parts of the segment do not cause the LKFS value to measure lower.

Loudness information can also be processed to give an indication of the dynamic range of the loudness in a segment. This is comparing the highest loudness of the segment with extended periods of low loudness to produce a loudness range or LR reading.

You might also like...

HDR & WCG For Broadcast: Part 3 - Achieving Simultaneous HDR-SDR Workflows

Welcome to Part 3 of ‘HDR & WCG For Broadcast’ - a major 10 article exploration of the science and practical applications of all aspects of High Dynamic Range and Wide Color Gamut for broadcast production. Part 3 discusses the creative challenges of HDR…

IP Security For Broadcasters: Part 4 - MACsec Explained

IPsec and VPN provide much improved security over untrusted networks such as the internet. However, security may need to improve within a local area network, and to achieve this we have MACsec in our arsenal of security solutions.

IP Security For Broadcasters: Part 3 - IPsec Explained

One of the great advantages of the internet is that it relies on open standards that promote routing of IP packets between multiple networks. But this provides many challenges when considering security. The good news is that we have solutions…

The Resolution Revolution

We can now capture video in much higher resolutions than we can transmit, distribute and display. But should we?

Microphones: Part 3 - Human Auditory System

To get the best out of a microphone it is important to understand how it differs from the human ear.