Digital Audio: Part 6 - Noise Shaping

Noise shaping performs an important role in digital audio because it allows hardware to be made at lower cost without sacrificing performance, and in some cases allowing a performance improvement.

There seems to be two categories of noise shaping in digital audio. The first one shapes noise primarily within the audio band, does not require oversampling and is designed using psychoacoustics. The second does require oversampling, works over a wide frequency band and relies on information theory rather than on psychoacoustics.

In most audio production processes, the performance of the production equipment traditionally exceeds the performance of the final delivery format. When making a recording, one is never sure what the actual sound level is going to be, so some headroom is needed to prevent clipping, that will not be necessary after production.

A recording of production word length may pass through some processing stages, such as EQ and mixing, which in the digital domain results in word length extension. The problem is then how to reduce the word length to match the requirements of the final delivery medium.

One way of reducing word length is to simulate in the digital domain the operation of a dithered ADC of the target word length. Digital dither is added to the input samples at the appropriate level and these samples are then quantized to the target word length. The dithering process linearizes the quantization and simply raises the noise floor to a level that would have been present if an ADC of the target word length had been used in the first place.

Fig.1 - The quantizing error due to shortening or truncating word length is fed back through a perceptual filter, shaping the noise floor to suit human hearing.

Fig.1 - The quantizing error due to shortening or truncating word length is fed back through a perceptual filter, shaping the noise floor to suit human hearing.

A conventional digital dithering system produces a flat noise floor, meaning that the average noise level is the same at all frequencies. For audio applications this is not optimal, because the response of the human auditory system to noise is not flat. If the system is good enough that the noise floor is low, then we can assume the human sensitivity to noise will follow the same curve as our threshold of hearing.

Whilst the noise power that is present is a deterministic function of the word length in use, and nothing can reduce it, what we can do is to alter the spectral distribution of the noise power without changing the total power. As the threshold of hearing rises at low and high frequencies, if the noise power is diverted away from the mid-range into the ends of the audio spectrum the noise level will appear to have gone down to a human listener.

Fig.1 shows how it's done. The word length reduction process calculates the quantizing error due to shortening a given sample, and feeds it back through a filter to be added to subsequent samples in such a way that the error is minimized. If the filter is designed to have a peak in the mid-range of human hearing, the noise due to shortening the word length will be reduced in the mid range, because the filter is in a feedback loop.

It is important not to push the idea too far, as incorporating the complete threshold curve of the HAS would produce excessive noise levels at low and high frequencies. The filtering is constrained to prevent that happening, with very little loss of performance.

When applied correctly, a subjective improvement of two to three bits is obtained. The first widespread application of the technique was in the Compact Disc, whose performance was effectively raised from a 16-bit to an 18-bit medium. In 16-bit form the medium was already lower in noise than any listening environment a consumer might reasonably have, so in 18-bit form it was essentially blameless.

Fig.2 - If the noise floor of an oversampled convertor is not flat, a disproportionate reduction in noise occurs when the sampling rate is returned to normal.

Fig.2 - If the noise floor of an oversampled convertor is not flat, a disproportionate reduction in noise occurs when the sampling rate is returned to normal.

The idea of using feedback around a convertor can be extended if oversampling is used. If the convertor noise floor is not flat, as shown in Fig.2, but is arranged to rise at high frequencies, most of the noise energy is then outside the audio band and upon decimating down to the required sampling rate the lowering of the in-band noise floor can then be much greater.

The improvement in the noise floor can be enough that alternative convertor architectures that previously did not have enough resolution can be used. These may allow a reduction in cost. One of the difficulties in digital audio conversion is in making every quantizing interval exactly the same. Noise shaping helps with that goal, because the physical quantizer is relatively coarse and the finer quantizing is done by signal processing that can be arbitrarily accurate. In addition to costing less, a noise-shaping convertor might also offer excellent performance.

The relaxed requirements of the physical convertor mean that it can be implemented using the flash convertor architecture widely adopted in digital video, which works at high speed but is limited in dynamic range.

When designing systems that are to be implemented in an integrated circuit, the constraints are somewhat different to those of a discrete circuit. In discrete circuitry, complexity raises cost, whereas in an IC complexity is nearly irrelevant, especially if it is in digital processing. IC based convertors often go to great computational lengths to eliminate external components such as capacitors and inductors, which add to the cost.

Fig.3 - The Sigma-DPCM convertor is a DPCM convertor with an integration stage. It has a noise floor that rises with frequency.

Fig.3 - The Sigma-DPCM convertor is a DPCM convertor with an integration stage. It has a noise floor that rises with frequency.

The accuracy requirements of a digital audio convertor put stringent tolerances on a number of critical components. Tight tolerances are an enemy of mass production, since they require extra production steps or reduce yield and an architecture that delivers the same performance using lower tolerance parts may be preferable, even if the complexity is increased.

In a conventional ADC, each sample is quantized independently, and is intended to take no notice of any previous conversion. Alternative convertor architectures take a different approach. A differential coder quantizes the difference between successive samples. The range of the quantizer is small compared to the range of the audio signal. The quantizer is kept within its operating range by a feedback loop that runs from earlier conversions so the quantizer is shifted up or down as the input varies.

This convertor can be thought of as a multi-bit version of the classic delta modulator that has a one-bit ADC. As the difference between one sample and the next is calculated, differential PCM (DPCM) data can be converted to PCM by a simple digital integrator.

The DPCM convertor has a constant noise floor that comes from the quantizing steps. On the other hand the differential action means that as the input frequency rises the differences between successive samples must get larger until the greatest possible difference is reached and the convertor clips. The noise floor may be flat, but the available amplitude falls with frequency. Obviously the higher the clock rate the better the performance could be.

Fig.3 shows the result of a simple rearrangement of a DPCM convertor that has the effect of integrating everything. The result is that the signal amplitude is now independent of frequency but the noise floor rises at high frequency. This is the Sigma-DPCM convertor, which has been found to be extremely useful for audio conversion.

Fig.4 - Increasing the order of the noise shaping increases the reduction in noise when the data are decimated.

Fig.4 - Increasing the order of the noise shaping increases the reduction in noise when the data are decimated.

In a practical noise-shaped ADC, the required performance will be a function of the accuracy of the core quantizer, the oversampling factor and the order of the noise shaping filter. In other words a small improvement in each of those three multiplies together to give a large overall improvement.

Fig.4 shows how increasing the order of the filtering steepens the slope of the noise and allows the convertor performance to be enhanced with a lower oversampling factor. Note that the noise is still shaped even without additional filters because of the delay in the feedback acts like a comb filter.

Fig.5 shows a simplified version of a high-order noise shaping ADC. The cascaded integrators increase the order of the filtering and concentrate the noise at high frequencies so that the later decimation process can filter it out.

In the context of high order noise shaping, stability has to be considered carefully. For example if not well engineered, the high-order filtering could easily turn the convertor into an uncontrolled oscillator, howling round like an over amplified microphone. On the other hand total stability isn't a goal either.

Fig.5 - A high-order noise shaping convertor using highly oversampled 4-bit flash convertor.

Fig.5 - A high-order noise shaping convertor using highly oversampled 4-bit flash convertor.

With a fixed input the convertor should run with a pseudo-random series of states that average to a constant output. This is the so-called idle pattern of a noise shaped convertor that is the equivalent of dither in a regular convertor. There is some artwork involved in tuning the convertor for the right balance of stability and idle pattern. Once these problems were solved, the noise shaping audio convertor became almost universal.

As the word length used in digital audio increased, the accuracy constraints became such that even if economic factors were set aside, conventional convertor architectures were simply impracticable.

You might also like...

HDR & WCG For Broadcast: Part 3 - Achieving Simultaneous HDR-SDR Workflows

Welcome to Part 3 of ‘HDR & WCG For Broadcast’ - a major 10 article exploration of the science and practical applications of all aspects of High Dynamic Range and Wide Color Gamut for broadcast production. Part 3 discusses the creative challenges of HDR…

IP Security For Broadcasters: Part 4 - MACsec Explained

IPsec and VPN provide much improved security over untrusted networks such as the internet. However, security may need to improve within a local area network, and to achieve this we have MACsec in our arsenal of security solutions.

IP Security For Broadcasters: Part 3 - IPsec Explained

One of the great advantages of the internet is that it relies on open standards that promote routing of IP packets between multiple networks. But this provides many challenges when considering security. The good news is that we have solutions…

The Resolution Revolution

We can now capture video in much higher resolutions than we can transmit, distribute and display. But should we?

Microphones: Part 3 - Human Auditory System

To get the best out of a microphone it is important to understand how it differs from the human ear.