Digital Audio: Part 12 - Sampling Rate Conversion
In real systems the issue of sampling rate conversion arises frequently but fortunately there are plenty of solutions.
If a digital audio recording exists on some medium or other and the goal is to transfer it or copy it to a different medium, perhaps in another location, the goal will be met if the copied data are identical to the original. There is no need to transfer the samples in real time, but there is economic pressure to complete the transfer quickly. The problem of sampling rate conversion does not arise.
Rate conversion will be required for about three dominant reasons. There is not a single digital audio sampling rate and recordings exist made at a great variety of rates. If those recordings are to be heard on systems using different sampling rates, then conversion will be necessary as running at the wrong rate will change pitch. The same problem occurs in reverse if we actually wish to change the pitch of a recording as the sampling rate will change in proportion to the pitch change.
Fig.1 - The easiest form of rate conversion is shown at a) in which the rate is changed by an integer, so samples at the lower rate always coincide in time with samples at the higher rate. Next hardest is b), in which a fractional relationship means that samples coincide periodically and a finite number of phases of conversion are needed. Hardest of all is c) where there is no relationship between the rates and an infinite number of phases is needed.
Another reason is where two systems have adopted the same nominal sampling rate, but they are not synchronized. The frequency error will be small and it is unlikely any pitch problem will be heard, but lack of synchronism will cause problems like buffers over- or under-flowing and loud transients due to sample values being corrupted.
Finally, if we wish to use oversampling convertors, then there will be a rate conversion to move from the temporarily high rates used in the convertor down to the standard rates used in systems.
From the viewpoint of the equipment designer, there are different categories of rate conversion shown in Fig.1. In the first, there is a simple ratio between the two sampling rates. This is the easiest to implement and is the reason why oversampling convertors run a simple multiples of the system rate.
In Fig.1b) is the next most complex problem, which is where the two rates are related by the ratio of two integers; the so-called fractional ratio conversion.
Finally in c) is the hardest rate conversion where there is no simple relationship between input and output samples and indeed the relationship may change due to drift or the use of pitch changing.
Odd comments are sometimes heard about rate conversion with suggestions that the samples at the output are simply guessed. Nothing could be further from the truth. Sampling and filtering are inseparable, and when a waveform has been bandwidth limited, samples taken according to Shannon's theory record the entire waveform, not just the sample instants.
Fig.2 - At a) is a band limited audio waveform. At b) it has been sampled at a constant rate. At c) the same sampling rate was used, but with a different starting point. The entire waveform is stored in the samples of b) and the same waveform is stored in the samples of c). Even though the samples are all different, b) and c) sound exactly the same.
Fig. 2a) shows a bandwidth limited waveform. At b) it has been sampled at a certain sampling rate. At c) it has been sampled at the same rate but with a different phase. Both of those sample sets store the same waveform, but using completely different sample values. There is an infinity of such possible phases.
If we know the cut-off frequency of the filter, we know the whole waveform from samples b) or samples c), which means that from b) we could calculate samples c) or vice versa. It follows that sampling rate conversion can be reduced to the problem of finding a sample value at a specific place in a known waveform described by samples in other places. Quite obviously finding the sample value will require a filter.
Fig.3 shows another way of looking at the problem. Here a digital audio sample stream is feeding a DAC that contains a low-pass filter to reconstruct the continuous waveform. That signal is then fed to an ADC that samples at a different rate. The ADC has its own anti-aliasing filter. Clearly one of those filters is redundant; it's the one with the widest pass band.
A sampling rate convertor is simply a digital simulation of Fig.3. Instead of using an analog low pass filter, we use a digital filter and instead of computing the whole waveform, we only compute the waveform at the point where we want an output sample.
Fig.3 - A basic way of rate converting is to connect a DAC and an ADC in series as shown here. One of the low-pass filters is redundant. Modern convertors are so good this works surprisingly well. A sampling rate convertor is simply a digital simulation of this figure.
The filter is required to have a low-pass characteristic where the cut-off frequency is one half of the lower of the two sampling rates. In order to avoid linear distortion of the waveform the filter must be phase-linear, meaning that its impulse response must be symmetrical. The impulse response of an ideal phase-linear low-pass filter is a sinx/x function, which is in theory infinite in extent and for practical purposes needs to be windowed, or tapered off at the ends to produce a finite impulse response (FIR) characteristic.
The FIR filter is simple to understand because it creates its impulse response graphically. Fig.4 shows the concept. As a single non-zero sample (the impulse) shifts across the register at each stage it is multiplied by a coefficient having the correct magnitude and polarity to draw the sampled impulse response.
The filter calculations can be implemented in a number of ways. In a consumer device the filter may be incorporated in an LSI chip made in quantity. In a workstation the filter calculations may be carried out by software. Other possibilities include the use a DSP module or an FPGA.
The simple case of a 2 x oversampling ADC is considered. The sampling rate needs to be halved. This cannot be done by the omission of every other sample for two reasons. Firstly any frequencies above one half of the lower sampling rate will alias. Secondly the information in the omitted samples cannot be used to increase the resolution of the samples that are retained.
Fig.4 - A transversal or finite impulse response filter creates a sampled impulse response as the impulse shifts across the stages.
If the samples were first passed through a low-pass filter then it would be possible to discard alternate samples, but this is an extremely inefficient way of going about things as the result of half of the calculations is thrown away. It makes better sense only to compute the value of samples that will be retained. Fig.5 shows the idea. Input samples at the high sampling rate are shifted across the transversal register at the input sampling rate. However, the multiplications and summing required for computation of the output sample are performed at the output sampling rate.
In practice another saving may be made when the impulse response is perfectly symmetrical because a sample shifting across the register will find itself multiplied by the same coefficient twice. It is more efficient to perform the multiplication once and to store the product to be used a second time. This is known as folding the filter.
If instead we want to use a 2x oversampling DAC, then the sampling rate will have to be doubled. Newly computed samples will have to be inserted between the existing samples, which remain unchanged. Calculating such samples is known as interpolation. The same type of transversal or FIR filter is used, with the same impulse response, but the coefficients will be different because they will be calculated for half the previous sample spacing.
Fig.5 - A reduction in sampling rate by a factor of 2. Input data shift across the transversal filter at the high sampling rate, but an output sample is only calculated on every other input clock.
Fig.6 shows what happens. This simplified diagram considers only four input samples, A, B, C and D and is calculating the value of the new sample mid-way between B and C. In a DAC, the samples would find a low-pass filter having the impulse response shown. The output would be the sum of the impulses due to each input sample. The interpolator does the same thing. The contributions from samples A and D are negative and smaller because they are further away in time. The contributions from B and C are larger and positive.
Fig.6 - Digital interpolation, where the values of samples halfway between existing samples are calculated. See text for details.
All four contributions are added to produce the value of the interpolated sample. In practice the number of input samples to be considered (the window) would be much larger, but the principle remains the same.
Changing the sampling rate up or down by an integer factor is easy because the samples to be computed are always in the same locations. In fractional ratio conversion the samples are in a finite number of locations. The hardest conversion of all is where there is no simple relationship between the two rates. This means that output samples can occur absolutely anywhere between the input samples.
In the case of an ADC or DAC, the slightest amount of jitter will move a sample in time and result in the noise floor rising. In a rate convertor failure to compute the correct time for which a sample should be computed is the equivalent of jittering the sample. In a variable rate convertor the computation of the output sample phase to the required accuracy is probably more complex than the actual interpolation.
The rate convertor needs to compare the wanted timing of the samples with the timing of the available samples in order to establish the phase of the interpolation. This needs to be done individually for each output sample. To improve accuracy the two-sample rate clocks need to be smoothed using phase locked loops. As the phase relationship of the samples is continuous, holding coefficients in a look-up table may not be practicable and they may have to be calculated dynamically.
You might also like...
Expanding Display Capabilities And The Quest For HDR & WCG
Broadcast image production is intrinsically linked to consumer displays and their capacity to reproduce High Dynamic Range and a Wide Color Gamut.
NDI For Broadcast: Part 2 – The NDI Tool Kit
This second part of our mini-series exploring NDI and its place in broadcast infrastructure moves on to exploring the NDI Tools and what they now offer broadcasters.
HDR & WCG For Broadcast: Part 2 - The Production Challenges Of HDR & WCG
Welcome to Part 2 of ‘HDR & WCG For Broadcast’ - a major 10 article exploration of the science and practical applications of all aspects of High Dynamic Range and Wide Color Gamut for broadcast production. Part 2 discusses expanding display capabilities and…
Great Things Happen When We Learn To Work Together
Why doesn’t everything “just work together”? And how much better would it be if it did? This is an in-depth look at the issues around why production and broadcast systems typically don’t work together and how we can change …
Microphones: Part 1 - Basic Principles
This 11 part series by John Watkinson looks at the scientific theory of microphone design and use, to create a technical reference resource for professional broadcast audio engineers. It begins with the basic principles of what a microphone is and does.