Digital Audio: Part 9 - Representing Data

The advantages of digital audio for recording purposes are clear, but once in the digital domain, productions steps also need to be carried out. Recorders don’t care about the encoding method, which is instead optimized for production purposes.

As has been shown earlier in this series, the type of encoding used for digital audio is fundamentally linear, which means that the size of the digital number representing the audio voltage is strictly proportional to that voltage. Linearity means that audio processes, such as gain and attenuation, are performed simply multiplying the sample values by factors respectively larger or smaller than one.

This linearity requirement is the reason for uniform quantizing that characterizes linear PCM as well as floating-point representations of audio.

Sound propagates through air as pressure variations above and below the average pressure. The velocity of the air movement is alternately positive and negative. Audio signals from microphones are electrical signals going above and below zero volts. Color difference signals in television are also bipolar.

What we mean by level in audio is the magnitude of the signal, the extent to which it deviates from the center. All audio waveform manipulation is performed with respect to the center of the signal range. This approach requires a different coding scheme from pure binary, which works from one end of the signal range.

Imagine a pure binary counter having a finite number of bits that is being endless clocked. It will count up until it has a value of all ones, and then it will overflow to all zeros and start again. Equally if the counter counted down, when it reached zero it would underflow and start again from all ones. Fig.1 shows that effectively the infinite number range of real numbers, positive and negative, is being mapped onto a circle, like magnetic tape wrapped around a spool. Only the position around the circle is known from the state of the counter. 

Fig.1 The repeated overflow or underflow of a counter having finite word length, three bits in this example, causes the range of all numbers to be wrapped repeatedly around a circle.

Fig.1 The repeated overflow or underflow of a counter having finite word length, three bits in this example, causes the range of all numbers to be wrapped repeatedly around a circle.

The number of times the count wrapped around the circle is not known, because there are no bits counting the number of overflows. The binary number in Fig.1 could therefore represent a range of numbers as shown. Mathematically speaking, the counter is implementing modulo arithmetic, a subset of the theory of finite fields.

In order to represent bipolar waveforms such as audio, the circular finite set of numbers from the binary counter is rotated one half turn, as shown in Fig.2, so that the count of zero is in the center of the signal range, corresponding to audio silence. Counts below that point represent negative audio voltages and counts above it represent positive audio voltages.

In the four-bit example of Fig.2, the sum of any positive number added to its negative will be 24 or 16. The negative number is the two's complement of the positive number. The sequence of the numbers is unchanged on passing through zero. Negative numbers are out by one step in this notation. To form the two's complement of a pure binary number, all of the bits must be inverted and then one must be added.

This procedure can be checked against Fig.2, which shows a simple example of a two's complement coding scale with its decimal equivalents. The tremendous advantage of two's complement is that it allows bipolar signals such as audio correctly to be handled in binary logic, performing audio mixing, for example.

Fig.2 In two's complement coding, the point where the overflow is deemed to occur is rotated so that zero is in the center of the scale and half of the scale can represent negative numbers.

Fig.2 In two's complement coding, the point where the overflow is deemed to occur is rotated so that zero is in the center of the scale and half of the scale can represent negative numbers.

Fig.3 shows a two's complement ADC. The analog input signal is given a DC offset such that silence sits in the center of the quantizing range. The unipolar or pure binary ADC now produces samples that are referred to one end of the scale. To convert to two's complement, the MSB of the convertor output is simply inverted. In most audio convertors the offset and inversion processes are integral.

Whilst a small error in the DC offset of Fig.3, due perhaps to drift, would not ordinarily be audible, audio clips having different offsets could result in an audible thump if edited together. There are various solutions available. One is to insert a digital high pass filter after the convertor, which will remove any DC component from the data. Another is to drive a low-pass filter from the audio data that will remove the audio and leave only the offset. This is then fed back to the input stage in a sense that reduces the offset.

In floating point coding, two's complement cannot be used because the shifting that results from the gain-ranging conflicts with the meaning of the MSB that two's complement relies on. Instead floating point uses a sign bit and the associated binary number is proportional to the voltage above or below silence. This coding scheme is known as signed binary.

Fortunately conversion between two's complement and signed binary is easy. The MSB of two's complement is effectively a sign bit where 1 represents a negative number. In the case of a positive number, the remaining bits are kept unchanged. In the case of a negative number, the remaining bits are inverted, and one is added. This can be checked with reference to Fig.2.

Fig.3 A two's complement convertor is implemented by placing a pure binary convertor in between two half-scale offsets, one analog and one digital.

Fig.3 A two's complement convertor is implemented by placing a pure binary convertor in between two half-scale offsets, one analog and one digital.

The use of two's complement allows a great simplification in the use of logic because it is not necessary to have subtraction hardware. Instead, adding the complement performs a subtraction.

In electronics, mixing is a process of multiplication that alters the signal spectrum. In audio and video the term is used with the meaning it has in cookery, where the ingredients are added.

Fig. 4 shows some examples of the addition of two's complement numbers, where the correct result is obtained for both positive and negative values. The binary additions can be followed with reference to Fig.2. The ability to add bipolar audio signals gives us the ability to make much more than audio mixers. The same mechanism will be found in sample rate convertors, in oversampling and in digital filters used for equalizing.

It is necessary to be careful to ensure that the system can correctly handle the case where the sum would go out of range. It must be borne in mind that two's complement is a finite field. A pure binary calculation adding two samples that are each bigger than half of the range will produce a sum that will wrap around the number circle, so that two positive numbers add to an erroneous negative result.

All audio processing needs additional circuitry or software to catch this overflow and replace it with a clipping process. Where the two samples to be added have opposite polarity, an overflow cannot occur, but where both have the same polarity, wrapping around past the ends of the scale must be prevented.

Fig.4 Two sample streams are added (mixed in audio parlance) by treating the sample values as pure binary numbers. The four-bit codes of Fig.2 are used here.

Fig.4 Two sample streams are added (mixed in audio parlance) by treating the sample values as pure binary numbers. The four-bit codes of Fig.2 are used here.

This is done by looking for overflows not from the end of the word, but into the MSB. In Fig.2, the maximum positive code is 0111 corresponding to +7. Adding one in pure binary would give 1000, and the overflow into the MSB reveals that the value of -8 is incorrect.

The hard end of the range in PCM audio led to it being used as one of the references for level measurement. A digital audio level of 0dB(Fs) is the level of a signal that is just failing to clip. In the case of a sine wave, the positive peaks of the signal would reach the largest positive two's complement code of zero followed by all ones. In a sixteen-bit system, the largest positive code is 32767 decimal, corresponding to an RMS level of 23167 decimal for a sine wave.

A proper audio level meter must measure both positive and negative peaks and in the analog world that was done using a full-wave rectifier. The digital equivalent of a rectifier is calculation of the signal magnitude, which is the distance from zero. For a positive sample value, nothing needs to be done. For a negative sample value, the sample needs to be inverted by reversing all the bits and adding one. Given that level meters are logarithmic, failure to add the one in the inversion creates an error in negative levels that is negligible.

When dB(Fs) is used, all meaningful audio levels are negative. Unlike legacy media, digital audio remains perfectly linear up to the point of clipping and so has no natural headroom. As a result if it is desired to continue using traditional working practices where signals can go above 0dB, it is necessary to introduce artificial headroom. This is achieved simply by re-calibrating the level meters so that they read 0dB at some arbitrary number of dBs below clipping.

The amount of artificial headroom does vary and if a digital recording made with one amount is sent to a place that assumes another amount, the level will appear to have changed. Obviously the numbers representing the waveform didn't change, it is simply the same waveform giving different readings with differently calibrated meters. Meters calibrated in dB(Fs) will always read the same on the same data. 

You might also like...

Live Sports Production: Part 1 - New Sports Production Workflows

Welcome to Part 1 of ‘Live Sports Production’ - This new multi-part series uses a round table style format to explore the technology of live sports production with some of the industry’s leading system designers. It is a fascinating insight i…

IP Security For Broadcasters: Part 8 - RADIUS Network Access

Maintaining controlled access is critical for any secure network, especially when working with high-value media in broadcast environments.

Microphones: Part 5 - The Variable Directivity Microphone

The variable directivity microphone is very popular for studio work. What goes on inside is very clever and not widely appreciated.

IP Security For Broadcasters: Part 7 - Operating Systems

As well as providing the core functionality of a computer, operating systems have the potential to be a primary issue for security and keeping hackers at bay.

Deep Learning Accelerates Object Tracking In TV Production

Advances in application motion tracking in audiovisual production, both live and recorded, have been slow until recently accelerated by the advent of modern AI techniques associated with neural network based deep learning and mathematical graph theory. These advances have converged…