Audio Levels - Part 3
The traditional level standards were based on electrical signals of specified power. When these signals are recorded on media, or transmitted in other ways, these definitions no longer apply.
Our traditional definitions of audio level were completely dependent upon a specific medium, which was an electrical transmission line having a characteristic impedance. Level was referenced to the delivery of a milliWatt into 600 Ohms, later to the Voltage that would have delivered that power.
However, on analog magnetic tape there is no impedance and no Voltage. The same is true of vinyl discs and Compact Discs alike. In FM radio, the waveform is carried by a change in frequency and the power does not reflect the audio level. Any level defined in the traditional electrical way is no longer meaningful or useful in these media and a different approach is required in each case.
The approach taken with analog magnetic tape is correctly based on the characteristics of that medium. Magnetic recording depends totally on hysteresis, which is fundamentally non-linear. If magnetism were linear the tape would be unable to remember anything. Practical analog tape relies on the use of bias to linearize the mechanism, which it more or less does except at high levels where the bias mechanism fails.
Fig.1 shows the characteristics of analog tape distortion as a function of level, where there is a linear region, followed by a region in which the distortion gets progressively worse. The maximum continuous level was located where some small agreed level of distortion at the start of the non-linear region.
Fig.1 - The characteristics of analog audio tape show steadily rising distortion as saturation is approached. The region could be used as headroom.
Psychoacoustically, the effect of the non-linear region was not too bad. Short transients, lasting a few milliseconds could be recorded there without obvious effect. Longer sounds would be subject to a kind of soft clipping. As early tapes had relatively poor noise performance, it was necessary to use the highest practical levels to keep the signal above the noise. This meant recording peaks in the non-linear region, which came to be called the headroom.
Analog tape thus had three meaningful levels: the noise floor, the maximum level for negligible distortion and the level of saturation. These levels, of course could not be expressed electrically, they had to be defined using units of magnetic flux strength, namely nanoWebers per meter (nWb/m).
Those magnetic levels could not be standardized, because they were a function of the tape formulation and would change as different brands and types appeared.
In practice the tape machine would be fitted with level meters that could only show an electrical level. During the calibration of the machine for the type of tape in use, the gain of the recording amplifier would have to be set so that when a standardized number of dB(u) appeared on the meter, the correct number of nWb/m appeared on the tape. Equally the gain of the replay amplifier would have to be calibrated so that the appropriate number of nWb/m on a tape would result in the original electrical level.
That approach worked best when the same machine played back its own tapes, but when tapes were interchanged, a tape recorded at one magnetic level might produce the wrong electrical level when played on a machine set up for a different tape. The solution was to precede the wanted recording with a section of tone, typically 1KHz. A different replay machine would then have its replay gain adjusted until the tone coming out had the right electrical level.
The natural occurrence of headroom in magnetic tape fitted well with the production procedures used by broadcasters, which also required headroom to allow the occasional excursion above a usual level. Unlike tape and disc, where the replay level could be adjusted for each recording, broadcasters had the further requirement that their output should not contain obvious jumps in level else the poor listener would have to keep adjusting the volume.
A similar approach was needed with disc cutting, where the amplitude (actually the velocity) of the stylus had to be limited to make the disc playable as well as to allow a reasonable recording time. The disc cutter was calibrated so that an agreed electrical level in dB(u) would produce a certain stylus velocity.
All of these approaches worked reasonably well, even if many users of tape recorders had no idea what was happening inside; provided they kept the meters somewhere reasonable, things would generally be fine.
On to this scene of composure burst digital audio sometime in the 1980's. At a simple level, an audio waveform in the digital domain is expressed by the values of a series of numbers. It does not matter how those numbers are transmitted or recorded, or if the numbers are faithfully copied from one medium to another. Numbers on magnetic discs have the same meaning as numbers written on paper.
The great advantage of digital audio is that the sound quality is independent of the medium and depends instead on the sampling rate, which limits the bandwidth, and the word length, which limits the dynamic range. Those limits can, of course, be far beyond the limits imposed by analog media.
When audio freed itself from the characteristics of a medium by going digital, it also freed itself from all of the traditional definitions of level. It was necessary to do three things to resolve that. Firstly, a way of measuring level based on the properties of numbers needs to be defined. Secondly it is necessary to standardize the relationship between that numerical level standard and the traditional standards so that when an analog signal is applied to an ADC we have some idea what is going to happen. Thirdly as digital audio, unlike analog tape, doesn't have headroom, artificial headroom would have to be invented.
Let us first explore the properties of binary numbers. Digital audio expresses the waveform as a series of binary numbers, except that the numbers are not pure binary because audio waveforms are bipolar: silence or muting is in the center of the range and values go both above and below it.
A binary number typically has a fixed word length. For example, in the Compact Disc it is 16 bits. There is then a hard limit on the range of combinations possible: 65,636 and we want to put muting halfway up that range. That hard limit helps us, because when a 16-bit pure binary number reaches all ones, or FFFF in IT-speak, adding another one causes the number to overflow to all zeros.
The number range becomes circular and it is possible to re-arrange things into so-called two's complement binary, so that all zeros corresponds to muting, and the most significant bit (MSB) becomes a sign bit, zero for positive. In analog audio, the level of a signal is the distance from zero Volts. In two's complement the level of the signal becomes the distance from all zeros.
Two's complement audio displays a phenomenon that is level related and known as sign extension. The idea is shown in Fig.2. The biggest possible positive value will be 7FFF and the biggest possible negative value will be 8000. In both cases the MSB or sign bit is different from the next bit down.
If the level is reduced by 6.02dB, the greatest positive value will now be 3FFF and the greatest negative will be C000. As Fig.2 shows, the top two bits are now the same. If a further 6.02dB attenuation is applied, the greatest positive value will be 1FFF and the greatest negative will be E000, as the top three bits are now the same.
Fig.2 - In PCM two's complement the phenomenon of sign extension means that as level goes down, more high order bits copy the MSB which is the sign bit.
As 6.02dB corresponds to a factor of two, every attenuation step halves the size of the sample, which needs one less bit to describe it. In two's complement as the level goes down increasing numbers of high order bits copy the sign bit.
It follows that an indication of the level of a digital audio signal can be obtained simply by looking at the amount of sign extension. It also follows that lossless audio compression is possible simply by omitting the copied bits.
The onset of clipping determines the greatest level that can be sent through a digital audio system without distortion. However, it is a characteristic of digital audio that there is no corresponding lower limit. With the use of dither, a digital system remains linear at all levels below clipping and as signals are attenuated, they simply slide further into the noise floor.
Techniques such as noise shaping are able to reduce the perceived noise simply by reducing it where the ear is most sensitive and placing it where the ear is less sensitive. There is thus no natural low limit that we might use as a reference, leaving us with the onset of clipping as the only simply observable reference.
That was behind the adoption of the dB(Fs) or full scale decibel, which is the level of a sine wave in a digital audio system (strictly speaking a PCM linear two's complement system) that just fails to clip. In other words the greatest sample values are reached without any deviation from the purity of the waveform. It follows that all meaningful values of dB(Fs) will be negative.
You might also like...
HDR & WCG For Broadcast: Part 3 - Achieving Simultaneous HDR-SDR Workflows
Welcome to Part 3 of ‘HDR & WCG For Broadcast’ - a major 10 article exploration of the science and practical applications of all aspects of High Dynamic Range and Wide Color Gamut for broadcast production. Part 3 discusses the creative challenges of HDR…
IP Security For Broadcasters: Part 4 - MACsec Explained
IPsec and VPN provide much improved security over untrusted networks such as the internet. However, security may need to improve within a local area network, and to achieve this we have MACsec in our arsenal of security solutions.
IP Security For Broadcasters: Part 3 - IPsec Explained
One of the great advantages of the internet is that it relies on open standards that promote routing of IP packets between multiple networks. But this provides many challenges when considering security. The good news is that we have solutions…
The Resolution Revolution
We can now capture video in much higher resolutions than we can transmit, distribute and display. But should we?
Microphones: Part 3 - Human Auditory System
To get the best out of a microphone it is important to understand how it differs from the human ear.