Digital Audio: Part 7 - Debunking The Myths Around Hi-Fi Audio

It’s interesting to compare the quality that can be obtained using digital audio with legacy media such as the vinyl disk and magnetic tape.

I have expressed before the opinion that analog was the last audio technology the consumer was able to understand. My case rests on the raft of nonsense that surrounds anything to do with digital audio, where commonly held views violate the laws either of physics, of communications theory or in some cases seek to negate ordinary reality.

Any attempt to cut through all that may be a waste of time, because the people who most need to understand do not have enough fundamental knowledge, or confidence in that knowledge, to be able to judge what is correct. That is almost the definition of an enthusiast.

I remember a time when it was possible to single out the hi-fi enthusiast as someone who would believe anything, and the hi-fi journalist as someone who would say anything the reader would believe. I did not then realize that these people were in fact trendsetters and that the unselfconscious propagation of nonsense they pioneered would subsequently expand to embrace politics and the media.

Fig.1 shows the famous nursery explanation of digital audio in which the waveform is described using toy bricks, preferably brightly colored. It is implied by this explanation that the digital version is inferior to the analog original because those ugly steps have destroyed the nice smooth analog waveform. Unfortunately this comparison is so far from the reality that it seems inadequate merely to say it is wrong. 

Fig.1 - The well-known nursery comparison of analog and digital audio shown here fails catastrophically because neither is represented correctly. One the same scale the analog waveform would have tremendous noise and the vertical steps of the digital signal are not possible in a system having a bandwidth limit.

Fig.1 - The well-known nursery comparison of analog and digital audio shown here fails catastrophically because neither is represented correctly. One the same scale the analog waveform would have tremendous noise and the vertical steps of the digital signal are not possible in a system having a bandwidth limit.

Sampling theory requires all sampling processes to be preceded and followed by filters that determine the bandwidth. The vertical sides of these toy bricks represent infinite frequency, so such a waveform cannot exist at all, let alone after a filter. Sampling theory also requires samples to be taken at an instant, not extended over the entire sample period in what is known mathematically as a zero-order hold process.

Although sampling appeared new to audio, it had been developed and perfected in other disciplines years earlier. One of those disciplines was shipbuilding. Traditionally, a ship hull would be made in model form. Then a special machine would sample it and allow the cross sections at various places along the hull to be transferred to paper and enlarged. Fig.2a) shows the lines of a yacht hull sampled in this way in the 1930s.

Fig.2 a) The lines of a 1930's yacht are samples of the hull outline.

Fig.2 a) The lines of a 1930's yacht are samples of the hull outline.

One of the earliest requirements to understand sampling theory was to find the minimum number of lines that would still accurately describe the curvature of a ship.

The lines would then be sent to the shipyard. If the shipyard was run by hi-fi journalists, the resulting ship would look something like Fig.2b) and onlookers would die laughing.

Fig 2 b) A yacht built by hi-fi journalists who think Fig.1 is correct.

Fig 2 b) A yacht built by hi-fi journalists who think Fig.1 is correct.

However, in the real world, the shipyard would erect a series of frames, representing samples taken at a point. The skinning of the hull would then be carried out by interpolation between the frames and the hull of Fig.2c) would emerge and everyone would be happy.

The interpolation process between the frames would be done with the help of a lead bar that could be formed to a fair shape and used as a pattern. This was known as a spline. Mathematicians subsequently adopted the term to describe interpolating algorithm.

Fig 2 c) A yacht hull built by people who know what they are doing.

Fig 2 c) A yacht hull built by people who know what they are doing.

The incorrect nursery brick explanation leads to further odd conclusions. I was told recently that audio must use higher sampling rates because transients can only be located at multiples of the sample period. Again this neglects the presence of filters.

In any system having a bandwidth limit, including human hearing, a waveform at 90 degrees to the time axis is completely impossible. All real sound waveforms in the air and in the ears and all audio waveforms, analog or digital, have a finite slewing rate.

Fig.3 shows a pair of transients with finite slope plotted across some sampling instants. The waveform of the transients is the same, but one of them has been shifted by a fraction of the sample period by the simple expedient of changing the value of the samples. As the samples in a 16-bit system have a resolution of one part in 65,000, it should be clear that a transient can be located in time with fantastic resolution.

The result of Fig.3 simply confirms the theory of perfect reconstruction on which sampling rests. Done properly, a sampled system reconstructs the original waveform plus or minus nothing. A digital system reconstructs the original waveform with the addition of a tiny amount of noise that is usually less than the noise in the original signal.

Another one of the myths that circulates in audiophile circles is that legacy audio technology uses continuously variable signals. Nice smooth waves are drawn alongside those horrible jagged digital bricks to show that analog is superior.

Fig.3 - Sampled bandwidth-limited transients shifted in time by a fraction of the sampling period by the simple expedient of allowing the samples to have different levels. The accuracy with which a transient can be located far exceeds the ability of the ear to discern it.

Fig.3 - Sampled bandwidth-limited transients shifted in time by a fraction of the sampling period by the simple expedient of allowing the samples to have different levels. The accuracy with which a transient can be located far exceeds the ability of the ear to discern it.

Unfortunately the comparisons are flawed. That's not just my opinion; instead it follows from the existence of electrons, molecules and magnetic domains, along with quantum theory. Electrons are charge quanta, all exactly the same.

Tape is coated with a magnetic layer containing tiny discrete magnetic domains. A magnetic domain is at all times fully magnetized, due to those pesky discrete electrons orbiting and acting like little discrete solenoids. In order to demagnetize a material, we have to arrange that the domains are pointing at random so they cancel out. To have some net magnetization, there must be more domains pointing one way than any other.

However, as domains are discrete, the overall magnetization goes in steps. That's where tape hiss comes from. It's magnetic quantizing noise, because analog tape isn't continuously variable.

Let's suppose we have 16 magnetic domains on an "analog" medium and we could rotate them N-S or S-N. If half of them were in each state, the result would be zero or 8:8. If we rotated the domains one at a time, the result would be 9:7, 10:6, 11:5, 12:4, 13:3, 14:2, 15:1 and 16:0, so it is possible to have nine different magnetizations from zero to maximum. We could get another eight magnetizations in the other direction, making 17 altogether.

Then some smartass comes along and allows each of the sixteen domains to represent a bit. Instead of 17 possible levels there are now 65,536 combinations. In the case of tape the fundamentally quantized nature of magnetic media give significantly better results when used to represent data. The discrete bipolar nature of magnetic domains goes hand in hand with binary numbers.

Magnetic tape uses hysteresis in order to remember what was recorded. It's fundamentally non-linear and bias has to be used to try to linearize it. On the other hand dithered digital audio is fundamentally linear.

Then we consider vinyl disks. Vinyl is a polymer, which means that its molecules are large, because they are made up of many atoms, hence the poly in polymer. Like magnetic domains, molecules are discrete, so the grooves of vinyl disks are approximated by an integer number of discrete molecules. If we did a fair comparison between digital audio in the shape of a Compact Disc and a vinyl disk, we would have to draw the vinyl molecules on the same scale as the quantizing steps of the CD.

Making a fair comparison like that reveals that properly dithered 16-bit digital, let alone noise shaped digital, causes significantly less damage to the waveform. On the same scale, the vinyl disk is dragging its stylus through a field of giant boulders. The math is simple. Take the amplitude of a typical vinyl groove, divide it by 65,000, the number of steps on a CD, and the result suggests how big the vinyl molecule needs to be. Unfortunately real vinyl molecules are bigger by orders of magnitude, hence the poorer SNR of vinyl.

Compared fairly, the vinyl medium is outperformed by PCM audio in every parameter that can be measured. Digital audio doesn't rumble, doesn't have crosstalk or wow or flutter and doesn't have tracing distortion, so when people claim that vinyl is a superior medium to digital audio, there is no theoretical or practically measured basis for their claim.

What I find especially amusing is the enthusiast who plays a vinyl disk to show how good it sounds without realizing that the recorded signal passed inaudibly through a digital delay in the variable groove pitch system.

However, that is not to say that a preference for a given vinyl disk is wrong, because I think it is true that a lot of what is carried over today's digital media is inferior to what can be heard on a good vinyl disk. Many modern recordings are over processed in production and are lacking in dynamic range and ambience. The medium gets the blame for what is put on it.

Another salient point is that a lot of digital audio heard today has been compressed, not in the dynamic range sense, but in the bit-rate-reduction sense and the degree of compression used often exceeds the capabilities of the algorithm. A vinyl disk made using traditional techniques may not suffer from either of those problems.

You might also like...

Expanding Display Capabilities And The Quest For HDR & WCG

Broadcast image production is intrinsically linked to consumer displays and their capacity to reproduce High Dynamic Range and a Wide Color Gamut.

NDI For Broadcast: Part 2 – The NDI Tool Kit

This second part of our mini-series exploring NDI and its place in broadcast infrastructure moves on to exploring the NDI Tools and what they now offer broadcasters.

HDR & WCG For Broadcast: Part 2 - The Production Challenges Of HDR & WCG

Welcome to Part 2 of ‘HDR & WCG For Broadcast’ - a major 10 article exploration of the science and practical applications of all aspects of High Dynamic Range and Wide Color Gamut for broadcast production. Part 2 discusses expanding display capabilities and…

Great Things Happen When We Learn To Work Together

Why doesn’t everything “just work together”? And how much better would it be if it did? This is an in-depth look at the issues around why production and broadcast systems typically don’t work together and how we can change …

Microphones: Part 1 - Basic Principles

This 11 part series by John Watkinson looks at the scientific theory of microphone design and use, to create a technical reference resource for professional broadcast audio engineers. It begins with the basic principles of what a microphone is and does.