Broadcast For IT - Part 15 - Digital Audio

Audio is arguably the most complex aspect of broadcast television. The human auditory systems are extremely sensitive to distortion and noise. For IT engineers to progress in broadcast television they must understand the sampling rates and formats of sound, and in this article, we delve into digital audio.

Analogue audio was prevalent in broadcast television up to the 1990’s when digital audio started to emerge in the professional arena. A combination of integrated circuit innovation, the adoption of digital audio in telecommunications, and real advantages of digital processing, drove the adoption of digital audio.

Noise, distortion, and interference are the enemies of any audio systems. Noise manifests itself as hiss. Distortion makes the audio sound “buzzy” or “crackly”. And interference can be anything from low frequency hum caused by electrical mains feeds, to high frequency spikes caused by a faulty fluorescent lighting circuit.

Digital Audio Benefits

Digitally distributing and processing audio negates many of these problems, especially interference caused by external factors. In the perfect digital audio chain, the signal is kept in its digital format wherever possible. The only time when the audio should be analogue is at the microphone and loudspeaker.

The two most common microphones used in broadcasting are moving coil and condenser, both generate analogue signals and need to be converted to digital at the earliest possible opportunity. The device to amplify and convert the microphone to a digital signal is the ADC (Analogue to Digital Converter).

Digitize Early

ADC’s are available in various formats with an array of functionality. Single channel units will take one or two mic inputs and provide a digital output that will connect directly to the sound console. Others have up to sixty-four mic inputs and provide a time-division-multiplexed data stream called MADI (Multichannel Audio Digital Interface).

Diagram 1 – Analogue audio signals are converted to digits and synchronized as soon as possible using the master-clock to enable time division multiplexing into AES3 format.

Diagram 1 – Analogue audio signals are converted to digits and synchronized as soon as possible using the master-clock to enable time division multiplexing into AES3 format.

Converting an analogue signal to digital using an ADC is also referred to as PCM (Pulse Code Modulation). The analogue audio is sampled at regular intervals and a digital number results providing data values proportional to the voltage amplitude of the audio signal.

Digital audio is normally distributed as discrete integer values, hence the reason we use PCM. However, audio processing equipment uses long floating point values to reduce the risk of concatenation errors.

Digitization Parameters

Two parameters are used to describe the ADC function; sampling rate and bit depth.

Sampling rate is the number of instances per second that the audio signal is measured to provide the resulting PCM output. Harry Nyquist (1889 – 1976) was a Swedish-born American engineer who made outstanding contributions to the theories of communications. One of these became known as the “Nyquist sampling rate” and defines the minimum sampling rate that a signal can be measured at to provide PCM output with no aliasing. That is, it can be turned back into analogue without error or distortion.

Nyquist determined the minimum sampling rate to be just over twice the highest frequency being sampled. The human auditory system has a maximum frequency range of approximately 20KHz (this is an average and varies according to age and physical condition of the listener). Television assumes 20KHz as the upper limit of the human hearing range and so two rates are commonly used; 44.1KHz and 48KHz. 

Diagram 2 – the top diagram shows an audio sine wave sampled in accordance with Nyquist’ theorem, the bottom diagram shows what happens when Nyquist isn’t obeyed, the blue signal will be terribly distorted and unusable.

Diagram 2 – the top diagram shows an audio sine wave sampled in accordance with Nyquist’ theorem, the bottom diagram shows what happens when Nyquist isn’t obeyed, the blue signal will be terribly distorted and unusable.

CD’s use 44.1KHz. This was chosen as the common audio sampling rate for 30 fps (pre-NTSC color) and 25fps. However, when NTSC was broadcast, the frame rate reduced to 30/1.001 fps and 48KHz was chosen as the nearest common denominator between European and US based television systems. Using 48KHz meets the Nyquist criteria as 48KHz sample rate is greater than twice 20KHz.

Greater Bit Depth Needed

Bit depth describes the resolution of data used to define the digitally converted audio. The bit-depth is directly proportional to the level of the noise floor, a bit depth of 16 bits gives a noise floor of 96dB, 20 bits gives 120dB and 24 bits gives 145dB’s. Professional studio’s use depths of 16, 20, or 24 bits.

There is always a compromise between quality and cost of implementation. The higher the bit depth the better the sound resolutions and signal to noise ratio. However, as bit depths and sampling rates increase, so does the bandwidth required to distribute them, and the capacity needed for storage.

AES3 and MADI Distribution

The two fundamental digital distribution systems used in the audio control room of a television studio are AES3 (Audio Engineering Society) and MADI (Multichannel Audio Digital Interface). A third method uses SDI (Serial Digital Interface), where the audio is embedded into the VANC (Vertical Ancillary Data) of the SDI. But this is only used when distributing the sound and vision together outside of the studio.

MADI and AES3 are usually used to distribute digital audio within studios, whereas SDI is generally used to distribute audio with video to remote studio’s or playout centers.

AES3 describes the format, electrical layer, and physical connectivity of the standard. Two channels are defined, Channel A and Channel B. Each sample is either 16, 20, or 24 bits, and thirty-two samples make up one sub frame. Two sub-frames are formed for channel A and B, and these are combined to make 192 frames. 

Diagram 3 – 48KHz sample rate was chosen for professional audio as it can be synchronized to both NTSC and PAL frame rates. NTSC is in phase every five frames and PAL every frame.

Diagram 3 – 48KHz sample rate was chosen for professional audio as it can be synchronized to both NTSC and PAL frame rates. NTSC is in phase every five frames and PAL every frame.

The data signals can be distributed over unbalanced, balanced, and optical cabling.

More Than Just Audio

User data is available in the auxiliary parts for the frames to facilitate distributed private-data and timecode. This allows frame accurate information to be sent with the audio samples.

One of the challenges of AES3 cable is that it is bulky and not very efficient, a 64-input desk would need between 64 and 128 cables, these take up a lot of space, are heavy and expensive.

The introduction of MADI maintained high quality 48KHz sampling and depths of up to 24 bits, but significantly increased the number of channels that can be distributed across one cable to 64. Coax and fiber optic cables are the main distribution mediums for MADI and this is particularly useful when distributing many audio channels along a single cable.

Even Higher Sampling

MADI facilitates an increase in the sampling rate to 96KHz at the expense of reducing the number of channels distributed, this is useful when exceptionally high precision is required, often when significant post processing is needed.

It’s important to note that both AES3 and MADI are synchronous distribution systems. They require their own specific networks and specialist equipment to process and distribute the audio. Taking one audio channel from an AES3 circuit and inserting it into a MADI circuit is difficult, problematic, and requires a great deal of specialist knowledge from the sound engineers operating the systems.

Synchronize Audio

Furthermore, AES and MADI networks require a master pulse generator to keep the respective networks synchronous along with the terminal equipment attached to them. Failure to do so will result in lost packets and audio distortion.

Digital audio is a complex subject to master and even the smallest error or timing loss can result in lost packets leading to distortion. One of the challenges to overcome, as we move to IP, is to distribute synchronous audio and video streams over asynchronous IP networks with no packet loss.

You might also like...

HDR & WCG For Broadcast: Part 3 - Achieving Simultaneous HDR-SDR Workflows

Welcome to Part 3 of ‘HDR & WCG For Broadcast’ - a major 10 article exploration of the science and practical applications of all aspects of High Dynamic Range and Wide Color Gamut for broadcast production. Part 3 discusses the creative challenges of HDR…

IP Security For Broadcasters: Part 4 - MACsec Explained

IPsec and VPN provide much improved security over untrusted networks such as the internet. However, security may need to improve within a local area network, and to achieve this we have MACsec in our arsenal of security solutions.

Standards: Part 23 - Media Types Vs MIME Types

Media Types describe the container and content format when delivering media over a network. Historically they were described as MIME Types.

Building Software Defined Infrastructure: Part 1 - System Topologies

Welcome to Part 1 of Building Software Defined Infrastructure - a new multi-part content collection from Tony Orme. This series is for broadcast engineering & IT teams seeking to deepen their technical understanding of the microservices based IT technologies that are…

IP Security For Broadcasters: Part 3 - IPsec Explained

One of the great advantages of the internet is that it relies on open standards that promote routing of IP packets between multiple networks. But this provides many challenges when considering security. The good news is that we have solutions…