Audio Over IP - Making It Work - Part 2

To fully leverage the benefits of IP networks we need to think in IT terms. Just replacing the acronym MADI or AES with IP is insufficient as all we end up with is a very complex, poorly utilized, static network.

Point to point connectivity has provided broadcasters with guaranteed bandwidths and exceptionally low latencies. With their telephony background, Telco’s have provided analogue and digital connectivity to route audio and video all over the world. The price is increased complexity and cost.

Essential Guide: Audio Over IP - Making It Work

Ironically, Telco’s have been distributing broadcast signals using IP networks for many years without us being fully aware. Gateways, at the interface to broadcasters, provided the necessary format translation from SDI, AES, or analogue, to IP and hid the underlying network from us.

Understanding how Layer-2 switching and Layer-3 routing operates, and how it relates to the Open Systems Interconnect (OSI) model is key to understanding computer networks and their implications for latency and timing.

Confused Terminology

It doesn’t help that broadcast engineers refer to signal switching devices as routers, such as an SDI router or audio router. The terminology becomes even more challenged as Layer-2 switches sometimes incorporate Layer-3 routing functionality. These switches are often referred to as multilayer switches.

Networks consist of two geographical zones – LAN’s (Local Area Networks) and WAN’s (Wide Area Networks). A LAN is generally associated with a single building, office or studio, and the WAN is a collection of many LAN’s to build a much bigger network. The largest of all networks is the World Wide Web.

Solve Congestion

One of the simplest LAN’s consists of a single Ethernet hub. Data received on each port on the hub will be duplicated to all other ports with no consideration for security or frame filtering. This works well for a small home or office network with a few connected computers and printers but is completely unworkable for broadcast infrastructures requiring high speed, low latency, and secure links. Congestion and collisions would soon occur resulting in highly distorted audio and video.

The Ethernet switch was introduced to solve the congestion issue. Switches automatically learn the Ethernet source and destination Media Access Control (MAC) addresses of the devices connected to each port. Using this information, the switch sends frames to ports only destined for the connected device, thus greatly reducing the potential for network congestion and collisions.

Diagram 1 – An Ethernet switch transfers frames to ports that are destined for a specific device.

An Ethernet frame is the smallest unit of bits on a layer-2 network. Frames are exchanged on the same LAN and provide a well-defined structure used for error detection and data link control. Frames contain source and destination MAC addresses and encapsulate IP packets when IP is used.

Broadcast Domain

Layer-2 networks, such as IEEE 802.3 Ethernet, use three types of delivery; unicast, multicast and broadcast. Unicast sends one single frame between devices. Multicast creates a “one to many” mapping from one device to many others. Broadcasting transmits frames to all devices in a network, also known as flooding the network.

A “broadcast domain” is a logical division of a network where all devices can be reached by a layer-2 broadcast message. This gives rise to the concept of a LAN being restricted to a building, office or studio.

WAN’s join many LAN’s together using layer-3 routers. Not all LAN’s are Ethernet networks, so the function of a router is to join different networks and different network technologies together to form a secure, cohesive, and manageable system.

Layer-2 Detects Errors

In practice, a layer-2 Ethernet switch monitors the frames’ CRC to determine if any errors have occurred in the source and destination MAC addresses, length and type field, and data payload. If the switch does detect a CRC error then it will simply drop the frame, thus resulting in a data corruption further up the IP stack.

IP is a data transfer protocol and no transmission medium is defined in the standard. The IP specification (RFC791) explicitly states that the IP protocol calls on local network protocols to carry the internet datagram to the next gateway or destination host. In other words, IP datagrams exist independently of an underlying medium on which to transport them.

Diagram 2 – When designing a network, data-rates must be adequately calculated and provisioned to prevent unacceptable latency. Here, the diagram on the right simulates four microphones being multiplexed together on one port of a switch with insufficient bandwidth, each of the microphone packets are connected to a different port on the left diagram, and the frames queue and start to lag behind real-time.

Independence of an underlying network is one of IP’s greatest strengths, but also provides some very interesting challenges. Throughout the history of broadcasting, the video and audio signals have been intrinsically connected to, and relied upon, the underlying transport medium. For example, AES-3 facilitates many data rates and encodes the data with the clock using bi-phase mark code (BMC) directly onto the wire to guarantee audio sample timing.

Timing is Lost

However, when using IP networks, the direct relationship between audio and video data, and sampled clock, is lost. We must adopt other strategies to reconstruct the video and audio signal such as RTP (Real Time Protocol).

Ethernet networks rely on VLAN’s (Virtual Local Area Networks) to provide security. VLAN’s split a network into logical units giving a unique number to each one. For example, VLAN-1 may consist of devices connected to ports 1, 3, 4, and 5, and VLAN-2 consist of devices connected to ports 2, 6 and 7. Any device on VLAN-1 will not be able to access devices connected to VLAN-2, even when sending a broadcast request.

Securely Route VLAN’s

A LAN may consist of many connected layer-2 switches and if VLAN’s were not used, then all devices connected to all ports within a network would be able to access each other – a clear security issue.

One application of Layer-3 routing is to connect VLAN’s together. Assigning different IP subnet masks to individual VLAN’s makes routing easier to administer and more secure. If we use VLAN-1 for studio 1 and VLAN-2 for studio 2, then we can route the microphones from studio 1 to studio 2 by simply creating an entry in the layer-3 routing table. This is a greatly simplified example as there will be many VLAN’s within a studio.

Sources of Delay

Moving frames from one port to another in a switch requires the use of look up tables so the frames destination MAC address can be associated with the correct destination port. Routers use a similar technique but rely on using the IP address within routing tables, and not MAC addresses, to determine where data packets are moved to. Although this method has become extremely efficient using content-accessible memory (CAM) techniques, an inherent delay is introduced in the process.

Static networks use manual routing tables to tell the router which port to send the IP datagram to. Routing tables tend to become bloated and difficult to administer, and in the event of a device failure, will require manual intervention to change the routing to a known good link. Dynamic routing fixes this.

Diagram 3 – The top diagram shows a constant evenly spaced string of IP packets sent from a host device such as a microphone, the bottom diagram shows the packets with variable delay and re-ordering after traversing through switches and routers in a LAN or WAN.

If a WAN is designed to be resilient, with extra routing paths provided to compensate for link or device failure, then dynamic protocols such as Routing Information Protocol (RIP) and Open Shortest Path First (OSPF) are used to facilitate dynamic systems. If a faulty link or router develops, these network protocols will detect and re-route data around them to effectively heal the network.

Adding Jitter

Unlike baseband MADI and AES, every single packet passing through a router or switch is processed to determine such values as the MAC source and destination addresses, CRC frame checks, and “Time to Live” counters in the IP headers. This adds variable processing time resulting in further jitter to frames and packets.

The combined effect of frame delays in switches and packet delays in routers, look up tables, and dynamic routing, leads to IP packets developing temporal jitter and latency. Data buffers in switches, microphones, sound consoles and all other IP host equipment are used to bring order back to the system. However, buffers add delay and too many concatenated buffers have a detrimental effect on the audio and video.

IP networks provide unprecedented flexibility for broadcasters and an incredible amount of research and development is being conducted by broadcast manufacturers to make IP systems work with the same reliability and quality of service broadcast engineers have come to demand and expect.

Other related articles posted on The Broadcast Bridge.

Part of a series supported by

You might also like...

Building Software Defined Infrastructure: Asynchronous & Synchronous Media Processing

One of the key challenges of building software defined infrastructure is moving to a fundamentally synchronous media like video to an asynchronous architecture.

Monitoring & Compliance In Broadcast: Monitoring Cloud Infrastructure

If we take cloud infrastructures to their extreme, that is, their physical locality is unknown to us, then monitoring them becomes a whole new ball game, especially as dispersed teams use them for production.

Phil Rhodes Image Capture NAB 2025 Show Floor Report

Our resident image capture expert Phil Rhodes offers up his own personal impressions of the technology he encountered walking the halls at the 2025 NAB Show.

Building Hybrid IP Systems

It is easy to assume that the industry is inevitably and rapidly making the move to all-IP infrastructures to leverage IP’s flexibility and scalability, but the reality is often a bit more complex.

Microphones: Part 9 - The Science Of Stereo Capture & Reproduction

Here we look at the science of using a matched pair of microphones positioned as a coincident pair to capture stereo sound images.