Audio Over IP Primer For Broadcast - Part 2

In Part 1 we introduced the benefits of Audio over IP and investigated some of the subtleties that make it the ideal choice for modern broadcast facilities. In Part 2, we look at the practicalities of making AoIP work in a real-time television environment.

This article was first published as part of Essential Guide: Audio Over IP Primer For Broadcast


Plug-and-play has been available in IT systems for many years. When we walk into a café the WiFi just works, or when we insert a USB drive to a computer it appears in the file viewer almost instantly, even automatically downloading and installing the drivers if needed. Adding a video card or external GPU requires little input from the user, again, it just works.

It hasn’t always been this way. When home computing and IT was in its infancy, even making a simple home network seemed to be incredibly difficult. Drivers had to be manually downloaded and hours were spent configuring files to recognize USB and network devices. Adding a new video card seemed like an impossible job.

Thirty Years of Experience

As audio over IP has been around for nearly thirty years, there has been plenty of opportunity for vendors to fine tune their systems to make interoperability much easier to achieve.

Just like IT, audio over IP has network and open standards solutions.

Audio has a multitude of configurations. From the number of channels to the sampling rate, and from the bit depth to the endianness of the data format, and that’s before we start even considering compression and coding types.

A microphone connected to an IP network stream will need to be configured so that any device receiving its IP packets can make sense of the audio. An IP stream of audio, by default, doesn’t have any knowledge of the audio sampling rate, bit depth, or channel configuration. The act of streaming audio at an IP packet level only occurs at the transport level.

Plug-n-Play Solution

To provide a complete plug-and-play system, further features are required such as discovery, control, and security. As well as telling a downstream device how the audio is configured in the IP stream, there must be a method of determining its existence on the network as well establishing its IP address.

Broadcast systems tend to work within managed networks. This provides us with some assurances such as minimal latency and ringfenced security, but some administration is still needed to allocate IP addresses, systems such as DHCP.

AES67 is a form of an open standard. The documentation of the standard mandates a specific RTP payload format for delivering audio over IP as well as methods of exchanging parameters about the audio stream. However, it does not discuss anything about discovery, or control.

Diagram 1 – Providing an open transport standard is often only the beginning, the whole solution includes discovery, control, and security.

Diagram 1 – Providing an open transport standard is often only the beginning, the whole solution includes discovery, control, and security.

Discovery is a complex service within the network and helps make the whole configuration of the system dynamic and much easier to manage. DHCP (Dynamic Host Control Protocol) is used in IT systems to issue and retrieve IP addresses for devices connected to the network and then removed. The system administrator provides a range of IP addresses available to the DHCP that it uses.

In audio over IP, the challenge of providing IP addresses is further complicated by the addition of multicasting. Not only does a management system have to issue IP addresses, but it must also manage the allocation of multicast addresses. Although there are several million addresses available within the range, downstream devices must know the multicast IP address to connect to the service.

For example, if Presenter-A’s microphone uses multicast address 224.0.10.0, then a sound console requiring Presenter-A’s microphone stream must first know it exists, then know the format of the audio stream, and then indicate to the Ethernet switch that it requires a copy of the stream. This could all be administered manually with a spreadsheet keeping record of the parameters, but even the smallest system soon runs into many hundreds, if not thousands of active multicast streams in a network.

Although plug-n-play solutions deliver fantastic ease of use, interoperability with other formats such as MADI, AES3 and AES67 are required. Plug-n-play systems allow manual configuration of the management software to facilitate other formats that don’t strictly adhere to their protocols.

Increased Flexibility

The upside of this type of configuration is that the facility becomes incredibly flexible but the price we pay is significant complexity. Even if we assume a multicast source is known either through an SDP file (Session Description Protocol) or is fixed, the task of entering every multicast stream IP address into a device is an incredibly difficult task fraught with potential for error. This is further complicated when using assignable sound consoles where the parameter configuration may be buried deep in a menu structure.

Even with this very simple example, we can see automated discovery and interoperability is a very difficult task to get right, and open standards bodies such as the AES seem to have stayed away from specifying it in the AES67 standard.

Interoperability Testing

One of the biggest challenges for achieving discovery and interoperability is the ability to test many different vendors software implementations so they work together. Many vendors put massive amounts of time, energy and resource into making their products work, and they simply don’t have the time to release highly skilled R&D teams to go on interop-days to see how reliably their equipment connects with a competing-vendors. This is before we start even considering the potential for commercially sensitive source code to be exposed to multiple vendors.

This is one of the reasons vendors like to provide their own discovery and interoperability management systems. SDI, AES3 and MADI are all effectively transport layer distribution systems. Yes, we can connect an AES3 disk player to an AES3 sound console and we know it will work, but this is an extremely rigid way of working as the broadcaster is limited to a very small subset of audio standards. If we want to take full advantage of the flexibility and scalability IP offers then as users, we must make some compromises. For example, we should accept vendors will provide vendor specific solutions especially as we move into automated management of networks. In fact, if we want reliable systems, we should encourage this.  

Diagram 2 – sample SDP (Session Description Protocol) file showing the audio and IP parameters.

Diagram 2 – sample SDP (Session Description Protocol) file showing the audio and IP parameters.

With AES67 at the transport level and vendor specific management and control for discovery and security, we have the best of all worlds. It would certainly be possible for multiple vendors to get together to build an open discovery, control and security protocols, but it would take an age to deliver such complexity.

Furthermore, today’s agile development demands software versions are released quickly to deliver maximum functionality and features for the users, but this would be almost impossible for multiple vendors developing new management layers. It is possible, but very inefficient.

Delivering Security

Using vendor specific network management for audio over IP gives vendors much more freedom to deliver reliable and secure systems with a host of regular new feature sets. Software can be developed and tested within the confines of a well understood system, and bugs can be detected quickly and dealt with efficiently.

Also, standards tend to be designed by committees. Although these groups usually consist of dedicated and talented CTO’s, each will want to move the specification to their point of view. This is not to suggest any CTO is better than another, it’s just that they have different ways of thinking and approach solving problems from different frames of reference. Consequently, open standards, if left unchecked, have the potential to balloon and fill with inefficient compromise to keep everybody around the table happy.

Committees are, however, very good at forming new well-defined standards. A typical example of this is the ST2110 and AES67 suite of specifications. There is just enough specified to make them efficient, but not so much that they become overweight and cumbersome.

Best of All Worlds

Solutions providers are constantly developing and supporting the critical discovery, routing and security tools needed by the broadcast industry. While “raw” interoperability standards such as AES67 can be configured manually and used in less complicated systems, this is not the intended use case and is unlikely to be sustainable or scalable as discussed earlier. Fortunately, most third-party developers of integrated and complete AoIP systems incorporate support for these standards, freeing broadcasters to choose the components they prefer while maintaining complete transport flexibility.

This operational method has been well established by vendors working with audio over IP for over twenty years. They seem to have reached a compromise where they provide all the advantages of discovery, management, control, and security, while simultaneously delivering interoperability and connectivity to licensed third party devices. These partner-vendors have direct access to the R&D teams to help them provide fast and efficient solutions, as well as identifying any anomalies along the way. All to the benefit of the broadcaster.

In Part 3, we look at system management and network security. 

Part of a series supported by

You might also like...

IP Security For Broadcasters: Part 1 - Psychology Of Security

As engineers and technologists, it’s easy to become bogged down in the technical solutions that maintain high levels of computer security, but the first port of call in designing any secure system should be to consider the user and t…

Demands On Production With HDR & WCG

The adoption of HDR requires adjustments in workflow that place different requirements on both people and technology, especially when multiple formats are required simultaneously.

If It Ain’t Broke Still Fix It: Part 2 - Security

The old broadcasting adage: ‘if it ain’t broke don’t fix it’ is no longer relevant and potentially highly dangerous, especially when we consider the security implications of not updating software and operating systems.

Standards: Part 21 - The MPEG, AES & Other Containers

Here we discuss how raw essence data needs to be serialized so it can be stored in media container files. We also describe the various media container file formats and their evolution.

NDI For Broadcast: Part 3 – Bridging The Gap

This third and for now, final part of our mini-series exploring NDI and its place in broadcast infrastructure moves on to a trio of tools released with NDI 5.0 which are all aimed at facilitating remote and collaborative workflows; NDI Audio,…