An Introduction To Network Observability

The more complex and intricate IP networks and cloud infrastructures become, the greater the potential for unwelcome dynamics in the system, and the greater the need for rich, reliable, real-time data about performance and error rates.

This article was first published as part of Essential Guide: Network Observability - download the complete Essential Guide HERE.

The idea of network observability is deceptively simple: having detailed insight into how data is flowing across an entire IP network and cloud infrastructure significantly increases the chances of understanding the general system health of the network. It improves the chances of becoming aware of impending system disruption or failures as they happen, and identifying the root causes of issues and resolving them quickly. It also has the potential to enable users to analyze predicted future network performance so that they can design and deploy systems and projects much more effectively.

In practicality, network observability is a philosophical approach to a suite of software tools that monitor various aspects of a network for performance and errors, and provide dashboards and alerts.

Network observability sits alongside existing content monitoring systems. The idea of observability is to implement tools which focus on the networks which carry video, audio and data flows. A healthy network is far more likely to deliver healthy video, audio and data, but engineers still need to be able to examine actual content to verify quality and compliance throughout their infrastructure, using traditional monitoring tools.

Network observability should also not be confused with network orchestration. Network orchestration tools are there to help users design, plan, spin up, manage in real time, and spin down their network. Most orchestration systems will also indicate whether the various elements of the network (switches, routers etc) are up and running when in use. The idea of network observability is to look at the health of the data as it is traversing the network.

The Key Challenge

Within the confines of a facility production network, broadcasters are in control at all times. They specify, deploy and configure their network to fit specific requirements. The network configuration is relatively fixed and their streams are the only traffic competing for bandwidth.

Once broadcasters step outside of their comfort zone to using live streams traversing the public internet, telcos, CDN’s and cloud services they are instantly not in control. They cross over into sharing network infrastructure which is controlled by telcos, CDN’s and cloud service providers.

These systems were primarily conceived for industrial IT services and managed by IT teams whose SLA’s are very different from what is required for video. In IT, latency and jitter are not key factors, and if packets go missing, they can simply be sent again. With a video stream missed packets mean black frames, and latency and jitter are toxic. The challenge is incredibly simple; 5% packet retransmission in IT is acceptable, with video it is fatal.

Shared Systems Are Dynamic

In simplified terms, with live broadcast production that leverages remote contribution, broadcasters have ingress points in the form of sports stadia, live events or remotely located studios. They have egress points in the form of Network Operation Centres or teams operating across multiple facilities. Some form of transport (MPLS, the internet etc) moves streams between these data access points. There is a two-way data path between these points. Increasingly, cloud based services are also part of broadcast production systems and contribution infrastructure, with streams being passed to and from the cloud and being passed between cloud based data centers.

The various elements of this network infrastructure are sourced from different providers. This network infrastructure is often carrying other data streams which are also potentially variable in nature. Within shared network infrastructure routes taken and therefore the number of hops is dynamically managed. All of which means broadcasters live with a managed risk of packet loss, data flows can become bursty, and latency and jitter are potentially variable. When problems arise in such dynamic infrastructure, that consists of a chain of service providers, it can be challenging and time consuming to identify the cause.

The traditional approach to keeping control of backhaul networks is to use MPLS lines but this can be costly, hence the increasing use of managed service providers seeking to provide lower cost controlled paths across the internet. Often the outcome is a complex combination of services to achieve primary and backup systems which can be difficult to analyze and configure for best performance. ARQ based Protocols like RIST and SRT certainly help bring more predictability and control to this fundamentally dynamic network environment but they need to be configured and optimized for the prevailing network infrastructure. When it comes to cloud services, part of the attraction is their scalability and flexibility, but with that comes potential for dynamic changes in the host environment.

So how do broadcasters combat so many variables to create network infrastructure that meets the performance requirements for live video, audio and data transport across countries and continents?

Plan, Test, Tweak & Fingers Crossed

The most obvious answer to this challenge is careful planning and preparatory testing. Broadcast engineering teams spend months planning and preparing for major events. Rigorously defining production requirements and the network infrastructure and bandwidth that will be required to meet them. Equipment and people are sent on site in advance to install systems and test the capacity, performance and resilience of the network infrastructure. This is essentially an offline philosophy. Engineers plan, test and tweak to get their network performing as they want it. They do their best to predict and avoid any dynamic influences. They build in redundancy to combat the unexpected… and then on the day, hope it all performs as it did in rehearsal. The managed risk of going live on game day has always been part of life in broadcast but can be very stressful. It seems obvious therefore that the more information engineers have about their network, the better their chances of maintaining QoE objectives, and the less stressful their lives become.

What Is Being Observed?

The simplest network observation tool there is, the ubiquitous Speedtest measures the network bandwidth at a specific location, on a specific network connection at a given time. Speedtest is TCP based, and measures an average time taken to deliver a given amount of data to a public test server. Run it repeatedly and the result will fluctuate. It doesn’t give information about the type of data, the route it took or what happened to it along the way. It’s useful consumer level information.

Stepping up one level is the trusty open source IPERF network performance testing tool that has to be run from the network command line. It is UDP/RTP based. It sets up a client and server at either end of a network connection, and introduces the technique of establishing a data stream to test throughput, and delivers a report. For those less confident with the command line, JPERF provides a graphical front end for IPERF. Again, this doesn’t tell you much about the route taken or how the network performed along the way.

Of course, there are now a great many commercially available software tools and services which build upon this conceptual foundation of observing the performance of a network. As they grow in sophistication and scale such tools introduce much more powerful feature sets. It isn’t difficult to find tools which will allow engineers to specify different routes, to measure comparative performance of different network configurations, and to compare services. Much as one might expect from traditional broadcast monitoring tools, most of these IT network observability tools include the capacity to configure alarms that notify administrators of problems. With hundreds and often thousands of streams to be observed and managed these systems also bring streamlined dashboards, reports and are frequently delivered as a SAAS model.

Most of these network observability tools however come from the world of IT, where as we have said, the focus is very different. IT tools tend to be more focused on the hardware of the network, the switches & routers, and IT is more focused on overall point to point performance. As long as the data as a whole gets there within a given SLA window all is well.

The focus in broadcast is the integrity of live video, audio and data – but this is intrinsically linked to the network infrastructure it is moving across. What broadcasters really need to be able to observe for live video, is error rates as content passes through the network. Is any packet loss occurring? Is latency fluctuating? Is jitter occurring as the timing of packets fluctuates? All of these things can degrade broadcast video, audio and data streams and diminish QoE. If engineers are evaluating different network configurations, different buffer settings, different routes, alternative service providers or combination of these things - or which datacenters to locate cloud services in – what is needed is to be able to quickly evaluate where in the network any disruptions are occurring, and what kind of disruption it is.

Being able to measure data throughput and error rates across each and every hop of the network, throughout the chain of service providers, gives an opportunity to identify precisely where in the network data errors are occurring, helping to troubleshoot the root cause much more efficiently, and hopefully optimize network performance. It speeds up network planning & testing significantly and provides insight into the best system configuration.

For the transport of live video, broadcasters need to observe the throughput and error rates across their entire network in real-time during show time. If the route taken by a stream changes, or services are switched from one data center to another, users need to be able to monitor the new configuration. This is where the concept of real-time broadcast network observability becomes incredibly powerful and why it seems inevitable that it will become a key tool for the industry. Unexpected peaks in network traffic caused by an increase in retransmission requests can trigger an alarm to indicate an impending service disruption before the disruption becomes critical – knowing precisely where this is happening across an ecosystem of providers and services could allow engineers to take evasive action and maintain quality of service.

Knowledge is power. Network observability should give broadcasters a suite of tools designed to provide the real-time information needed to troubleshoot extremely quickly within a multi-vendor ecosystem, and to evaluate the performance and error rates of an entire network, so that they can make highly informed decisions, and to get the best performance and value from their network infrastructure.

Supported by

You might also like...

Designing IP Broadcast Systems - The Book

Designing IP Broadcast Systems is another massive body of research driven work - with over 27,000 words in 18 articles, in a free 84 page eBook. It provides extensive insight into the technology and engineering methodology required to create practical IP based broadcast…

Microphones: Part 2 - Design Principles

Successful microphones have been built working on a number of different principles. Those ideas will be looked at here.

Microphones: Part 1 - Basic Principles

This 11 part series by John Watkinson looks at the scientific theory of microphone design and use, to create a technical reference resource for professional broadcast audio engineers. It begins with the basic principles of what a microphone is and does.

Audio For Broadcast: Cloud Based Audio

With several industry leading audio vendors demonstrating milestone product releases based on new technology at the 2024 NAB Show, the evolution of cloud-based audio took a significant step forward. In light of these developments the article below replaces previously published content…

Next-Gen 5G Contribution: Part 2 - MEC & The Disruptive Potential Of 5G

The migration of the core network functionality of 5G to virtualized or cloud-native infrastructure opens up new capabilities like MEC which have the potential to disrupt current approaches to remote production contribution networks.