Monitoring & Compliance In Broadcast: Monitoring Cloud Infrastructure

If we take cloud infrastructures to their extreme, that is, their physical locality is unknown to us, then monitoring them becomes a whole new ball game, especially as dispersed teams use them for production.
The definition of cloud computing is a little fluid and often open to interpretation. For the purposes of broadcasting, cloud infrastructures can exist within the broadcaster’s premises or in a remote location, whose physical location may or may not be known. Even if a cloud infrastructure is on-prem, when we speak of cloud computing we’re talking about a resource that cannot be touched or physically accessed, consequently, we need new methods of monitoring these systems.
On- & Off-Prem
Some might argue that an on-prem cloud infrastructure can be touched and prodded, which is essentially true, however, this mode of thinking doesn’t take into consideration the spirit of cloud computing. If the infrastructure has been adequately designed, then the available resource can be scaled to off-prem systems thus embracing the full potential of cloud infrastructures. If a broadcaster is never going to want to scale, then they must have a static infrastructure, which negates the need for a dynamic system so they’re probably best staying away from cloud systems completely.
In traditional SDI/AES type broadcast infrastructures we have focused on monitoring video and audio signal parameters such as video and audio levels. The underlying transport stream wasn’t really a concern as it generally consisted of wire and distribution amplifiers. If you were unlucky then a cable or DA could go intermittent, but with a bit of logical thinking and well-established working practices then these issues could generally be fixed relatively quickly. The same cannot be said of cloud infrastructures.
As broadcasters have adopted IP more and more, it has become apparent that we can no longer take the transport stream for granted. Although IP provides incredible flexibility, it does so at the expense of complexity. A network, in computing terms can be thought of as the transport stream, but as broadcasters core ambition is to generate continuous and bandwidth thirsty data streams, then we suddenly find ourselves looking slightly higher up the resource hierarchy to take into consideration servers and storage.
Media Is Synchronous
Media signals are synchronous by nature, hence the reason we have standards such as SDI and AES as they contain embedded clocks that align the sending and receiving clocks. Not only does this significantly reduce latency but it also guarantees bandwidth and data throughput. Deep inside standards converters and other video processing equipment it is quite likely that the video data is processed asynchronously, especially if memory SIMs such as those found in PCs are employed as they burst data to store and retrieve it from the memory chips.
Compute resource within servers certainly does not respond linearly to demands for processing. Its response times are almost random in nature and systems that stream continuous data employ buffers to impose queuing methods that can hold back data until the processor becomes available. Optimizing these buffers becomes as much of an art as a science, if the buffer is too long then unacceptable latencies can occur, and if they’re too short then video and audio packets will be lost, resulting in unacceptable picture and sound disturbance.
As we climb the transport stream hierarchy, it becomes apparent that the servers must be included in our reliability measurements if we are to be certain that the video and audio streams have an error free path. This is especially true when we don’t have any control over the physical hardware, as is found with cloud infrastructures.
Building Software Toolkits
There are tools within the opensource community that go some way to helping with this, especially with Linux type systems. Htop is an opensource application that should be at the forefront of every engineer’s software solution toolkit. This is a very simple and reliable application that provides a view of how the resource is being used within a server. It lists the number of processors, amount of memory available, and which applications are using the processors and memory. It gives us a helicopter view of how the server is coping with the users demands, whether the server is physical hardware or virtualized in a cloud infrastructure.
If htop is showing that all its thirty-two processors are maxed out, then the server is clearly under incredibly high load, but this doesn’t necessarily mean there is an issue. Whether there is a problem or not depends on the application and context of how the server is being used. If it’s a file-based application, such as transcoding a list of VOD files then the high usage will just result in delayed processing. This may be important if delivery times are part of a broadcasters SLA, but it’s unlikely there will be any data loss, even under heavy usage. The server will just slow down when reading from and writing to the file storage.
The same cannot be said of real-time video and audio processing found within studios or playout systems. With such applications, the video and audio streams will be relentlessly streamed to the server regardless of whether it is ready or not. So, when we look at the results of htop then knowing whether the server is processing file-based or live media is very important.
Monitoring Influencing Results
Another indicator of processing reliability is packet loss. If a media stream is losing packets inside the server, then there is something wrong. However, when monitoring servers, especially those found in off-prem cloud infrastructures, then we must rely on the host processor providing the monitoring equipment. If a server is suspected of losing media packets, then the go-to solution is to use the opensource application Wireshark, or its smaller command line version tshark. These run on the server and record packets to pcap files for later analysis. It’s possible to view the incoming streams in near-real-time on the server but doing so highlights another problem. The monitoring software may well interfere and influence the ability of the server to stream the media.
If a network feeding a server is heavily utilized then this will potentially put a high demand on the server, adding a monitoring application such as Wireshark with its GUI will put further demand on the server potentially resulting in the server overloading and dropping packets. The very application that we’re using to provide a monitoring solution could heavily impact the reliability of the system.
Life becomes even more interesting when we think how we’re going to access the cloud servers in the first place. SSH’ing using command line access is relatively trivial in terms of compute demand. But if we start RDP’ing (Remote Desktop Protocol) into the servers to get the full GUI experience, as would be needed with near-real-time monitoring and analysis with Wireshark, then this could in itself have a serious impact on the processing ability of the server, potentially stopping it from processing streamed video and audio media.
Question The Results
This is especially true within cloud infrastructures as we have no other way of finding out what is going on inside the box, even if server boxes exist in the cloud – they might not, cloud servers could be a massive array of motherboards in hermitically sealed rooms. Who knows? The point is, we shouldn’t care. At least, this is what the book tells us, that is, cloud infrastructures, whether on- or off-prem should be treated as a massive array of computer resource that can be dynamically switched on and off to meet the demands of the broadcaster. The reality is quite different. We need to have a deep understanding of how the cloud infrastructure has been configured and how it is being operated. Is it just a VM farm? Are there microservices involved? Has the vendor done a proper job of virtualization, or have they just resorted to lift-and-shift?
Monitoring any remote system is difficult, but the challenges escalate when the very monitoring applications that we are relying on have the potential to not only influence the results but even worse, damage the video and audio streams being monitored. The good news is that any broadcast engineer learns very early on in their career that they should always question their monitoring equipment, the same is true of cloud infrastructures, whether on- or off-prem.
Part of a series supported by
You might also like...
Phil Rhodes Image Capture NAB 2025 Show Floor Report
Our resident image capture expert Phil Rhodes offers up his own personal impressions of the technology he encountered walking the halls at the 2025 NAB Show.
Microphones: Part 9 - The Science Of Stereo Capture & Reproduction
Here we look at the science of using a matched pair of microphones positioned as a coincident pair to capture stereo sound images.
Monitoring & Compliance In Broadcast: Monitoring Cloud Networks
Networks, by their very definition are dispersed. But some are more dispersed than others, especially when we look at the challenges multi-site and remote teams face.
Audio At NAB 2025
Key audio themes at NAB 2025 remain persistently familiar – remote workflows, distributed teams, and ultra-efficiency… and of course AI. These themes have been around for a long time now but the audio community always seems to find very new ways of del…
Production Control Room Tools At NAB 2025
We return to the high pressure world of the Production Control Room where Switchers, Replay and Graphics are always at the heart of the action. The 2025 NAB Show will bring a myriad of new feature releases and opportunities for hands-on…