Scalable Dynamic Software For Broadcasters: Part 10 - Monitoring Usage And Improving Efficiency

Operating a broadcast facility using microservices and containers may well deliver exceptional flexibility, scalability, and resilience. However, the hardware the microservices architecture it is running on will always have its limits, whether in terms of physical resource or cost. Monitoring not only improves our understanding of the limits but helps us build more efficient infrastructures to make the best of the available resource and budgets.

On-prem datacenters are much more flexible than the traditional broadcast workflows but are still limited by the amount of physical hardware available. The flexibility we speak of is brought about by the assumption that the datacenter can be built to accommodate the average workflows and then scale to public cloud providers to meet peak demand, or dynamically repurpose existing resource in their datacenter.

Therefore, there are two challenges system administrators must address when scaling workflows: when to scale, and by how much? Furthermore, we also need to know when a system is misbehaving or a fault maybe developing. And both of these can be solved using intelligent monitoring.

Monitoring makes order out of apparent chaos. Whether measuring the voltage of a camera sensor or the loudness of an audio feed, the function of monitoring allows us to take a deeper look into the system to make sense of how it is operating. And this is particularly important for dynamic and highly scalable systems.

DevOps describes both a system of working and the people who carry out the functions. It’s like a bridge that joins the technology, architecture, and business operations all under one umbrella. DevOps encourages personal responsibility so that individuals can react quickly using agile methodologies while at the same time encouraging team collaboration to build and manage dynamic systems, especially for cloud, virtualized, and microservice architectures.

Deep monitoring helps DevOps understand how a system is performing so that they can both maintain reliability and understand which areas can be automated. Allowing virtualized, cloud, and microservice architectures to automatically scale up and down is key to building infrastructures that increase and decrease to meet the needs of the business. Consequently, monitoring must be built into the infrastructure from the ground up and not as an afterthought when the particular function has been designed and implemented.

Datacenter system administrators are used to monitoring metrics such as server uptime and storage capacity. A whole host of opensource tools such as Prometheus and Nagios provide a good insight to show how systems are performing in terms of CPU allocation, memory usage and available storage. But to allow microservice architectures to make much more efficient use of the underlying hardware and available budget, we must go several levels deeper in terms of monitoring.

Fig 1 - Monitoring software can be run as a pod on a node within the microservice architecture. A large broadcast infrastructure may contain many monitoring agents distributed all over the world.

Fig 1 - Monitoring software can be run as a pod on a node within the microservice architecture. A large broadcast infrastructure may contain many monitoring agents distributed all over the world.

Maintaining Resource

One of the advantages of microservices is that we can enable the function on a need-to-use basis. Although this is achievable with virtualized servers, where virtual machine instances are spun up as required, the spin-up latency can run into several minutes, whereas the spin-up latency of the microservice is often less than a few seconds. Therefore, virtualized servers are often left running in the background unused, which in a highly optimized system is wasteful of resource.

Physical and virtualized servers form the nodes of the microservice architecture where the high-level monitoring takes place. As the node can run on a virtualized server, the configuration of the virtualization may form another level of resilience. For example, if the physical servers were clustered then several machines could be clustered and each of the clusters could be allocated to a node. This level of abstraction provides much greater resilience, however, the virtualization will need to be monitored to confirm the nodes are working within the limits of the available resource and that all servers are functioning correctly.

Although the node isn’t exactly a server, the mapping of a server to a node helps understand the resource allocation. An IP port, server memory, CPUs and even GPUs are assigned to the node which in itself must be monitored. The orchestration system will provide this but at this level the DevOps may start building their own monitoring systems to check the allocation. If a node starts running short of resource, such as memory, the DevOps team will need to be alerted so they can take the necessary action. The preference is for the orchestration and management software to perform this automatically and then inform the DevOps team more nodes have been allocated.

Containers are subcomponents of the nodes and have resource allocated to them from the node. Multiple containers will see the node resource shared between them, and this allocation will also need to be monitored. If a proc-amp microservice and container is running on the same node as a color corrector container and microservice, the resource allocation doesn’t necessarily need to be divided equally. It might be that the proc-amp needs more memory than the color corrector.

The monitoring needed at this level will certainly require DevOps input and when the microservice was designed, the API would have made a significant amount of monitoring data available.

Monitoring the whole resource allocation from the servers to the nodes and then down into the containers and microservices will provide the necessary insight to not only maintain reliability, but also scale to meet the microservices resource needs.

Message and Queue Monitoring

Messages and queues not only provide valuable data exchange between microservices and the orchestration and management systems, but also act as an indicator to assist scaling.

The number of jobs users create is often proportional to the amount of resource a microservice system will need. And this demand doesn’t necessarily follow a fixed and deterministic pattern resulting in timed scheduling of resource becoming difficult. Measuring the number of jobs in a queue will provide the first indication of the total resource allocation.

Process commands within the microservice architecture are queued in buffers. This stops any important commands being lost if network congestion occurs or a server develops a fault. There may be tens, and potentially hundreds of message queues around the microservice architecture, all providing insight into how the system is performing.

One of the challenges DevOps teams have with monitoring is not only knowing what to monitor, but also knowing what not to monitor. Retrieving potentially millions of metrics from datacenters all over the world, and saving them to logs and traces, is a whole job in itself. The microservices themselves may well have logs and these need to be stored. But a broadcaster cannot store every metric within the monitoring system as they run the risk of impacting the way the system is operating due to the excessive workload on the servers and traffic on the networks.

Metrics, logs, and traces need to be stored so that the broadcaster can conduct some form of forensic analysis. This is not only to find out what went wrong should a failure arise, but also to provide evidence should there be any litigation from third party media owners, especially when considering security and cyber theft.

It’s worth remembering that monitoring isn’t new to broadcasting as waveform monitors, vectorscopes, and audio meters are the bread and butter of every broadcast infrastructure. However, what is new is the need to provide higher levels of resilience and scalability within the microservice architecture, especially when reliability and forensic audits are considered. And this is built into the microservice apps and management systems from day one.

Every broadcast facility has its limits, even when hybrid on-prem and public cloud infrastructures are adopted. In such a case the limits might not be physical infrastructure as the cloud will meet the needs of any peak demand, but there will be limits in terms of budgets and how much can be spent on scaling. As well as keeping systems reliable, monitoring also helps broadcasters maintain the efficiency of their microservice infrastructures. 

Part of a series supported by

You might also like...

Designing IP Broadcast Systems - The Book

Designing IP Broadcast Systems is another massive body of research driven work - with over 27,000 words in 18 articles, in a free 84 page eBook. It provides extensive insight into the technology and engineering methodology required to create practical IP based broadcast…

Demands On Production With HDR & WCG

The adoption of HDR requires adjustments in workflow that place different requirements on both people and technology, especially when multiple formats are required simultaneously.

If It Ain’t Broke Still Fix It: Part 2 - Security

The old broadcasting adage: ‘if it ain’t broke don’t fix it’ is no longer relevant and potentially highly dangerous, especially when we consider the security implications of not updating software and operating systems.

Standards: Part 21 - The MPEG, AES & Other Containers

Here we discuss how raw essence data needs to be serialized so it can be stored in media container files. We also describe the various media container file formats and their evolution.

NDI For Broadcast: Part 3 – Bridging The Gap

This third and for now, final part of our mini-series exploring NDI and its place in broadcast infrastructure moves on to a trio of tools released with NDI 5.0 which are all aimed at facilitating remote and collaborative workflows; NDI Audio,…