Machine Learning (ML) For Broadcasters: Part 5 - Datasets And GPUs

In the final article in this series, we look at datasets, their importance and why GPUs are critical for machine learning.


Other articles in this series:


In the previous article in this series, we learned that forward propagation is the process of machine learning that facilitates prediction and classification. In mathematical and computational terms this is relatively straight forward process, albeit highly recursive and resource hungry. However, the learning process requires backwards propagation that uses complex mathematical functions to detect the global minima of the model (or function). It is this process that is computationally difficult and benefits greatly from GPU acceleration.

In machine learning, we do not use the GPU to render images, but instead use the hardware accelerated mathematical functions and high-speed memory within it to provide forward and backward propagation. Critically, these GPU processes rely on dividing an array of data into smaller sub arrays to match the GPUs memory map, and then provide processing threads for each sub array. In effect, one processing unit is associated with each sub array allowing thousands of computations to take place simultaneously.

In an image, an array of 1920 x 1080 may be split into 8 x 8 arrays to give 240 x 135 sub arrays. Each one of these would have a processing unit associated with it allowing 32,400 simultaneous parallel processes. If we substitute the pixels in an image for neurons in a neural network, then thousands of neurons can be processed in parallel with their associated data.

Figure 1 – GPUs are used to accelerate machine learning as the thousands of CPUs with associated memory allows for massive parallel processing.

Figure 1 – GPUs are used to accelerate machine learning as the thousands of CPUs with associated memory allows for massive parallel processing.

The GPU functionality is abstracted away from the hardware using libraries such as NVIDIA’s CUDA. NVIDIA provide both the hardware and software, so they are able to highly tune the two leading to massive parallel processing efficiency. The CUDA library is a generic solution that facilitates all kinds of parallel processing from high performance computing found in finance, to image processing found in medical and broadcast.

A further software abstraction takes place using machine learning libraries to provide the necessary models. Pytorch and Keras are two such libraries and deliver convenient interfaces to many of the models needed for machine learning.

A data scientist working to build machine learning solutions spends most of their time preparing their dataset to meet the needs of the Pytorch and Keras models. This allows the models such as LSTMs or CNNs to be standardized enabling the data scientist to configure the model rather than deal with designing it from the ground up. Furthermore, the libraries allow convenient methods of transferring and processing the data in the GPU.

As illuded to in previous articles, datasets are incredibly important, especially when they are labelled by humans as this presents another challenge, that is data bias. Humans making decisions in the present are really making decisions based on their previous experiences. This may sound controversial, but if we assume that we are a product of our experiences then this observation does make some sense. If two people witness an incident, then they usually recall it with slightly different detail.

Our brain is constantly being bombarded with millions of bits of information from our senses every minute of every day and it cannot hope to process it all simultaneously. Instead, we filter out much of the information and process only the data needed. And the information we filter out is based on our past experiences, which are different for everybody. Once again, we are running the risk of disappearing down a philosophical rabbit hole but just to reinforce this idea, watch the famous Simons and Chabris Selective Attention Tests on YouTube. You’ll understand my point when you’ve watched them.

Figure 2 – Fifteen samples of a dataset of TCP/IP flows, but could as easily be video or audio samples.

Figure 2 – Fifteen samples of a dataset of TCP/IP flows, but could as easily be video or audio samples.

Machine learning relies almost entirely on accurately labelled datasets, but if they are wrong, then the whole model is wrong, and we are presented with incorrect or even biased outcomes. In television, we have the opportunity to label many of the datasets by industry professionals. For example, somebody working in subjective QC will be able to label many hours of video as either pass or fail. But how do we know they were correct?

Key to overcoming data bias classification is to first of all be aware of the phenomenon. Any engineer or technologist learns early on in their career that they should question everything and validate their assumptions. The same is true in data classification. Furthermore, we can mitigate against bias by both increasing the size of our datasets and increasing the diversity of the number of humans that are classifying the data. The last thing we want is classified data to be classifying data as the bias amplifies and skews. Alas, there are numerous examples of this having already happened.

Another challenge we have is determining who owns the data. For example, facial recognition systems are well established, and a robotic camera connected to a suitable machine learning system could find specific people in a crowd and zoom in on them. One fantastic application of this is in sports where multiple robotic cameras could be used to frame shots of a specific player using facial recognition. But to do this the model would have to have been trained with thousands of instances of the images of the players in the respective league. The technology is well established to do this. However, who owns the image of the sports player? Is it the sports person? The photographer? The agency who employed them? Or even the governing sports league? It depends.

The point is that we cannot assume that we can use the dataset we have even if we want to. And this is another great challenge for broadcasters hoping to leverage machine learning. Not only do they need to be sure that the data does not suffer from bias, but they need to be sure the vendor has authorization to use the data. Anybody using a free social media service might want to read the very small print to see if they are transferring their image rights to the social media company.

Broadcast television has the opportunity to benefit greatly from machine learning and we are very much in the infancy of its development. But unlike broadcast technology of the past, we now must contend with the validity of datasets.

You might also like...

IP Security For Broadcasters: Part 1 - Psychology Of Security

As engineers and technologists, it’s easy to become bogged down in the technical solutions that maintain high levels of computer security, but the first port of call in designing any secure system should be to consider the user and t…

Demands On Production With HDR & WCG

The adoption of HDR requires adjustments in workflow that place different requirements on both people and technology, especially when multiple formats are required simultaneously.

If It Ain’t Broke Still Fix It: Part 2 - Security

The old broadcasting adage: ‘if it ain’t broke don’t fix it’ is no longer relevant and potentially highly dangerous, especially when we consider the security implications of not updating software and operating systems.

Standards: Part 21 - The MPEG, AES & Other Containers

Here we discuss how raw essence data needs to be serialized so it can be stored in media container files. We also describe the various media container file formats and their evolution.

NDI For Broadcast: Part 3 – Bridging The Gap

This third and for now, final part of our mini-series exploring NDI and its place in broadcast infrastructure moves on to a trio of tools released with NDI 5.0 which are all aimed at facilitating remote and collaborative workflows; NDI Audio,…