Machine Learning (ML) For Broadcasters: Part 3 - Neural Networks And The Human Brain

Machine learning is often compared to the human brain. But what do they really have in common?


Other articles in this series:


Some of the terminology we use supports the notion of ML being like the human brain. For example, the brain, at a very simple level, consists of billions of interconnected neurons. Also, humans learn a large part of their behavior through supervised learning or learning with some sort of feedback – parents call it discipline.

As an example, we do not teach a child how to cross every road in the world as this would simply be impossible. Instead, we provide them with a strategy to cross a generic road, and with the appropriate training and expansion in learning, this knowledge can be used to cross all roads. One of the reasons this strategy is successful is that roads all follow a similar pattern, that is, vehicles approach from the left and right, or we have specific pedestrian crossings. We don’t expect a bus to drop from the sky, or a helicopter to land on the freeway.

In other words, a human only needs to learn a subset of all possible outcomes to achieve a function as the natural world behaves in a predictable fashion. Well, at least usually. Admittedly, this is a very broad statement that has more holes in it than the proverbial sieve, especially if we take it to the extreme. However, we can say that as most people cross the road successfully, then the training we provide for children on how to cross the road is largely successful, but not 100% successful. And this is where some of the confusion of what ML and NNs can achieve, and what they cannot, lies.

Figure 1 - a simple single neuron in a neural network takes inputs (x1 … x3) and modifies the weights (w1 … w3), which perform a multiplication function, then adds the bias, which performs an addition or subtraction, which in turn provides an output. The sigmoid function applies a non-linearity to the neuron to encourage the output to tend to a larger or smaller value, this is similar to how a neuron in the human brain “fires” a signal.

Figure 1 - a simple single neuron in a neural network takes inputs (x1 … x3) and modifies the weights (w1 … w3), which perform a multiplication function, then adds the bias, which performs an addition or subtraction, which in turn provides an output. The sigmoid function applies a non-linearity to the neuron to encourage the output to tend to a larger or smaller value, this is similar to how a neuron in the human brain “fires” a signal.

Machine learning is analogous to human learning in that we don’t teach a machine how to deal with every possible data point that can be applied to it, but instead use a subset of data to teach it a generic understanding of the relationships between the input data and required output. This is possibly the most important aspect of machine learning, especially when applied to neural networks (NN).

An example of this is in object recognition. Convolution Neural Networks (CNNs) can be used to detect objects, such as a bus. When training the CNN model, we do not train it to detect every possible bus, with every possible color, at every possible aspect, in every possible lighting condition. Instead, we train the model on how to detect a generic set of busses as they share similar attributes, that is they are box shaped, big, have wheels, etc. A baby will not be able to detect a bus, but a child going to school for the first time will be (assuming the parent has provided the appropriate training).

It's important to note that the comparison of the human brain with ML NN is a tentative one and only really meant for illustrative purposes. Nobody (at least those in the know) would ever try and compare their NN model to a human brain, but there are some interesting similarities. Which isn’t surprising as the early ML NN pioneers are human.

NN models consist of weights, biases, and non-linear functions such as the sigmoid or tanh function. When we speak of training an NN, what we are really doing is changing the weights and biases of the interconnections within the model to detect patterns in the input data. When this has been sufficiently achieved, the model is said to be trained. Then we can use the trained model with input data it hasn’t seen before to provide an output. For example, if we train a CNN model to detect all buses using object recognition with a dataset of 100,000 images of different busses, then we should be able to apply any image data to the model and it will be able to detect all buses in the image.

This data-led learning is analogous to the workings of the human brain. In the same way a parent will teach a child through repeated learning, then the CNN model is being trained by providing continuous labeled data. We say to the CNN model “this is an image of a bus”, and after a period of repeated application of the tens of thousands of images showing buses, it will modify its weights and biases so that all buses will be able to be detected. Any parent who has sat down with a child and taught them how to recognize objects such as “this is an apple, and this is an orange” will understand the basis of ML NN learning. 

Figure 2 - when the single neurons from Figure 1 are combined, they form a complex network which is capable of learning patterns from a dataset. A typical ML neural network can consist of tens of thousands, and even millions, or neurons.

Figure 2 - when the single neurons from Figure 1 are combined, they form a complex network which is capable of learning patterns from a dataset. A typical ML neural network can consist of tens of thousands, and even millions, or neurons.

Strangely, the child having recognized the apple for the first time may not be able to detect it when it is turned upside down. In effect, they need more data to train their neurons, and this is exactly what happens in machine learning. We must constantly provide more training as it becomes available. Consequently, the learning of the model is never complete and can never achieve 100% accuracy, but there again, no system can. Gaussian distribution models demonstrate this.

This form of repetitive learning is intrinsic in everything we do. Parents, teachers, and influential people in our lives provide a method of feedback when we’re learning, which in effect updates our neurons. In a similar fashion, ML uses a method called backwards propagation to reduce a loss function. The role of the data scientist is to design a model that reduces the difference between the MLs prediction of the training dataset and the actual labelled data from the training dataset. The backwards propagation updates the models’ weights and biases to reduce the loss value, and hence, make the model more and more accurate.

The way we train and use ML certainly has some similarities to how the human brain works (at a very basic level), and how it trains. However, just like humans, the training process of the ML model is never complete as there is always something new to learn. Consequently, vendors will be updating their ML engines on a regular basis (or at least should be) as new training data is made available.

Nothing in life is certain, but many events are highly probable. It is these highly probable events that allow both humans and ML NNs to learn, and then form highly accurate predictions based on their prior learning experience. Here we run the risk of disappearing down a philosophical rabbit hole especially when we consider training bias, and just like humans, ML runs a great risk of running into training bias issues. But what happens with improbable events? These are called anomalies, and just like in life, they’re really difficult to deal with.

In the next article in this series, we will dig deep into the NN training process and understand exactly what is going on at an engineering level.

You might also like...

IP Security For Broadcasters: Part 1 - Psychology Of Security

As engineers and technologists, it’s easy to become bogged down in the technical solutions that maintain high levels of computer security, but the first port of call in designing any secure system should be to consider the user and t…

Demands On Production With HDR & WCG

The adoption of HDR requires adjustments in workflow that place different requirements on both people and technology, especially when multiple formats are required simultaneously.

If It Ain’t Broke Still Fix It: Part 2 - Security

The old broadcasting adage: ‘if it ain’t broke don’t fix it’ is no longer relevant and potentially highly dangerous, especially when we consider the security implications of not updating software and operating systems.

Standards: Part 21 - The MPEG, AES & Other Containers

Here we discuss how raw essence data needs to be serialized so it can be stored in media container files. We also describe the various media container file formats and their evolution.

NDI For Broadcast: Part 3 – Bridging The Gap

This third and for now, final part of our mini-series exploring NDI and its place in broadcast infrastructure moves on to a trio of tools released with NDI 5.0 which are all aimed at facilitating remote and collaborative workflows; NDI Audio,…