How “Deep Learning” Technology is Revolutionizing Sports Production

Deep learning technology is more common than one might think. This technology is used to identify objects in images, texts or audio, achieving results that were not possible before. This article will examine how deep learning is revolutionizing sports production to enable low-cost, fully automated production for semi-professional and amateur sports broadcasts.

To understand how deep learning works, let's examine how our brains work. A human brain is made up of nerve cells, called "neurons," which are connected in adjacent layers to each other, forming an elaborate "neural network." In an artificial neural network, signals also travel between "neurons.” Instead of firing an electrical signal, a neural network assigns "weights" to various neurons.

Deep learning neural networks comprise as many as 150 connected layers. The more layers developed, the “deeper” the network. Deep learning models are trained by using large sets of labeled or annotated data. The neural network architectures learn features directly from the data, so you do not need to identify the features used to classify images. The relevant features are not pretrained either; they are learned while the network trains on a collection of images. This automated feature extraction makes deep learning models highly accurate for computer vision tasks such as object classification.

Although there is no need to manually extract each feature, there is a need to create a large enough training data set with annotations. So, for example, to identify a ball, you will need a data set of hundreds of thousands of unique images, which are annotated by humans and present the "ground truth" for the deep learning model. If you consider the fact that you would usually annotate other elements, such as players, this can add up to millions of annotations. The result is a "trained model" that can identify the objects it was trained on.

Deep Learning in Sports Production

Deep learning is used to generate fully automated sports production that looks very similar to professional sports broadcast, including camera zoom ins on the action, panning, etc. The basis for any decent-level automated sports production is the ability to at least identify the ball and the players. Identifying the ball is not an easy task, if you consider the fact that the ball can be on the ground and sometimes held by a goal keeper or a player (e.g. before kicking a foul).

Deep learning technologies enable software to identify all of the required elements of a sports broadcast to automate its live production.

Deep learning technologies enable software to identify all of the required elements of a sports broadcast to automate its live production.

If you think about it, in all these different situations the ball "looks" different, yet, we, as humans, have no problem identifying it as ball from a single frame. Identifying the players is not simple either, as the system will have to distinguish between "real" players and referees, bench players, etc.

Identifying the Field/Court

In sports production, one of the ways used to help identify the ball and the players is to define to the system the area that constitutes the field/court. This process -- "calibration" -- limits the scope of options for the DL algorithm by establishing within each frame which pixels are part of the field and court and which ones are not. It then translates these pixels to physical dimensions based on real-world coordinates.

By establishing the area of the field/court, it is possible to distinguish between players who are inside the field/court versus others outside of it, such as bench players, and between players on the field and spectators, who are outside the field.

Data Annotation for Sports

As mentioned above, as part of the deep learning model training is a need for a large data set to establish the "ground truth" for the deep learning algorithm. This is a major undertaking that should be done on an ongoing basis as more data is gathered and the algorithm evolves.

There are several options to achieve this. A minimal number of frames must be annotated by humans. In addition, several methods that require less effort, including:

  • Google/YouTube images - it is possible to augment the data set by searching "soccer players" on Google or YouTube. This will yield frames or images that include soccer players, or, in other words, have been "pre-annotated" as soccer players.
  • Unsupervised learning – this technique uses un-labeled data by applying an additional non-deep-learning algorithm to first segment the area of the potential players. For example, we can use known background subtractors such as MOG to roughly identify players.
  • Augmentations – another commonly used technique is to modify or augment the images, for example to stretch them, modify angles, etc. These augmentations produce an additional data set that has been already labeled. 
One key to proper camera tracking is for the system to recognize the area of the field or court.  The software must distinguish between players who are inside the field/court versus others outside of it.

One key to proper camera tracking is for the system to recognize the area of the field or court. The software must distinguish between players who are inside the field/court versus others outside of it.

As we've seen with deep learning technologies, computers can understand the sports action, opening new opportunities in sports production that were never possible before. In its highest level, this technology can mimic the decision-making process of a human camera operator and video editor, providing almost the same experience of a professional live sports broadcast, at a fraction of the cost. This technological revolution will allow semi-professional and amateur sport clubs to broadcast the games to their fans and even monetize their content.

Yoav Liberman is Director of Computer Vision & Deep Learning Algorithms at Pixellot.

Yoav Liberman is Director of Computer Vision & Deep Learning Algorithms at Pixellot.

You might also like...

HDR & WCG For Broadcast: Part 3 - Achieving Simultaneous HDR-SDR Workflows

Welcome to Part 3 of ‘HDR & WCG For Broadcast’ - a major 10 article exploration of the science and practical applications of all aspects of High Dynamic Range and Wide Color Gamut for broadcast production. Part 3 discusses the creative challenges of HDR…

IP Security For Broadcasters: Part 4 - MACsec Explained

IPsec and VPN provide much improved security over untrusted networks such as the internet. However, security may need to improve within a local area network, and to achieve this we have MACsec in our arsenal of security solutions.

IP Security For Broadcasters: Part 3 - IPsec Explained

One of the great advantages of the internet is that it relies on open standards that promote routing of IP packets between multiple networks. But this provides many challenges when considering security. The good news is that we have solutions…

The Resolution Revolution

We can now capture video in much higher resolutions than we can transmit, distribute and display. But should we?

Microphones: Part 3 - Human Auditory System

To get the best out of a microphone it is important to understand how it differs from the human ear.