AI Assisted Talent Tracking Trims Robotic Shots Without An Operator

As artificial intelligence (AI) continues to invade the video production space, and more specifically the unmanned robotic camera studio, the addition of automated shot correction/facial tracking and recognition software technology is helping to computerize tasks previously performed by a human operator in the control room and making life easier for everyone involved.

In most cases it’s doing it more efficiently, freeing up that operator to focus on other parts of a production. And it eliminates the need for a camera operator to manually adjust for the position of the subject in the image.

Indeed, the technology is taking over a job previously done by a technical director or dedicated robotic camera operator and is now in use by thousands of broadcasters, corporate and house of worship clients’ productions worldwide.

Expanding The Boundaries Of Camera Robotics
AI-assisted technology has allowed broadcasters to expand the use of camera robotics in their studios without compromising on production quality by letting the computer track the talent to ensure that they are always in frame, no matter if they should lean out of the shot or move across the set in an unexpected way. Previous to this technology, talent moving out of frame and other on-air mistakes were a major problem for users of robotic camera control systems. Users are now calling it “one of the biggest developments in camera robotics,” which have been used in broadcast studios for over 40 years.

Today the technology can also handle dynamic tracking of multiple presenters speaking simultaneously and even moving around the same space. The technology is also assisting the operator by having the system automatically take and frame shots based on who is speaking.

Shotoku’s AutoFrame software technology combines AI-assisted tracking algorithms with subtle adjustments for response delays, accelerations and decelerations in motion while tracking.

Shotoku’s AutoFrame software technology combines AI-assisted tracking algorithms with subtle adjustments for response delays, accelerations and decelerations in motion while tracking.

“Face tracking is not new, we are all familiar with the ability of even the simplest of PTZ cameras to identify and track a face,” said James Eddershaw, Managing Director at Shotoku Broadcast Systems. “The challenge for live TV applications though, is to ensure the tracking is intelligent and produces a viewing experience virtually indistinguishable to that of a manual camera. To hold frame is easy, holding it smoothly and naturally as presenters and inexperienced guests move, is more difficult.”

Streamlined AutoFrame
UK-based Shotoku’s version of shot trimming technology, called AutoFrame, is available on its TR-XT control system. Eddershaw said the software technology combines AI-assisted tracking algorithms with subtle adjustments for response delays, accelerations and decelerations in motion while tracking.

AutoFrame requires only a minimum of selection and setup choices and then the rest is accomplished without any human intervention, recalling a shot and initiating tracking immediately on the on-screen talent. Tracking can be based on a single face, or two faces simultaneously, in order to maintain appropriate overall framing for live interviews. In addition, all tracking selections can be carried out by an external production automation system (like those from Grass ValleyRoss Video and Vizrt) so, where appropriate, productions can be run without any dedicated robotics operator at all, even on shows where on-air reframing may still be expected.

Manual intervention is always possible whenever the need arises—AutoFrame will immediately disengage if an operator takes over control of the camera or if tracking becomes impossible to maintain in an acceptable way to the viewer. Status information is always displayed showing how AutoFrame is operating at any time. And each AutoFrame system can simultaneously track on four live cameras and multiple systems can be added to increase that capacity at any time.

Vision[Ai]ry Facial Tracking
Earlier this year Ross Video, in Canada, launched its own version of talent tracking, called Ross Vision[Ai]ry Facial Tracking (Ft), which uses AI to detect, locate and track the position of faces within the video stream directly from the camera. It uses facial positions to drive the pan, tilt and zoom axes of the robotic camera system and maintain the desired framing of the face or faces in the image. This eliminates the need for a camera operator to manually adjust for the position of the subject in the image.

Ross’ Vision[Ai]ry Ft’s AI algorithms can recognize a diverse set of race, gender and age data, and can accurately identify and locate faces as long as at least 50 percent is visible in the image.

Ross’ Vision[Ai]ry Ft’s AI algorithms can recognize a diverse set of race, gender and age data, and can accurately identify and locate faces as long as at least 50 percent is visible in the image.

“Customers need to work more efficiently and rationalize their workflows,” said Karen Walker, Vice President of Camera Motion Systems at Ross. “Vision[Ai]ry Ft meets these challenges head on and holds true to our philosophy of helping drive high impact, high efficiency productions.”

Walker said Vision[Ai]ry Ft’s AI algorithms can recognize a diverse set of race, gender and age data, and can accurately identify and locate faces as long as at least 50 percent is visible in the image. The powerful user interface provides a live display of the video feed with detected faces and framing target clearly indicated, along with status info, tracking controls and framing template library. In addition, damping and deadband settings enable the user to tailor the system to the talent in order to maintain optimal framing and tracking while eliminating undesirable movement and overshoot.

The company said Vision[Ai]ry Ft is the first in a suite of products that will use video analytics to automate the functions of a camera operator.

ReFrame The Shot
Allendale, NJ-based robotic camera control supplier Telemetrics, Inc. offers a similar technology called reFrame Automatic Shot Framing and Tracking technology—as part of it sRCCP-2A camera control system for Studios and Legislative applications. It comes ready to integrate with compatible hardware out of the box. It was one of the first companies in the space to introduce automated talent tracking, back in 2017.

The reFrame tracking is powered by an AI layer that fuses facial recognition, object tracking and other data sources for highly accurate, smooth and reliable tracking, said Michael Cuomo, Vice President of Telemetrics. Because those multiple layers are combined via data fusion technology, the system can handle a wide variety of duties that production teams of all sizes will find useful.

Telemetrics’ reFrame handles adjusting camera framing when talent shifts or moves on camera or isn’t quite on their mark, but it can also take other objects such as desks, video walls or other scenic elements into consideration when framing the shot thanks to its intelligent object tracking. This also allows productions to track and account for non-human objects, such as scenic pieces, video screens and more.

Each camera can be set to track and frame different types of shots.

“We can trigger based off of any camera that goes into either preview or program,” said Cuomo. “We can trigger based off of whatever cameras the robotic move was last called on.”

Taking Control Without An Operator
These automated talent tracking systems can also help human operators through “camera assist” features that can fine-tune productions with more precise framing, tracking or camera moves. It also allows the operator to take over control of a camera—either completely or by any combination of pan, tilt or zoom. 

Telemetrics’ reFrame tracking is powered by an AI layer that fuses facial recognition, object tracking and other data sources for highly accurate, smooth and reliable tracking.

Telemetrics’ reFrame tracking is powered by an AI layer that fuses facial recognition, object tracking and other data sources for highly accurate, smooth and reliable tracking.

In an entirely or mostly robotic production, however, the ultimate goal is to give the controller or technical director a wide variety of practical, well framed shots of the action to pick from and then punch up.

Fixed format productions, such as news or panel shows, are ideally situated for this technology, but vendors are also invested heavily in stretching the technology to work across a wide variety of staging and environments. For example, the technology can be used for house of worship and auditorium applications that have multiple people moving around the same space who need to appear on camera.

Under a traditional, facial-recognition-only model, for example, a camera system could run afoul if the person talking turns away from the camera to point at a video screen or wall. Because some products also include object tracking of the entire body structure, with the ability to track presenters much more reliably without hindering their ability to act naturally and dynamically.

Data fusion capability also provides the ability to use a camera’s zoom functionality to adjust shots—such as if two people are on screen and they move farther apart or closer together. ReFrame, for one, will detect this and determine the optimal combination of pan, tilt and zoom to get the best view.

Tracking Objects As Well As People
Tracking isn’t limited to standing or sitting humans however, it can also track everything from racehorses to a player on a field to further enhance both venue video board productions and live or recorded broadcasting.

The best part is that this technology is not as expensive as you might think. Smaller to mid-size productions can take advantage of reFrame without requiring a separate server or hardware to power the tracking, including situations where multiple sources are being tracked at once.

“This isn’t just for high-end studios or venues. We have a lot of entry level users, in addition to local news stations and networks across the globe,” said Telemetrics’ Cuomo.

He added that Telemetrics is also examining even more advanced systems that bring another layer into the mix, such as “time of flight” technology which uses battery powered sensors that presenters carry on their person to help give reFrame even more data to accurately track subjects.

With huge possibilities on the horizon, this automated talent tracking technology can be used to save time and resources, while also allowing broadcasters to produce more innovative, creative and engaging content for their various platforms. And everything intended will always be in frame.

Broadcast Bridge Survey

You might also like...

HDR & WCG For Broadcast: Part 3 - Achieving Simultaneous HDR-SDR Workflows

Welcome to Part 3 of ‘HDR & WCG For Broadcast’ - a major 10 article exploration of the science and practical applications of all aspects of High Dynamic Range and Wide Color Gamut for broadcast production. Part 3 discusses the creative challenges of HDR…

The Resolution Revolution

We can now capture video in much higher resolutions than we can transmit, distribute and display. But should we?

Microphones: Part 3 - Human Auditory System

To get the best out of a microphone it is important to understand how it differs from the human ear.

HDR Picture Fundamentals: Camera Technology

Understanding the terminology and technical theory of camera sensors & lenses is a key element of specifying systems to meet the consumer desire for High Dynamic Range.

Demands On Production With HDR & WCG

The adoption of HDR requires adjustments in workflow that place different requirements on both people and technology, especially when multiple formats are required simultaneously.