AI In The Content Lifecycle: Part 2 - Gen AI In TV Production

This series takes a practical, no-hype look at the role of Generative AI technology in the various steps in the broadcast content lifecycle – from pre-production through production, post, delivery and in regulation/ethics. Here we examine how Generative AI is bringing more scope for automation of tasks such as editing and clip creation during production, while also extending into creative domains and disrupting traditional workflow methods.

Well established machine learning methods have already had a major impact on broadcast production by automating more routine tasks, but now Generative AI is extending the scope in various directions with prospect of continuing disruption and upheaval in the foreseeable future. Since settlement of the Hollywood writers dispute late in 2023, attention has been refocused on positive aspects of this disruption with the further elimination of routine repetitive tasks and acceleration of workflow so that human producers can pursue creative paths that were previously impossible, either because of time and costs, or even because the conception itself was physically unrealizable. Indeed, there is recognition that Gen AI itself, operating through various forms of Extended Reality, can help realize previously unfulfilled creative dreams.

The negative side is that this intrusion by Gen AI is disruptive, threatens to make some people on the creative side redundant, and looks like continuing to evolve quite rapidly over the next few years. The latter implies that broadcasters and content producers will have to go on recalibrating their sights for some time when it comes to production workflow.

It is worth considering how Gen AI differs from earlier neural network-based machine leaning and what impact that has on production workflow. To some extent Gen AI represents a catching up with the vast computational resources now available, in processing power, graphics capability and working memory. This brings much greater predictive power based on assimilation of knowledge across domains based on ability to analyze broader tasks with links between different fundamental information types, primarily text, graphic, image, and video.

This is achieved through foundation models, which are often defined as large machine learning based models trained on diverse data sets such that they can be applied to a broad range of problems. In broadcasting, however, and probably also other domains, foundational models are themselves evolving into distinctive families with different variants targeted at subcategories with increasing granularity. Broadcasters therefore might develop a foundational model for sports generally, and then break those down into separate models for each individual sport.

So far the latter is only possible for larger broadcasters and even they are at an early stage of development. Fox Sports in California is one that can now generate sports highlights automatically by applying foundational models to massive data sets it has been collecting.

Foundational models can also be developed for news, opinion and documentary, in all of which Gen AI is featuring increasingly. They overlap naturally since sports highlights may be incorporated in news programs, but in a more abridged form perhaps than within a dedicated sports program. Similarly, snapshots of topical documentaries being shown shortly may also make the news.

There is also interest in using Gen AI to illustrate or enliven items such as face to face interviews, recommending or bringing in auxiliary video content which could either be created from scratch, or more likely at this stage selected from existing media asset systems. This process can be laborious manually and not always practical in the case of live interviews.

Broadcasters are mostly experimenting with semi automation of these processes, rather than deploying them live at this stage. But already Gen AI is automating some quality functions within content production, examples including branding insertion, color correction, and creation of multiple versions. Automatic transcription has been possible for some time, but Gen AI is increasing the scope to include full translation of dialogues, subtitling and other aspects of localization including graphics and captions.

This reduces manual effort with scope for adding additional value for local audiences. Some of these processes are being incorporated not just in fixed broadcast facilities but also OB vehicles for both news and sports.

Automated camera control has featured increasingly in OB, especially at sports events, allowing inclusion of multiple angles and viewing positions without requiring extra personnel. Earlier systems were rather unreliable and susceptible to error in crowded locations such as stadia when trying to track a given individual, or even follow an event.

Even before Gen AI automatic cameras could cope with temporary occlusion of the object or event they were tracking. With Gen AI there is further scope for development of robotic cameras that can roam around a location, in addition to tilting, rotating and panning. On this front broadcasting will benefit from developments in general AI-driven robotics, although with the caveat that the road from initial development and capability to full deployment can be long, as we are seeing with autonomous driving.

One aspect of automation and the role of AI is increasing connection between the various stages of the content life cycle, as they become more interdependent. At the production stage AI has come into play in overall Media Asset Management (MAM) in preparation for subsequent archiving for various tasks that were previously manual. These include captioning, content editing, and creation of suitable metadata to facilitate search and navigation in future when assets are accessed.

Only a few major broadcasters have the resources to do all this themselves and so inevitably the major hyperscalers and cloud service providers have come into the equation, peddling their own Gen AI wares. This is leading towards cloud-based archiving, achieving scale economies for storage, while also benefiting from higher levels of data availability than any internal data center could provide affordably.

Products such as Amazon’s Media2Cloud incorporate Gen AI to enhance metadata creation during archiving, as a customer’s assets are transferred to the Amazon S3 Glacier repository. Microsoft, Google and few others provide similar services, which facilitate metadata extraction and creation with audio to text transcription, for example.

Naturally, broadcasters want to archive content at optimum quality for a given level of compression. It is true that AI methods are now helping restore quality of archive footage and upscaling to higher resolution, but now in the digital age even better results will be obtained by storing at best quality possible in the first place.

On this front an offshoot of Gen AI, Generative Adversarial Networks (GAN), are being employed, focused particularly on optimizing quality in the compressed domain. Given that lossless compression is practically impossible for stored archives, the aim is to minimize the losses, in part by concentrating on critical objects rather than spaces between them.

GANs were inspired by natural evolution, specifically in an arms race where two organisms, such as a predator and its prey, or pathogen and its host, continually adapt in the hope of gaining some advantage over the other. However this tends to stop, or slow down greatly, once a point is reached at which further adaptations prove more costly in other ways than any additional small gains over the direct adversary.

Similarly under GAN, two Gen AI models are pitted against each other, one being called the generator and the other the discriminator. The generative network creates data of interest within a larger space which the discriminator must then recognize. The generator keeps tweaking the process in an attempt to disguise that data, while the discriminator in turn adapts in order to keep tuned so that it continues to identify correctly. Convergence occurs when the changes dwindle or become confined within an agreed margin.

This process can be applied to video compression by comparing content after it has been through a codec and emerged in the decompressed state with the original. The objective is to minimize relevant differences for a given level of compression.

There has been a spate of research papers published on this application of GANs, with particular focus on the H.265 /HEVC codec, since that is widely employed by broadcasters. HEVC is a reasonably efficient codec, but inevitably video information is lost during compression and transmission, detracting from both objective and subjective quality.

To date techniques such as PSNR (Peak Signal to Noise Ratio) and SSIM (Structural Similarity Index Measure) have been employed for video quality evaluation and restoration, but they are relatively crude in the sense they do not discriminate within frames or sequences to home in objects or areas moist critical for the viewer’s quality of experience (QoE) . The Perceptive Index (PI) has emerged as a technique for assessing the QoE of video in the absence of a reference, and this can be applied with GANs.

Results so far have been promising, with GANs reducing visible artifacts and elevating perceived video quality more than prior methods. GANs achieve this by effectively acquiring knowledge of how images are distributed, and so minimizing the disparity image distributions in the training set and generated set, through application of this iterative adversarial method.

Gen AI then is finding application across the video workflow, with many developments in the R&D pipeline if not yet deployed. Real time feedback will feature increasingly across the production pipeline and enhance circular news production, as well as quality at the point of delivery, feeding into distribution.

Gen AI will also have a significant impact on localization, with benefits for regional broadcasters, especially as automation lowers the cost of production for stories with smaller local audiences.

You might also like...

HDR & WCG For Broadcast: Part 3 - Achieving Simultaneous HDR-SDR Workflows

Welcome to Part 3 of ‘HDR & WCG For Broadcast’ - a major 10 article exploration of the science and practical applications of all aspects of High Dynamic Range and Wide Color Gamut for broadcast production. Part 3 discusses the creative challenges of HDR…

IP Security For Broadcasters: Part 4 - MACsec Explained

IPsec and VPN provide much improved security over untrusted networks such as the internet. However, security may need to improve within a local area network, and to achieve this we have MACsec in our arsenal of security solutions.

IP Security For Broadcasters: Part 3 - IPsec Explained

One of the great advantages of the internet is that it relies on open standards that promote routing of IP packets between multiple networks. But this provides many challenges when considering security. The good news is that we have solutions…

The Resolution Revolution

We can now capture video in much higher resolutions than we can transmit, distribute and display. But should we?

Microphones: Part 3 - Human Auditory System

To get the best out of a microphone it is important to understand how it differs from the human ear.