AI In The Content Lifecycle: Part 2 - Gen AI In TV Production
This series takes a practical, no-hype look at the role of Generative AI technology in the various steps in the broadcast content lifecycle – from pre-production through production, post, delivery and in regulation/ethics. Here we examine how Generative AI is bringing more scope for automation of tasks such as editing and clip creation during production, while also extending into creative domains and disrupting traditional workflow methods.
Other articles in this series:
Well established machine learning methods have already had a major impact on broadcast production by automating more routine tasks, but now Generative AI is extending the scope in various directions with prospect of continuing disruption and upheaval in the foreseeable future. Since settlement of the Hollywood writers dispute late in 2023, attention has been refocused on positive aspects of this disruption with the further elimination of routine repetitive tasks and acceleration of workflow so that human producers can pursue creative paths that were previously impossible, either because of time and costs, or even because the conception itself was physically unrealizable. Indeed, there is recognition that Gen AI itself, operating through various forms of Extended Reality, can help realize previously unfulfilled creative dreams.
The negative side is that this intrusion by Gen AI is disruptive, threatens to make some people on the creative side redundant, and looks like continuing to evolve quite rapidly over the next few years. The latter implies that broadcasters and content producers will have to go on recalibrating their sights for some time when it comes to production workflow.
It is worth considering how Gen AI differs from earlier neural network-based machine leaning and what impact that has on production workflow. To some extent Gen AI represents a catching up with the vast computational resources now available, in processing power, graphics capability and working memory. This brings much greater predictive power based on assimilation of knowledge across domains based on ability to analyze broader tasks with links between different fundamental information types, primarily text, graphic, image, and video.
This is achieved through foundation models, which are often defined as large machine learning based models trained on diverse data sets such that they can be applied to a broad range of problems. In broadcasting, however, and probably also other domains, foundational models are themselves evolving into distinctive families with different variants targeted at subcategories with increasing granularity. Broadcasters therefore might develop a foundational model for sports generally, and then break those down into separate models for each individual sport.
So far the latter is only possible for larger broadcasters and even they are at an early stage of development. Fox Sports in California is one that can now generate sports highlights automatically by applying foundational models to massive data sets it has been collecting.
Foundational models can also be developed for news, opinion and documentary, in all of which Gen AI is featuring increasingly. They overlap naturally since sports highlights may be incorporated in news programs, but in a more abridged form perhaps than within a dedicated sports program. Similarly, snapshots of topical documentaries being shown shortly may also make the news.
There is also interest in using Gen AI to illustrate or enliven items such as face to face interviews, recommending or bringing in auxiliary video content which could either be created from scratch, or more likely at this stage selected from existing media asset systems. This process can be laborious manually and not always practical in the case of live interviews.
Broadcasters are mostly experimenting with semi automation of these processes, rather than deploying them live at this stage. But already Gen AI is automating some quality functions within content production, examples including branding insertion, color correction, and creation of multiple versions. Automatic transcription has been possible for some time, but Gen AI is increasing the scope to include full translation of dialogues, subtitling and other aspects of localization including graphics and captions.
This reduces manual effort with scope for adding additional value for local audiences. Some of these processes are being incorporated not just in fixed broadcast facilities but also OB vehicles for both news and sports.
Automated camera control has featured increasingly in OB, especially at sports events, allowing inclusion of multiple angles and viewing positions without requiring extra personnel. Earlier systems were rather unreliable and susceptible to error in crowded locations such as stadia when trying to track a given individual, or even follow an event.
Even before Gen AI automatic cameras could cope with temporary occlusion of the object or event they were tracking. With Gen AI there is further scope for development of robotic cameras that can roam around a location, in addition to tilting, rotating and panning. On this front broadcasting will benefit from developments in general AI-driven robotics, although with the caveat that the road from initial development and capability to full deployment can be long, as we are seeing with autonomous driving.
One aspect of automation and the role of AI is increasing connection between the various stages of the content life cycle, as they become more interdependent. At the production stage AI has come into play in overall Media Asset Management (MAM) in preparation for subsequent archiving for various tasks that were previously manual. These include captioning, content editing, and creation of suitable metadata to facilitate search and navigation in future when assets are accessed.
Only a few major broadcasters have the resources to do all this themselves and so inevitably the major hyperscalers and cloud service providers have come into the equation, peddling their own Gen AI wares. This is leading towards cloud-based archiving, achieving scale economies for storage, while also benefiting from higher levels of data availability than any internal data center could provide affordably.
Products such as Amazon’s Media2Cloud incorporate Gen AI to enhance metadata creation during archiving, as a customer’s assets are transferred to the Amazon S3 Glacier repository. Microsoft, Google and few others provide similar services, which facilitate metadata extraction and creation with audio to text transcription, for example.
Naturally, broadcasters want to archive content at optimum quality for a given level of compression. It is true that AI methods are now helping restore quality of archive footage and upscaling to higher resolution, but now in the digital age even better results will be obtained by storing at best quality possible in the first place.
On this front an offshoot of Gen AI, Generative Adversarial Networks (GAN), are being employed, focused particularly on optimizing quality in the compressed domain. Given that lossless compression is practically impossible for stored archives, the aim is to minimize the losses, in part by concentrating on critical objects rather than spaces between them.
GANs were inspired by natural evolution, specifically in an arms race where two organisms, such as a predator and its prey, or pathogen and its host, continually adapt in the hope of gaining some advantage over the other. However this tends to stop, or slow down greatly, once a point is reached at which further adaptations prove more costly in other ways than any additional small gains over the direct adversary.
Similarly under GAN, two Gen AI models are pitted against each other, one being called the generator and the other the discriminator. The generative network creates data of interest within a larger space which the discriminator must then recognize. The generator keeps tweaking the process in an attempt to disguise that data, while the discriminator in turn adapts in order to keep tuned so that it continues to identify correctly. Convergence occurs when the changes dwindle or become confined within an agreed margin.
This process can be applied to video compression by comparing content after it has been through a codec and emerged in the decompressed state with the original. The objective is to minimize relevant differences for a given level of compression.
There has been a spate of research papers published on this application of GANs, with particular focus on the H.265 /HEVC codec, since that is widely employed by broadcasters. HEVC is a reasonably efficient codec, but inevitably video information is lost during compression and transmission, detracting from both objective and subjective quality.
To date techniques such as PSNR (Peak Signal to Noise Ratio) and SSIM (Structural Similarity Index Measure) have been employed for video quality evaluation and restoration, but they are relatively crude in the sense they do not discriminate within frames or sequences to home in objects or areas moist critical for the viewer’s quality of experience (QoE) . The Perceptive Index (PI) has emerged as a technique for assessing the QoE of video in the absence of a reference, and this can be applied with GANs.
Results so far have been promising, with GANs reducing visible artifacts and elevating perceived video quality more than prior methods. GANs achieve this by effectively acquiring knowledge of how images are distributed, and so minimizing the disparity image distributions in the training set and generated set, through application of this iterative adversarial method.
Gen AI then is finding application across the video workflow, with many developments in the R&D pipeline if not yet deployed. Real time feedback will feature increasingly across the production pipeline and enhance circular news production, as well as quality at the point of delivery, feeding into distribution.
Gen AI will also have a significant impact on localization, with benefits for regional broadcasters, especially as automation lowers the cost of production for stories with smaller local audiences.
You might also like...
Designing IP Broadcast Systems - The Book
Designing IP Broadcast Systems is another massive body of research driven work - with over 27,000 words in 18 articles, in a free 84 page eBook. It provides extensive insight into the technology and engineering methodology required to create practical IP based broadcast…
Demands On Production With HDR & WCG
The adoption of HDR requires adjustments in workflow that place different requirements on both people and technology, especially when multiple formats are required simultaneously.
NDI For Broadcast: Part 3 – Bridging The Gap
This third and for now, final part of our mini-series exploring NDI and its place in broadcast infrastructure moves on to a trio of tools released with NDI 5.0 which are all aimed at facilitating remote and collaborative workflows; NDI Audio,…
Designing An LED Wall Display For Virtual Production - Part 2
We conclude our discussion of how the LED wall is far more than just a backdrop for the actors on a virtual production stage - it must be calibrated to work in harmony with camera, tracking and lighting systems in…
Microphones: Part 2 - Design Principles
Successful microphones have been built working on a number of different principles. Those ideas will be looked at here.