AI In The Content Lifecycle: Part 4 - Pushing The Content Distribution Boundaries
Generative AI is poised for another round of disruption across key aspects of media content distribution. These include recommendation, streaming, quality control during transmission, and video encoding.
Other articles in this series:
Major impacts of Generative AI in video distribution are yet to come, but their scope has already been demonstrated through existing applications of neural network-based machine learning. There will be further improvements in the balance between video compression and viewing experience, as well as more blurring of the lines between previously distinct aspects of video workflow, such as content generation, personalization and video delivery.
Video distribution itself embraces several disciplines, all relating ultimately to consumption of content by users wherever they want to view it. This has increasingly become a two way process in the streaming era, or even since the beginning of VoD (Video on Demand), involving choice by the user as well as output according to published linear schedules. It therefore includes search and recommendation, as well as transmission. Quality control during delivery is critical, as is video encoding to minimize use of bandwidth for a given quality.
AI, meaning some variant of neural network based machine learning, has been employed in all these facets of distribution, and Gen AI essentially represents an advance in the capability and power. It goes beyond convergence, regression and pattern recognition to prediction and data simulation, which is bringing further advances in compression and quality control during transmission.
The increasing overlap between production and distribution is already evident in various pilot projects being conducted by broadcasters, for both personalization and rapid repurposing of near live content whose value diminishes quickly with age, notably news and sports. The BBC is trialling Gen AI for translation of breaking news content into multiple languages, so that it is available almost immediately. It is also evaluating Gen AI for reformatting live sports commentaries into text for its live sports pages, with the twin goals of doing this faster and making it more informative for those following the event that way, with better narrative control.
Some broadcasters are also looking at use of Gen AI based chatbots for content search, aiming to fulfil fuzzier and more indirect requests for movies rather than just obvious connections. In this vein Gen AI can be seen as the latest in a series of technologies to enter service for search and recommendation, recalling how it began with the collaborative filtering popularized by Amazon in the early days of ecommerce.
Collaborative filtering is based on the premise that two users who share various interests in one field are more likely than two randomly chosen people to have similar tastes in another. This would be applied on a fuzzy basis, making recommendation of TV shows to people who had a significant degree of common interest to others already known to enjoy such content. The technique was limited by being little more than quantitatively statistical, not taking account of varying degrees of association between interests. For some cases, especially when it comes to targeted advertising, it might be that just one degree of overlap was significant. Two individuals known to be running enthusiasts would be suitable targets for adverts on a new training shoe for example, even if all their tastes were different.
Such factors were taken into account in subsequent recommendation systems, with machine learning later entering the fray for identifying strategies that work most effectively. US cable company and media conglomerate Comcast, owner of NBCUniversal (NBCU) as well as Sky, revealed in March 2024 that it was using Gen AI to analyze TV and social content across its great content portfolio with the aim of associating items with emotions and motivations of its audience. This would then be pitched at advertisers for targeting people most likely to be interested in certain products, according to the themes and storylines they were drawn to.
NBCU said it had developed 300 audience segments this way that allow marketers to target people down to the level of episodes within shows, giving “Law & Order” and “Dateline” as examples. Here NBCU was harnessing Gen AI to build on Sky’s Admart targeted advertising technology first released in 2013 before the UK based group’s acquisition by Comcast for £30 billion, completed in 2018. The use of Gen AI has enabled much greater granularity and finer segmentation, with more intuitive connections between people and products.
But when it comes to the entertainment content itself, the recommendation system is constrained by the available information, contained in titles and associated metadata. Google has applied Gen AI at this level to generate text descriptions of content automatically through analysis of the video and audio, when that is lacking or deficient.
Video transcoding is another big field for application of AI, again building on existing techniques to improve compression efficiency, quality, or both. AI has been employed at a high level for Quality Defined Variable Bitrate (QVBR). This can operate outside the compression domain, adjusting the bit rate to maintain a constant perceptual quality for a given codec. The idea is that user experience is sustained at the required level, without ever allocating more bandwidth than is needed.
This concept of delivering just what is needed can also be applied inside the encoding process, with Gen AI more recently employed to home in on structural features within the video so that compression can be applied differentially, rather than at a constant level irrespective of the content being played out.
In the earlier days of video compression, well before the arrival of AI in its modern incarnation of machine learning, coding was reliant on analysis of source signals as surrogates of the actual visual content perceived by humans. This has led to use of PSNR (Peak Signal to Noise Ratio) for example, which has been quite successful but can only go so far without being able to exploit structural features of the video.
Then second generation coding methods arrived with ability to take account of structural elements of the video and exploit the fact that not all frames are equal, and not all parts of each frame. This led to improvements in compression ratios for a given quality, with techniques such as geometric partitioning subject to R&D over the last few years. Essentially these break down frames into smaller areas that can be subject to different levels of compression. Some of these have entered service in leading codecs such as MPEG 4, MPEG 7, AVS2, HEVC, and VVC.
Such techniques can also enable multiple views to be encoded more efficiently. There is growing demand to transmit multiple camera angles of sporting events as single streams, and spatial encoding can capitalize on the similarities between these, given that they are after all of the same event. This has been done in Multi View (MV) HEVC, for example.
Gen AI can increase the capabilities of such techniques, converging towards the optimal combination of quality and compression. Gen AI can also build out from intra frame analysis to inter frame, taking account of structural temporal redundancy and the ability to reconstruct less significant frames at the point of destination without too much computation.
The idea then is that after calculating the importance of individual frames, some involving more substantial changes in content as say new objects such as people enter the scene, the most critical ones can be sent in full. Then frames of intermediate importance might be sent using advanced spatial or other coding methods to home on the important objects, while least critical frames might be completely omitted. The later would then be reconstructed by the client relatively easily.
Generative Adversarial Networks (GAN), a variant of Gen AI, have shown promise for encoding because they accelerate convergence by attempting to outwit each other. In this case one adversary would attempt to encode as efficiently as possible and the other to detect whether this could be distinguished from the target quantity. GAN is likely to be employed for this selective frame encoding.
This approach involving some reconstruction at the client can be taken further, with some research now looking at recreating entire sequences of video at the client from textual descriptions, with scope for personalization. As this suggests, application of Gen AI in this context is blurring the lines between distribution, encoding and production.
But these are early days, for while video generation from text is a fertile and vibrant field spawning numerous startups as well as drawing in established technology companies, there is a lot of R&D to be done just to generate videos of acceptable broadcast quality, never mind that faithfully create or recreate the desired content. It will though be a growing theme across the AV lifecycle over the next few years.
You might also like...
HDR & WCG For Broadcast: Part 3 - Achieving Simultaneous HDR-SDR Workflows
Welcome to Part 3 of ‘HDR & WCG For Broadcast’ - a major 10 article exploration of the science and practical applications of all aspects of High Dynamic Range and Wide Color Gamut for broadcast production. Part 3 discusses the creative challenges of HDR…
IP Security For Broadcasters: Part 4 - MACsec Explained
IPsec and VPN provide much improved security over untrusted networks such as the internet. However, security may need to improve within a local area network, and to achieve this we have MACsec in our arsenal of security solutions.
IP Security For Broadcasters: Part 3 - IPsec Explained
One of the great advantages of the internet is that it relies on open standards that promote routing of IP packets between multiple networks. But this provides many challenges when considering security. The good news is that we have solutions…
The Resolution Revolution
We can now capture video in much higher resolutions than we can transmit, distribute and display. But should we?
Microphones: Part 3 - Human Auditory System
To get the best out of a microphone it is important to understand how it differs from the human ear.