Mobile Codecs: The Battle Of The Codecs Continues But AI May Disrupt The Field

The continued proliferation of streaming video consumption on mobile devices and especially smartphones has boosted activity around mobile codecs with new releases of VVC and AV2 slated in the next 2 years. At the same time, Generative AI is disrupting the whole codec field through spectacular gains in compression efficiency and quality enhancement.
Video codecs are increasingly geared to the needs of mobile devices and especially smartphones because these are where an ever-increasing proportion of viewing is taking place. These devices have smaller screens and are more constrained in computer power as well as storage, while also being served by cellular networks with generally lower bandwidth than fixed line broadband infrastructures. On top of that many mobile services still impose data caps or at least some constraints over total usage, which increases demand for the highest levels of compression.
This comes at a time when the whole codec field, including pre and post compression enhancement, is being disrupted by neural network-based AI, which is improving both potential encoding efficiency and quality enhancement during or after the decode stage. Generative AI is now blurring the line between the creation, processing and compression of video, as well as audio, with its ability to correct or remove undesirable artefacts and also introduce qualitative features such as fine textures that may not have even been captured by the camera in the first place.
AI is also entering domains previously addressed by what might be called conventional algorithms related to compression. One example is in Film Grain Synthesis (FGS), which has been incorporated in some video codecs, most notably AV1, to eliminate the grain during encoding.
This is done because film grain as found in a lot of traditional movie content is notoriously difficult to compress because of its random nature and therefore tends to be lost or at best distorted in the final decoded output for playback. FGS converts the grain into parameters that consume less bandwidth when transmitted alongside the encoded video, for reconstruction by the decoder during playback.
Now AI techniques for FGS are being evaluated in trials, with superior results in some cases. Although not directly relevant for mobile codecs, such approaches of converting artefacts or objects to parameters is emerging as an alternative approach, employing AI algorithms to achieve higher levels of compression.
While for now mobile devices are mostly being served by traditional codecs, their specific requirements are increasingly having an impact on choices between the various options, and on priorities for ongoing R&D. The mobile dimension is to some extent being aligned with the exigencies of internet streaming more generally, which has been exerting a growing influence on codec development and deployment for well over a decade now. This has been directed by commercial and competitive factors related to royalties and patents, as much as by technical factors.
MPEG standards dominated the codec world before streaming, with relatively leisurely progression between standards at around 10 year intervals, comparable in fact to the pace of cellular generations. MPEG-2 was replaced by MPEG-4/H.264 and then H.265/HEVC, all following a predictable path of technical innovation dictated in large part by advances in available computational resources. HEVC for example extended the ideas embodied in H.264/MPEG-4 AVC (Advanced Video Codec), in essence encoding by comparing different parts of a frame of video to find redundant areas both within single frames (intra-frame), and between consecutive frames (inter frame).
But then the big tech companies that grew up on the back of the internet came together to challenge MPEG’s hegemony, motivated partly by unwillingness to go on paying royalties levied by the patent holders. Google developed its VP9 codec primarily for YouTube initially, released in June 2013.
The streaming community then coalesced around VP9 and formed the Alliance for Open Media (AOMedia) to develop a successor to that called AOMedia Video 1 (AV1). This was developed using open web standard development methods to create an alternative to MPEG’s HEVC, which in turn was emerging at the time as a successor to H.264.
AV1 was released in 2018 and used for transmitting higher quality video over the internet by some of the AOMedia consortium members, which included Apple, Meta (Facebook), Google, Microsoft, Amazon and Netflix. Support for AV1 was added in relevant platforms, with Amazon Web Services (AWS) for example bringing it into in its Elemental MediaConvert file-based video encoding service in 2020. Then Netflix announced it was supporting AV1 for subscribers with various leading smart TVs in November 2021.
Yet AV1 has been retarded in the mobile domain by lack of urgency on the part of device makers, which were initially more enthusiastic about HEVC, despite some of them hedging their bets by subscribing to AOMedia. Most notably, Apple only added AV1 decode support for iPhones in June 2024. It came earlier onto some Android phones but was still lagging behind smart TVs from the likes of Samsung.
Similarly, leading browsers have been tardy, with Apple’s Safari supporting AV1 only from September 2023, and even then at first just for devices with AV1 hardware decode already built in. There was no software-only decode to run on device CPUs, which might admittedly have been a deliberate omission given that software decoding consumes valuable resources on smartphone SoCs, depleting battery life and generating more heat.
The upshot of these delays is that even by the end of 2024, barely over 10% of all smartphones in circulation had hardware-supported AV1 decode, which is necessary for efficient operation of the codec on these devices. HEVC was and still is the predominant codec for smartphones.
However, the tide is turning towards AV1 on smartphones, partly because on most benchmarks it is more efficient than HEVC and therefore generates encoded video at lower bit rate for a given quality, which is especially important for mobile streaming. AV1 compresses around 30% more than HEVC as an average across multiple tests, which equates to say one hour of encoded 4K video generating 7GB of data rather than 10 GB.
Yet as ever in the codec wars advantages are only temporary, and the MPEG camp has VVC (Versatile Video Coding), otherwise known as H.266 in the wings as the successor to HEVC/H.265. VVC generally outscores AV1 and is around 33% more efficient than HEVC at HD resolution, according to tests conducted by the BBC, but by a greater margin up to 50% for content delivered to mobiles at lower resolutions in some other benchmarks.
The AOMedia camp though has its riposte with the AV2 codec, which looks like it will perform better still and is already well on the way towards deployment. As a result, there are signals that AV2 may provide further fuel for the swing towards this family of codecs over the rest of this decade. There is a lot of talk about AV2 in the Apple Developer community at the moment.
The first AV2 silicon is likely to arrive in 2026, followed by software decoders the year after and hardware in 2028. But VVC hardware decoders are likely to be available in 2026, giving that a window of opportunity. What looks certain is that there will be several codecs in circulation on mobile devices for several years at least. There will also be several LCEVC (Low Complexity Enhanced Video Codec) options as boosters to regular options.
With its focus on working alongside existing codecs outside the box, LCEVC is a good fit itself for further enhancement through machine learning. Indeed, there have been several research papers describing AI-infused improvements to existing LCEVC codecs, where the algorithms determine optimal encoding parameters within frames, and potentially in the time dimension as well between frames. One benefit seems to be smoother playback, including on mobile devices given the major grunt work occurs on the encoding side.
It is unclear yet how quickly AI will encroach directly on primary codecs. While there have been advances in the currently evolving codecs, such as VVC and AV2, most of the algorithms are still conventional in the sense that machine learning is not involved.
But whether AI is employed inside, or alongside, existing codecs is rather immaterial, certainly from the viewer’s stance. In either case the role of AI in a more generative capacity raises questions over assessment of quality in the finished playback after decoding. Without accurate and reproducible quality measurement it is impossible to benchmark codecs meaningfully to assess how they compare over efficiency at a given quality.
Several metrics are available for video measurement and used to varying extents, notably PSNR (Peak Signal to Noise Ratio), SSIM (Structural Similarity Index Measure), and VMAF (Video Multimethod Assessment Fusion).
Of these, PSNR is too surrogate a measure to be well correlated with user perception, which led to SSIM being developed and launched in 2004 with its ability to assess structural features in frames rather than just individual pixels. But this still fell well short as content catalogues expanded and individual metrics such as SSIM proved inaccurate for a growing number of cases. Netflix as the early front runner in streaming developed Video Multi-Method Assessment Fusion (VMAF) for deployment from 2016, combining four elementary metrics to enable better all-round quality assessment accuracy.
But even VMAF is proving ill equipped for the Gen AI era of content creation, especially for assessing the output of codec decodes. Contemporary reference metrics such as VMAF work by comparing the quality of an output by comparison with the input, typically the original unencoded video. This was a vast improvement over earlier methods but inadequate for the generative era because new details can be introduced, while the output may improve on the original in various ways, such as by upscaling resolution, removing artefacts, or indeed adding simulated film grain.
Traditional quality metrics regard all such interventions as degradations because they depart from the original, so new tools are needed. This is leading to a new generation of no-reference metrics that attempt to take subjective quality assessment to a higher level. Inevitably these are now employing machine learning to make more accurate predictions of how human viewers would perceive the output.
At first traditional ML algorithms were tried, involving directed training where the system has to be told what features to look for in the video when assessing quality. This as would be expected yields accurate assessments at a broad level but fails to pick up fine grained features, which is a serious handicap for more subtle AI-engineered content.
More advanced deep learning techniques allow the models to extract features automatically, and train to assess quality across a range of scales and nuances, but this is computationally more intensive. It can also lead to overfitting, which in this context means being too sensitive to small anomalies in the target video, resulting in quality being underestimated.
The best results seem to be obtained by hybrid models combining traditional ML with deep neural network-based approaches that extract some raw features from the pixel date but add some hand-crafted ones relating to properties of whole bit streams and the encoding resolutions.
The same considerations that apply to video quality assessment are also relevant for codecs as a whole. There is a lot of ongoing research into what are sometimes called neural codecs, and specifically in making them work with sufficient efficiency for decoding on mobile devices.
Furthermore, with growing use of smartphones in the field for professional capture as well as playback, there has been development of codecs capable of combining the required quality with the efficiency to run on smartphones. This has become possible because of the phenomenal increases in both smartphone memory and SoC (System on Chip) power since their breakthrough onto the consumer scene with the first iPhone in 2007.
Apple’s ProRes now runs on recent iPhones, while Samsung’s Advanced Professional Video (APV) codec, launched in October 2023, is available on the very latest Galaxy S25 Ultra unveiled in January 2025. These are essentially intermediate codecs, intended for use during video editing rather than end-user playback.
This is enabled by using only intra compression so that each frame can be stored independently and decoded without depending on other frames. This confers satisfactory performance and quality in post-production applications where random access to frames may be required, while still being much less memory and compute intensive than working with completely uncompressed video.
While ProRes was initially developed for larger devices, Samsung’s APV is an example of a codec designed with smartphones in mind. It may be the first of many.
You might also like...
BEITC At NAB 2025: Conference Sessions Preview - Part 2
Once again in 2025 The Broadcast Bridge is proud to be the sole media partner for the BEIT Conference Sessions at NAB. They are not free, but the conference sessions are a unique opportunity to engage with very high quality in-person…
Microphones: Part 8 - Audio Vectorscopes
The audio vectorscope is an excellent tool for assuring quality in stereo sound production, because it makes the virtual sound image visible in the same way that a television vectorscope allows the color signals to be seen.
BEITC At NAB 2025: Conference Sessions Preview - Part 1
Once again in 2025 The Broadcast Bridge is proud to be the sole media partner for the BEIT Conference Sessions at NAB. They are not free, but the conference sessions are a unique opportunity to engage with very high quality in-person…
HDR & WCG For Broadcast - The Book
‘HDR & WCG For Broadcast – The Book’ is a multi-article exploration of the science and practical applications of all aspects of High Dynamic Range and Wide Color Gamut within broadcast production.
Monitoring & Compliance In Broadcast: Part 1 - Cloud, Multi-Site & Remote Systems
‘Monitoring & Compliance In Broadcast’ explores how exemplary content production and delivery standards are maintained and legal obligations are met. The series includes four Themed Content Collections, each of which tackles a different area of the media supply chain. Part 1 con…