Video Quality: Part 2 - Streaming Video Quality Progress
We continue our mini-series about Video Quality, with a discussion of the challenges of streaming video quality. Despite vast improvements, continued proliferation in video streaming, coupled with ever rising consumer expectations, means that meeting quality demands is almost like an ongoing arms race.
Other articles in this series:
OTT streaming has dominated the video quality world for over a decade now, with the emphasis switching to live services where there is far less scope for content preparation before delivery. Subscription VoD providers, led by Netflix, made the early running, but have since been joined by broadcasters, social media giants and other service providers also interested in the live dimension.
Quality issues specific to mobile services have figured more prominently given increasing consumption of content on the go on smartphones particularly. User Generated Content (UGC) has also risen up the agenda as even traditional broadcasters start to rely on that increasingly for breaking news and enriching of remote coverage of sporting events, music festivals and others.
Although the fundamentals of video quality have not changed much, there have been some important additions, especially for live. These include stream latency, which must be kept in check for live sports coverage especially, to prevent people in close proximity in public places for example witnessing events such as goals scored at different times. There is also the issue of metadata generation, which must be performed accurately to ensure content is played out correctly on all the target devices.
There have also been changes in how video quality is measured to reflect the advancing science of user perception, which was discussed in the introductory piece of this series. There have been significant changes in the measurement process itself brought by the expanding scope of streaming distribution, with video subject to quality insults right across the pipeline.
It begins with content capture, preparation, and encoding, through the origin servers and out across long distance transport links, CDN edge networks, the last mile over an ISP network typically, and finally into the user’s viewing device. There may also be a hop over WiFi in the home, or alternatively a cellular network. These last two may be concatenated when the last mile is a Fixed Wireless Access cellular connection, feeding a home WiFi router.
A good starting point is to consider the technical surrogates of perceived video quality, that is resolution, latency, color accuracy, motion smoothness, and consistency of all these. Jitter is usually taken as variation in latency and impacts quality, so that it tends to be smoothed out by buffering. But that comes at the expense of extra start up delay and increasing lag behind live, so other methods also have to be employed.
Variation in resolution also affects quality, which is what motivated development of adaptive bit rate streaming (ABRS) protocols to ensure users obtained the best quality possible at a given time subject to varying network conditions. But if the variation is too great, the dips in resolution during the worst network conditions impair the viewing experience themselves.
Motion smoothness is governed by frame rate, and that becomes a major factor when viewing fast moving action on big screens. This can be noticeably jerky at rates such as 30 fps which are fine for most cases, such as viewing sports on small screens, or talking heads on any screen.
Remedial measures for deficiencies in any of these viewing parameters then come under the headings of bit rate, latency, jitter, error correction, frame rate and streaming protocols. These in turn have to take account of encoding, which reduces resolution and can introduce visual artefacts that deplete the experience in the event of over compression.
There is also the audio side of the equation, which is sometimes overlooked in superficial quality discussions at least, and has become more critical in the streaming era as its quality has increased.
Synchronization between audio and video can still be an issue, although less so than in the past as it is quite well understood. Yet, loudness control remains a challenge despite being well understood, largely because it has to encompass multiple sources within a single audio/video stream.
Loudness can change significantly between regular programming, podcasts, music streams, ads, and User Generated Content, which can all be monitored but still with difficulty translating into uniform sound that does not jar on the listener’s ear during content switches. Regular users of YouTube and social media platforms including Facebook are well aware that audio volume can vary considerably between, or sometimes even within, items.
Increasing quality has exacerbated the audio challenge. All the big streamers now provide a growing amount of content in 4D, HDR, and Dolby Vision, with object-based sound coming in. Modulating loudness may also have to go hand in hand with maintaining dynamic range and balance between sound sources.
There is also continuing need for adhering to regulations such as the USA’s CALM Act, which stipulates that ads must be played at the same volume as the programs in which they are embedded. We can be sure then that audio is going to stay high up the quality agenda.
Ads bring quality issues of their own, especially with live when they are stitched into programs dynamically on the fly. Apart from the loudness issue, Dynamic Ad Insertion (DAI) risks introducing buffering and video artefacts during the transition if not done properly.
For such reasons Server-Side Ad Insertion (SSAI) is sometimes preferred, enabling ads to be incorporated with content in a stream effectively seamless and with full control over loudness. SSAI also reduces scope for ad blocking, but at the loss of knowledge of where the ads are being played or ability to insert topical ads at short notice. There is less scope for local variation in ad selection.
All forms of streaming differ from transmission over the air or over fixed cable networks in lack of visibility or control across the distribution domain from the head end onwards. This has two immediate implications, firstly that it becomes imperative to optimize quality in the domain where the content owner does have control, which is production.
Secondly, monitoring has to be performed across the distribution domain at all critical points so that locations of problems can be pinpointed quickly for troubleshooting and remedial action. This is just as important for live content because in practice it is not always possible to resolve issues immediately in real time. It is then vital to address issues as quickly as possible after playout, ideally within minutes, so that impact on viewing experience and therefore business is minimized.
At the same time streaming video workflows are being migrated to the cloud, which means control of the network becomes more centralized. Analytic functions can be executed on powerful and yet competitively priced COTS (Commodity Off The Shelf) servers, enabling faster feedback leading towards real time control. This cause is being assisted by incorporation of Machine Learning (ML) algorithms that can pinpoint actions more quickly and enable greater automation of quality control.
While a lot of that is yet to come, there is the prospect of live streaming services being able to adapt to at least some quality issues in real time, so that they can be mitigated before users realize there is a problem. This could be by reducing bit rate temporarily the moment congestion would otherwise cause some stuttering.
While ABRS is supposed to cater for varying network conditions, devices cannot always adjust quickly enough without the help of ML based tools at the monitoring coal face.
ML also brings the prospect of content aware quality adjustment. Given a network of finite capacity, there is not enough bandwidth for all video services to enjoy the optimal bit rate that would ensure the best possible viewing experience.
It is now becoming possible to identify in real time those sections of a video that need higher bit rates, when action is fast moving, or detail is high with a lot of color variation. These areas could then be allocated a high bit rate temporarily, but lowering the rate during frames with little contrast or movement.
Netflix was among the pioneers of per title encoding designed to match the compression to the varying frames within a video, maximizing quality for each of the different bit rate variants of the content. This is now being extended to live content where encoding speed is an additional constraint alongside bit rate.
CPU power is then the bottleneck, so that the objective becomes to optimize execution of encoding. Less effort is devoted to frames without movement or variation, and then focusing the CPU power on the more critical segments of video from a quality perspective.
The growth in streaming over cellular networks is also a major factor affecting viewing quality. This is especially the case for popular live events where demand can overwhelm capacity when the streams are delivered unicast on a one-to-one basis. 5G Multicast/Broadcast has emerged as a solution to this problem by converting delivery to multicast or broadcast mode as demand increase. This is discussed in the recent Broadcast Bridge series on 5G broadcast and multicast.
With users sometimes subject to limits over streaming data, or costs when these are breached, some service providers are now offering customers the option to choose between different quality levels. The NBC Universal Peacock streaming service available in the USA has done this by offering four options. The first is default where Peacock adjusts quality purely in the basis of current internet speed and viewing device capabilities, which is normal procedure.
There is a low setting, delivering a minimum quality with emphasis on saving data, then medium balancing the two, and high, where the best quality possible given prevailing network conditions is provided.
As more streaming companies and indeed broadcasters start allowing users to pick their own quality, it will become more important that they ensure they play their part in delivering the best experience possible for each of the options.
You might also like...
HDR & WCG For Broadcast: Part 3 - Achieving Simultaneous HDR-SDR Workflows
Welcome to Part 3 of ‘HDR & WCG For Broadcast’ - a major 10 article exploration of the science and practical applications of all aspects of High Dynamic Range and Wide Color Gamut for broadcast production. Part 3 discusses the creative challenges of HDR…
IP Security For Broadcasters: Part 4 - MACsec Explained
IPsec and VPN provide much improved security over untrusted networks such as the internet. However, security may need to improve within a local area network, and to achieve this we have MACsec in our arsenal of security solutions.
Standards: Part 23 - Media Types Vs MIME Types
Media Types describe the container and content format when delivering media over a network. Historically they were described as MIME Types.
Six Considerations For Transitioning To Cloud Based Video Distribution
There are many reasons why companies are transitioning from legacy video distribution workflows to ones hosted entirely in the public cloud, but it’s not a simple process and takes an enormous amount of planning. Many potential pitfalls can be a…
IP Security For Broadcasters: Part 3 - IPsec Explained
One of the great advantages of the internet is that it relies on open standards that promote routing of IP packets between multiple networks. But this provides many challenges when considering security. The good news is that we have solutions…