The Streaming Tsunami: Testing In Streaming Part 1: The Journey To Zero Bugs
As users migrate to streaming services, and the number of devices used in the delivery chain increases exponentially, so the need for and complexity of testing also grows exponentially. Here we discuss the challenges and emerging technologies and best practices.
More articles in this series and about OTT/Streaming:
As streaming services add more users and serve larger audiences, especially for live events, the reliability of video playback and the reliability of the hundreds of microservices that support the Streaming App User Experience matter more than ever. So, what are Streamers and Media Tech companies doing to test so that everything will be alright on the night?
A Zero-Bug Policy
The UK’s Channel 4 (C4) has had a large-scale streaming service for over a decade. Each year it beats its streaming records from the year before, and we witness routine reports of streaming highs related to new programs or live events. March 2024 was reported as Channel 4’s biggest streaming month on record, while April 2024’s minutes viewed were up 22% compared to April 2023.
Like other national broadcaster streamers, C4 delivers more VOD content than live content when measured by volume of content viewed, but its biggest audiences measured by concurrent viewers are either for live content (e.g., sports) or for appointment-TV content (e.g., super-popular pre-recorded program releases). Their recently released Fast Forward strategy points very clearly at an all-digital future, with targets for 50% of total business revenue coming from digital and streaming services by 2030.
This heavyweight shift towards streaming growth has already led to stringent policies and new disciplines related to streaming service performance. One specific policy is a zero-bug policy, intended to avoid every possible interference viewers could experience with the C4 Streaming service.
Obviously, zero bugs is easier said than done, as any Streaming leader in Product, Development, or Operations will attest to. The growing complexity of devices, platforms, networks, chipsets, operating systems, and the increasing sophistication of the services themselves as more user-features and 3 rd-party integrations are introduced, makes for a heady cocktail for a classic “QA function” to manage.
While continuous improvement of streaming QoE is fundamental for any streaming service, the zero-bug policy is really a pre-production and pre-event policy. While pre-production is easy to identify as it relates to new features or changes to the existing streaming service that must be seamlessly introduced, the pre-event work presents a repetitive challenge of scale-testing. And notably, in streaming services vs. traditional broadcasting services, it is the flexible and changing nature of IP architectures and IT technologies coupled with constantly growing audiences that makes pre-event testing one of the most challenging and important testing processes.
The testing discipline therefore covers technical testing across a very wide set of different technical domains, plus customer testing and scale testing. The goal across all three areas is to achieve broadcast-grade performance – i.e., consistently high quality at scale. This performance level is one way broadcasters’ streaming services compete with their global competitors.
To strive for zero bugs, C4 uses a test-driven design approach. Operational testing at C4 focuses on load testing, penetration testing, device testing, and device-farm testing. As Declan Toman, C4’s Digital Service Delivery Manager, states, “We need to always take the viewer perspective, and this is most impacted by the devices people use and the way they interact with the user interface of our streaming service. These two aspects therefore dictate how we operate our testing processes.”
Devices Dominate Technical Testing
While playout, encoding and origination are critical technical components, they are almost all under the broadcaster’s control – i.e., they are operating in highly controlled environments (even if in the public cloud) with daily repetitive usage on well-understood infrastructure.
On the other hand, shared platforms that deliver and display content and information to the viewers are the most challenging for broadcasters to validate and trust. These shared platforms include set-top-boxes, Android platforms, streaming sticks, and Smart TVs. In the UK, C4 also now has Freely, launched in April 2024, as a shared platform that also needs to be validated.
The primary problem with shared platforms is the very different capabilities between chipsets used in the different platforms. This creates an inconsistency that, from a testing perspective, is challenging economically and logistically to manage, given the many thousands of devices in the market that broadcasters deliver to.
In the UK, the Digital Television Group (DTG) plays an important role in supporting broadcasters, service providers, and product manufacturers to navigate the complex digital technology ecosystem involved in TV and streaming media services. The DTG ZOO, understood to be the largest array of TVs in the world that represent over 95% of the UK’s free-to-air receiver market, is available as a testing resource. It is used for testing HbbTV Apps, FAST channels, and Players, along with traditional broadcast tests like HDMI, RF, Audio-Video Encoding, and power consumption.
The DTG ZOO supports interoperability testing and non-functional performance testing. Streamers and manufacturers can see how a User Interface works across a wide range of streaming devices and test across a wide range of technology stacks. Non-functional performance testing includes load testing, latency tests, and is often used to test the all-important transition from Program to Ad Break and back again.
Ranjeet Kaur, Program Director at the DTG, shares, “For over 2 decades, Digital TV has been based on a relatively simple user experience for consumers and clear manufacturer guidelines to comply with certain standards (e.g., DVB-T). Test suites, hardware requirements, and usability testing have been reasonably straightforward. But things are much more complex with Streaming, to support different device types, operating systems, and hardware components. And this complexity is only increasing. Streamers must deploy their services operating system by operating system, and this creates a choice about device coverage for the service, which often leads to a roadmap per service to widen its coverage. Manufacturers do not have a single standard to meet, so they must work with different App providers to run in-depth testing programs to ensure their devices perform well in the market. Global streamers and large national streamers are refining requirements to arrive at a more standard approach, but still the situation is more complex than Digital TV was before. The best practice we observe in the market revolves around Streamers and Manufacturers proactively collaborating on a commercial basis to jointly improve streaming performance for audiences.”
DTG Zoo is also available for hire to companies globally as a fully managed testing service that can include remote access to devices as required.
Supplier Changes Drive Risk-Assessed Testing
Streamers always need to stay across technology changes in their market. Suppliers routinely introduce changes to existing products and services or introduce completely new solutions that consumers or TV service providers use. This drives decisions about whether it is sufficient to run regression tests or a subset of specific testing processes, depending on the importance of the change. An important change would be a CDN API Gateway update, an Encoder update, or a major firmware update on a device like a Samsung Smart TV or Sony Playstation. That said, it is not always possible to test the Streaming service before the supplier update is introduced, and sometimes an emergency update must be released. Such is the world of using standardized IT services and IP-connected devices without strict standards.
And even in the scenario that change notices are proactively managed, it is not always possible to re-validate the service in the pre-production environment and be confident the Zero Bug policy has been met. It is often the case that problems are only discovered when the change is fully introduced into the Production environment and goes through its full real-world test. In short, strong relationships with suppliers are important to avoid significant impact from supplier changes.
When a Streamer like C4 is introducing a change to its service, some platforms, like Apple iOS, provide the option to phase releases across a subset of the user base. This helps identify issues before they impact an entire customer base. In essence, where there is more control over the release cycle that allows risk to be closely managed, then a Streamer is generally more willing to take more risks in its new product introduction process.
Apple iOS is considered to be a relatively easy platform to work with, even if it is estimated to run on about 2.2 billion devices worldwide. There are a broad range of automated tests and regression suites that help the Streamer to build reasonable levels of confidence in their service changes. Android is considered more complex because it runs on such a wide range of devices from different manufacturers (e.g., Samsung, Google, Motorola, OnePlus) that are estimated to currently total about 3 billion devices. C4, as a reference point for a significant national broadcaster in a larger than average country, tries to test on a wide range of Android-based devices, but given their different chipsets, security levels, and hardware decoding capabilities, C4 faces significant complexity to validate a service in a pre-production setting.
Smart TV, or big-screen, testing is even more involved than Android testing due to the array of operating systems and screen formats. At C4, where the number of views on the big-screen have been continuously growing and now represent more than 60% of total views, most of the testing work is now focused on big-screen devices.
In the streaming technology stack there are numerous types of other technical devices, such as servers (private and shared), routers, and home gateways. These devices are a minor part of normal testing processes when changes are introduced to the Streamer’s App. They become important only when major video processing and delivery infrastructure changes are introduced, such as new encoders, packagers, or CDNs. In these cases, a new packager can create compatibility issues or a new CDN can have problems with a subset of devices due to non-adherence to a spec from either party.
As Sarita Dharankar, a former QA Director at Snell Advanced Media (now part of Grass Valley) and more recently an AWS Solutions Architect, explains: “Many organizations have now moved to a software delivery model where there is no distinct QA team. Development teams augment the lack of dedicated QA team members with automated testing practices. The adoption of continuous integration and continuous delivery strategies, such as blue/green deployments, allows optimization opportunities to build resilient applications for scale, while enabling detection of applications failures early in the development life cycle. Using application performance monitoring tools to gather metrics, logging, and telemetry information enables detection and alerting on unexpected system failures. Complementing the Observability function with purpose built operational playbooks and run books enables timely remediation of application issues. This strategy minimizes unhealthy states of system availability, which is clearly critical given that seamless user experience is a key success criterion for streaming applications.”
Part 2 of this article will look at how customer-supported testing is evolving and explores the never-ending quest for “full confidence”.
You might also like...
Designing IP Broadcast Systems - The Book
Designing IP Broadcast Systems is another massive body of research driven work - with over 27,000 words in 18 articles, in a free 84 page eBook. It provides extensive insight into the technology and engineering methodology required to create practical IP based broadcast…
Operating Systems Climb Competitive Agenda For TV Makers
TV makers have adopted different approaches to the OS, some developing their own, while others adopt a platform such as Google TV or Amazon Fire TV. But all rely increasingly on the OS for competitive differentiation of the UI, navigation,…
Demands On Production With HDR & WCG
The adoption of HDR requires adjustments in workflow that place different requirements on both people and technology, especially when multiple formats are required simultaneously.
Standards: Part 21 - The MPEG, AES & Other Containers
Here we discuss how raw essence data needs to be serialized so it can be stored in media container files. We also describe the various media container file formats and their evolution.
Broadcasters Seek Deeper Integration Between Streaming And Linear
Many broadcasters have been revising their streaming strategies with some significant differences, especially between Europe with its stronger tilt towards the internet and North America where ATSC 3.0 is designed to sustain hybrid broadcast/broadband delivery.