The Streaming Tsunami: Testing In Streaming Part 2: The Quest For “Full Confidence”

Part 1 of this article explored the implementation of a Zero Bug policy for a successful Streamer like Channel 4 (C4) in the UK, and the priorities that the policy drives. In Part 2 we conclude with looking at how Streamers can move towards having “full confidence” in their testing processes.


More articles in this series and about OTT/Streaming:


Customer Testing Remains Super-Valuable (Because Customers Do Unexpected Things)

Technical testing dominates the approach to testing because it covers the majority of scenarios that require testing and it can be automated, but customer testing remains a critical part of the testing framework.

In customer testing, test scripts are created for the top customer journeys such as arriving at the homepage, logging in, visiting a channel brand page, and searching for different types of content.

People generally manage to find problems that are not found by test scripts, for example an incorrect aspect ratio for a particular image. More often than not, the human-found issues are subjective quality issues. But people will also find unexpected bugs because they may take a certain sequence of steps that normal testing processes aren’t set up for. For this reason, before releasing its App to millions of users, C4 undertakes crowd testing to get real world feedback on the user experience of the App and User Interface.

Playback testing is a different situation for various reasons. While Playback is essentially the most important customer journey, there are three primary technical reasons why it is difficult to test at scale in a pre-production environment:

First, if DRM (digital rights management) is part of the delivery chain for a channel, then it requires a real authenticated device to run a Playback test.

Second, SSL (Secure Sockets Layer) has made it harder to diagnose a problem because “sniffing on the wire” is no longer straightforward thanks to the stiffer security settings.

These first two security-related points highlight that any new layer of security in a streaming service makes load testing and penetration testing more difficult.

And third, it is hard to find all the qualitative issues from Playback through the small sample of customers involved in crowd testing. To improve on this, C4 invested in a mechanized and automated method from Witbe for continuously watching and analyzing properly authenticated streams (which overcomes the security problem) across multiple device types. Anecdotally, it is proving to be effective at closing the gap on eliminating bugs in the pre-release stage.

As James Scanlan, Quality Assurance Manager at C4 explains: “C4 has quality assurance embedded end to end in its processes to ensure functional and non-functional aspects are addressed to give our viewers the best experience possible. Testing is generally performed automatically against real devices. The key challenge we need to focus on is due to the wide spectrum of physical devices our viewers can use, and especially the lifetime of those devices that is extending all the time. We find that a lot of defects only occur on older devices, which leads us to difficult decisions about how to support our viewers on out-of-support platforms.”

The Opportunity For Deeper & Faster Testing With GenAI

Automated technical testing and focused customer testing are the core domains of the Quality Assurance approach.

Generative Artificial Intelligence (GenAI) is now able to augment this approach in both domains. While this needs to mature and be applied in the appropriate way for Media Streaming Applications, in general terms for software development and software testing, GenAI already offers several important benefits to support the Zero Bug policy.

  1. Automation of Test Case Generation based on application code and user scenarios.
  2. Enhanced Test Data Management by generating and managing synthetic test data.
  3. Democratization of Testing by enabling non-technical users to engage in the testing process.
  4. Reduction of Human Error by automating repetitive and error-prone tasks like test case creation and execution.
  5. Faster Testing Cycles by speeding up test case generation, execution, and feedback loops.
  6. Improved Defect Detection by more effectively identifying patterns and anomalies.

Using GenAI in software testing requires skills and experience in both traditional methods of software testing and in AI. Knowledge of existing software development lifecycles, testing methodologies, and the nuances of software systems is essential to successfully integrate AI tools into existing workflows as AI tools must be configured and adapted to specific contexts. Any transition from manual or semi-automated testing processes to AI-enhanced automated testing needs expert oversight to not disrupt existing systems and decide the best use cases.

At the same time there are limitations and ethical considerations that require human oversight. AI testing may struggle with certain types of testing that require a deep understanding of business logic or human behavior, such as watching a video. In addition, AI systems can sometimes make decisions that are opaque or based on biased data. To avoid introducing new risks or errors into software that has been tested by automated AI-enhanced processes needs careful management.

As with any testing process, there is a test-observe-act-test loop. GenAI creates opportunities to accelerate these loops with simultaneous deeper learnings that ultimately continue to improve how we develop software for our Streaming Apps.

It’s Not Possible To Have Full Confidence From Testing (But We Can Get Close)

Achieving “Full Confidence” from Pre-Production Testing is the nirvana, but of course it’s not possible. Pre-Production Testing simply doesn’t give the full scale or the real-life conditions at the time of the actual streaming event or release of the upgraded service. Getting as close as possible is therefore the goal.

Pre-Event Testing, especially for big live events or the release of a new highly popular series that drives appointment-TV behavior, is a specific challenge. Given that streaming services keep breaking records for viewer numbers and bandwidth utilization pre-event testing is often a step into the unknown. So how do Streamers make sure everything will be alright on the night?

A basic good practice is to use logs and learnings from previous events to help prepare for the next one. When the event is expected to be bigger than anything previously managed, working with a testing service provider to do more comprehensive test simulations is important. But this is often limited to simulated load testing in the Pre-Production environment on the interplay between the Streamer’s owned and operated services, rather than a full end-to-end Service Test across all parts of the delivery chain with real traffic. C4 Streaming’s specific experience from its pre-event and pre-production tests is that scalability issues relate mostly to devices, not networks. This focuses the team on the areas that matter most in pre-production and pre-event testing.

Then timing becomes a key point. If tests are run too early then too many supply chain conditions could change by the time of the main event. If tests are run too late then there may not be time to fix the identified issues found by the tests.

In effect, full testing really only happens in full Production mode, thereby making every major streaming event a Live Experiment. We are therefore brought full circle back to the origins of Media & Entertainment, where the concept of a rehearsal before the live act was to be as ready as possible to deliver the performance as intended, but this could never guarantee the live act would be flawless. Broadcasters that stream their content still feel this “live performance on the night” concept.

And just as Theaters and Broadcasters have done for many decades, as the live event is a one-time-only event, precautions are taken and operational readiness is set to its highest level. Because, even after all this effort, conditions on the internet change all the time. Networks have outages, routes are changed, other events consume capacity. Being prepared for the unexpected is the norm.

The current approach to world-class streaming readiness is nicely summed up by Declan Toman, C4 Streaming Delivery Manager, “We put every effort into delivering flawless performance for our viewers and advertising partners through Pre-Production and Pre-Event testing processes, but we know that we must be ready for quick-fixes in real-time because there is simply no replacement for real-world Production-scale load testing. Therefore, for major streaming events, even if we’ve successfully delivered at that scale before, we are on operational high-alert with Dev teams and suppliers ready to work on any identified incidents. Zero Bugs is a great policy to have, and we pursue that agenda relentlessly so that our service to viewers is as good as it can possibly be and we minimize demands on our operational teams and partners. But in parallel, to reach as close to broadcast-grade streaming as possible we also employ a Fix-Fast capability during live streaming coverage, which gets maximum focus from an Operational perspective."

You might also like...

Designing IP Broadcast Systems - The Book

Designing IP Broadcast Systems is another massive body of research driven work - with over 27,000 words in 18 articles, in a free 84 page eBook. It provides extensive insight into the technology and engineering methodology required to create practical IP based broadcast…

Operating Systems Climb Competitive Agenda For TV Makers

TV makers have adopted different approaches to the OS, some developing their own, while others adopt a platform such as Google TV or Amazon Fire TV. But all rely increasingly on the OS for competitive differentiation of the UI, navigation,…

Demands On Production With HDR & WCG

The adoption of HDR requires adjustments in workflow that place different requirements on both people and technology, especially when multiple formats are required simultaneously.

Standards: Part 21 - The MPEG, AES & Other Containers

Here we discuss how raw essence data needs to be serialized so it can be stored in media container files. We also describe the various media container file formats and their evolution.

Broadcasters Seek Deeper Integration Between Streaming And Linear

Many broadcasters have been revising their streaming strategies with some significant differences, especially between Europe with its stronger tilt towards the internet and North America where ATSC 3.0 is designed to sustain hybrid broadcast/broadband delivery.