Viewing the Web Browser as an Opportunity Not a Threat

What if a video production could be tailored to each viewer, based on transmitted audio and video essence and data stored in the viewer’s browser? Suppose the browser could receive the content and based on the viewer’s personal data, create an individualized version of the program based on that data?

In Southern England, when it snows, chaos reigns. Road transport grinds to a halt and all the local TV channels start broadcasting, using rolling text on strap lines, the names of the hundreds of schools in the area that have shut for the day. TV needs to modernise; the tailored TV of the future would tell me that my family’s school has shut, not other peoples’.

Tailored TV

Tailored TV supports creative innovation. The murderer in a crime drama opens Facebook to choose their next victim, and it’s your profile that they’re looking at; then your phone rings!

Tailored TV supports disability. By allowing a user’s profile to influence how audio is mixed, users with hearing impairments can have a tailored sound mix that reduces the impact of background noise or music that can obscure speech.

A key advantage of tailored TV is that each consumer receives content specially customized for them.

A key advantage of tailored TV is that each consumer receives content specially customized for them.

More commercially, tailored TV will allow targeted advertising. Not only temporal advertising, where the adverts interrupt the entertainment, annoying the viewer, but spatial advertising, where blue screen techniques and alpha channel mattes are used to create the tailored advertising on stadium billboards or a sporting field.

Our television distribution systems have historically required a destructive production process where all source essence is rendered flat into a linear stream, destroying all the part / whole relationships and the ingredients used in the recipe that makes the production. A truly web based video distribution system would no longer require these lossy processes, as the edge device would assemble all of the parts into a unified whole.

A browser-based interface

These innovations can only be achieved using a browser-based viewing device. The browser enables two things that support this specialisation: profiling – knowing who is watching; and rendering at the edge. The combination can support massive scale and distribution.

The first part of this is starting to happen, with large data centre computers rendering streams of footage for particular groups of viewers – tailoring the results by having a knowledge of who’s viewing. Rendering in the data centre does not work at scale, as each unique stream needs storing separately in the CDN, defeating the benefit of caching required to scale.

Rendering at the edge is a pre-requisite for the TV of the future, as this is the only way to allow millions of viewers to each have their own experience.

The edge device

The viewing device is likely to combine both highly capable processors and an internet connection. However, it is more useful to discuss the edge platform; the browser that renders the essence the viewer is consuming.

A modern browser is quite capable of cutting together streams of video and rendering new pictures to a given recipe. Audio support is just as rich, so the idea of rendering at the edge becomes viable – even at great scale. Additionally, all ancillary streams, such as captions and perhaps even rights can be streamed to the browser.

A discussion of browser capabilities often starts with HTML & CSS, but much of the Internet is built using JavaScript. Of interest is Media Source Extensions (MSE). MSE allows web developers to request GOP’d payloads (often termed chunks) from a web server and present them for decoding to uncompressed frames by browser codecs. 

Figure 1.  The MSE pipeline model has the potential for multiple source buffers to be spliced together.

Figure 1. The MSE pipeline model has the potential for multiple source buffers to be spliced together.

MSE has a model, see Figure 1, that allows multiple segments of media to be spliced together with arbitrary alignments; supposedly supporting cuts between GOP’d media at any frame offset. Currently MSE implementations refuse to instantiate more than one source buffer, so only linear play is directly supported. However, Google Chrome engineers have started discussing relaxing this constraint in future versions. For now, cutting together / splicing arbitrary edits is possible using MSE with additional (quite tricky) JS code.

Once the selected frames have been decompressed, one can access them in JS code to manipulate the content before displaying them to the user.

Video editing in the browser

At this point I need to draw on the source code in our in-house built editing products. What are the set of operations that one can perform on pictures to produce rendered results? The answer is in the interface to our lowest level classes that control how the GPUs we (SAM) use to manipulate pictures. See Figure 2. The interface contains the following functions:

Figure 2.  Typical elementary video editor content assembly features.

Figure 2. Typical elementary video editor content assembly features.

I am not going to explain each function, but as an example, wipe takes a source and a key and sets a colour through the key onto the source. The point is there is another part of the JS system available in the browser, it’s called Web Graphics Library (WebGL). WebGL allows access from the browser to any graphics hardware available on the host device; it is ideally suited to rendering these functions.

Exposing objects

Now it has been shown that the browser has the potential to assemble program content from sources and recipes, the discussion needs to turn to how the browser can access the program parts at scale and how live feeds can be supported.

A URL allows for arbitrary information to be embedded in a string that the browser has no knowledge of, this is the essence of HATEOAS (Hypermedia As The Engine Of Application State) in RESTful (Representational State Transfer systems). The server can have a distributed view of the client’s state, by offering URLs in client pages that encode what needs to happen when that URL is followed. A file can contain many URLs, and enough JavaScript that the different sources that the URLs access can quite reasonably be rendered together.

Embedding technical identities in URLs allows the web server to make data centre queries when processing requests. A simple file can support access to many streams simultaneously, some of which can be stable and pre-existing in CDNs (like adverts) and some of which could be live and delivered direct from an origin server (like a live sports feed). Indeed, non-live objects can easily be pre-cached in the edge – either in the viewing device or the edge CDN.

The data centre

An origin server used for a broadcast cannot possibly serve millions of concurrent clients on its own. However, with stable URLs and large scale CDNs, the streams of media can be accessed using logarithmic load scaling. A tree structure, with the root at the origin server and each leaf being a viewer allows for massive scale. HTTP has explicit support for intelligent caching, so if each element referenced by a URL does not change, CDNs allow for massive concurrent viewing.

The presumption in previous implementations of broadcasting over the web, is that every viewer gets the same viewing experience, with simple overlays and edge of screen advertising being displayed by the browser outside of the video streaming part of the web page. With the technologies of the modern browser and intelligent caching, this no longer holds. Everyone can get access to the same sets of streams, but the recipe at the client can combine with the knowledge that the browser has of the identity of the user to build queries that are unique to each user. As an example, all viewers of the live sports game get to see the same sports coverage, but each client gets different embedded advertising unique to their social profiling.

This means that URLs carrying requests from the web client to the datacentre need to carry identities of the media being requested and the identity of the viewer. This happens all the time when you look at Facebook. The cookies planted by Facebook in your browser cache uniquely identify your account to the Facebook computers. With the power of the browser to assemble video this can now happen for edge video productions. 

Production workflow will of necessity have to change. Even so, there are benefits to both producers and viewers if properly implemented.

Production workflow will of necessity have to change. Even so, there are benefits to both producers and viewers if properly implemented.

The production system

We now need to discuss video production techniques.

Most of today’s production systems would struggle to support building compositions that render outside of the broadcast centre. The production would need to keep all the parts that are used in the composition quite separate and distinct all the way to the origin server in the Data Centre. This implies having a pipeline that tracks essence along two axes: renditions and locations.

My name is James. My name is not James wearing shorts in Las Vegas. My name does not change to James wearing trousers when I’m in London. My identity is not linked to either my form or my location. Production systems of the future need to be able to reference media using strong identities that are stable. So, an editor can use a large high bitrate version of some essence, but adaptive bitrate viewers over the web can request one of several different renditions of the same media at different bitrates.

Object storage

Many parts play a role in cloud computing. One part notably absent from data centre systems are shared access filesystems, such as GPFS, Isilon or even Samba. The data centre can scale as it does partly because it does not store files in folders, it stores files in objects. Object stores support massive scale as they don’t need a monolithic index (the directory in a filesystem) and don’t (generally) support object mutation (thus avoiding the centralised locking that file or folder mutability implies). You allocate and store an object – its name is a hash or a GUID derived from the contents it stores. To retrieve the object, you need to supply the ID that was generated as the object was created.

This implies that the identities need storing somewhere – so an obvious place would be a database. Many NoSQL databases have been developed to solve this very problem. An alternative is those URLs we discussed earlier. URLs can be built in such a way that the ID of the objects they represent are actually encoded in the text of the URL. The browser does not know, as the browser does not have a semantic understanding of the text. Embedding object IDs in URLs is the heart of REST – we are building a RESTful video composition system.

Editing and distribution

The web browser can compose the rendered result from the recipe it has been supplied, but what is making the recipe? The web browser! These insights into video rendering have come because SAM are in the throes of building a browser based video editor. The requirements of the web based video editor are a (small) superset of the web based video compositor – in other words a web based video player.

It is about as hard to play a real time composed video stream as it is to allow the recipe that the video is composed of to be edited. In effect, what has been described so far is a multi-million user concurrent video production and consumption system.

As a consequence, the creatives of the future do not need to be co-located with the sources that they are editing or the consumers they are creating for. This has massive implications for the structure of media organisations of the future.

The distribution part of this puzzle is the part most developed. Production and consumption – enabling the web browser to do both – is the part that needs most research and development to achieve this.

The mantra of Big Data is to delete nothing, keep everything, building more data indefinitely. Another Big Data mantra is to mutate nothing (which is, in effect, another type of deletion).

The editing together of a package does not ever destroy or render flat. Archives are the same as what is consumed – ingredients and recipes. The set of technologies we have developed to mitigate the losses from video production (shot change detection – your time is up!) are obviated as their very rational is obsoleted. More valuable, as nothing is destroyed, the tracking of rights usage becomes trivial, as all essence references are explicit within the recipes and the URLs.

Keeping immutable objects with stable names, having simple relationships between different renditions of media irrespective of storage location – whether data centre or CDN – enables the scale required to build this vision.

Security and rights

Each chunk of essence can be stored in its object store fully encrypted. Access to the content is not the right way to prevent essence theft (as essence gets cached in CDNs).

Security and DRM is always a concern. The tailored TV web player/browser can be well-designed to manage those rights.

Security and DRM is always a concern. The tailored TV web player/browser can be well-designed to manage those rights.

What is needed is adept control of access keys, supporting fine grained control of who can decrypt the essence contained in an object.

Policy can also play a part. A low-quality view – or a water marked view – can be streamed freely, but a high quality version of the same edit may have its keys carefully controlled to protect copyright theft.

What is next?

This article has demonstrated that the browser has the potential to support edge composition of video productions. Data centre technologies and CDN infrastructure can be used to allow flows of essence and recipes to the edge devices at a massive scale. A web based editor, using the same technology in a web based player will combine to create a new web browser.

Such changes will have massive ramifications on the broadcasting industry as the technical cost of entry to production is lowered. All of which will democratise the production process and merge the consumption and production platforms.

The 20th century was about mass production, the 21st century will be about tailored production. Instead of making one thing for millions of customers we will make millions of unique things for each customer. Such will also be true for video production and consumption – the web platform of today points the way for the video platform of tomorrow. 

James Westland Cain, Ph.D., Snell Advanced Media

James Westland Cain, Ph.D., Snell Advanced Media

You might also like...

Standards: Part 24 - Timed-text & Subtitles Overview

Carriage of timed-text must be closely synchronized to the AV stream to ensure it is presented in a timely manner so here we describe the standards that enable this for both broadcast and internet delivery.

HDR & WCG For Broadcast: Part 3 - Achieving Simultaneous HDR-SDR Workflows

Welcome to Part 3 of ‘HDR & WCG For Broadcast’ - a major 10 article exploration of the science and practical applications of all aspects of High Dynamic Range and Wide Color Gamut for broadcast production. Part 3 discusses the creative challenges of HDR…

IP Security For Broadcasters: Part 4 - MACsec Explained

IPsec and VPN provide much improved security over untrusted networks such as the internet. However, security may need to improve within a local area network, and to achieve this we have MACsec in our arsenal of security solutions.

Standards: Part 23 - Media Types Vs MIME Types

Media Types describe the container and content format when delivering media over a network. Historically they were described as MIME Types.

Six Considerations For Transitioning To Cloud Based Video Distribution

There are many reasons why companies are transitioning from legacy video distribution workflows to ones hosted entirely in the public cloud, but it’s not a simple process and takes an enormous amount of planning. Many potential pitfalls can be a…