Standards: Part 25 - Designing Client-Side Video Players

Here we chart the historical development of client-side video players, describe the building blocks used to create them and the relevant standards.


This article is part of our growing series on Standards.
There is an overview of all 26 articles in Part 1 -  An Introduction To Standards.


Efficient client-player implementations require a full-stack approach. The client-player and server-middleware must be closely coupled with well factored code running optimally in the right place.

Video centric streaming services have overtaken the earlier embedded video approach which augmented the text in a web page. The same coding tools are used to construct the client-side user experience but the visual content now dominates the page. The roles of text and video have been exchanged.

Pushing decisions back to the server-side eliminates redundant content delivered over the network when the target client platform is known from the outset.

Modern web development allows the JavaScript in a web page to request additional content without reloading the page. Anything in the page can be altered in the client as a result of user interaction.

A Little Bit Of History

Web-based video players emerged in the late 1990's with Real Video based on H.263 coding. Real Networks provided a streaming server and compatible player plug-in for web-browsers. Bandwidth limitations only allowed very small moving images.

Netscape and Microsoft embedded the video with different tags (<embed> and <object>) which were mutually incompatible. Plug-in manufacturers needed two versions of their players and web developers would then build two different versions of a page.

This is the BBC News Online video console from around 1999 which looks archaic compared with modern streaming services but it represents where it all started:

Progress was rapid. By 2003, MPEG-4 BIFS based players looked like this prototype which carried 45 minutes of compressed video and interactive TV content in a single downloadable MP4 file. The Envivio player plug-in supported this on every platform but BIFS failed to gain sufficient traction it to become successful in the marketplace:

The World Wide Web Consortium (W3C) facilitated a convergence of the browsers around a common standardized definition of HTML based media player functionality. This was seen as a marketing benefit by the browser manufacturers. Consequently, web-based players are more advanced and easier to build now with HTML5.

The Benefits Of HTML5

The introduction of HTML5 changed everything and web content became much easier to create and develop because the platforms supported a much larger and more consistent feature set that was available everywhere. The relevant standards are:

Standard Description
HTML5 Managed by the World Wide Web Consortium (W3C).
CSS Level 3 Managed by the World Wide Web Consortium (W3C). Levels 4 and 5 exist but are fragmented into many separate parts.
JavaScript Core Standardized by the European Computer Manufacturers Association (Standard ECMA-262). Known as ECMAScript.
DOM The Document Object Model describes an API to the web page content and CSS style model that JavaScript uses. Managed by W3C, it is layered on top of the ECMA JavaScript core.
Other extensions Stay within the support defined by the DOM. Then the additional items defined by proprietary web browser manufacturers are of little consequence. These are best avoided to ensure portability.

 

The HTML5 media tags

One of the important features of HTML5 was the introduction of the <video> and <audio> tags. These provided playback support for media directly inside the browser. Prior to this, it was necessary to use plug-in modules. The plug-ins were supported differently in each browser via the <embed> and <object> tags. The JavaScript API support for each of these was different and the choice of plug-in installed via these tags added more complexity. The media players contained in these tags would run in a separate memory and process context to the main browser which required low-level inter-process communication which added more complexity:

Writing JavaScript code that crosses the context boundary into the plug-ins to access their API is sometimes highly complex and challenging when avoiding bugs or the required functionality is not supported. Some web sites added to this complexity by having a video player in one plug-in and an Adobe Flash based controller in another. The JavaScript glued everything together but was extremely difficult to write. Dealing with buffering issues when delivering progress updates into an Adobe Flash control bar and avoiding memory leaks in the browser was especially difficult.

The HTML5 <video> and <audio> tags eliminate all of that complexity.

The HTML5 media tags are described as first-class citizens because they run in the same memory and process space as the browser. This made the interaction with JavaScript much more reliable and since they were standardized by W3C, they worked exactly the same everywhere.

Soon, the media tags were enhanced to include the <source> tag which provides multiple alternative content sources for the browser to choose from. The <track> tag was added to support subtitles and captions but it can do so much more than that by exploiting the JavaScript event-handler triggered by each cue as it arrives.

The <video> tag is designed to play visual content with an accompanying synchronized audio track. It supports MP4 and WebM video across all the major browsers but the less popular Ogg format is not currently supported by Safari.


HTML5 also introduces the <picture> tag which shares the <source> tag with the <video> and <audio> tags.


The <video> Tag

Because the <video> tag is a first-class citizen, it supports all of the global HTML element attributes. The JavaScript API and CSS styling for block structured elements is also supported.

The <video> tag has additional behaviors determined by HTML attributes. These are all mirrored into the Document Object Model (DOM) so that JavaScript can access them as properties.

Attribute Description
autoplay Indicates that the video should commence playback as soon as it is able.
controls Activates the generic controls. Alternatively, build a separate control bar with HTML/CSS/JavaScript.
height={pixels} Defines the height of the video player.
width={pixels} Defines the width of the video player.
loop Causes the video to loop continuously.
muted Controls the audio muting without altering the sound volume level.
poster={URL} Describes a poster image to display as a placeholder until the video is played.
preload={auto | metadata | none} Controls how the video is loaded into the web page before playback commences.
src={url} The location where the video is streamed or downloaded from. Use this instead of incorporating a single <source> tag.

 

JavaScript can call methods (functions) in the DOM object representing the <video> tag to control the playback experience.

Method Description
addTextTrack() This is the equivalent of including additional <track> tags inside the <video> tag. It returns an object representing the named text track. JavaScript can then add or modify the cues in that text track. The type of text track and its native localized language can also be specified when it is created.
canPlayType() This query requests whether the browser can (probably) play the specified video codec MIME type. Where a codec is not supported, code a work-around or present a suitable error message.
load() Reloads the video object with content from a new source URL.
play() Starts playing the video.
pause() Pauses the currently playing video.

 

In theory, it should be safe to call the play() and pause() functions regardless of the current playing state of the video. It is good practice to be state-aware in the supporting code. Wrap these calls in a single play/pause handler function that checks the playing state and calls the appropriate method in the video object. Attach this to the play button and then change the icon on the button when a state-change event is triggered.

The DOM video object has approximately 30 properties in addition to those in the HTML global set. The properties return the current player state and configuration settings.

As the video is loaded and played, it will trigger JavaScript events. Attach handler functions to these events to update the user interface appearance as the player state changes.

There are over 20 supported event types which could be fired asynchronously at any time. Code the handlers carefully because additional events may arrive while an earlier one is still executing. Keep the event-handlers short and return as soon as possible.

It is also good to write re-entrant event-handler code. Then the event-handler can be called multiple times, possibly simultaneously, without any of the instances interfering with the others. It is not very difficult. Avoid referring to globally available objects. They might get trampled on.

The <audio> Tag

The <audio> tag is very similar to the <video> tag except that it has no visual presentation rectangle. It will take up some physical space in the page if the control bar is active. If subtitles are important, then it may be preferable to use a <video> tag to play the audio so it has somewhere to display the text.

The <audio> tag supports MP3 and WAV audio in all browsers but Ogg containers are not supported in Safari. Although it is often undocumented, the AAC audio codec is widely supported, provided it is contained in an MP4 file.

The <source> Tag

The original design of the <audio> and <video> tags specified the source content URL in the src="{url}" attribute.

Embed additional <source> tags inside the <audio> or <video> containers to allow multiple sources to be provided. The browser can then choose a preferred format:

<audio controls autoplay>
  <source src="music.ogg" type="audio/ogg">
  <source src="music.mp3" type="audio/mpeg">
  This web browser cannot play audio.
</audio>

In this example, there are three possible outcomes. The browser will choose the first one it supports. If the browser can play content in the Ogg format, then it will use the first <source> tag. In a Safari browser, it would fall back to the MP3 audio defined in the second <source> tag. Safari cannot play Ogg files. In the rare situation where a browser does not support the <audio> tag, the text message will be presented instead. This is the expected behavior for an unsupported tag.

Additional flexibility is provided via the CSS media queries and viewport size properties. These can also affect which one of several <source> tags are selected for presentation.

The <track> Tag

The <video> and <audio> containers support <track> tags along with the multiple <source> tags. The <track> tags connect to WebVTT streams that deliver timed-text that is synchronized to the playback.

When a timed cue arrives, the player triggers a JavaScript event and passes across the payload in the event object. The JavaScript handler extracts the payload and decides what to do with it.

Timed text tracks can be one of these types:

  • Captions.
  • Chapter marks.
  • Descriptive text.
  • Metadata.
  • Subtitles.

The event-handler inspects this type value and handles the payload accordingly. It can also detect which one of several text tracks triggered the event. The user interface can then provide a choice to the user.

Textual content such as subtitles or captions can be inserted into <div> or <span> blocks via their corresponding DOM objects. They will appear on the screen right away. CSS styling can be applied to change the appearance.

Advanced Ideas For Event Cues

The metadata track type can carry more diverse payloads. For example, they could mimic traditional broadcast signaling messages and alter the displayed aspect ratio of the media.

There are opportunities to go much further if the text is interpreted as drawing instructions to present graphical symbols. When a live human interpreter is unavailable, a virtual BSL signing avatar could be implemented like this.

Embedding synchronized Scalable Vector Graphics (SVG) drawing instructions can augment the main program material in a variety of other ways. Parts of the picture can be highlighted and annotated or a marker could be moved around a map placed beside the video rectangle.

Forced narrative subtitles can be treated differently to the normal subtitle text so they are always presented regardless of the subtitle visibility setting.

Similar undocumented functionality was present in the RealVideo player in which could call a browser to action. Experiments in 1999 with javascript: requests instead of http: requests proved that embedded players could alter the appearance of the containing page in much the same way.

The <track> tag event triggering technique based on WebVTT streams is technically superior and the content authoring process is much simpler.

Learning Outcomes

From a long history of developing media player implementations, some important ground rules have evolved as learning outcomes. Some things worked and others didn't. This is a distilled set of bullet points to consider when developing a media player:

  • Always call the player to action with a UI triggered handler that calls a method in the media player object. DO NOT modify the UI appearance during this handler.
  • ALL UI appearance modifications should be the result of a player event triggering a JavaScript handler. This ensures the UI state correctly reflects the player's internal state.
  • Carefully maintain the state interlocking between the JavaScript code and the player internals.
  • Built-in control bars often overlay the video content. Build external control bars with HTML/CSS and drive them with JavaScript support.
  • Determine the platform and browser type at the server-side at the outset.
  • Vend content to the client that has been configured and optimized for that platform.
  • Use XMLHttpRequest (XHR) calls to the server to request supplementary content.
  • Display subtitle text outside the video window using HTML/CSS to style it but don't rule out presenting it inside the video window if that seems appropriate.
  • Subtitle text in <div> or <span> blocks can be floated above the video window using z-index layering techniques. Subtitles can be positioned in X & Y using CSS.
  • Develop re-entrant handlers for JavaScript state-change events. This coding approach avoids the use of global variables and objects which can be trampled on when additional events arrive simultaneously and invoke the same code.
  • Keep the code in an event-handler short so it can exit quickly before another event arrives.
  • Use the 'this' keyword to deliver a reference pointer to the target UI object that was clicked on. This reduces the code required to locate the clicked object from the event-handler.
  • Pass object ID values for related containers in the data-name attributes. These are easily queried inside a JavaScript handler and avoids the need to derive the ID of a target algorithmically.

Conclusion

Video players have evolved out of humble beginnings to realize some sophisticated streaming services.

Between 1998 and 2004, there was a lot of exciting and interesting research into Interactive TV services. None of the platforms they were built on have survived, but we do have a rich heritage of research ideas and learning outcomes to mine for ideas. This would be an interesting area to develop as additional services alongside the streamed content.

Implement modern players with additional interactive content using a combination of HTML5, CSS3 and JavaScript in the knowledge that they will work reliably across multiple platforms.

Part of a series supported by

You might also like...

Microphones: Part 5 - The Variable Directivity Microphone

The variable directivity microphone is very popular for studio work. What goes on inside is very clever and not widely appreciated.

IP Security For Broadcasters: Part 7 - Operating Systems

As well as providing the core functionality of a computer, operating systems have the potential to be a primary issue for security and keeping hackers at bay.

The Creative Challenges Of HDR-SDR Simulcast

HDR can make choices easier - or harder - at every stage of production but the biggest challenge may be just how subjective those choices are.

Building Software Defined Infrastructure: What Is Software Defined Infrastructure?

We begin our new series by asking a simple question; what is Software Defined Infrastructure and why do we need it?

IP Security For Broadcasters: Part 6 - NAT And VPN

NAT will operate without IPsec and vice versa, but making them work together is a fundamental challenge that needs detailed configuration and understanding.