Standards: Part 11 - Streaming Video & Audio Over IP Networks

Streaming services deliver content to the end-users via an IP network connection. The transport process is similar to broadcasting and shares some of the same technologies but there are some unique caveats.


This article is part of our growing series on Standards.
There is an overview of all 26 articles in Part 1 -  An Introduction To Standards.


Broadcast content is gradually being displaced by streaming directly to viewers via an IP network. The streaming server is presented with a file which it then delivers to the client players. That content might be in a static file already prepared beforehand or a virtual file arriving as an external live feed.

The underlying technology solutions have been around for a long time. More recent innovation has improved the performance by alleviating bottlenecks and deploying these novel solutions to facilitate transport efficiency:

  • Edge serving.
  • Adaptive bitrate streams.
  • Reducing latency with the Common Media Application Format.
  • Multicasting.
Streaming Transport

The transport of audio-visual media over IP networked streams is largely independent of the coding choice. Some streaming protocols will dictate particular codecs for carriage.

The video and audio content are embedded together in a similar way to a broadcast program stream. Additional streamed tracks with synchronized text are delivered separately.

In addition to timing and synchronization, part 1 of the MPEG 1, 2 and 4 standards also describe how multiple streams can be embedded and multiplexed into a single transport stream.

Metadata from the content store is delivered within the web page content for the player implementation to use. This supplemental data can configure the player and manage viewing permissions. It can be embedded as special tags, attributes of tags or directly as JavaScript variable assignments in a parameter bridge.

HTML5 introduced the <video> tag as a first-class citizen within the web page. This eliminates the internal process and memory management issues that browsers had with plug-ins.

The HTML5 <video> tag can have many child <track> objects inside its tag body. Each <track> is associated with a different JavaScript event handler. The <video> player knows when the texts need to be presented and triggers an event at the right time. The event payload is individual the timed text string. The JavaScript handler can manage that event in whatever way it chooses. It could parse the text for instructions to display an image in a box, or present the text as a caption. It could place a marker on the screen and move it around in a synchronized fashion or change the aspect-ratio of the video frame.

Edge Serving

As streaming became more popular, the providers found they could not service the huge numbers of streams required from a single central content store. Akamai introduced an array of edge servers which were placed near to the highest traffic locations. The thickness of the lines in the diagram illustrate the bandwidth of the link:

An edge server caches the files locally and serves them from there. The latest version of the content must be deployed when it changes (infrequently).

The edge servers need sufficient storage. It is unrealistic to replicate the entire central repository at each edge server. Only the first part of each video asset is needed right away. The remainder of the asset is requested via a very fast link during the time it takes to watch that first segment. Then the stream is spliced to read from a complete copy instead of the introduction.

Streaming service providers might give themselves extra time to acquire the full copy by presenting interstitial idents, copyright messages and trailers for other upcoming content. This would make the stream switching more straightforward and remove the need for storing abridged introductions.

Adaptive Bitrate (ABR) Streaming

This was originally described as Quantum Streaming and is also described as Multi-Bitrate Streaming. The content stream is broken into many small segments at different bitrates that are reassembled as they are received by the client player.

The server encodes the video multiple times into alternative streams of varying quality. These are optimized for different delivery bandwidth ranges. Then they are sliced into short segments typically between 2 and 10 seconds long. Carefully managing the job queues to run the coding in parallel on multiple processors can speed up the entire process.

During a session, the performance of the connection is measured to determine the available bandwidth. Then an appropriately sized stream of segments is selected. This content negotiation is apparent to the viewer because the stream will initially start with a very low-quality picture. It will quickly improve up to the optimum bitrate available.

The packets are delivered via the HTTP protocol. This is based on TCP which guarantees the delivery of all packets at the expense of some latency due to buffering. HTTP enables the content to traverse corporate firewalls. These typically block most kinds of traffic but they allow web content through in a controlled manner.

This is a resilient way to deliver streamed content because if the available bandwidth is reduced momentarily, a lower quality packet can be delivered. Thus, the viewing experience is maintained albeit with a reduction in quality.

Adaptive techniques are available with these streaming protocols:

Implementation Description
MPEG-DASH ISO 23009 - is an international standard for Adaptive Bitrate Streaming.
HLS Used for delivery to Apple iOS devices as HTTP Live Streaming.
HDS Adobe HTTP Dynamic Streaming. Previously used for Flash based players.
MSS Microsoft Smooth Streaming. Is sometimes repackaged into HLS compatible streams without the need for re-encoding.

 

Although this improves the viewing experience for end users, the production and deployment require additional expenditure on processing and storage capacity to hold the multiple bitrate copies of all the content.

The content can be delivered with a simple web server rather than deploying a specialized streaming server.

There is still some work to do in the area of securing the streams and adding rights control to manage access to content.

Latency Issues Solved With CMAF

All streaming services suffer from latency. Adaptive Bitrate delivery is particularly prone to latency issues. Latency is the delay from the content being input to the coder and then transmitted to the client end-user for viewing on their device. There are several reasons for this:

  • Delivery to the ingest platform.
  • Ingesting process.
  • Coding into the streaming compatible format.
  • Length of the Group-Of-Pictures (GOP configuration of the coder).
  • Assembly into a transport stream.
  • Delivery to the end user which can be delayed by a bad comms link when using TCP to avoid dropping packets.
  • Transmission technology (3G, 4G, 5G).
  • Unpacking of the stream and decoding the content on the device.
  • Buffering in the receiving player.

Although HLS is considered to be a leading contender for streaming services, the latency can be as much as 30 seconds. This is too long for live sporting events where the audience may hear a goal scored via a radio transmission much sooner than seeing it on the screen. The content requires several steps to prepare it for delivery:

A relatively new approach called the Common Media Application Format (CMAF) turns the coding and embedding problem inside out. Instead of slicing a previously coded stream into packets, the raw content is sliced into short segments ready for adaptive delivery first and then encoded fragment by fragment.

The coding of multiple segments can be distributed across a multi-processing grid. Modern computers have multiple CPU processors which can each tackle a segment coding task independently and in parallel.

Read ISO 23000 part 19 for the full specification of CMAF.

Multicasting Techniques

Edge servers are good for delegating video on demand services but live streams are not amenable to this approach because they are not delivered from stored files. Multicasting is a useful alternative and can be managed in a variety of ways:

  • Ethernet multicasting.
  • IP Multicast.
  • Application layer.
  • Peer-to-peer casting.

The streams are all identical so Adaptive Bitrate streaming is difficult to deploy without additional copies of the multicast at different bitrates. Switching from one multicast to another mandates that they must be synchronized which is challenging when they might arrive by different network routes. These are simplified descriptions of each multicasting technique:

  • With Ethernet Multicasting, all packets are delivered to all destinations on the local area sub-net. Any interested node can then pick up the streams from there. Routers can forward the packets to other networks depending on filter configurations. This is wasteful of network capacity on cables where it is not required.
  • IP Multicasting is a more controlled form of multicasting where the clients receive an invitation and then join the streams that they want to. This works on wired or Wi-Fi networks. The networks need to be carefully managed with the router and switch configurations (see IGMP). This is better suited to corporate environments rather than random public access.
  • Multicasting can be simulated at the Application Layer using unicast streams. The server vends a single stream per child process and a client application connects to it. Each client requires a separate stream from the server. The limiting factor here is the number of streams the server can deliver. There are some variations of this approach that use front-end processors implemented either as hardware or software to replicate the streams.
  • Peer-to-peer multicasting uses the WebRTC technology described in the W3C standards. Clients can forward content to their peers with a direct socket connection without needing a server.

Read IETF RFC 3376 to see how the Internet Group Management Protocol (IGMP) signals are used in the context of Multicasting so that all the devices appear to share the same IP address.

Multicasting will get easier to deploy. There are other solutions handled at the network level which have varying levels of complexity and require sophisticated network configuration. New solutions are being researched that will solve the current downsides to multicasting.

Standards For Protocols & Supporting Technologies

Streaming services and traditional broadcasting use many of the same standards although they may apply them differently. Some of them began as proprietary specifications that have been released as open standards and are now royalty free.

The standards are grouped into families according to which organization developed them. Some organizations collaborate to bring complementary expertise together:

  • ISO
  • MPEG
  • ITU
  • EBU
  • SMPTE
  • AES
  • VSF
  • IETF
  • W3C
  • Proprietary (various orgs)

These are the relevant standards:

Standard Name Org Description
ISO 13818-6 DSM-CC ISO/MPEG MPEG-2 - Digital Storage Media - Command and Control.
ISO 14496-8   ISO/MPEG MPEG-4 Part 8 - Video over IP.
ISO 14496-29 WVC ISO/MPEG MPEG-4 Part 29 - Web Video Coding.
ISO 14496-31 VCB ISO/MPEG MPEG-4 Part 31 - Video coding for browsers.
ISO 14496-33 IVC ISO/MPEG MPEG-4 Part 33 - Internet Video Coding.
ISO 23000-5   ISO/MPEG Media streaming application format.
ISO 23000-9 DMB ISO/MPEG Digital Multimedia Broadcasting application format.
ISO 23000-19 CMAF ISO/MPEG Common Media Application Format for segmented media. Significantly improves latency in streamed services.
ISO 23001-8 MPEG-B ISO/MPEG Systems technologies. Part 8 has been replaced by ISO 23091.
ISO 23006 MPEG-M ISO/MPEG Extensible Middleware (MXM).
ISO 23009 MPEG-DASH ISO/MPEG Dynamic Adaptive Streaming over HTTP. This is expected to become more popular. It is codec agnostic so you are not locked into any particular video coding standard. Suitable codecs include AVC, HEVC and VP10.
ISO 23091 CICP ISO/MPEG Coding Independent Code Points used for signaling.
ISO 29116   ISO/MPEG Supplemental Media Technologies.
ISO 29116-1 MXM ISO/MPEG Part 1 describes MPEG extensible middleware protocols. This was previously described as 'Media Streaming Application Format Protocols'.
ST 2110 part 20   SMPTE Uncompressed video transport, based on SMPTE 2022-6.
ST 2110 part 21   SMPTE Traffic shaping and network delivery timing.
ST 2110 part 22   SMPTE Constant Bitrate Compressed Video transport.
AES3   AES Serial transmission format for two-channel linearly represented digital audio data.
AES67   AES High-performance streaming audio-over-IP interoperability.
RFC 1889 RTP IETF Real-time Transport Protocol for delivering multimedia. Superseded by RFC 3550.
RFC 1890   IETF RTP profile for Audio and Video Conferences. Built on top of RFC 1889. Superseded by RFC 3551.
RFC 2326 RTSPv1 IETF Real Time Streaming Protocol for control over the delivery of data with real-time properties. Version 1.0 is a very old format developed in the 1990s.
RFC 3016   IETF RTP Payload Format for MPEG-4 Audio/Visual Streams.
RFC 3376 IGMP IETF Signaling for multicast services. Referred to by ST 2110-10.
RFC 3550 RTP IETF Real-time Transport Protocol. Obsoletes and replaces RFC 1889.
RFC 3550 RTCP IETF Real Time Control Protocol for managing RTP services. Also referred to by the Video Services Form TR-02 recommendation.
RFC 3551 RTP/AVP IETF RTP profile for Audio and Video Conferences. Built on top of RFC 3550. Obsoletes and replaces RFC 1890.
RFC 3640   IETF RTP Payload Format for Transport of MPEG-4 Elementary Streams.
RFC 4566 SDP IETF Session Description Protocol used for session announcement, invitation, and other multimedia session initiation messages.
RFC 7826 RTSPv2 IETF Real-Time Streaming Protocol Version 2.0. Obsoletes the version 1 protocol. This protocol does have extremely low latency which might be useful for some applications. It is not resilient in unreliable networks and requires stable communications.
RFC 8216 HLS IETF Apple HTTP Live Streaming. Provided as an open specification. This is very widely supported by all browsers. It has many useful features that make it a leading contender.
RFC Draft SRT Haivision The open-source Secure Reliable Transport for streaming. Currently described by an IETF draft document which is not yet ratified as a standard. It is designed to minimize artefacts resulting from poor quality communications links. Look for companies belonging to the SRT Alliance for supporting products. This has the potential to become very popular.
TR-02   VSF Using RTCP for In Band Signaling of Media Flow Status. Refers to RTCP as described in RFC 3550.
TR-06 RIST VSF Reliable Internet Stream Transport. Currently under ongoing development by the Video Services Forum.
  ASoH Netflix Adaptive Streaming over HTTP developed by Netflix and derived from the Apple HLS standard.
  MSS Microsoft Microsoft Smooth Streaming Protocol. An adaptive bitrate protocol. Mainly used for Xbox gaming platforms.
  WebRTC W3C Web Real-Time Communication for browsers. This is not just intended for peer-to-peer streaming but can be used for any kind of real-time communication. Standardized by the W3C. Popularity is gathering momentum and all the major browsers support it.
  RTMP Adobe A proprietary streaming format developed by Adobe for use with Flash based players. Adobe now offers this in a semi-open fashion. Declining use for public consumption, it might be useful in production workflows.
  HDS Adobe Adobe HTTP Dynamic Flash Streaming which is still useful in some production environments due to its low latency even though Flash is discontinued. Not suitable for end-users since it will not work on Apple mobile devices.

 

The MPEG Taxonomy

The MPEG standards replace or enhance their earlier counterparts. The MPEG-2 standard improves MPEG-1 and MPEG-4 adds further enhancements. After MPEG-4, the various major parts of the standard are broken out to separate standards which themselves have many parts.

Support For Streaming In MPEG-4

MPEG-4 incorporates many technologies that are useful to streaming service architects. ISO 14496 part 8 provides these specifications to support streamed content:

  • A framework for the carriage of MPEG-4 coded content on IP networks.
  • Guidance on how to design RTP payload formats.
  • Payload fragmentation.
  • Concatenation rules.
  • How to deliver related information with SDP.
  • MIME type definitions for MPEG-4 content.
  • Advice on RTP Security.
  • Guidance on Multicasting.

This topic is also addressed in some IETF RFC documents.

Conclusion

Recent growth in streaming services is driven by competition to gain market share and retain subscribers. But there is also an opposing cost for developing your own technology. Adopting common standards is beneficial if some off-the-shelf technology can be used.

The future of streaming protocols is likely to include the following technologies as the dominant solutions going forward:

  • SRT
  • HLS
  • WebRTC
  • MPEG-DASH
  • CMAF

Of these, HLS is considered to be the most useful and popular due its support for DRM, closed captions, support for advertising inserts, security and suitability for mobile client devices. Performance is much improved by the addition of CMAF support which reduces latency. Because it is delivered via HTTP, it does not suffer from firewall issues.

We should not exclude the others as they have some useful applications in niche areas as part of the production workflow where the boundary firewall is of no consequence.

Part of a series supported by

You might also like...

Future Technologies: Artificial Intelligence & The Perils Of Confirmation Bias

We continue our series considering technologies of the near future and how they might transform how we think about broadcast, with a discussion of the critical topic of training AI models and how this is potentially compromised from the outset…

Delivering Intelligent Multicast Networks - Part 1

How bandwidth aware infrastructure can improve data throughput, reduce latency and reduce the risk of congestion in IP networks.

NDI For Broadcast: Part 1 – What Is NDI?

This is the first of a series of three articles which examine and discuss NDI and its place in broadcast infrastructure.

Brazil Adopts ATSC 3.0 For NextGen TV Physical Layer

The decision by Brazil’s SBTVD Forum to recommend ATSC 3.0 as the physical layer of its TV 3.0 standard after field testing is a particular blow to Japan’s ISDB-T, because that was the incumbent digital terrestrial platform in the country. C…

Designing IP Broadcast Systems: System Monitoring

Monitoring is at the core of any broadcast facility, but as IP continues to play a more important role, the need to progress beyond video and audio signal monitoring is becoming increasingly important.