Standards: Part 11 - Streaming Video & Audio Over IP Networks
Streaming services deliver content to the end-users via an IP network connection. The transport process is similar to broadcasting and shares some of the same technologies but there are some unique caveats.
This article is part of our growing series on Standards.
There is an overview of all 26 articles in Part 1 - An Introduction To Standards.
Broadcast content is gradually being displaced by streaming directly to viewers via an IP network. The streaming server is presented with a file which it then delivers to the client players. That content might be in a static file already prepared beforehand or a virtual file arriving as an external live feed.
The underlying technology solutions have been around for a long time. More recent innovation has improved the performance by alleviating bottlenecks and deploying these novel solutions to facilitate transport efficiency:
- Edge serving.
- Adaptive bitrate streams.
- Reducing latency with the Common Media Application Format.
- Multicasting.
Streaming Transport
The transport of audio-visual media over IP networked streams is largely independent of the coding choice. Some streaming protocols will dictate particular codecs for carriage.
The video and audio content are embedded together in a similar way to a broadcast program stream. Additional streamed tracks with synchronized text are delivered separately.
In addition to timing and synchronization, part 1 of the MPEG 1, 2 and 4 standards also describe how multiple streams can be embedded and multiplexed into a single transport stream.
Metadata from the content store is delivered within the web page content for the player implementation to use. This supplemental data can configure the player and manage viewing permissions. It can be embedded as special tags, attributes of tags or directly as JavaScript variable assignments in a parameter bridge.
HTML5 introduced the <video> tag as a first-class citizen within the web page. This eliminates the internal process and memory management issues that browsers had with plug-ins.
The HTML5 <video> tag can have many child <track> objects inside its tag body. Each <track> is associated with a different JavaScript event handler. The <video> player knows when the texts need to be presented and triggers an event at the right time. The event payload is individual the timed text string. The JavaScript handler can manage that event in whatever way it chooses. It could parse the text for instructions to display an image in a box, or present the text as a caption. It could place a marker on the screen and move it around in a synchronized fashion or change the aspect-ratio of the video frame.
Edge Serving
As streaming became more popular, the providers found they could not service the huge numbers of streams required from a single central content store. Akamai introduced an array of edge servers which were placed near to the highest traffic locations. The thickness of the lines in the diagram illustrate the bandwidth of the link:
An edge server caches the files locally and serves them from there. The latest version of the content must be deployed when it changes (infrequently).
The edge servers need sufficient storage. It is unrealistic to replicate the entire central repository at each edge server. Only the first part of each video asset is needed right away. The remainder of the asset is requested via a very fast link during the time it takes to watch that first segment. Then the stream is spliced to read from a complete copy instead of the introduction.
Streaming service providers might give themselves extra time to acquire the full copy by presenting interstitial idents, copyright messages and trailers for other upcoming content. This would make the stream switching more straightforward and remove the need for storing abridged introductions.
Adaptive Bitrate (ABR) Streaming
This was originally described as Quantum Streaming and is also described as Multi-Bitrate Streaming. The content stream is broken into many small segments at different bitrates that are reassembled as they are received by the client player.
The server encodes the video multiple times into alternative streams of varying quality. These are optimized for different delivery bandwidth ranges. Then they are sliced into short segments typically between 2 and 10 seconds long. Carefully managing the job queues to run the coding in parallel on multiple processors can speed up the entire process.
During a session, the performance of the connection is measured to determine the available bandwidth. Then an appropriately sized stream of segments is selected. This content negotiation is apparent to the viewer because the stream will initially start with a very low-quality picture. It will quickly improve up to the optimum bitrate available.
The packets are delivered via the HTTP protocol. This is based on TCP which guarantees the delivery of all packets at the expense of some latency due to buffering. HTTP enables the content to traverse corporate firewalls. These typically block most kinds of traffic but they allow web content through in a controlled manner.
This is a resilient way to deliver streamed content because if the available bandwidth is reduced momentarily, a lower quality packet can be delivered. Thus, the viewing experience is maintained albeit with a reduction in quality.
Adaptive techniques are available with these streaming protocols:
Implementation | Description |
---|---|
MPEG-DASH | ISO 23009 - is an international standard for Adaptive Bitrate Streaming. |
HLS | Used for delivery to Apple iOS devices as HTTP Live Streaming. |
HDS | Adobe HTTP Dynamic Streaming. Previously used for Flash based players. |
MSS | Microsoft Smooth Streaming. Is sometimes repackaged into HLS compatible streams without the need for re-encoding. |
Although this improves the viewing experience for end users, the production and deployment require additional expenditure on processing and storage capacity to hold the multiple bitrate copies of all the content.
The content can be delivered with a simple web server rather than deploying a specialized streaming server.
There is still some work to do in the area of securing the streams and adding rights control to manage access to content.
Latency Issues Solved With CMAF
All streaming services suffer from latency. Adaptive Bitrate delivery is particularly prone to latency issues. Latency is the delay from the content being input to the coder and then transmitted to the client end-user for viewing on their device. There are several reasons for this:
- Delivery to the ingest platform.
- Ingesting process.
- Coding into the streaming compatible format.
- Length of the Group-Of-Pictures (GOP configuration of the coder).
- Assembly into a transport stream.
- Delivery to the end user which can be delayed by a bad comms link when using TCP to avoid dropping packets.
- Transmission technology (3G, 4G, 5G).
- Unpacking of the stream and decoding the content on the device.
- Buffering in the receiving player.
Although HLS is considered to be a leading contender for streaming services, the latency can be as much as 30 seconds. This is too long for live sporting events where the audience may hear a goal scored via a radio transmission much sooner than seeing it on the screen. The content requires several steps to prepare it for delivery:
A relatively new approach called the Common Media Application Format (CMAF) turns the coding and embedding problem inside out. Instead of slicing a previously coded stream into packets, the raw content is sliced into short segments ready for adaptive delivery first and then encoded fragment by fragment.
The coding of multiple segments can be distributed across a multi-processing grid. Modern computers have multiple CPU processors which can each tackle a segment coding task independently and in parallel.
Read ISO 23000 part 19 for the full specification of CMAF.
Multicasting Techniques
Edge servers are good for delegating video on demand services but live streams are not amenable to this approach because they are not delivered from stored files. Multicasting is a useful alternative and can be managed in a variety of ways:
- Ethernet multicasting.
- IP Multicast.
- Application layer.
- Peer-to-peer casting.
The streams are all identical so Adaptive Bitrate streaming is difficult to deploy without additional copies of the multicast at different bitrates. Switching from one multicast to another mandates that they must be synchronized which is challenging when they might arrive by different network routes. These are simplified descriptions of each multicasting technique:
- With Ethernet Multicasting, all packets are delivered to all destinations on the local area sub-net. Any interested node can then pick up the streams from there. Routers can forward the packets to other networks depending on filter configurations. This is wasteful of network capacity on cables where it is not required.
- IP Multicasting is a more controlled form of multicasting where the clients receive an invitation and then join the streams that they want to. This works on wired or Wi-Fi networks. The networks need to be carefully managed with the router and switch configurations (see IGMP). This is better suited to corporate environments rather than random public access.
- Multicasting can be simulated at the Application Layer using unicast streams. The server vends a single stream per child process and a client application connects to it. Each client requires a separate stream from the server. The limiting factor here is the number of streams the server can deliver. There are some variations of this approach that use front-end processors implemented either as hardware or software to replicate the streams.
- Peer-to-peer multicasting uses the WebRTC technology described in the W3C standards. Clients can forward content to their peers with a direct socket connection without needing a server.
Read IETF RFC 3376 to see how the Internet Group Management Protocol (IGMP) signals are used in the context of Multicasting so that all the devices appear to share the same IP address.
Multicasting will get easier to deploy. There are other solutions handled at the network level which have varying levels of complexity and require sophisticated network configuration. New solutions are being researched that will solve the current downsides to multicasting.
Standards For Protocols & Supporting Technologies
Streaming services and traditional broadcasting use many of the same standards although they may apply them differently. Some of them began as proprietary specifications that have been released as open standards and are now royalty free.
The standards are grouped into families according to which organization developed them. Some organizations collaborate to bring complementary expertise together:
- ISO
- MPEG
- ITU
- EBU
- SMPTE
- AES
- VSF
- IETF
- W3C
- Proprietary (various orgs)
These are the relevant standards:
Standard | Name | Org | Description |
---|---|---|---|
ISO 13818-6 | DSM-CC | ISO/MPEG | MPEG-2 - Digital Storage Media - Command and Control. |
ISO 14496-8 | ISO/MPEG | MPEG-4 Part 8 - Video over IP. | |
ISO 14496-29 | WVC | ISO/MPEG | MPEG-4 Part 29 - Web Video Coding. |
ISO 14496-31 | VCB | ISO/MPEG | MPEG-4 Part 31 - Video coding for browsers. |
ISO 14496-33 | IVC | ISO/MPEG | MPEG-4 Part 33 - Internet Video Coding. |
ISO 23000-5 | ISO/MPEG | Media streaming application format. | |
ISO 23000-9 | DMB | ISO/MPEG | Digital Multimedia Broadcasting application format. |
ISO 23000-19 | CMAF | ISO/MPEG | Common Media Application Format for segmented media. Significantly improves latency in streamed services. |
ISO 23001-8 | MPEG-B | ISO/MPEG | Systems technologies. Part 8 has been replaced by ISO 23091. |
ISO 23006 | MPEG-M | ISO/MPEG | Extensible Middleware (MXM). |
ISO 23009 | MPEG-DASH | ISO/MPEG | Dynamic Adaptive Streaming over HTTP. This is expected to become more popular. It is codec agnostic so you are not locked into any particular video coding standard. Suitable codecs include AVC, HEVC and VP10. |
ISO 23091 | CICP | ISO/MPEG | Coding Independent Code Points used for signaling. |
ISO 29116 | ISO/MPEG | Supplemental Media Technologies. | |
ISO 29116-1 | MXM | ISO/MPEG | Part 1 describes MPEG extensible middleware protocols. This was previously described as 'Media Streaming Application Format Protocols'. |
ST 2110 part 20 | SMPTE | Uncompressed video transport, based on SMPTE 2022-6. | |
ST 2110 part 21 | SMPTE | Traffic shaping and network delivery timing. | |
ST 2110 part 22 | SMPTE | Constant Bitrate Compressed Video transport. | |
AES3 | AES | Serial transmission format for two-channel linearly represented digital audio data. | |
AES67 | AES | High-performance streaming audio-over-IP interoperability. | |
RFC 1889 | RTP | IETF | Real-time Transport Protocol for delivering multimedia. Superseded by RFC 3550. |
RFC 1890 | IETF | RTP profile for Audio and Video Conferences. Built on top of RFC 1889. Superseded by RFC 3551. | |
RFC 2326 | RTSPv1 | IETF | Real Time Streaming Protocol for control over the delivery of data with real-time properties. Version 1.0 is a very old format developed in the 1990s. |
RFC 3016 | IETF | RTP Payload Format for MPEG-4 Audio/Visual Streams. | |
RFC 3376 | IGMP | IETF | Signaling for multicast services. Referred to by ST 2110-10. |
RFC 3550 | RTP | IETF | Real-time Transport Protocol. Obsoletes and replaces RFC 1889. |
RFC 3550 | RTCP | IETF | Real Time Control Protocol for managing RTP services. Also referred to by the Video Services Form TR-02 recommendation. |
RFC 3551 | RTP/AVP | IETF | RTP profile for Audio and Video Conferences. Built on top of RFC 3550. Obsoletes and replaces RFC 1890. |
RFC 3640 | IETF | RTP Payload Format for Transport of MPEG-4 Elementary Streams. | |
RFC 4566 | SDP | IETF | Session Description Protocol used for session announcement, invitation, and other multimedia session initiation messages. |
RFC 7826 | RTSPv2 | IETF | Real-Time Streaming Protocol Version 2.0. Obsoletes the version 1 protocol. This protocol does have extremely low latency which might be useful for some applications. It is not resilient in unreliable networks and requires stable communications. |
RFC 8216 | HLS | IETF | Apple HTTP Live Streaming. Provided as an open specification. This is very widely supported by all browsers. It has many useful features that make it a leading contender. |
RFC Draft | SRT | Haivision | The open-source Secure Reliable Transport for streaming. Currently described by an IETF draft document which is not yet ratified as a standard. It is designed to minimize artefacts resulting from poor quality communications links. Look for companies belonging to the SRT Alliance for supporting products. This has the potential to become very popular. |
TR-02 | VSF | Using RTCP for In Band Signaling of Media Flow Status. Refers to RTCP as described in RFC 3550. | |
TR-06 | RIST | VSF | Reliable Internet Stream Transport. Currently under ongoing development by the Video Services Forum. |
ASoH | Netflix | Adaptive Streaming over HTTP developed by Netflix and derived from the Apple HLS standard. | |
MSS | Microsoft | Microsoft Smooth Streaming Protocol. An adaptive bitrate protocol. Mainly used for Xbox gaming platforms. | |
WebRTC | W3C | Web Real-Time Communication for browsers. This is not just intended for peer-to-peer streaming but can be used for any kind of real-time communication. Standardized by the W3C. Popularity is gathering momentum and all the major browsers support it. | |
RTMP | Adobe | A proprietary streaming format developed by Adobe for use with Flash based players. Adobe now offers this in a semi-open fashion. Declining use for public consumption, it might be useful in production workflows. | |
HDS | Adobe | Adobe HTTP Dynamic Flash Streaming which is still useful in some production environments due to its low latency even though Flash is discontinued. Not suitable for end-users since it will not work on Apple mobile devices. |
The MPEG Taxonomy
The MPEG standards replace or enhance their earlier counterparts. The MPEG-2 standard improves MPEG-1 and MPEG-4 adds further enhancements. After MPEG-4, the various major parts of the standard are broken out to separate standards which themselves have many parts.
Support For Streaming In MPEG-4
MPEG-4 incorporates many technologies that are useful to streaming service architects. ISO 14496 part 8 provides these specifications to support streamed content:
- A framework for the carriage of MPEG-4 coded content on IP networks.
- Guidance on how to design RTP payload formats.
- Payload fragmentation.
- Concatenation rules.
- How to deliver related information with SDP.
- MIME type definitions for MPEG-4 content.
- Advice on RTP Security.
- Guidance on Multicasting.
This topic is also addressed in some IETF RFC documents.
Conclusion
Recent growth in streaming services is driven by competition to gain market share and retain subscribers. But there is also an opposing cost for developing your own technology. Adopting common standards is beneficial if some off-the-shelf technology can be used.
The future of streaming protocols is likely to include the following technologies as the dominant solutions going forward:
- SRT
- HLS
- WebRTC
- MPEG-DASH
- CMAF
Of these, HLS is considered to be the most useful and popular due its support for DRM, closed captions, support for advertising inserts, security and suitability for mobile client devices. Performance is much improved by the addition of CMAF support which reduces latency. Because it is delivered via HTTP, it does not suffer from firewall issues.
We should not exclude the others as they have some useful applications in niche areas as part of the production workflow where the boundary firewall is of no consequence.
These Appendix articles contain additional information you may find useful:
Part of a series supported by
You might also like...
HDR & WCG For Broadcast: Part 3 - Achieving Simultaneous HDR-SDR Workflows
Welcome to Part 3 of ‘HDR & WCG For Broadcast’ - a major 10 article exploration of the science and practical applications of all aspects of High Dynamic Range and Wide Color Gamut for broadcast production. Part 3 discusses the creative challenges of HDR…
IP Security For Broadcasters: Part 4 - MACsec Explained
IPsec and VPN provide much improved security over untrusted networks such as the internet. However, security may need to improve within a local area network, and to achieve this we have MACsec in our arsenal of security solutions.
Standards: Part 23 - Media Types Vs MIME Types
Media Types describe the container and content format when delivering media over a network. Historically they were described as MIME Types.
Building Software Defined Infrastructure: Part 1 - System Topologies
Welcome to Part 1 of Building Software Defined Infrastructure - a new multi-part content collection from Tony Orme. This series is for broadcast engineering & IT teams seeking to deepen their technical understanding of the microservices based IT technologies that are…
IP Security For Broadcasters: Part 3 - IPsec Explained
One of the great advantages of the internet is that it relies on open standards that promote routing of IP packets between multiple networks. But this provides many challenges when considering security. The good news is that we have solutions…