OTT (Or Is It ABR?) - Part 1 - The Challenges To Be Solved

Over-The-Top (OTT) started life as a means to deliver file-based assets for on-demand viewing, and most of its key characteristics echo that origin.

This article was first published as part of Essential Guide: OTT (or is it ABR?)


OTT delivery of video services is commonly used to mean adaptive bit rate (ABR) streaming over an unmanaged network. ABR as a technology can also be deployed over managed networks. 

Often, the use of the two terms are interchangeable, however the danger is that the use of “OTT” can cause a narrowing of the possible technologies that can be applied, and hence the terms deserve to be used in an unambiguous manner.

OTT offered, and continues to offer, a valuable means to extend the reach to more content and to more devices than the existing broadcast / satellite / cable / IPTV systems were able to achieve. However, the danger is that we ignore the key characteristics of those other mechanisms, when attempting to create a one-size-fits-all approach.

ABR’s key attributes are:

  • Working with non-STB clients, reaching additional devices and viewers.
  • Adapting to different client capabilities.
  • Rate adjustment based on network throughput estimation.
  • Simpler and more cost-effective personalization and advertising mechanisms.
  • Potential for common live and on-demand formats.
  • Fast introduction of new capabilities because clients supporting new capability can elect to use the new services, whereas the backwardly compatible services remain.

Live TV’s key attributes are:

  • Excellent scalability for live, with very high bandwidth efficiency, due to the inherently multicast nature of the delivery.
  • Low cost for popular content, costs scale with channels rather than subscribers.
  • Low latency because no additional stages of buffering are needed to make it a reliable delivery system.
  • High quality of content viewing experience.
  • Extremely high availability.

Clearly, neither is the entire superset of the other, so the ambition is to make optimal systems that appropriately balance key attributes and complexity: and where the balance does not need to be the same in all cases.

All Content Is Not Equal – Neither Are All Screens

Some content is more valuable than others and the same is true for screens. By way of an example, it is easy to see how a combination of a large screen showing a major live sports event supports more value than a mobile phone screen showing clips of user generated content. On the other hand, mobile phones and tablets offer an accessibility to media: consume wherever you are.

Thus, there are good attributes of both categories of screens, and there are both overlaps and differences between those sets of values.

Another important aspect of the difference between the two viewing scenarios is extremely simple: screen size (and resolution).

Unsurprisingly, fixed viewing infrastructure has much larger screens than hand-held screens and this disparity is increasing as consumers increasingly purchase Ultra-High Definition (UHD) capable TVs. Regardless of whether a UHD service is actually available, the proportion of screens that are bigger than 49” diagonal has increased from 20% of sales to 33% between 2015 and 2019, which means any video artefacts are more easily visible.

The larger screens used for fixed viewing infrastructure mean that a higher resolution and quality image must be delivered, and thus a much higher bit rate is used than for small screens. For example, an AVC encoded 1080p TV service might typically operate at 7 Mbit/s, whereas a handheld device would typically be using around 1 Mbit/s.

Connecting The Content

OTT, by definition, uses the internet to deliver content and while it’s tempting to visualize the internet as an essentially unlimited resource, the reality is that the internet is in fact a network of real connections, each with a finite capacity.

Not least is the consideration that the last mile connection into the homes is typically either shared capacity between homes (as for cable), or a limited capacity per home connection (such as a DSL connection). The last mile is therefore a significant consideration when looking at real-world delivery, whether it’s provided by a telco or a cable operator. The consequences of the constraints differ between shared vs per-home capacity limits.

For DSL telco connections, the main limitation is the DSL capacity, which is largely a function of distance from the DSLAM. Improvements in performance due to VDSL, VDSL2, etc have significantly improved the capacity for homes close to the DSLAM, however when further away (e.g. > 1.5km), the bit rates are limited mainly to roughly similar rates as ADSL2.

For homes beyond about 1 km, the maximum line rate is approximately 25 Mbit/s. With this in mind, it is easy to see that 2x UHD HEVC feeds into the home might sum to roughly this capacity. The line rate is a hard limit, so any contention will result in queueing and maybe packet loss.

For cable connections, DOCSIS 3.1 provides wide-bonded carriers with spectral efficiency close to Shannon’s limit. Depending on the exact configuration, this typically leads to an average sustained downstream rate of around 40 Mbit/s per home, reducing to about half that for peak hours (based on 1GHz spectrum, 400 homes per node, etc). One benefit with DOCSIS 3.1 vs DSL is that because the capacity is shared, individual client short-term peaks can be absorbed into the larger pool. Of course, this can’t help high sustained rate demand (such as when there’s a very popular event).

Very often, cable DOCSIS rates are advertised as their peak rates, and while it is undoubtedly true and valuable when downloading files or web content, streaming video is dependent on sustained rates, which are usually not advertised.

FTTH offers higher rates per-home, with the only real-world contention occurring further back in the network. The exact set of constraints depends on the detailed topology and throughput of each stage of the network.

Viewing Habits

In both telco and cable cases, making a step-change to the per-home capacity is an expensive investment (either FTTH or node-splits), and there is likely to be a good business case for more efficient delivery to avoid infrastructure costs. A more efficient delivery mechanism can save network operators from expensive network infrastructure upgrades.

Viewing habits have changed significantly, with new expectations for on-demand content set by online streaming services. Each of these on-demand sessions is, of course, unique and so a simple unicast model is appropriate. ABR delivery is ideal for the on-demand unicast portion of the viewing.

In general, while the overall amount of viewing has increased, there are still distinct peaks which correspond to breakfast and evening viewing, and not surprisingly, the most valuable content (and hence advertising slots) occur at these times. The high value content also has a high likelihood of being watched on the main TV in the household.

Figure 1: Viewing demand profile.

Figure 1: Viewing demand profile.

At peak hours, and even more so when there’s a significant live, or socially live, event, then the peak is driven mainly by relatively few real-time content sources. Whether or not the content itself is genuinely produced live is not important: the key aspect is that it is consumed by large numbers concurrently. Social media plays a role in keeping the “live edge” relevant for produced content, since many shows have an associated social media chat value window, which is very close to the live availability edge, it therefore has most of the same consumption characteristics as genuinely live content.

For popular live or socially live events, delivering the same content through a unicast-per-consumer approach can lead to poor efficiency, and as a by-product, lead to congestion and poor quality.

When comparing today’s live viewing between conventional TV delivery and OTT streamed delivery, we can see from recent sports events that:

  • An order of magnitude more viewers today watch on TVs.
  • An order of magnitude between bitrates used for TV compared to the average streamed rates for the same event (i.e. broadcast quality video uses approximately 10x the bit rate that is usually delivered to OTT clients).

Scaling For The Viewing Peak

It is useful to take a real-world example to understand viewing peak requirements: The Superbowl typically has around 100M US viewers, other major sports and political events are greater than 30M US viewers.

To supply the entire US TV audience with an equivalent bit rate, using public CDN, OTT delivery would require around 350 Tbit/s of capacity for just that one event’s viewing, assuming one TV per two viewers on average. To put that into perspective, Akamai’s global traffic capacity supplied around 60 Tbit/s during the 2017 World Cup, despite the official facts and figures indicating a lower capacity of 30 Tbit/s requiring 240,000 servers (the difference is most likely due to the access patterns: video has relatively large file transfers, reducing the relative request rate compared to generic data).

Latency

The issue here is that the massive peak is required only for the extremely high concurrency that happens during the event, and the typical daily peak is very much smaller: in other words, there would need to be a large investment to support occasional very high demand peaks.

The peaks are liable to increase further as adoption of UHD increases (even though HEVC will be used for UHD, the increase in resolution and frame rate causes roughly 2.5x bit rate increase).

While latency is of little concern for on-demand content, for live or socially live content, the picture is very different.

The exact sensitivity does vary between different types of content and between geographic regions, but wherever there are highly popular events, the difference between the latency of delivery via different paths can be an issue. In general, a difference of up to about 10 seconds is usually considered acceptable; most OTT streaming services today have much more latency than that and it is considered one of the biggest issues for OTT delivery.

Video Formats

MPEG-4 AVC is the de-facto standard for OTT today, however more recent coding standards have emerged that offer better efficiency. HEVC was standardized in 2013 and can now be considered a mature technology with widespread hardware support in higher performance client devices. The slow replacement cycle of STBs, due to the investment cost by the network operator, has meant the rate of adoption in STBs has been comparatively slow, although HEVC is ubiquitous for UHD and/or HDR capable STBs.

As client adoption improves, a migration away from AVC to more efficient coding standards becomes valuable, with the additional cost of producing more formats (since the AVC ones will be required for many years yet) being outweighed by either the reduced CDN costs, or by the increase in value of UHD or HDR services. Of course, for hand-held devices, UHD has a limited benefit because of the small screen size, and HDR in isolation, although visually compelling, is not a simple marketing message. Nevertheless, it is reasonable to expect the improved efficiency of delivery will drive adoption of more efficient compression schemes.

HEVC is well understood, and despite some licensing questions, HEVC can be considered a straight-forward option to improve delivery costs for popular content.

AV1 has recently been made available as a reference design and has gained a large amount of publicity, with various competing claims about its performance relative to HEVC. Since it is immature, implementations are inefficient from a compute perspective and there is concern about decode processor resource requirement. A comprehensive MediaKind white paper is available explaining the comparisons. In summary, AV1 does not appear to show a material advantage over HEVC, but currently requires significantly higher compute resource; therefore, bit rate efficiency does not seem to be a good motivation to use AV1, but the royalty free aspect may appeal to some. Several further new coding initiatives are in progress, including EVC, which is aimed at being a similar performance to HEVC, but with friendlier licensing terms, however it is complicated by having a “base” toolset and a “main” toolset—which is not a superset of “base”.

In the meantime, the next international standard from International Telecommunication Union (ITU) and Moving Picture Experts Group (MPEG) is being developed. Versatile Video Coding (VVC) has a target of a completed standard in 2020, with a target of ~ 50% bit rate reduction versus HEVC. Initial proposals have shown this performance step is on track to be achieved.

Environmental Sustainability

For popular channels, it is clear that the conventional (DTT in the case of the BBC report in ITU Kaleidoscope 2016) delivery requires only a very small fraction of OTT (denoted in the report as IP) distribution. Conversely, for rarely watched content (BBC Parliament in the case cited in the report), OTT provides a more energy efficient delivery means. From an environmental sustainability perspective, a hybrid broadcast + OTT configuration would appear optimal.

In addition, it is reasonable to assume a strong correlation between energy efficiency and cost efficiency.

Part of a series supported by

You might also like...

Designing IP Broadcast Systems - The Book

Designing IP Broadcast Systems is another massive body of research driven work - with over 27,000 words in 18 articles, in a free 84 page eBook. It provides extensive insight into the technology and engineering methodology required to create practical IP based broadcast…

Demands On Production With HDR & WCG

The adoption of HDR requires adjustments in workflow that place different requirements on both people and technology, especially when multiple formats are required simultaneously.

Standards: Part 21 - The MPEG, AES & Other Containers

Here we discuss how raw essence data needs to be serialized so it can be stored in media container files. We also describe the various media container file formats and their evolution.

Broadcasters Seek Deeper Integration Between Streaming And Linear

Many broadcasters have been revising their streaming strategies with some significant differences, especially between Europe with its stronger tilt towards the internet and North America where ATSC 3.0 is designed to sustain hybrid broadcast/broadband delivery.

Microphones: Part 2 - Design Principles

Successful microphones have been built working on a number of different principles. Those ideas will be looked at here.