OTT (Or Is It ABR?) - Part 2 - The Mechanics Of ABR Delivery

ABR delivery offers the prospect of automatic adaptation of bit rates to the prevailing network conditions. Since the client is aware of the set of available bit rates, it can determine, segment-by-segment, which is the optimal bit rate to use.

This article was first published as part of Essential Guide: OTT (or is it ABR?)


The aim is to receive the highest rates that can be delivered given the network availability (i.e. based on measured download rate) and avoiding too many rate step-changes. There are two scenarios that the clients should avoid: a) running at too low a bit rate (since that will result in more quantization artifacts and possibly reduced resolution), and b) avoiding too many changes in selected rate (both directions of change cause a visual disturbance to the viewer, but if there is a need for a “panic” change down – i.e. when a segment at a given rate is failing to be downloaded quickly enough – the visual disruption is severe).

One unfortunate aspect is that ABR clients create bursty, periodic traffic, and while that’s not an issue if there is only one client, with multiple ABR clients in a shared capacity, the measured download rate can be unstable or unfair. The degree of instability is a function of the responsiveness of the client to changes in measured rate: too fast and the rates will change frequently, too slow and there is the risk of congestion and a resulting panic rate change. Stability is also a function of the segment length.

Understanding The Challenge of Low Latency ABR

Achieving low latency for OTT, or even for ABR on-network, is not trivial. To understand why, it’s important to consider the way data is formatted and transferred, and how networks and clients behave. A typical configuration today will use 6 second segments, with 30 or 60 frames per second, depending on the profile. In the sections below, the diagrams show a reduced number of frames per segment, just for clarity.

Figure 2: ABR live workflow.

Figure 2: ABR live workflow.

Live content is initially processed by the live transcoder in real time. Some latency is incurred at that point, however in principle that latency is equivalent to the conventional TV’s buffering delay, so is not included here for the comparison. Encoded video is fed to the packager as a continuous data flow (with embedded boundary point markers denoting the start & end of each segment), where it is formatted into segment files and published onto the origin server. The segment duration is typically about 6 seconds and it is not until the complete segment is placed on the origin that the manifest can be updated to advertise its availability. Once it is available, it can be pulled by a client via the CDN (which caches in the process) using a single HTTP transfer. Whole file transfers must take place and unless the CDN bandwidth is very much higher than the sum of the data being moved across it, each file will take a time that is less than a full segment, but still finite, to be guaranteed to be moved. The CDN may not provide the data from the cache until the segment is fully received.

At the client, there are typically three segments in operation: one being decoded, one ready for decode next and one being received. The reason for the “next segment” to be complete and waiting is due to the need to adapt bit rate. Each time bit rate adapts significantly, the perception of quality is impaired (the quality discontinuity is visually unpleasant.  Since each client makes an autonomous decision about what bit rate to attempt to download, there is a need for it to:

  • Measure download rate with a reasonable accuracy.
  • Determine whether the segment will be received sufficiently ahead of time.
  • Abandon an HTTP transfer, start a new one and complete the download without underflowing the buffer.

Consider what happens when the available bandwidth suddenly reduces, for example because of competing traffic on the network connection.

Figure-3 shows a single 3-representation (or “profile”) example, with a, b and c designating the representation, and 1, 2, 3 and 4 showing the time sequence. Since the content is being encoded live, there is an earliest availability point, after which it can be downloaded. In the absence of any restriction, the highest representation will be downloaded.

Figure 3: Segments available over time to be downloaded by client.

Figure 3: Segments available over time to be downloaded by client.

Next consider the case of competing traffic causing the transfer rate to reduce. Having downloaded 2a, the client next tries to download 3a. This time, however, because of the competing traffic, the rate is much lower, meaning the transfer would not complete in time. In order for the client to be able to make that decision, enough data must have been downloaded to give a reasonable accuracy for the measurement. While there is no specific right answer to how long to measure, about 2 seconds’ worth of data appears to give a reasonable balance between time taken and accuracy. Since there is inevitably some uncertainty, clients ask for a lower bit rate representation than they measure as the capacity: about 70% is typical. Without this, the clients would spuriously adapt rates too frequently, causing frequent visual disturbance.

Figure 4: Rate adaptation by the client device.

Figure 4: Rate adaptation by the client device.

Once the client has made the decision to abandon a transfer and drop to a lower rate, some time will have elapsed (i.e. the time taken to make the measurement and consequently the decision). The alternative representation for the same segment in time therefore must be downloaded in substantially less time than normal, which is why clients often drop all the way to the lowest rate representation when adapting downwards. Unfortunately, this also maximizes the visual disturbance caused by the adaptation.  If shorter segments were used, the period available for measurement would be shorter, leading to a corresponding increase in measurement volatility (and therefore frequency of spurious adaptation). This can be mitigated to some extent by reducing further the 70% figure, but either way, using shorter segments to reduce latency will lead to a reduction in quality.

The client’s need for measurement is fundamental to the client being able to adapt autonomously. Other system delays, however, are not fundamental and there exist mechanisms to improve the latency.

Overall, the end-to-end latency looks similar to Figure 5, with complete segments moving between each stage.

Figure 5: Overall end-to-end latency with conventional ABR and HTTP transfer.

Figure 5: Overall end-to-end latency with conventional ABR and HTTP transfer.

Another aspect to consider is that a client also needs to guess when to next request a segment (or in the case of HLS, when to poll for the next update to the manifest). This results in further latency as the client needs to avoid asking too early, so must apply some degree of conservatism.

Part of a series supported by

You might also like...

IP Security For Broadcasters: Part 8 - RADIUS Network Access

Maintaining controlled access is critical for any secure network, especially when working with high-value media in broadcast environments.

Standards: Part 25 - Designing Client-Side Video Players

Here we chart the historical development of client-side video players, describe the building blocks used to create them and the relevant standards.

Microphones: Part 5 - The Variable Directivity Microphone

The variable directivity microphone is very popular for studio work. What goes on inside is very clever and not widely appreciated.

IP Security For Broadcasters: Part 7 - Operating Systems

As well as providing the core functionality of a computer, operating systems have the potential to be a primary issue for security and keeping hackers at bay.

The Creative Challenges Of HDR-SDR Simulcast

HDR can make choices easier - or harder - at every stage of production but the biggest challenge may be just how subjective those choices are.