OTT (Or Is It ABR?) - Part 2 - The Mechanics Of ABR Delivery
ABR delivery offers the prospect of automatic adaptation of bit rates to the prevailing network conditions. Since the client is aware of the set of available bit rates, it can determine, segment-by-segment, which is the optimal bit rate to use.
This article was first published as part of Essential Guide: OTT (or is it ABR?)
The aim is to receive the highest rates that can be delivered given the network availability (i.e. based on measured download rate) and avoiding too many rate step-changes. There are two scenarios that the clients should avoid: a) running at too low a bit rate (since that will result in more quantization artifacts and possibly reduced resolution), and b) avoiding too many changes in selected rate (both directions of change cause a visual disturbance to the viewer, but if there is a need for a “panic” change down – i.e. when a segment at a given rate is failing to be downloaded quickly enough – the visual disruption is severe).
One unfortunate aspect is that ABR clients create bursty, periodic traffic, and while that’s not an issue if there is only one client, with multiple ABR clients in a shared capacity, the measured download rate can be unstable or unfair. The degree of instability is a function of the responsiveness of the client to changes in measured rate: too fast and the rates will change frequently, too slow and there is the risk of congestion and a resulting panic rate change. Stability is also a function of the segment length.
Understanding The Challenge of Low Latency ABR
Achieving low latency for OTT, or even for ABR on-network, is not trivial. To understand why, it’s important to consider the way data is formatted and transferred, and how networks and clients behave. A typical configuration today will use 6 second segments, with 30 or 60 frames per second, depending on the profile. In the sections below, the diagrams show a reduced number of frames per segment, just for clarity.
Live content is initially processed by the live transcoder in real time. Some latency is incurred at that point, however in principle that latency is equivalent to the conventional TV’s buffering delay, so is not included here for the comparison. Encoded video is fed to the packager as a continuous data flow (with embedded boundary point markers denoting the start & end of each segment), where it is formatted into segment files and published onto the origin server. The segment duration is typically about 6 seconds and it is not until the complete segment is placed on the origin that the manifest can be updated to advertise its availability. Once it is available, it can be pulled by a client via the CDN (which caches in the process) using a single HTTP transfer. Whole file transfers must take place and unless the CDN bandwidth is very much higher than the sum of the data being moved across it, each file will take a time that is less than a full segment, but still finite, to be guaranteed to be moved. The CDN may not provide the data from the cache until the segment is fully received.
At the client, there are typically three segments in operation: one being decoded, one ready for decode next and one being received. The reason for the “next segment” to be complete and waiting is due to the need to adapt bit rate. Each time bit rate adapts significantly, the perception of quality is impaired (the quality discontinuity is visually unpleasant. Since each client makes an autonomous decision about what bit rate to attempt to download, there is a need for it to:
- Measure download rate with a reasonable accuracy.
- Determine whether the segment will be received sufficiently ahead of time.
- Abandon an HTTP transfer, start a new one and complete the download without underflowing the buffer.
Consider what happens when the available bandwidth suddenly reduces, for example because of competing traffic on the network connection.
Figure-3 shows a single 3-representation (or “profile”) example, with a, b and c designating the representation, and 1, 2, 3 and 4 showing the time sequence. Since the content is being encoded live, there is an earliest availability point, after which it can be downloaded. In the absence of any restriction, the highest representation will be downloaded.
Next consider the case of competing traffic causing the transfer rate to reduce. Having downloaded 2a, the client next tries to download 3a. This time, however, because of the competing traffic, the rate is much lower, meaning the transfer would not complete in time. In order for the client to be able to make that decision, enough data must have been downloaded to give a reasonable accuracy for the measurement. While there is no specific right answer to how long to measure, about 2 seconds’ worth of data appears to give a reasonable balance between time taken and accuracy. Since there is inevitably some uncertainty, clients ask for a lower bit rate representation than they measure as the capacity: about 70% is typical. Without this, the clients would spuriously adapt rates too frequently, causing frequent visual disturbance.
Once the client has made the decision to abandon a transfer and drop to a lower rate, some time will have elapsed (i.e. the time taken to make the measurement and consequently the decision). The alternative representation for the same segment in time therefore must be downloaded in substantially less time than normal, which is why clients often drop all the way to the lowest rate representation when adapting downwards. Unfortunately, this also maximizes the visual disturbance caused by the adaptation. If shorter segments were used, the period available for measurement would be shorter, leading to a corresponding increase in measurement volatility (and therefore frequency of spurious adaptation). This can be mitigated to some extent by reducing further the 70% figure, but either way, using shorter segments to reduce latency will lead to a reduction in quality.
The client’s need for measurement is fundamental to the client being able to adapt autonomously. Other system delays, however, are not fundamental and there exist mechanisms to improve the latency.
Overall, the end-to-end latency looks similar to Figure 5, with complete segments moving between each stage.
Another aspect to consider is that a client also needs to guess when to next request a segment (or in the case of HLS, when to poll for the next update to the manifest). This results in further latency as the client needs to avoid asking too early, so must apply some degree of conservatism.
Part of a series supported by
You might also like...
HDR & WCG For Broadcast: Part 3 - Achieving Simultaneous HDR-SDR Workflows
Welcome to Part 3 of ‘HDR & WCG For Broadcast’ - a major 10 article exploration of the science and practical applications of all aspects of High Dynamic Range and Wide Color Gamut for broadcast production. Part 3 discusses the creative challenges of HDR…
IP Security For Broadcasters: Part 4 - MACsec Explained
IPsec and VPN provide much improved security over untrusted networks such as the internet. However, security may need to improve within a local area network, and to achieve this we have MACsec in our arsenal of security solutions.
Standards: Part 23 - Media Types Vs MIME Types
Media Types describe the container and content format when delivering media over a network. Historically they were described as MIME Types.
Six Considerations For Transitioning To Cloud Based Video Distribution
There are many reasons why companies are transitioning from legacy video distribution workflows to ones hosted entirely in the public cloud, but it’s not a simple process and takes an enormous amount of planning. Many potential pitfalls can be a…
IP Security For Broadcasters: Part 3 - IPsec Explained
One of the great advantages of the internet is that it relies on open standards that promote routing of IP packets between multiple networks. But this provides many challenges when considering security. The good news is that we have solutions…