Compression: Part 6 - Inter Coding

The greatest amount of compression comes from the use of inter coding, which consists of finding redundancy between a series of pictures.

Most compression works by prediction, where the encoder and the decoder both have the same predictive ability and both attempt to predict a new picture from pictures already transmitted. This means that the predictions at encoder and decoder will be identical. However, the encoder has the ability to compare the prediction with the actual picture, to see how good the prediction was. Subtracting the predicted picture from the actual picture results in the prediction error, also called a residual. If the residual is transmitted to the decoder, it can be added to the prediction to re-create the correct picture.

In the absence of motion, a new picture can be created simply by taking pixel data from the same place in a previous picture, which the decoder already has. If an object on the screen moves from picture to picture, prediction can still be used but the decoder is guided by vectors created by the encoder to take pixel data from a place in a previous picture where the moving object used to be, thus cancelling the motion and finding the redundancy along an optic flow axis rather than along the time axis.

Fig.1 - Any errors made in the motion compensation process are cancelled by the addition of the residual at the decoder.

Fig.1 - Any errors made in the motion compensation process are cancelled by the addition of the residual at the decoder.

Fig.1 shows a motion compensated process in which a macroblock is being filled in the predicted (target) picture by taking data from the previous (source) picture and shifting it according to the vectors from the encoder. The predicted picture will then be corrected by adding the prediction error or residual that has been sent by the encoder. Using redundancy along optic flow axes in this way requires significantly fewer data to be sent than were needed in the original picture. If sufficient bandwidth is available to transmit all of the residual data, motion compensated coding can be lossless, although this is seldom done in practice. In most cases the residual will suffer lossy coding to achieve greater compression.

Although motion compensation works very well for the moving objects themselves, it has one fundamental failing, which is in dealing with the background. When an object moves, the leading edge of the object proceeds to conceal the background. Optic flow axes joining background data in earlier pictures must end there and there is no temporal redundancy going forward in time. This doesn’t matter, because when some of the background is concealed the decoder doesn’t care. That part of the screen will be filled by motion compensated pixel data representing the object in a new location.

The problem arises at the trailing edge of a moving object, because new optic flow axes begin there and areas of background are revealed that have either never been seen before or were seen so long ago that they are outside the temporal window of the encoder. Nothing from an earlier picture can be used to describe revealed background. On a conventional time axis it represents new information that the decoder cannot know and which has to be sent as residual data when prediction fails.

Fig.2 - Revealed background cannot be found in an earlier picture, but can be sourced from a later picture.

Fig.2 - Revealed background cannot be found in an earlier picture, but can be sourced from a later picture.

However, as optic flow axes begin where background is revealed, it follows that there may be redundancy between revealed picture areas and similar areas in later pictures. Fig.2 shows that it should be possible to bring pixel data back from the future, again with the help of motion compensation if necessary. Without the possession of clairvoyant powers this is not going to happen in real time, but when a series of pictures are held in frame stores random access is not difficult and data can be brought from a future picture with the same ease as from a previous picture.

This is the basis of bidirectional coding, as shown in Fig.3. Using reference or anchor pictures, transmitted in advance, pictures in between can be decoded by taking pixel data from earlier or later pictures or both, using motion compensation directed by vectors. In many cases a given picture will be part way along an optic flow axis that extends into the past and into the future. Redundancy may be found in both directions but one direction may reveal more redundancy than the other.

The compression formats do not state how an encoder should work, only the commands it is allowed to issue, so encoder designers have a lot of freedom to try different methods. In bidirectional coders it is possible to try several different prediction methods in parallel on a macroblock by macroblock basis and to select the one that results in the smallest amount of residual data. One could try forward prediction and backward prediction and compare the results. The decoder doesn’t care as it just executes the instructions from the encoder. 

In order to prevent generation loss, B pictures are not decoded from one another, but are individually encoded. In MPEG-2 it was only possible to take picture data from a single picture before and a single picture after the target picture and it was also possible to take both and perform a linear interpolation. In AVC, the idea was extended so that picture data could be taken from a greater number of places and the interpolation could be weighted to support a larger number of B pictures between anchors.

Fig.3 - In bidirectional coding, B pictures are predicted from anchor pictures before and after in time, but are never predicted from one another.

Fig.3 - In bidirectional coding, B pictures are predicted from anchor pictures before and after in time, but are never predicted from one another.

The need to transmit pictures in the wrong order means that when bidirectional coding is used, it is inevitable that the codec delay must extend. But the use of bidirectional coding allows a significant improvement in compression factor for the same perceived quality, so the delay is tolerated. The encoder re-arranges the time base of the incoming pictures, and the memory of the decoder is extended so that it can act as a time base corrector to put the outgoing pictures in the correct order after they have been decoded. At this point any similarity to a traditional television signal has completely vanished.

It should also be obvious that nothing can be done with such an encoded bit stream except to decode it. Any attempt to edit the bitstream in the middle of a GOP will crash the decoder. Long GOP bidirectional coding can only be applied for final delivery after all post production steps have been taken. In general the GOP are not independent of one another, as the first picture of the next GOP may be needed as an anchor to decode the last pictures of the previous GOP. A special GOP known as a closed GOP can be encoded. This ends with pictures that are only backward encoded, so that a bitstream switch can be made at the end of a closed group and the result will look like a cut edit. 

Having access to pictures before and after, a bidirectionally coded picture is able to find any redundancy that is available, with the result that prediction is very effective and the amount of residual data needed to correct prediction errors is very small. The data in the output of a coder that describes bidirectionally encoded pictures is disproportionately small.

Fig.4 shows a bidirectionally encoded bitstream in which the re-ordering and the differing amounts of data needed by the picture types can be seen.

Fig.4 - A bidirectionally coded bitstream showing re-ordering and the smaller quantity of data needed to encode B pictures.

Fig.4 - A bidirectionally coded bitstream showing re-ordering and the smaller quantity of data needed to encode B pictures.

Bidirectional coding is extremely powerful and forms the heart of most modern video codecs. It depends totally on re-ordering pictures and then getting them back in the correct order at the decoder. This is done using time stamps, which are a specialised form of time code. There are two kinds of time stamp needed in bidirectional coding. The first of these simply stamps pictures entering the encoder with time codes that are sequential. These are known as presentation time stamps and will be used to put pictures out of the decoder in the right order, no matter what happens to them in the mean time. A second type of time stamp is also needed to direct the decoder to use its resources in the correct order.

For example a bidirectionally encoded picture cannot be decoded unless the pictures on which it depends for source data have already been decoded. The decode time stamp specifies when a future picture needs to be decoded so that it will be available for use as an anchor to decode another earlier picture.

Whenever the subject of increased frame rates is mooted, there will inevitably be a protest that these require increased data rates that cannot be afforded. A moment’s thought will reveal that in the compression domain, this is simply not true. In Fig 3, a significant increase in frame rate could be had by using longer groups of pictures (GOPs), putting more B pictures in between the anchors, with an increase in data rate of only a few percent. It should also be realised that when the frame rate of a TV signal is increased, the pictures get closer together in time and become more highly redundant. It is also easier to measure motion between them, the amount of motion will be smaller and the vectors can be described with fewer bits.

You might also like...

Live Sports Production: Part 1 - New Sports Production Workflows

Welcome to Part 1 of ‘Live Sports Production’ - This new multi-part series uses a round table style format to explore the technology of live sports production with some of the industry’s leading system designers. It is a fascinating insight i…

Automating HDR-SDR Conversion

Automation seems like an obvious solution but effective conversion involves understanding what the image content is and therefore what the priorities are for how it should look.

Building Software Defined Infrastructure: Virtualization Vs Microservices

How virtualization and microservices differ, and workflows where virtualization and microservices would be used or avoided in terms of reliability, flexibility and security.

IP Security For Broadcasters: Part 8 - RADIUS Network Access

Maintaining controlled access is critical for any secure network, especially when working with high-value media in broadcast environments.

Standards: Part 25 - Designing Client-Side Video Players

Here we chart the historical development of client-side video players, describe the building blocks used to create them and the relevant standards.