Understanding Compression Technology: Predicted Frames and Difference Frames. Part 3.

Image courtesy of XIPH.ORG

In Part 2 of this series on Compression Technology we learned how Motion Vectors are generated when motion estimation is employed as the first step of creating P-frames and B-frames. In Part 3 we’ll learn how these motion vectors are used to generate Predicted Frames.

Let’s review the nature of P- and B-frames by first looking at forward dependencies. Two types of frames serve as references for other frames: an I-frame can support a future P-frame and/or B-frame. And, a P-frame can support a future P-frame and/or B-frame. Put a different way, a P-frame and/or a B-frame can be dependent on a previous I-frame or a previous P-frame. Arrows that point leftward in the Closed GOP diagram below show such dependencies.

Dependencies among I-, P-, and B-Frames (Apple).

Video frames that will become P- and B-frames are partitioned into macroblocks in the same way as is done for an I-frame. Starting with the first macroblock in the Present Image (current video frame) a search is made to determine where it’s content can be found in the Adjacent Image (next video frame). When the contents of a macroblock have not moved, the macroblock’s motion vector is set to zero.

When a match is not found at X=0 and Y=0, the Present Image’s comparison macroblock is moved at an increasing distance from its origin until there is a match—or ultimately no match. Once a search is made for the first macroblock, additional searches are made for every macroblock within the Present Image. In this way every macroblock within a Present Image is assigned a motion vector. A Present Image’s motion vectors are stored in a Motion Estimation Block. Although an estimation block will ultimately be stored, it is first used to generate a Predicted Frame.

Below, the upper-left image is the Present Image. The upper-right image is the Adjacent Image (next video frame). One difference that has occurred between the capture of the Present Image and the capture of the Adjacent Image is obvious – the person has opened their eyes.

Steps to Generate a Predicted Frame (Wang).

Steps to Generate aPredicted Frame (Wang)
The lower-left image is the Adjacent Image with the calculated motion vectors superimposed. These motion vectors are applied to the Present Image (current video frame) to construct a Predicted Frame. Simply put, the vectors move macroblocks in the Present Image to new locations. The lower-right image shows the generated Predicted Frame.

Ideally, these vectors would move pixels exactly to their new locations. However, as shown, the Predicted Frame has errors. To eliminate motion estimation errors, a Difference Frame is created.

A Difference Frame is generated by subtracting the Adjacent Image (current video frame) from the Predicted Frame (next video frame). Were the motion vectors able to create a perfect Predicted Frame, the Predicted Frame would match the Adjacent Image and the Difference Frame would be empty. With motion video, likely there will be information in the Difference Frame – as shown below.

Difference Frame (Wang).

The Difference Frame is compressed (DCT) after which lossless data reduction is applied (VLC and RLC). This is the same process used to compress an I-frame. The motion estimation blocks are also VLC and RLC compressed. The compressed Difference Frame along with the compressed motion estimation block are then stored.

To summarize the compression process; each I-frame is intra-frame compressed and stored in a long-GOP stream. Each compressed P-frame includes two types of information: a motion estimation block and a Difference Frame. (Each compressed B-frame has two motion estimation blocks and two Difference Frames.)

As a stream is uncompressed, an I-frame is re-created by reversing its lossless compression and then performing an Inverse DCT. This yields a Present Image that is output as a video picture. (A Present Image can be obtained from a previous I- or P-frame.) When a P-frame is encountered in a long-GOP stream, its motion estimation block is uncompressed. These vectors are applied to the Present Image to create a Predicted Frame.

Next, the P-frame’s Difference Frame is re-created by reversing its lossless compression and then performing an Inverse DCT. With both a Predicted Frame and Difference Frame available, an Adjacent Image – output as a video picture – is generated by using the Difference Frame to correct errors in the Predicted Frame. (A B-frame’s single Adjacent Image is obtained from a previous I- or P-frame by appropriately employing two Difference Frames to correct errors in two Predicted Frames.)

This process is repeated for the remaining frames in each GOP. When the next I-frame is encountered, the process is repeated. Although P- and B-frames are more efficient – require less stored data – than are I-frames, because of the use of Difference Frames they have the same visual quality.

You might also like...

Live Sports Production: Backhaul In Live Sports Production

Getting content reliably and securely from venue to studio remains key to live sports production so here we discuss the technology and services required.

Local TV In The U.S.A – 1967 Style

Our very own TV pioneer shares recollections of local TV in the US from his start in 1967.

Monitoring & Compliance In Broadcast: Monitoring Delivery In The Converged OTA – OTT Ecosystem

Convergence or coexistence between linear broadcast, IP based delivery and 5G mobile networks creates new challenges for monitoring of delivery paths, both technically and logistically.

Live Sports Production: Broadcast Controllers & Orchestration In Live Sports Systems

As production infrastructure, processing resources and the underlying networks required become ever more complex, powerful tools are required to plan, deploy and monitor.

Monitoring & Compliance In Broadcast: Monitoring The Media Supply Chain

Why monitoring the multi-format delivery ecosystem starts with a holistic approach to the entire media supply chain.