Understanding Compression Technology: Motion Estimation Part 2

Part 1 of this article covered multiple aspects of compression technology: macroblocks, DCT, quantization, and lossless compression. Part 2 will focus on motion estimation for P- and B-frames—after a review of lossless data compacting. Part 3 (next month) will detail the critical role of Predicted frames and Difference frames in maintaining image quality.

Compression, defined as the removal of information deemed by an encoder’s designer as not essential to a successful transfer of visual information can be performed in multiple ways: reduction of chroma information (i.e., 4:2:2 to 4:2:0 colorspace); noise reduction; quantization, and lossless data compacting. (The definition of “successful” is based upon an encoder’s design specification.)

The quantization process is the primary locus of compression. During quantization the 64-cell coefficient matrix from a DCT is multiplied by a pre-defined quantization matrix. (Pre-defined here means based upon an encoder’s design.) The definition is, however, relative to whether Variable Bit Rate (VBR) or Constant Bit Rate (CBR) encoding is applied. When the compression factor—represented by a quantization matrix—is kept constant under varying image complexity, the result is Variable Bit Rate (VBR) encoding. Alternatively, by monitoring the output bit-rate and dynamically altering the compression factor, data output is smoothed thereby yielding Constant Bit Rate (CBR) encoding.

Lossless data reduction follows quantization and employs Variable Length Coding (VLC) and then Run Length Coding (RLC). Both processes reduce information by compacting the data that results from quantization. Like ZIP file compression, no information is lost. Once the luminance (Y) blocks have been compressed, the Cb and Cr blocks are compressed. All compressed blocks are then stored.

This compression process is the same for intra-frame encoding, and for generating an I-frame as part of inter-frame encoding in which a series of video frames are compressed into data contained within in a sequence of several types of data frames. One frame type, of course, is the “I” frame. Two other types of frames that can be found in a GOP (Group Of Picture) data stream include “P” (predictive) frames and “B” (bi-predictive) frames.

A P-frame contains the information needed to recreate a video frame in conjunction with the information from a previous closest I-frame or a previous closest P-frame. A B-frame contains the information needed to reconstitute a video frame when combined with information from: a previous closest I-frame; a previous closest P-frame; a future closest I-frame (open GOPs only); and a future closest P-frame. (A closed GOP is never dependent on information from another GOP.) Frame dependencies for 15-frame open and closed GOPs are shown below.

Compression relies on three types of frames. The first is the “I” or inter-frame. Two other types of frames that can be found in a GOP (Group Of Picture) data stream include “P” (predictive) frames and “B” (bi-predictive) frames.

(H.264 introduced the concept of slices—segments of a picture bigger than a macroblock but smaller than a frame. A P-slice depends only on a single link (motion vector) to another slice. A B-slice depends on two links. Unlike MPEG-2, “bi” here means two links—not two dependency directions.)

Motion estimation may be one of the most interesting digital technologies developed because in a way it “sees” the movement of objects over time in a sequence of video frames. One application of this technology is the creation of intermediate frames when, to avoid LCD display motion blur, 60fps video is converted to 120fps or 240fps video. Motion estimation is also employed when generating P-frames and B-frames. It is the first stage of creating P-frames and B-frames.

Video frames that will become P- and B-frames are partitioned into macroblocks in the same way as done for an I-frame. Starting with the first (uppermost, leftmost) macroblock, which is contained in what is called the Present Image, a search is made to determine where it’s content can be found in the next video frame, which is called the Adjacent Image.

The first comparison is made in the Adjacent Image at X=0 and Y=0 coordinates to determine if the Present Image’s first macroblock remains at its initial location. To determine whether or not a macroblock has moved, a content match is made between the Present Image and the Adjacent Image. (To measure the strength of a match, a correlation technique is used. The correlation must be above a defined threshold to be a match.) When the contents of a macroblock have not moved, the macroblock’s Motion Vector is set to zero.

When a match is not found at X=0 and Y=0, in a methodical pattern, the Present Image’s comparison macroblock is moved at an increasing, but limited, distance from its origin until there is a match— or no ultimately no match. Movement size is typically one PEL (Picture Element, e.g., a pixel), although a step-size of ½ PEL can be employed. The maximum number of X and Y steps allowed defines the search window shown by a red square, below.

Motion estimation can be considered as “seeing” the movement of objects over time in a sequence of video frames. This is often used to avoid LCD display motion blur; 60fps video is converted to 120fps or 240fps video. It is the first stage of creating P-frames and B-frames.

The displacement (direction and distance) moved until a match is made, determines a macroblock’s motion vector. A motion vector (the small arrow) is shown below.

Video frames that will become P- and B-frames are partitioned into macroblocks. Beginning with the first (uppermost, leftmost) macroblock, which is contained in what is called the Present Image, a search is made to determine where it’s content can be found in the next video frame, which is called the Adjacent Image.

Once a search is made for the first macroblock, additional searches are made for every macroblock within the Present Image. In this manner every macroblock within a Present Image is assigned a motion vector. A Present Image’s motion vectors are stored in a Motion Estimation block. Although an estimation block will ultimately be stored, it is first used to generate a Predicted Frame. This process will be detailed in Part 3.

The success of inter-frame compression depends on the contents of most macroblocks not moving from frame to frame. Therefore, the motion vector for most macroblocks is zero. When this assumption is violated, for example when an explosion fills the screen, you are likely to see a screen filled with ugly “macroblocking.”

You might also like...

Designing IP Broadcast Systems - The Book

Designing IP Broadcast Systems is another massive body of research driven work - with over 27,000 words in 18 articles, in a free 84 page eBook. It provides extensive insight into the technology and engineering methodology required to create practical IP based broadcast…

Demands On Production With HDR & WCG

The adoption of HDR requires adjustments in workflow that place different requirements on both people and technology, especially when multiple formats are required simultaneously.

If It Ain’t Broke Still Fix It: Part 2 - Security

The old broadcasting adage: ‘if it ain’t broke don’t fix it’ is no longer relevant and potentially highly dangerous, especially when we consider the security implications of not updating software and operating systems.

Standards: Part 21 - The MPEG, AES & Other Containers

Here we discuss how raw essence data needs to be serialized so it can be stored in media container files. We also describe the various media container file formats and their evolution.

NDI For Broadcast: Part 3 – Bridging The Gap

This third and for now, final part of our mini-series exploring NDI and its place in broadcast infrastructure moves on to a trio of tools released with NDI 5.0 which are all aimed at facilitating remote and collaborative workflows; NDI Audio,…