Understanding Compression Technology: Motion Estimation Part 2

Part 1 of this article covered multiple aspects of compression technology: macroblocks, DCT, quantization, and lossless compression. Part 2 will focus on motion estimation for P- and B-frames—after a review of lossless data compacting. Part 3 (next month) will detail the critical role of Predicted frames and Difference frames in maintaining image quality.

Compression, defined as the removal of information deemed by an encoder’s designer as not essential to a successful transfer of visual information can be performed in multiple ways: reduction of chroma information (i.e., 4:2:2 to 4:2:0 colorspace); noise reduction; quantization, and lossless data compacting. (The definition of “successful” is based upon an encoder’s design specification.)

The quantization process is the primary locus of compression. During quantization the 64-cell coefficient matrix from a DCT is multiplied by a pre-defined quantization matrix. (Pre-defined here means based upon an encoder’s design.) The definition is, however, relative to whether Variable Bit Rate (VBR) or Constant Bit Rate (CBR) encoding is applied. When the compression factor—represented by a quantization matrix—is kept constant under varying image complexity, the result is Variable Bit Rate (VBR) encoding. Alternatively, by monitoring the output bit-rate and dynamically altering the compression factor, data output is smoothed thereby yielding Constant Bit Rate (CBR) encoding.

Lossless data reduction follows quantization and employs Variable Length Coding (VLC) and then Run Length Coding (RLC). Both processes reduce information by compacting the data that results from quantization. Like ZIP file compression, no information is lost. Once the luminance (Y) blocks have been compressed, the Cb and Cr blocks are compressed. All compressed blocks are then stored.

This compression process is the same for intra-frame encoding, and for generating an I-frame as part of inter-frame encoding in which a series of video frames are compressed into data contained within in a sequence of several types of data frames. One frame type, of course, is the “I” frame. Two other types of frames that can be found in a GOP (Group Of Picture) data stream include “P” (predictive) frames and “B” (bi-predictive) frames.

A P-frame contains the information needed to recreate a video frame in conjunction with the information from a previous closest I-frame or a previous closest P-frame. A B-frame contains the information needed to reconstitute a video frame when combined with information from: a previous closest I-frame; a previous closest P-frame; a future closest I-frame (open GOPs only); and a future closest P-frame. (A closed GOP is never dependent on information from another GOP.) Frame dependencies for 15-frame open and closed GOPs are shown below.

Compression relies on three types of frames. The first is the “I” or inter-frame. Two other types of frames that can be found in a GOP (Group Of Picture) data stream include “P” (predictive) frames and “B” (bi-predictive) frames.

(H.264 introduced the concept of slices—segments of a picture bigger than a macroblock but smaller than a frame. A P-slice depends only on a single link (motion vector) to another slice. A B-slice depends on two links. Unlike MPEG-2, “bi” here means two links—not two dependency directions.)

Motion estimation may be one of the most interesting digital technologies developed because in a way it “sees” the movement of objects over time in a sequence of video frames. One application of this technology is the creation of intermediate frames when, to avoid LCD display motion blur, 60fps video is converted to 120fps or 240fps video. Motion estimation is also employed when generating P-frames and B-frames. It is the first stage of creating P-frames and B-frames.

Video frames that will become P- and B-frames are partitioned into macroblocks in the same way as done for an I-frame. Starting with the first (uppermost, leftmost) macroblock, which is contained in what is called the Present Image, a search is made to determine where it’s content can be found in the next video frame, which is called the Adjacent Image.

The first comparison is made in the Adjacent Image at X=0 and Y=0 coordinates to determine if the Present Image’s first macroblock remains at its initial location. To determine whether or not a macroblock has moved, a content match is made between the Present Image and the Adjacent Image. (To measure the strength of a match, a correlation technique is used. The correlation must be above a defined threshold to be a match.) When the contents of a macroblock have not moved, the macroblock’s Motion Vector is set to zero.

When a match is not found at X=0 and Y=0, in a methodical pattern, the Present Image’s comparison macroblock is moved at an increasing, but limited, distance from its origin until there is a match— or no ultimately no match. Movement size is typically one PEL (Picture Element, e.g., a pixel), although a step-size of ½ PEL can be employed. The maximum number of X and Y steps allowed defines the search window shown by a red square, below.

Motion estimation can be considered as “seeing” the movement of objects over time in a sequence of video frames. This is often used to avoid LCD display motion blur; 60fps video is converted to 120fps or 240fps video. It is the first stage of creating P-frames and B-frames.

The displacement (direction and distance) moved until a match is made, determines a macroblock’s motion vector. A motion vector (the small arrow) is shown below.

Video frames that will become P- and B-frames are partitioned into macroblocks. Beginning with the first (uppermost, leftmost) macroblock, which is contained in what is called the Present Image, a search is made to determine where it’s content can be found in the next video frame, which is called the Adjacent Image.

Once a search is made for the first macroblock, additional searches are made for every macroblock within the Present Image. In this manner every macroblock within a Present Image is assigned a motion vector. A Present Image’s motion vectors are stored in a Motion Estimation block. Although an estimation block will ultimately be stored, it is first used to generate a Predicted Frame. This process will be detailed in Part 3.

The success of inter-frame compression depends on the contents of most macroblocks not moving from frame to frame. Therefore, the motion vector for most macroblocks is zero. When this assumption is violated, for example when an explosion fills the screen, you are likely to see a screen filled with ugly “macroblocking.”

You might also like...

HDR & WCG For Broadcast: Part 3 - Achieving Simultaneous HDR-SDR Workflows

Welcome to Part 3 of ‘HDR & WCG For Broadcast’ - a major 10 article exploration of the science and practical applications of all aspects of High Dynamic Range and Wide Color Gamut for broadcast production. Part 3 discusses the creative challenges of HDR…

IP Security For Broadcasters: Part 4 - MACsec Explained

IPsec and VPN provide much improved security over untrusted networks such as the internet. However, security may need to improve within a local area network, and to achieve this we have MACsec in our arsenal of security solutions.

Standards: Part 23 - Media Types Vs MIME Types

Media Types describe the container and content format when delivering media over a network. Historically they were described as MIME Types.

Building Software Defined Infrastructure: Part 1 - System Topologies

Welcome to Part 1 of Building Software Defined Infrastructure - a new multi-part content collection from Tony Orme. This series is for broadcast engineering & IT teams seeking to deepen their technical understanding of the microservices based IT technologies that are…

IP Security For Broadcasters: Part 3 - IPsec Explained

One of the great advantages of the internet is that it relies on open standards that promote routing of IP packets between multiple networks. But this provides many challenges when considering security. The good news is that we have solutions…