Motion Pictures: Part 2 - Optical Flow Axis

There is no motion in the static frames of a movie. The motion is purely in the imagination of the viewer. But how does it work?

The human visual system (HVS), in common with the visual systems of other forms of life is pretty remarkable. Light entering the eyes is converted into some kind of time variant image which is somehow communicated to the brain down a set of nerves. It is well known to those who work in the field that the bandwidth of nerves is very low indeed.

Yet even the old-fashioned standard definition (525/625) television signal required 270 megabits per second to send down an SDI cable and four or five megahertz of bandwidth to broadcast, although it was using tricks like interlace, color difference working and gamma to reduce the information rate.

There is no possible way a nervous system can handle bandwidth like that, yet the HVS can discern shortcomings in an SDTV picture and appreciate recent developments such as high definition, high dynamic range, and extended color gamut, all of which drive the information rate higher.

Clearly there must be something clever going on in the HVS: very clever indeed. We are often told the human eye is like a camera, but that is a misleading statement. It has a lens and an iris and an image forming sensor, but there the similarity ends. I am here writing this because my genes evolved to survive, and the visual system is an important consequence of that evolution.

Evolution reacted to the fact that in the typical field of view most of what can be seen is of no consequence and is static and neither a threat nor an opportunity. Threats in particular are seldom static. What evolved was a system in which most of what we think we see comes from a kind of mental frame store that will be updated if any movement is detected.

This is where the eye departs from being a camera. Most of the area of the retina is a motion detector. It is color blind and has poor resolution. Right in the center of the retina is a small area called the fovea that has high acuity and color vision. The eye has the ability to swivel left and right; up and down, so that if the motion detection spots something, the eye can turn to place it on the fovea so it can be seen in detail.

The eye scans a scene to fill up the frame store and from then on motion detection allows the frame store to be updated. All of this cuts dramatically the amount of information the HVS has to handle. A further and highly significant consequence of eyes that can move is that objects in motion can be tracked and brought to rest on the retina, eliminating motion smear and cutting information content. This is just as well, because the temporal response of the HVS is quite slow, whereas moving detail produces high frequencies.

Essentially living visual systems anticipated MPEG-2 in having motion detection that allows them to look along an optic flow axis. The motion vectors control the eyeballs and holding the image static on the retina means it doesn’t change and can be described with lower bandwidth. The key point to remember is that in all visual situations the HVS will attempt to use eye tracking. In real life it usually succeeds. When viewing artificially reproduced moving images the success may be only partial because of shortcomings in the system.

Fig.1 - The x, y plane is the plane of the image and of course the time axis t is orthogonal to that. However, the optic flow axis is not orthogonal when anything moves and it can be projected on the image plane.

Fig.1 - The x, y plane is the plane of the image and of course the time axis t is orthogonal to that. However, the optic flow axis is not orthogonal when anything moves and it can be projected on the image plane.

The traditional view that moving pictures can be described with three mutually orthogonal axes, x, y and z is incomplete. In the presence of eye tracking, x and y still exist in the image plane, but the important third axis is the axis of optic flow. As Fig.1 shows, the optic flow axis is typically not at right angles to the image plane and so it is not orthogonal. Put more plainly, things that are done on the time axis that is orthogonal to the image plane can still affect the image because actions on the time axis reflect off the optic flow axis and into the image plane.

If that sounds a bit academic, consider a fixed still camera shooting a moving object. The film is in x and y, and the shutter works in z, all three are mutually orthogonal. But the moving object isn’t moving along the z axis. Photographers soon learn that shooting moving objects requires short shutter speeds. If x, y and z are truly orthogonal, that wouldn’t make any difference. But it does. The object in motion has a component of movement across the image plane, so the shutter speed controls how far it moves.

In real life, optic flow axes tend to be straight for constant motion, or curved when objects change course or speed. The motion portrayal ability of imaging systems is basically the accuracy with which these axes are reproduced. None of today’s moving picture systems have good motion portrayal; it is just that 24Hz movies are even worse.

The reason that motion portrayal is so important is eye tracking. In real life, a moving object is tracked by the eye which can then extract detail from it. If in a motion picture system, the eye cannot track well, the detail seen by the viewer will be reduced.

When movies were in their infancy, test procedures simply copied what had been done in photography and the static resolution of the system was measured. Unfortunately, static resolution turned out to be virtually useless as a way of comparing moving images. Static resolution can be thought of as a kind of bound that cannot be exceeded. It is the resolution obtained when nothing moves.

Clearly movies and TV programs in which nothing moves are a long way from reality. When things do move, in the real word, the resolution actually obtained will always be less than the static resolution. The amount by which resolution is reduced depends upon the system and how it is used. It follows that the worse the system itself is, the more carefully it has to be used. One source of the film look is that it reflects that extra care.

Fig.2 - A traditional film projector has four phases within the 1/24 sec. frame. In phases 2 and 4 the screen is dark, but in phases 1 and 3 the same picture is on the screen. Relative to the tracking eye (shown dotted) there will be double images above a certain speed.

Fig.2 - A traditional film projector has four phases within the 1/24 sec. frame. In phases 2 and 4 the screen is dark, but in phases 1 and 3 the same picture is on the screen. Relative to the tracking eye (shown dotted) there will be double images above a certain speed.

A motion picture system based on frames is a sampling system. It is now common knowledge that sampling systems need to be preceded by a low-pass filter to prevent aliasing and followed by a further filter to return the sampled data to the continuous domain. Whilst that would be true for systems such as digital audio, in motion pictures it is not done.

Although a movie or TV camera is a sampling device, there is no technology that allows something to be placed in front of the camera to prevent temporal aliasing. It’s impossible. In fact, Nature is doing us a favor, because such a filter will be seen in Part 3 to be undesirable. Equally there is no optical device known that can be fitted to a display to smooth the frames/samples into a continuum. Again, this doesn’t matter because the HVS contains such a filter, which makes any display filter undesirable.

Movies and television simply don’t seem to adhere to sampling theory and they run with no temporal filters at all except for the filtering of the HVS. It is not that conventional sampling theory is wrong, it is that a more sophisticated form of sampling theory is needed to deal with the motion of the eye and the existence of optic flow axes.

Fig.2 shows that one frame period in a classical film projector is split in to four parts of about 1/96 of a second. In the first part the film frame is projected on to the screen. In the second part the shutter blocks the light path, and the screen goes dark. In the third part the shutter opens again, and the same film frame is projected a second time. In the fourth part the shutter closes, and the film is pulled down to the next frame so the cycle can repeat. 

Fig.3 - Electronic projectors don’t have pull down, and don’t need to double project. Nevertheless, motion is still limited to low speeds as the images are smeared with respect to a tracking eye.

Fig.3 - Electronic projectors don’t have pull down, and don’t need to double project. Nevertheless, motion is still limited to low speeds as the images are smeared with respect to a tracking eye.

The projector does this in order to make the flicker frequency 48Hz. However, Fig.2 shows the unintended consequence which is that the optic flow axis compares rather badly with the original. At low motion speeds the two projected versions of the frame arrive on the retina with a small displacement that causes defocusing. At high speeds, the retina sees a double image.

Fig.3 shows the same images handled by an electronic projector. These need no dark period to pull the film down, so the image is displayed for the whole 1/24 second. To the tracking eye the image moves across the retina a distance proportional to motion speed, once more causing loss of resolution.

Paradoxically the last thing that motion pictures can do is to portray motion and much of the grammar of movies results from an effort to get rid of it. The tracking shot is the perfect example. Moving the camera with the action renders the action still relative to the camera and pushes any funny stuff into the background. Fancy a trip to the barely movies?

You might also like...

Designing IP Broadcast Systems - The Book

Designing IP Broadcast Systems is another massive body of research driven work - with over 27,000 words in 18 articles, in a free 84 page eBook. It provides extensive insight into the technology and engineering methodology required to create practical IP based broadcast…

Demands On Production With HDR & WCG

The adoption of HDR requires adjustments in workflow that place different requirements on both people and technology, especially when multiple formats are required simultaneously.

If It Ain’t Broke Still Fix It: Part 2 - Security

The old broadcasting adage: ‘if it ain’t broke don’t fix it’ is no longer relevant and potentially highly dangerous, especially when we consider the security implications of not updating software and operating systems.

Standards: Part 21 - The MPEG, AES & Other Containers

Here we discuss how raw essence data needs to be serialized so it can be stored in media container files. We also describe the various media container file formats and their evolution.

NDI For Broadcast: Part 3 – Bridging The Gap

This third and for now, final part of our mini-series exploring NDI and its place in broadcast infrastructure moves on to a trio of tools released with NDI 5.0 which are all aimed at facilitating remote and collaborative workflows; NDI Audio,…