Motion Pictures: Part 6 - How We Might Achieve True Motion
John Watkinson continues his exploration of the potential for a true motion tv system that requires the complete removal of frame sampling to make each pixel a continuous representation of the image thus removing motion artefacts.
Other articles in this series and more series by the same author:
A true motion video system would reproduce the full-time history of each pixel at the display. At first sight this might seem to require tremendous bandwidth, but a longer look reveals it is not so.
The fundamental problem with traditional cinema and television is that they cannot work properly because sampling theory is violated. Correctly performed, sampling is reversible, and the reconstructed waveform is indistinguishable from the original within the intended bandwidth. In digital audio that happens in practice because suitable filters precede and follow the sampling stage. Not only that, but the filters work in the same domain.
This does not happen in traditional so-called moving pictures. The only filters available are the finite temporal response of the human visual system (HVS) and spatial smear due to the image moving across a sensor. With a still picture, such as a test card presented to a traditional TV system, there is no smear and the only changes the HVS sees at the display are along the time axis, so the individual pictures are not seen.
However, if there is a moving object of interest, the HVS will use eye tracking. The eyeball moves in order to superimpose the successive samples of the object in the same place on the retina. Once more the retina sees only temporal changes in the object and filters them out. However good the eye tracking may be, the resolution of the moving object will be diminished because of smear taking place at the sensor. The dynamic resolution falls below the static resolution.
Worse than that, Fig.1a) shows that the optic flow axis is rotated from the time axis when there is motion. For everything except the tracked object of interest, the discrete time samples at the frame rate reflect off the optic flow axis to become discrete samples in the image plane. That is where the judder or strobing comes from.
Long-term viewers are accustomed to strobing and may not regard it as a serious drawback. Probably the most striking feature of true motion video is that viewers could choose to follow any moving object with equal success. Such a situation occurs frequently in sport. Increasing pixel count, dynamic range and color gamut in later TV standards has not achieved that.
As Schreiber pointed out, raising the frame rate won’t stop strobing as the image plane samples just get closer together. The only solution is shown in Fig.1b) where continuous treatment of the time axis reflects from the optic flow axis to give smooth motion across the image plane. Traditional moving picture portrayal systems were put together under the technological constraints of the times and in the absence of any theoretical framework. This is often the case, for example the Wright Brothers could not buy a textbook on airplane design and what they built violated a lot of laws that were only discovered later.
Fig.1 - At a) in the presence of motion, the discrete images on the time axis in conventional moving pictures reflect off the optic flow axis to create discrete steps in the image plane. This is the source of judder. At b) in a true motion system there are no discrete images, and the time axis is continuous, so there can be no judder.
Today the situation is somewhat different as the technological restraints have receded and there is now a reasonably accurate theory of how motion is perceived. True motion video is simply an attempt, possibly the first one, to put that theory into practice. However different to what went before, if the theory says certain things need to happen to get smooth motion, that’s what needs to happen. There is no point in having a theory if it is to be ignored.
The most important factor in true motion is that at the display, the changes at each pixel must be continuous. It is perfectly practical to sample and digitize the signal for each pixel, provided the sampling rate is high enough and the waveform is properly reconstructed. The lowest practical sampling rate might be zero in some cases, or in the kilohertz region for worst cases. The temporal frequency response of sensors and displays would set a limit on the bandwidth available.
Probably the next most important factor in true motion is that the raw data from the sensor are massively redundant. Not only that, but the data exist in a form that lends itself to exploitation of that redundancy.
A third relevant point is that the excellent dynamic resolution of a true motion system means that it can function with a pixel count considerably smaller than is used in conventional systems. This eases the construction of the sensor and eases the raw bandwidth input to the coder.
In the case of a still scene, the temporal spectrum of each sensor pixel collapses and once the picture has been sent, no more data are necessary. In the case of a pan, most of the picture itself does not change: it just appears in different places on the screen. The only new data that need to be sent describe the previously concealed areas revealed by the pan. Existing motion compensation techniques take care of all of that.
Moving plain areas, such as the sky or featureless objects generate narrow temporal spectra and that can be exploited. Image areas that have deliberately been put out of focus by the use of depth of field also generate narrow temporal spectra. As has been shown in earlier articles, high temporal frequencies result only from moving detail.
In all practical codecs, entry points must exist at which a decoder can recover from un-correctable errors or begin to operate after a channel switch. In true motion coding the data can be halted at any time and a complete picture results. There is thus no difficulty in having periodic anchor pictures, the equivalent of the I picture in MPEG. As there is no frame rate, there can be no groups of pictures, so a new term would have to be found for the GOP of MPEG. As with MPEG the transmission of an anchor picture would put a bump in the data rate that would be absorbed by buffers in the usual way.
The goal of all codecs is to express the original information in ways that make it sparse. What that means is that any changes in the data are less frequent in any relevant domain. In true motion video, the most important step that can be taken is to abandon the time axis and transform the picture data onto any relevant optic flow axes. Motion compensation has been seen to be very effective in conventional codecs, but in every traditional case it is compromised because motion is difficult to estimate in a system that cannot properly reproduce motion.
In a true motion system that barrier no longer exists, and it becomes much easier not only to establish the direction of motion, but also to establish the boundaries of moving objects.
Once data are transmitted in that way, they become sparse. For example, when viewed along the optic flow axis, the temporal spectrum of an object that simply translates without changing its shape or appearance collapses to zero. The only data that need to be sent are those which describe areas revealed by the motion. The object itself needs zero bandwidth.
The path of an optic flow axis needs to be continuous so that motion vectors can be established anywhere. In most cases optic flow axes are either straight or form gentle curves and so can easily be expressed by a few data points and interpolated between them. To reduce latency, optic flow axes could be forward predicted and corrected with residual data.
In practice there will be changes to moving objects and the same approach is used as in MPEG, the codec image is compared at the encoder with the original and used to create a residual that corrects the coding errors. The residual in true motion would be a continuous history of the coding error of each pixel. The amount of residual data acts as a metric for the accuracy of the motion estimation, namely how well any optic flow axes have been found and described.
The decoder would receive a spatial image of an object and then smoothly move it along its optic flow axis so it appears in the correct place on the screen. Then residual information corrects the errors in that process for each pixel. The use of residuals allows systems to be created that are truly lossless and that would be attractive for production purposes.
Such a radical departure from tradition brings with it some problems. The sensor is probably the hardest device to implement as it must effectively output the state of each pixel at the same time. Most sensors are built in the form of shift registers, so the pixel information comes out serially. This approach is consistent with the universal use of line scanning in traditional television. For true motion, in order to obtain sufficient bandwidth for each pixel the serial approach is unlikely to be satisfactory.
In the absence of a suitable camera sensor, the system could still be demonstrated with an artificially rendered source. Image rendering already uses accurate motion internally and to produce traditional video the motion is sampled at a frame rate. This need not be the case and a rendering system capable of producing true motion is not especially difficult.
You might also like...
HDR & WCG For Broadcast: Part 3 - Achieving Simultaneous HDR-SDR Workflows
Welcome to Part 3 of ‘HDR & WCG For Broadcast’ - a major 10 article exploration of the science and practical applications of all aspects of High Dynamic Range and Wide Color Gamut for broadcast production. Part 3 discusses the creative challenges of HDR…
IP Security For Broadcasters: Part 4 - MACsec Explained
IPsec and VPN provide much improved security over untrusted networks such as the internet. However, security may need to improve within a local area network, and to achieve this we have MACsec in our arsenal of security solutions.
Standards: Part 23 - Media Types Vs MIME Types
Media Types describe the container and content format when delivering media over a network. Historically they were described as MIME Types.
Building Software Defined Infrastructure: Part 1 - System Topologies
Welcome to Part 1 of Building Software Defined Infrastructure - a new multi-part content collection from Tony Orme. This series is for broadcast engineering & IT teams seeking to deepen their technical understanding of the microservices based IT technologies that are…
IP Security For Broadcasters: Part 3 - IPsec Explained
One of the great advantages of the internet is that it relies on open standards that promote routing of IP packets between multiple networks. But this provides many challenges when considering security. The good news is that we have solutions…