Compression: Part 11 - Synchronization
Synchronizing is a vital process in all television systems and the use of compression adds extra constraints.
Other articles in this series and other series by the same author:
Since television began to use scanning to divide the picture up into columns or lines, there has been a need for synchronization. In its earliest form, a second conductor alongside the video signal carried pulses that locked the scan of the display to the scan of the camera. A pair of signals could not easily be broadcast and this was overcome by dividing the voltage gamut into two regions, with video black level near the boundary.
This meant that all voltages above the boundary represented brightness information, whereas all voltages below the boundary represented synchronizing information. Since they never occurred at the same time, there was no problem. A further advantage was that the synchronizing pulses went from black to below black, so would never be seen on the screen.
The principle was adopted world-wide and formed the basis of television broadcasting for many decades. There were detail differences between the 60Hz and 50Hz formats, which divided the one volt video gamut slightly differently, but the idea was the same. Later the development of color TV required the addition of a burst during the horizontal blanking which allowed the receiver to recreate the subcarrier used in the chroma modulation.
With the advent of component digital interfaces for television production, the split gamut approach was discontinued. Better dynamic range in the digital domain was obtained by making the video gamut begin just below black and end just above white. A new approach to synchronizing was adopted. The all ones and all zeros codes were disallowed in picture areas so they could be reserved for synchronizing. Less than one percent of the gamut was lost to synchronizing, rather than 30 percent in the analog domain.
The digital video interfaces worked in real time. They were only digital in as much as the analog video voltage was expressed by a number. The timing was precisely that of the analog original. At the times where there would have been an analog sync pulse, there was a timing reference signal (TRS) using the reserved codes. This was followed by an ID code that identified where in the picture the pulse was. The serial digital interface (SDI) simply sent the same data one bit at a time down a single coaxial cable, but the message was otherwise the same.
Then along came compression, whose job was to minimize the data needed to describe television pictures. Analog TV and the SDI version were highly redundant because all of the blanking was retained. During blanking, data was being sent but it did not represent the picture. Compressors ripped out the blanking and assembled frames from active parts of active lines as pixel arrays.
Then the fun started, because compressors have interesting characteristics. After compression, the amount of data needed to express a picture is a variable, depending on what type of picture it is in the group and on the difficulty of the image. This means that with a constant bit rate, transmitting a picture takes a variable amount of time. With the use of bidirectional coding, the pictures are not even sent in their natural order.
The timing and synchronizing embedded in the traditional analog TV picture has been completely trashed. Worse than that the compressed data may be multiplexed with other data on its way to a decoder and thereby suffers an unknown amount of latency from the network.
With the traditional synchronizing gone, compression systems have to create new ways of synchronizing the decoder so that, amongst other things, the pictures come out at the right time. Clearly the associated audio must retain lip sync with the pictures as well.
Both the encoder and the decoder contain a limited amount of buffer memory to help even out the data rate. These devices are similar to time base correctors in that they work best when the memory is on average half full. One goal of synchronizing is to maintain the decoder buffers in that state. It is not enough that the encoder and the decoder clocks should be perfectly synchronized, because that merely keeps the average buffer content constant, without determining what it is.
The decoder receives data that have been buffered and multiplexed and transmitted in some way. The decoder cannot know what latency was caused by the upstream processes, but that latency must be added to the latency due to the compression codec, some of which is due to the coder and some of which is due to the decoder.
The system works as follows. A master clock derived from the same timing source as the video that has been encoded is recreated at the decoder. This clock drives a counter in both encoder and decoder. At salient times, such as the start of a picture, the encoder samples the state of the counter and adds it to the data stream. The sample is known as a presentation time stamp. Once it is up and running, the decoder is intended to receive the time stamps and to output the relevant picture when the state of the local counter is the same as the time stamp.
Fig.1 - The synchronization of a codec is achieved with a numerical phase locked loop at the receiver that filters out jitter in the timing from the encoder.
In order to get going, the decoder finds incoming time stamps. The decoder knows its own latency, so it can estimate how to jam the time counter to an appropriate value that gives the decoder time to operate. The first estimate will probably allow the decoder to function, but it will not necessarily do so with the buffer half full. Centering the buffer will require the time counter to be jammed to a modified value.
The master clock of a codec runs at 27MHz. This long-established frequency dates from the days of standard definition where it was a common multiple of the US and European line rates. The goal is to recreate that clock frequency at the decoder. The phase doesn’t matter. As the 27Mz frequency is tightly specified, the life of the decoder is made a little easier.
As Fig.1 shows, the 27MHz clock at the encoder drives a counter which is 48 bits long. The state of the counter is periodically sampled to produce a value called program clock reference (PCR) and added to the output multiplex. No attempt is made to set the count to a particular starting value as it doesn’t matter. When the counter reaches all ones it will overflow and start again.
The decoder contains a numerical phased locked loop (NLL). This is the same as a regular phase locked loop except that the parameters are numbers. As the specification of the 27MHz master frequency is so tight, the decoder NLL can use a voltage-controlled crystal oscillator (VXCO) that is only capable of small frequency changes. The NLL drives another 48-bit counter. On startup it will jam the state of the counter to that of the first received PCR. If the local oscillator is running too fast, when the next PCR is received, the local count will be ahead of it. If too slow, the local count will be behind the PCR.
If the local count is subtracted from the PCR count, the difference can be used to pull the frequency of the oscillator until it runs at the same frequency as the master. In an ideal world, every PCR would match every local count and the oscillator would need no adjustment. In the real world, the data arriving at the decoder suffers variable latency and jitter, so the two counts seldom match. This doesn’t matter because the count difference is heavily filtered before it reaches the VXCO. The result is that the jitter and latency in the PCR counts is averaged out.
The 27MHz master clock is divided down to 90KHz at both encoder and decoder and drives a counter. At the beginning of a picture, the encoder samples the count and incorporates it in the multiplex as a presentation time stamp. As all pictures have exactly the same duration in video, the time stamps are highly redundant and need to be sent for every picture.
When bidirectional coding is used, it is necessary to send and decode a future picture ahead of time so that a bidirectionally coded picture can bring data back from the future. The future picture will have a presentation time stamp that ensures it appears on the screen at the right time, but the decoder needs to know that the picture must be decoded ahead of time. That is the purpose of the decode time stamp, (DTS) that will be multiplexed into a picture sent out of sequence.
Once the decoder is up and running with its PCR locked and its buffer memories centered, pictures will emerge at exactly the same rate as their source. There will, however, be an unknown delay between the timing of the pictures at the source and at the decoder, due to latency in the codec and in the transmission system. The system cannot be synchronous but instead is isochronous, meaning it runs at the same frequency but not at the same time.
You might also like...
HDR & WCG For Broadcast: Part 3 - Achieving Simultaneous HDR-SDR Workflows
Welcome to Part 3 of ‘HDR & WCG For Broadcast’ - a major 10 article exploration of the science and practical applications of all aspects of High Dynamic Range and Wide Color Gamut for broadcast production. Part 3 discusses the creative challenges of HDR…
IP Security For Broadcasters: Part 4 - MACsec Explained
IPsec and VPN provide much improved security over untrusted networks such as the internet. However, security may need to improve within a local area network, and to achieve this we have MACsec in our arsenal of security solutions.
IP Security For Broadcasters: Part 3 - IPsec Explained
One of the great advantages of the internet is that it relies on open standards that promote routing of IP packets between multiple networks. But this provides many challenges when considering security. The good news is that we have solutions…
The Resolution Revolution
We can now capture video in much higher resolutions than we can transmit, distribute and display. But should we?
Microphones: Part 3 - Human Auditory System
To get the best out of a microphone it is important to understand how it differs from the human ear.