Data Recording and Transmission: Part 15 - Error Handling II
Errors are handled in real channels by a combination of techniques and it is the overall result that matters. This means that different media and channels can have completely different approaches to the problem, yet still deliver reliable data.
Traditionally, error handling evolved to deal with received data that were not the same as the original data due to technical issues such as noise and dropout. However, as IT equipment entered the mainstream, another mechanism of data change arose, that of tampering; and the principles of error handling had to be extended to deal with that in applications such as blockchain.
Encryption shares some of the mathematical concepts behind error correction, except that the goal is to make the data meaningless to unauthorized recipients and the definition of an error is that it isn't.
Some of the key areas of error handling will be considered here. Error tolerance is a characteristic of the data to be recorded. How many un-corrected errors will be acceptable to real users? Error detection is fundamental and is the most critical part, since if we don't know there has been an error, the correction strategy becomes irrelevant as it cannot be invoked.
One fundamental of error correction codes is that they present a trade-off. Fig.1 shows that at low raw error rates, the final error rate is reduced, typically to a negligible level. The penalty is that at higher raw error rates the error correction system may be confused so that it mis-corrects and causes error propagation that makes the data worse. That situation must be avoided at all costs.
Figure 1 – Error correction showing the trade-off between low raw error rates and higher raw error rates (see text)
The performance of error correction math is limited to a certain size of error, so it is necessary to add a layer that we might call error containment; that limits the size of error the mathematical part sees.
Finally in extremis, it must be possible reliably to determine if correction is not possible because the extent or distribution of the erroneous symbols it too great. In some cases of uncorrectable error, concealment is possible instead.
The advantage of the use of binary or m-ary symbols becomes clear, because in discrete signaling a received symbol is either the right one or it isn't and there is no debate. If the error magnitude has been contained, the error can be computed and if it is added back to the received message then it is as if the error never took place. In contrast all values within the gamut of an analog signal are allowed so there is no way of determining the proportion of signal and noise.
Clearly some forms of data will not tolerate error. Machine code, for example, is error intolerant because a single bit in error in an instruction changes it to a different instruction and storage has to be essentially error-free. In the good old days when media were developed for specific purposes, such as audio or video recording, the correcting power of the media could be matched to the ability of the viewer or listener to detect uncorrected errors. As a result when the CD was adapted for data recording purposes to become CDROM, it needed an additional layer of error correction in order to be suitable for generic data rather than just audio.
If the data to be sent down the channel are a PCM representation of an audio signal or an image, the data will be in the form of samples expressed as binary numbers. The effect of an uncorrected bit error on the original waveform depends on its significance. An erroneous least-significant bit (LSB) is probably not detectible in real program material, whereas a most-significant bit (MSB) in error would insert a significant transient into the reproduced waveform.
This knowledge only affect the statistics of detectability, because if we knew which bit of the word was in error we would have corrected it! By definition in the case of an uncorrected error we don't know and the best we can do is to assume the worst, which is that a sample with an uncorrected error might be visible/audible and needs to be concealed.
Figure 2 – Error correcting block converting between rows and columns of data to detect and potentially correct for burst errors (see text)
Concealment consists of creating a sample having a plausible value by taking an average of those nearby. In audio the samples before and after the failing sample may be averaged. In images pixels above and below the erroneous pixel may also be used.
Concealment works well in video because the frame rates are so low and motion smear means that there is seldom much high frequency information in the picture. High definition is a marketing term, not a description of any image. Concealment works well in audio because the human auditory system can only detect distortion that is sustained and simply cannot hear the transient distortion due to the odd approximated sample.
Clearly concealment is only possible if the samples either side of the uncorrectable one remain correct. One solution is an odd/even interleave. Prior to being stored, samples are divided into odd and even data channels. In a block-based system, the odd and even samples will be stored in different blocks so if one of them is lost the waveform can be interpolated from the surviving block.
In convolutional interleaving, the odd and even samples are subject to different delays before being merged again. An equal delay is applied on reading to re-align the samples in the correct order. The occurrence of a burst error then results in two separate errors, one in only the even samples and one in only the odd samples.
Various factors have changed error correction strategies somewhat in recent years. One of them is that there are no more dedicated media. Today, storage devices are generic and may be required to store machine code or images or audio or whatever. The development of compression algorithms also changed the rules. Since compression removes redundancy from data, the data that are left must be more significant and so compressed audio or video requires higher data reliability than PCM.
The situation also benefits from the constant advance of microelectronics, which allows ever more complex processes to be used without increase in cost. Such processes include error correction.
Errors are statistical in nature and the size of an error is theoretically unbounded. In contrast, error-correcting codes are mathematical and the size of the errors they can correct is finite. Providing a code that can correct large errors to allow for their occasional occurrence is inefficient as it increases the complexity of the processing before and after the channel as well as requiring more redundancy.
A better solution is to employ interleaving. If an error-correcting code can correct, for example, an error of up to one byte, we simply arrange that adjacent bytes in a block of data sent into the channel form part of different codes.
Fig.2 shows the general idea. Data bytes are temporarily written into a memory one row at a time. Each row is made into an error correcting code. When the memory is full, the data are read out in columns to create blocks to pass into the channel.
On retrieving the data from the channel, the interleaving process is reversed by writing data into a memory in columns and reading it out in rows to re-assemble the codes. In the event a large burst error occurs in the channel, after de-interleave the result will be that there will be no more than a byte in error in each error correcting code.
In the early days of digital technology, when memory remained expensive, an alternative to block-based interleave was developed by Sony that uses less memory. Fig.3 shows that in a convolutional interleave symbols are assembled into columns, which are then sheared by subjecting each row to a different delay. Codes are created on diagonals rather than on rows. This approach is used in the Compact Disc and CDROM.
An unusual approach that makes a convolutional code block-based is the block-completed interleave shown in Fig.4. This is best understood as the result of writing data on the surface of a cylinder, which is then twisted to shear the columns into diagonals.
Interleaving of this kind is extremely powerful as it allows the error-correcting codes to work at maximum efficiency while minimizing the risk that they will be faced, with an error beyond their power.
One consequence of the use of interleaving is that both interleave and the subsequent de-interleave cause delay. In the case of a storage medium that is of little consequence, but in the case of transmission it means that there is no longer any such thing as real time.
Digital television broadcasting is a good example, by the time the pictures have been compressed and interleaved at the transmitting side and de-interleaved and decoded at the receiver the images will be displayed well after their original timing. Broadcasters solved the problem by removing the seconds hand from the station clock.
For entertainment purposes such delays are of no real consequence, but they can be a real problem if used inside feedback loops, where the result may be instability. Applications such as fly-by-wire instead use parallel redundancy to deal with transmission errors. There are multiple wires or fibers taking different routes but carrying the same data.
You might also like...
IP Security For Broadcasters: Part 1 - Psychology Of Security
As engineers and technologists, it’s easy to become bogged down in the technical solutions that maintain high levels of computer security, but the first port of call in designing any secure system should be to consider the user and t…
Demands On Production With HDR & WCG
The adoption of HDR requires adjustments in workflow that place different requirements on both people and technology, especially when multiple formats are required simultaneously.
If It Ain’t Broke Still Fix It: Part 2 - Security
The old broadcasting adage: ‘if it ain’t broke don’t fix it’ is no longer relevant and potentially highly dangerous, especially when we consider the security implications of not updating software and operating systems.
Standards: Part 21 - The MPEG, AES & Other Containers
Here we discuss how raw essence data needs to be serialized so it can be stored in media container files. We also describe the various media container file formats and their evolution.
NDI For Broadcast: Part 3 – Bridging The Gap
This third and for now, final part of our mini-series exploring NDI and its place in broadcast infrastructure moves on to a trio of tools released with NDI 5.0 which are all aimed at facilitating remote and collaborative workflows; NDI Audio,…