Information: Part 1 - What Is Information?

Although we are said to live in an information society, the more that is considered, the less likely it appears. A good place to start is to consider what information is.


Other articles in this series:


   

I've looked at a lot of attempts to define information and it's quite difficult. About the best one I have found is that information is something that resolves uncertainty.

In the general case it may not be possible to resolve uncertainty completely. You might say the only thing that is certain is uncertainty. Receiving information may not resolve the uncertainty, but instead may only reduce it. This could happen if the information itself is insufficient, or if the message carrying it suffers some corruption en route.

Some new friends are coming to dinner. The uncertainty emerges: "Is there anything they don't eat?" - "I don't know." A couple of emails later, you know that Marge doesn't eat prawns. There's no question that the second email contained information because the uncertainty was resolved. Perhaps it is obvious in this context, but the resolution of a given uncertainty for a given recipient of a message can only happen once and so by definition it is novel or surprising. This means that a message that is not novel does not contain information.

Suppose instead the conversation went as follows: "Steve and Marge are coming to dinner. Steve says Marge doesn't eat prawns."...."Yes, Marge told me that the last time we spoke."

This time there was no uncertainty to resolve and no novelty or surprise because the pertinent facts were already known. "Steve says Marge doesn't eat prawns." cannot be information. Instead it was redundant. In their original form, most messages most of the time contain redundancy. Television signals contain a lot of redundancy. If they didn't it would not be possible to use compression on them.

A TV frame is an array of pixels. If all of the pixels were set to the same value, we wouldn't call the result a picture. We would say the same if the pixel values came from a random number generator.

In a real picture, there must be objects we can recognize, and these require that there must be correlation between the pixels that are within the edges of the object and not with those that are outside it. Once any correlation exists, the pixel values are no longer independent and the amount of information goes down. I think it is safe to say that signals carrying information can be compressed.

Fig.1 - Real equipment works on whole numbers of bits, or shannons, which represent the information capacity that cannot be exceeded. However, the amount of information could be less than the capacity and then fractional bits are required. The equation here shows how the number of bits of information is derived from the possible number of states.

Fig.1 - Real equipment works on whole numbers of bits, or shannons, which represent the information capacity that cannot be exceeded. However, the amount of information could be less than the capacity and then fractional bits are required. The equation here shows how the number of bits of information is derived from the possible number of states.

Murray Gell-Mann tried to address this problem with the concept of complexity, which better relates to what we consider to be a picture. According to complexity, neither the flat grey screen nor the screen full of random pixels is complex. Where we discern pictures is somewhere between those extremes.

There's another possible scenario to the dinner party uncertainty. Due to some freak of cyberspace, the message about Marge's diet is delivered to a farmer in Finland who doesn't speak English. It's the same message, and it sure is novel to the recipient, but it isn't information, because the Finnish farmer can't understand the message and even if he could, he doesn't harbor any uncertainties about the dietary requirements of someone he has never met.

What we can say so far is that information has been delivered when uncertainty is reduced. It is implicit that the recipient of the message must understand it in order to know that the uncertainty is reduced. That reduction can only take place once and so is novel. If the message is repeated it is redundant.

It is also implicit that information can only be received at a destination that is capable of some sort of thought. Until recently that would have been restricted to a sentient form of life, but today it is possible for machines to have sufficient sentience to receive and use information. But you can't send information to a rock.

A lot of engineering consists of making the residual uncertainty of a system small enough that it doesn't matter a whole lot. We like to think airliners are in that category. On the other hand if we are interested in protecting confidential material, what we are trying to do is to avoid reducing the uncertainty of our enemies, and we use encryption so that the message we send is not understood by unauthorized recipients.

Having some idea of what information is; the next thing we might do is to consider how to quantify it. The fundamental unit of information is the bit, also called the shannon. If a process can have only two outcomes, for example truth or falsehood: is the stable door locked or isn't it, one bit of information is delivered when the outcome is resolved. Strictly speaking the probability of the two states has to be equal. If it is not, less than one bit is delivered by the outcome.

In the real world the quantity of information and its significance are completely independent. When in 1987 the ferry Herald of Free Enterprise set out to cross the English Channel with the bow door still open, the transmission of a single bit from the door to the master could have avoided all the deaths and the loss of the vessel. It was subsequently found that no such information channel existed. The author had a personal interest in the matter, having travelled on the vessel a number of times without realizing it was a game of Russian roulette.

We need to be careful, because the shannon is a unit of information and does not need to have an integer value. Fractional bits of information are possible. However, when we come to hardware in which signals are represented by binary data, the number of bits in use is always an integer.

It is commonly thought that information measured in bits is restricted to digital systems. That is not true. There are very many systems in which the information is sent in a continuously variable way. The conventional microphone and the vinyl disk are but two examples.

These devices create what we call analog signals and in order to measure their information capacity we sample and quantize them. Let's look at quantizing first. If we choose a certain number of bits for our sample, Fig. 1 explains how many different states the sample can exist in. For example, four bits have 16 combinations, whereas 16 bits have 65,536 combinations.

It is possible to think of a sample with its integer number of bits as a container for information that does not have to be full. An eight-bit sample may convey less than eight bits of information. For example in digital eight-bit luma, black level is defined as 16 and peak white is defined as 235.

There are only 219 legal codes and the codes above and below are not found in an ideal system. Thus when the state of a luma sample is unknown, the uncertainty is bounded because there are only 219 possible outcomes, fewer than the 256 allowed by the wordlength. The greatest amount of information in an 8-bit luma sample is 7.8 bits. For reasons that will emerge, it could be even less.

It should be clear that, in the case of quantizing, the provision of more bits just results in more steps and the ability to quantize an infinitely variable analog signal actually requires an infinite number of bits, when the steps will have zero size and the quantizing error becomes zero.

This is the reason hi-fi enthusiasts claim that vinyl disks sound better than this modern digital rubbish. They cling to views like this because no qualifications or learning are needed to become an enthusiast, so they cannot understand why they are misguided.

Those with a little more learning will see that needing an infinite word length to quantize an analog signal suggests that it has infinite information capacity, which seems most unlikely. All real sources of information, whatever the technology involved, have finite capacity and so all real signals can be quantized using a finite number of bits without loss of information.

All real signals are bounded by physical limits such as clipping that prevents the signal exceeding a certain size and the existence of a noise floor, which interferes with small signals. In order to understand information, it is necessary to understand the antithesis of information, which is noise.

In turn, in order to understand noise it is necessary to have some grasp of statistics. Whilst statistics can be thought of as a branch of mathematics, there is a distinction, because when the topic of mathematics comes up, most people run a mile, whereas when the topic of statistics comes up they run two miles.

Statistics considers what can typically be expected from large numbers of events, and it follows that individual events within a population are not necessarily typical. In a population that doesn't understand statistics, there can be a few people who do; the exceptions that prove the rule.

You might also like...

HDR & WCG For Broadcast: Part 3 - Achieving Simultaneous HDR-SDR Workflows

Welcome to Part 3 of ‘HDR & WCG For Broadcast’ - a major 10 article exploration of the science and practical applications of all aspects of High Dynamic Range and Wide Color Gamut for broadcast production. Part 3 discusses the creative challenges of HDR…

IP Security For Broadcasters: Part 4 - MACsec Explained

IPsec and VPN provide much improved security over untrusted networks such as the internet. However, security may need to improve within a local area network, and to achieve this we have MACsec in our arsenal of security solutions.

Standards: Part 23 - Media Types Vs MIME Types

Media Types describe the container and content format when delivering media over a network. Historically they were described as MIME Types.

Building Software Defined Infrastructure: Part 1 - System Topologies

Welcome to Part 1 of Building Software Defined Infrastructure - a new multi-part content collection from Tony Orme. This series is for broadcast engineering & IT teams seeking to deepen their technical understanding of the microservices based IT technologies that are…

IP Security For Broadcasters: Part 3 - IPsec Explained

One of the great advantages of the internet is that it relies on open standards that promote routing of IP packets between multiple networks. But this provides many challenges when considering security. The good news is that we have solutions…