Future Technologies: Artificial Intelligence In Image & Sound Creation

We continue our series considering technologies of the near future and how they might transform how we think about broadcast, with a discussion of how the impact of AI on broadcasting may be more about ethical standards than technical standards.

Like all new technologies, innovation can be used for both good and bad, but when the truth we are presented with isn’t as clear as it may first appear, then who do we believe?

Prior to AI, the world seemed much more naive and easier. A presenter would appear on the evening news, tell us about the news-worthy stories that had made it to the broadcast that day, and we would find ourselves educated and informed with a high degree of certainty. The BBC’s purpose was even defined in three words “inform, educate, entertain”. What could possibly go wrong?

AI is one of those technologies that can be easily misappropriated in the wrong hands. The world is awash with fake news whose prevalence seems to imply that we now find it difficult to debate the facts. Finding the truth is getting harder as we are inundated with videos of international senior politicians and leaders that are clearly fake. But these videos are not the issue, the real problem is the news that is almost good enough to be true but isn’t, this is where the real damage occurs.

AI Is Fundamentally Flawed

Teaching video and audio creation AI tools requires incredibly large datasets. In the realm of supervised learning then these datasets must be labeled by humans, who intrinsically have harbored their own versions of the truth and express them when classifying the data. It seems that bias is a trait of the human condition and no matter how liberal we think we are, then unconscious bias in particular influences our decision making.

It’s not unreasonable to assume that image and sound creation have at their core unconscious bias, and it’s almost impossible to get away from this. The good news is that this is well known, and we should be able to design systems around it. However, when a fake news item is almost convincing, a bit like a well-prepared phishing email purporting to be from your bank, sometimes you need to look really carefully to identify whether its real or not.

The next question is where do we draw the line when deciding what is fake and what isn’t? For example, if we show a news journalist in front of a green screen with an overlay of the Whitehouse, reporting on a political story in the US, then it’s probably not unreasonable to assume that the graphic has just been placed there to enhance the story and give it context. What if we take this a step further and place a journalist in front of a green screen and this time show an image of a riot they are reporting on, implying that the reporter is at the scene, but not explicitly stating that they are. In essence, this probably wouldn’t matter as the green-screen method isn’t the most advanced immersive technology and even to a lay-observer it is obvious that the image is a human creation. However, if we add AI into the mix then we can quite easily, using generative AI techniques, insert the journalist into the riot scene so it looks as if they are actually there.

Defining Fake

The question is, is this fake news? To answer this then we need to take a step back and ask the question, why do we send journalists all over the world to war zones and very dangerous places if we can use AI to do the job for us? This all comes down to proving the truth and removing as much bias as possible, or in other words, delivering credibility. After all, the whole point of news is to inform and educate.

Again, bias starts to appear, not just in the Gen-AI model, but also in the decision making of the people who decide which items are published and how. Why would a news editor publish one story and not another? Or give one story prominence over another? There are many factors involved but often, this comes down to human judgement. Which brings us full circle back to Gen-AI and the bias it may contain. Although companies are technically inanimate legal instruments, they are governed and controlled by humans with their own beliefs and biases. And this is where bias makes more sense if we consider it in the context of belief. Many of us fall into different classifications of belief systems, so it’s not unreasonable to assume that a company can tend towards a particular belief or group of beliefs.

Editorial Truth & Rigor

One of the overwhelming attributes of public service broadcasters the world over is that they are governed by a system of editorial and production guidelines. If applied correctly then these guidelines should in theory remove the possibility of bias as they provide the editors and program makers with a system of rules that should resolve any prejudice or conflict. When we dig a bit deeper into how the rules have been formulated then we soon realize that they are created by governments and their leaders. As these people are only human, then it’s not unreasonable to assume they also subscribe to a system of beliefs which in turn has some bias.

It's quite interesting that a technical article on AI video and audio creation, or Gen-AI as it otherwise known, is indeed heavily influenced by the human condition. But this shouldn’t come as such a surprise as the Gen-AI engine, or any other neural network type system, is merely a mapping of one domain of data onto another based on labelled datasets provided by humans, that is, it’s a mathematical function with parameters and rules. But these parameters and rules are “taught” by humans, so the model has the potential to become a simile of the people creating the labels with the associated and inherited bias.

One way of reducing bias in an AI model is to increase the size of the dataset and use as many different people as possible to classify the training data. Hopefully this will create a rich and diverse set of training data that will remove any bias. But is this possible? Asking such questions soon puts us into the philosophy of Juvenal’s question of who guards the guardians?

Moral Dilemma

Going back to the question of whether using Gen-AI to insert a reporter into a recording, or even live streams of a riot and whether this is fake news or not, becomes not a technical question but instead one of morality. As there are many different classifications of morality which in itself may even be fluid, then this is far from an easy question to answer and instead leads once again to belief systems and bias of the human condition. Just because we have the technical ability to do something, doesn’t mean we should.

Mainstream broadcasters are massively influential in communities throughout the world, and as Gen-AI continues to play an increasingly important role in our lives then we come to rely on the broadcast rules through their editorial guidelines. These might not always be perfect, and they will vary from country to country due to the interpretation of their own belief systems, but at least they are something to work with. We might not always like them, and we might think they are indeed heavily biased towards one political affiliation, but at least we have the opportunity through lobbying government representatives should we feel the need to change the rules. However, can the same be said of the wild west of social media where the editorial rules seem to be less well defined and lacking in moral rigor?

You might also like...

Standards: Part 18 - High Efficiency And Other Advanced Audio Codecs

Our series on Standards moves on to discussion of advancements in AAC coding, alternative coders for special case scenarios, and their management within a consistent framework.

HDR & WCG For Broadcast - Expanding Acquisition Capabilities With HDR & WCG

HDR & WCG do present new requirements for vision engineers, but the fundamental principles described here remain familiar and easily manageable.

What Does Hybrid Really Mean?

In this article we discuss the philosophy of hybrid systems, where assets, software and compute resource are located across on-prem, cloud and hybrid infrastructure.

Standards: Part 17 - About AAC Audio Coding

Advanced Audio Coding improves on the MP3 Perceptual Coding solution to achieve higher compression ratios and better playback quality.

Designing IP Broadcast Systems: NMOS

SMPTE have delivered reliable low latency video and audio distribution over IP networks, but it’s NMOS that is delivering solutions to discovery & registration challenges that satisfy operational requirements.