A New Year Speculation On Immersion

As we head into another new year it seems ok to indulge in some obvious speculation about what the future may bring. Here we consider the proposition that eventually, and probably not far into the future, broadcasters will have to deal with immersiveness. Why is this likely, and how should we approach it?

Watching Dr Who on a 12” 1960s black and white TV tube wasn’t exactly immersive, except that it tended to heat the room up. Even with the underwhelming 405-line picture, the venerable Dalek show was terrifying enough, as plenty of slightly-beyond-middle-aged viewers will testify. Despite falling short in all-round media spectacle, it was nevertheless only safe to watch from behind the sofa. You didn’t need 8K video in 60fps 3D to suspend reality then, or now.

Fast forward 40 years, and in 2009, Avatar went all-out for 3D - and it looked spectacular in the cinema. Perhaps surprisingly, though, it was very nearly as enjoyable from a Blue Ray DVD connected to a very ordinary 2D LCD TV.

Fast forward another fifteen years and Apple’s Vision Pro headset is considered by many to be the pinnacle of immersive consumer technology. Owners describe it as an incredibly accomplished device, with meticulous head and eye tracking, greater-than-4K resolution available for each eye and extremely low latency. But even with all of that - and with the incentive of having paid a non-trivial starting price of $3,499 for the device - many early adopters have given it up, some citing a lack of compelling software, and some concluding more prosaically that it’s uncomfortable to wear for long periods.

All of which might sound like a damming dismissal of the need for us ever to adopt truly immersive technology. But eventually, and probably sooner than you might expect, we will have as much immersion as we need or want. But it may not happen the way you expect.

Immersion and interactivity are closely connected. There’s not a great deal of point in an immersive environment that ignores you because it’s unaware of your existence. Some dramas and perhaps advertising might work with passive immersion, but, ultimately, being in a 3D world without “being” in all its senses and physical degrees of freedom is fairly pointless.

Here’s the problem with interactivity: it’s exponentially more difficult to create. Just to be clear: “exponential” doesn’t just mean “a lot”: it means “a lot by a multiplication factor”. When you have multiple possible outcomes from a starting scenario, as a content author, you have to plan for all of them. And each of those contingent outcomes will have further outcomes, too. It quickly gets out of hand. Early interactive forms of “multimedia” CDs came and went in what seems like a few seconds of historical time. Computer games are, of course, interactive, but they often have budgets as big as blockbuster movies. There’s no way round it: interactivity is difficult and expensive to produce.

There are, of course, degrees of immersion. My first experience of VR in the ‘90s was objectively terrible, with a resolution of around half that of standard definition, and pixels the size of dustbin lids. There were no curves - only blocks. Think the video for Dire Straits “Money for nothing” and that was as good as it got. But it was, in a weird way, impressive for its time, because it was responsive. You wouldn’t want to live in that virtual world, but you could imagine where it might end up eventually.

Three years ago, there was a huge amount of hype around the concept of the metaverse. It seemed a logical product of the internet and VR. The metaverse doesn’t have a capital “M” in the same way that the internet doesn’t have a capital “I”. There is only one metaverse. If there were more than one, they wouldn’t be metaverses. The idea is that we all take part in a virtual world, and these individual experiences are connected. They join up, in other words. If you were to wander into my current space, you would see the same things I see, and the same people (including you and I) would be there too.

With today’s technology, it seems eminently possible, and the advantages could be massive. What used to be static web pages would become “spaces” where we can examine and even try on products, as if we were in a shopping mall. Eventually the metaverse would become a digital overlay on the real world: a “digital twin” that would be accurate in every way. We can be reasonably certain about this, because much of it exists already. BMW already uses metaverse-like technology to not only design its cars, but also the factories that make them. Nvidia’s Omniverse is a framework and an engine that’s used to build these hyper-realistic models. Within Omniverse, a model of a car - fully photorealistic and rendered with ray-tracing - behaves in a digital twin-world exactly as it would in reality. Each car carries with it the full behavior of a real vehicle, with every tire and shock absorber accurately physically and dynamically modelled. You could imagine an advertising company taking the engineering model of a car and importing it directly into an advertisement showing that car driving through a forest, complete with reflections and realistic lighting. Except that you don’t have to imagine it, because film directors are already doing this.

The metaverse needs quite a few technology breakthroughs to become a worldwide reality. The biggest are probably Persistence and Coherence: how do changes in your “world” stay with you, and with other people who come into your world. It’s a bit like the role of the continuity assistant in filmmaking. If someone’s wearing a double-breasted suit in one “scene”, they need to be wearing it in the next shot, too, if it’s on the same contiguous timeline. On a wider scale, if you, or the car you’re driving in the metaverse goes from one location to another - quite possibly built and supported by different parties but complying to a notional “metaverse standard” for data interchange, you or that car - or any object - need to carry data associated with you, with the object and its history of previous activities and interactions.

If all of the above seems largely irrelevant to broadcasters, the reverse is likely to be true. The metaverse will need as much, if not more, content - including live content - as the real world. The questions is, how will it be integrated with the metaverse, and how different will it have to be?

It’s probably helpful to think of this in several stages or levels.

The first level might be exactly the same as the relationship between broadcast and your living room. In the metaverse, there will be television screens, loudspeakers and in-car entertainment systems, in much the same way as computer games have incorporated either real or made-up broadcast media on virtual screens and even in virtual cars.

The next level might be where broadcasters create programs that acknowledge and contribute to the metaverse. This could start with little need for new technology by capturing video immersively. It would involve “traditional” 3D stereoscopy, but with a much wider field of view. It wouldn’t look right on a flat screen (although a “flat” view could be derived through digital processing) but it would be perfect for devices like Apples Vision Pro. Indeed, recent iPhone models can use their multiple cameras to capture immersive video, and although the results lack resolution, it is a very clever and effective trick. Capturing audio immersively is also easy with a technique like Ambisonic B format, which is essentially two sets of sum-and-difference mics arranged perpendicularly, from which you can derive almost any spatial audio format.

The third level might be where the broadcaster contributes to the “scenery” within the metaverse - or maybe some other aspect of virtual life, like making it appear you can fly. It might be helpful here to think of the metaverse - or your current experience of it - is effectively an immersive version of a web browser - although pretty much unrecognizable as such experientially. Just like with browsers, broadcasters might be able to upload “media players” where the “player” will include not just content but the entire environment. Imagine sports coverage where you not only see the gameplay, but you find yourself sitting next to the game commentators and pundits, inside their virtual studio.

If all of this sounds ludicrously far away, given that current broadcast standards take so long to design and implement, remember that the web allowed us to essentially discard many standards because almost everything becomes software-defined. So, if you compare YouTube with TV broadcasts, the internet version has flexibility over frame rate, aspect ratio, compression codecs and a host of other things that software running inside a browser allows - because it’s software.

The final part of this complex scenario is content generation. Until 2022, the biggest blocker for the metaverse was the time it took to create 3D assets - objects and scenery, for example. Ironically, the reason that this may no longer be a problem is the same reason that the metaverse has largely disappeared from the wider tech media pundits agendas in the last couple of years. It’s AI. In 2022, generative AI suddenly became good enough to grab the attention of the news media. Progress in AI was so extreme - and still is - that it obliterated virtually all discussion about the metaverse.

But it’s not a question of either the metaverse or AI: far from it. It is at least arguable that the metaverse is already with us. The metaverse is not something that we spend decades building, and then, with a grand opening ceremony, we finally turn it on. It will be a gradual process that has already started. The metaverse is with us in all the devices we use, in their connectivity and their myriad displays, large and small. It’s in digital signage, it’s in outdoor digital advertising, it’s in apps, and it’s in our cars’ digital dashboards. You can think of these as “slices” of the metaverse pie. Eventually, these slices will join up, and we will have a complete, contiguous metaverse.

But what about the problem of making all those assets? There’s an AI for that. Generative AI is an obvious answer to the laborious process of building virtual worlds and populating them. Progress in AI-generated video has been breathtaking and has surprised even experts with its rapidly-growing virtuosity. Some text-to-video models seem to “understand” the physical world, building their own “world model” against which to compare the credibility of their outputs. Other models can create a virtual world (albeit currently at a low-ish resolution) in real time.

It will take a while to comprehend and assimilate all of these possibilities. There's no need for broadcasters to embrace them all at once. There will be applications and niches for immersive broadcasting, but, right now, most of them are unknown and unknowable until the technology and the ecosystems are in place. Meanwhile, be open minded, and even if you don't know exactly what is on the horizon, you can at least try to be looking in the right direction when it arrives.

You might also like...

The New Frontier Of Interactive Rights: Part 2 - The Sports Pioneers

The first article in this series explained how content owners are pivoting towards a new way to manage their rights and monetize their content, known as Interactive Rights. This is driven by the new ‘Converged Entertainment Paradigm’ of the Streaming Era…

Microphones: Part 4 - Microphone Technology - The Diaphragm

Most microphones need a diaphragm in order to follow some aspect of the air motion that carries the sound.

IP Security For Broadcasters: Part 5 - NAT Explained

When IP was first envisaged back in the 1970s, just over 4 billion unique IP addresses were allocated. However, the overwhelming international adoption of the internet with a world population of nearly 8 billion people has demonstrated there are simply not enough…

Standards: Part 24 - Timed-text & Subtitles Overview

Carriage of timed-text must be closely synchronized to the AV stream to ensure it is presented in a timely manner so here we describe the standards that enable this for both broadcast and internet delivery.

HDR & WCG For Broadcast: Part 3 - Achieving Simultaneous HDR-SDR Workflows

Welcome to Part 3 of ‘HDR & WCG For Broadcast’ - a major 10 article exploration of the science and practical applications of all aspects of High Dynamic Range and Wide Color Gamut for broadcast production. Part 3 discusses the creative challenges of HDR…