Great Things Happen When We Learn To Work Together

Why doesn’t everything “just work together”? And how much better would it be if it did? This is an in-depth look at the issues around why production and broadcast systems typically don’t work together and how we can change that. If we do, there are untold benefits.

For decades, we’ve lived through rapid, continuous change in production, post-production and distribution, so much so that we may have become normalized to it - or certain aspects of it. We take it for granted that speeds and storage capacity will increase. We assume that we will be able to do more with less - and we're mostly right about that. Scaling (doing the same thing but faster and bigger) solves some problems and lets us work in new and dramatically better ways, but it doesn't automatically create the mindsets, expertise and raw innovation that we need to take advantage of newer, more bountiful playing fields.

Storage access and transfer speeds have grown a thousand-fold, as have storage capacities. Bottlenecks have been cleared, thresholds crossed, and we've seen extraordinary innovation. If you've not been paying attention (and non-technical people often don't know how to pay attention), you can miss orders of magnitude of difference. People have even asked, "You know those things called "gigabytes"? Why are they called "terabytes" now?" as if it's a rebranding exercise and not a triple order of magnitude change.

But it's not always clear that production and broadcast technology are converging into a production platform greater than the sum of its parts. Some quite brilliant products do a fantastic job individually. But do they all work together for a common good? It's not always clear that they do. Why is this?

On the face of it, my 100 terabytes is the same as your 100 terabytes. Beyond that, there's a world of complexity. The reason is, to use a technical phrase, dimensionality. It's easy to understand but less so to solve. When the only dimensions were speed and capacity, there was little to consider when building a system, except if it could serve you (or your NLE with video or files) fast enough. But now, we have extra dimensions like remote working, collaborative working, the cloud, security, metadata and concurrency. And then we have color pipelines, increased and not always standard resolution, multi-channel, multi-lingual sound, vertical and horizontal video, multiple codecs and a wider variety than ever of viewing devices (including Apple's Vision Pro).

On top of all of that, we have creative and artistic matters, not to mention commercial and operational considerations.

The cloud has proven to be a reliable, hugely beneficial addition to our production repertoire. But it is only part of the journey - and we're not entirely sure what the ultimate destination is. With that in mind, it's imperative to maximize flexibility in any system.

With careful planning and strategic integration, it's perfectly possible to set up entire workflows, sometimes with as few as two participating products. But the question is: Why can't we use any product with any other product? Why can't we use any NLE with any video review system, with any storage, and with any media asset management system?

We can certainly come close. We already have "glue" tools like metadata and XML, but that would likely require bespoke development, which is the opposite of "plug and play.” Why can't things just work nicely together?

It's not that they can't; it's that they typically don't. This is not a failure at a technical level; it's more of a case of differing visions.

If you were to try to build an all-in-one system from scratch, you'd pour your own experience and that of your team into a design that would probably work pretty well but would require you to work in a certain way. You might not even notice that restriction because it's the way you want to work. Take five or ten "equivalent" systems, and they'd all work the way their designers wanted them to. This means there's only a small chance they'll work with other systems. But does that matter if they do everything? Yes, it does, because people don't all want to work the same way for many reasons - sometimes personal and often depending on the business structure and type of content it produces.

And surprisingly, people's definitions of "everything" will rarely be the same.

Of course, standards bodies try to coordinate content transport and exchange, but it doesn't matter how tightly they define things; new developments and techniques never miss an opportunity to create incompatibilities.

Standards (like HTML, SDI, MIDI, IP, etc.) are a boon to innovation. They are not merely static, ossified stipulations but also a stable, easily understood foundation for development. The future needs standards, too, and they are currently sorely lacking in the field of AI. That's understandable because AI is like the pressure wave from an explosion, and you don't necessarily adhere to standards when you're diving behind a rock for survival. That's not an argument against standards but a recognition that it's not easy to set them up.

So why doesn't everything "just work" together?

It's partly because of competition. It used to be a goal for manufacturers to "own" the entire workflow, preferably by building a "walled garden". That would ensure that everything within the boundaries works together, but it also sets up a dynamic where it's harder to work with diverse workflow elements and other users unless they "buy in" to the brand. And, commercially, why allow competitors to easily park on your lawn? If the workflow needs storage, surely it's best if they use "our" storage and not someone else's.

A couple of decades ago, this approach had merit. In the early days, Non-Linear Editing (NLE) pushed computers to their limit and often beyond, even to provide basic functionality. Each speed increase meant an additional layer of effects or better quality real-time offlining. Storage interconnects were typically parallel SCSI cables as thick as a hosepipe. Storage itself was eye wateringly expensive. The only way to guarantee it would work at all would be to buy it from the same manufacturers as your NLE or other post-production software.

Today, even consumer-level laptops have blisteringly fast interconnections like Thunderbolt, and the industry is moving towards "COTS" (hardware resources) - "Commercial Off The Shelf". And yet, specialist gear still has its place. In a phenomenon that's a bit like the "Overton Window" in politics, what constitutes "normal" in production can move up the scale, only to seem "normal" again. So, twenty years ago, you'd need a mixture of specialist (and typically expensive) equipment and desktop PCs and Macs to edit and finish video on a computer. But now, you can achieve almost anything, with video perhaps eighty times the resolution (SD vs 8K), and still with a mixture of off-the-shelf computing and specialist storage devices. What's changed is several orders of magnitude of speed and capacity, all with an added dimension of networking, of which the ultimate expression is the cloud.

Meanwhile, on-set networked storage has also become sharable, with devices like a laptop-sized storage unit packed with up to 48TB of blisteringly fast NVMe storage and eight Thunderbolt ports. It allows local post-production to begin immediately on full-resolution files, with multiple concurrent users working on different tasks. Shared storage has existed for a long time, but rarely at this speed or connectivity.

The raw cloud is not end-user-friendly. It needs a front end that can make it seem like you're plugging in an external hard drive or like your media is always there beside you, wherever you are.

But the cloud doesn't have SDI inputs. Although the cloud increasingly surrounds us, we still have to connect our cameras to it for camera-to-cloud workflows. IP Video in the form of SMPTE 2110 is now available on many broadcast cameras. Meanwhile, products like connected monitor recorders can create H.265 proxies on-set and - even during a shoot - from any professional camera and automatically send the footage to post-production for work to start immediately. With 10-bit, 4K proxies, there's enough resolution and color information to get much of the work done before the original camera files arrive. Sometimes you can even deliver the proxy format to end users without transcoding. This would allow you to post events to social media, branded and packaged, within seconds of real-time.

So, what needs to change for everything to "just work" together - or at least to move us in that direction, given that it won't happen overnight?

Within the cloud, this is already starting to fall into place. The cloud is like a completely new territory, where everyone can look at past mistakes and legislate for a better place to be. Some early signs are promising. Vendors and software developers are starting to build open APIs into their products so that other parties can easily integrate with them, and these APIs are becoming increasingly granular - so you don't have to connect to a huge, unwieldy chunk of functionality (like an entire compositing package) when all you need is something basic. Some vendors even go as far as supplying "microservices", tiny chunks of code with only one function. Each Microservice has its own API, which means users can integrate almost natively with cloud services, picking and choosing functions as if at a sweet shop. We see examples of enterprise-scale transcoding engines that can hook into any workflow via APIs.

Cloud integration is an enormous help, but there is an even bigger picture, which is - everything. With AI now permanently "in the room", it is driving even faster rates of change. So much so that it is very hard to know where the industry will be in six months, never mind six years. This largely explains why products that try to do everything never converge into the same set of ideas. How do you deal with this? The one thing you can't do is just wait and hope it will all blow over.

One way - perhaps the only way - is to build an entirely new technology stack that is open and expansive enough to incorporate virtually any possible change. The lower layers would be physical: existing hardware, cables and interconnects, and control surfaces (for color grading, for example). The middle layers would be software, and the top layer would be "cognitive", which means that it would understand instructions and be able carry out and implement instructions that achieve the original artistic (or commercial) intent. This structure could accommodate any existing workflow methodology but would contain layers that translate between the human input at the top, and the software/hardware combination underneath. AI “agents” would respond to requests like “Optimize our data path for this production primarily for speed and secondarily for cost”. Crucially, it would be able to adapt to content from any source (including generative AI where appropriate), and it is flexible enough to cope with almost any new development while being minimally disruptive to the other parts of the workflow.

If we can agree on standards and accept that no single organization has a monopoly on innovation, we might find that by working together, manufacturers that nominally compete with each other can do more by working together than they could by erecting barriers between themselves. Look at how space-based services are now thriving because of Space X’s orders-of-magnitude cheaper orbital launch services. Look at how Open AI’s products are now essential parts of Microsoft’s AI repertoire.

Working together in a spirit of cooperation might seem like a hopelessly utopian dream. But it will probably be the only way we can navigate the torrent of technological innovation that we’re likely to see in the next few months and years.

You might also like...

Microphones: Part 1 - Basic Principles

This 11 part series by John Watkinson looks at the scientific theory of microphone design and use, to create a technical reference resource for professional broadcast audio engineers. It begins with the basic principles of what a microphone is and does.

Designing An LED Wall Display For Virtual Production - Part 1

The LED wall is far more than just a backdrop for the actors on a virtual production stage - it must be calibrated to work in harmony with camera, tracking and lighting systems in an interdependent array of technology.

NDI For Broadcast: Part 1 – What Is NDI?

This is the first of a series of three articles which examine and discuss NDI and its place in broadcast infrastructure.

Brazil Adopts ATSC 3.0 For NextGen TV Physical Layer

The decision by Brazil’s SBTVD Forum to recommend ATSC 3.0 as the physical layer of its TV 3.0 standard after field testing is a particular blow to Japan’s ISDB-T, because that was the incumbent digital terrestrial platform in the country. C…

Designing IP Broadcast Systems: System Monitoring

Monitoring is at the core of any broadcast facility, but as IP continues to play a more important role, the need to progress beyond video and audio signal monitoring is becoming increasingly important.