Industry Embraces AI For Streamlined Clip Search Across Petabytes (And Soon Zettabytes) Of Storage

AI technology brings the promise of simplifying and enhancing media content generation and repurposing.

In today’s highly competitive media environment, companies are always looking for ways to streamline their operations and speed up the processes involved in content creation. One of the most critical is post production workflows and the need to find audio and video material stored on ever-larger repositories.

The datasphere is growing exponentially and the media and advertising industries are drowning in more than petabytes of data being generated every year. Some experts suggested using artificial intelligence (AI) to analyze this data. However, searches must be completed sequentially, one query at a time. As the amount of data increases, even this will be a laborious task.

When people hear the term AI, uncontrollable robots taking over the Earth come to mind. Even so, the technology has made a positive impact on many parts of our lives, from product suggestions on Amazon to music listening software that can accurately name a song that’s playing. The continuing advancement of AI has brought improvements to many everyday tasks.

AI comes to video

These same benefits are now coming to the video production industry bringing faster audio and clip searches across petabytes of data. Searches using face detection, object recognition voice-to-text transcription, optical character recognition and other attributes, all powered by parallel processing and specialized algorithms designed for other industries (like security and the military). Once content is cognitively processed and indexed, searching and finding an image across vast libraries takes seconds.

AI solutions bring the promise of simplifying and enhancing everything from systems to processes, in turn cutting the costs of production, speeding up content syndication and significantly reducing the amount of man hours required for some of the most labor-intensive tasks.

Veritone's Drew Hilles, senior vice president of Veritone.

Among the leaders in the media & entertainment space is a Mesa, Calif.-based company called Veritone, which has developed a suite of cloud-based software tools it calls aiWARE, which relies on a proprietary, machine-learning orchestration layer called "Conductor". Serving as a search engine a aggregator, the software not only employs multiple AI engines at once, but it also chooses the best-available engine or engines spread out across the globe.

For example, with natural language processing, aiWARE can predict the accuracy of each transcription engine based on the characteristics of the media being processed. Conductor then automatically selects the best engine to process that file. The newest version of Conductor under development can identify the best engine for each portion of a file, applying multiple engines when needed to fill accuracy gaps.

AI is at work in familiar places

“AI, as most people know it, is actually artificial narrow intelligence [ANI], which represents a class of AI technology designed to perform a specific task,” said Drew Hilles, senior vice president of Veritone. “This narrow approach is expected to dominate the AI market in the coming years.”

Many people regularly use this type of technology without even realizing it. Familiar examples include Google Translate and geolocation searches via Apple Maps. Hilles said these commonly used resources involve a type of machine learning to complete a single-problem task like “translate this.”

“This type of technology is called a cognitive engine, and translation is just one of many possibilities,” he said. Veritone calls their version Conductor.

AI gets smarter

Conductor works by selecting and layering the best engines to gain insight from audio and video data, based on speed, cost and performance. The best, single NLP (Neuro Linguistic Programming) engine available today delivers 75 percent accuracy, while Conductor achieves approximately 82 percent accuracy, according to Veritone.

In addition, Conductor can be used to analyze problematic files that typically produce low accuracy rates. Examples could include media containing music or multiple voices and clips with high background noise.

“The technology has the ability to “tune” audio media via programmatic pre-processing, effectively operating as an extraordinarily fast and seasoned audio engineer,” Hilles said.

Although Veritone targets its AI technology at multiple industries, media & entertainment is currently a major focus. The company is working with several media transcoding services (Amazon Web Services, Brightcove, Cloudian, and Ooyala) and storage vendors (IBM, Dell EMC, Quantum) to infiltrate large broadcast organizations. Major sports leagues, teams, and sports broadcasters like CBS, FOX Sports and iHeart Radio use Veritone to maximize their audience reach, improve fan affinity, and ultimately, grow sponsorship revenue.

“aiWARE for Xcellis” leverages Quantum’s StorNext file system and its range of Xcellis storage solutions (cloud, LTO tape, SSD and HDD spinning disk).

At this year's IBC Show in Amsterdam, Veritone and Quantum both displayed a new joint solution, “aiWARE for Xcellis” that leverages Quantum’s StorNext file system and its range of Xcellis storage solutions (cloud, LTO tape, SSD and HDD spinning disk). StorNext serves as a database to a storage library that can be searched quickly, although the larger the library, the slower the search. The combination of the two will enable users to apply AI to on-premise stored content that previously could not be leveraged for this purpose and to add new content for analysis as the data is captured.

Quantum said the type of storage used with its Xcellis platforms does not necessarily determine search performance, although it is a bit quicker to find material stored on solid-state flash media.

“Clip searches happen in near real time, but if you have petabytes of information, it’s still going to take quite some time to find an individual clip,” said Keith Lissak, senior director of Quantum’s M&E Solutions Marketing team.

The two companies have entered into a strategic relationship under which Veritone’s aiWARE―a hybrid on-premise and cloud version of Veritone's cloud-based AI platform, will be offered as an integrated solution with Quantum’s StorNext workflow storage for on premise installations—using a 1RU COTS server.

“We’re bringing Veritone’s cloud solution on-premise,” Lissak said. “The real benefit is that it automates the process of adding metadata to any piece of content. Your metadata is only as good as the person inputting it at ingest. So this automates it and can be much faster and more accurate. It gives you insight into what’s on that tape or in that file very quickly, which facilitates a fast search.”

Transcoding platforms like those from Amazon Web Services help streamline file-based workflows in the media industry.

Benefits of the partnership include allowing users to: create more customized video-on-demand programs through more metadata tagging of existing content; provide better ROI data to sponsors by identifying where and how often a logo appeared in highlight clips beyond an original broadcast. Finally, this approach speeds postproduction work through enhanced context-based search and discovery.

“Artificial intelligence offers the opportunity to extract dramatically more value from data in ways that were previously impossible, and even unimaginable in some cases,” said Lissak.

Find and repurpose content

With so much data being generated on a daily basis, both large and small organizations have to find new ways of making their operations more efficient and faster so that they can distribute ever more content to more platforms.

Veritone said broadcasters of all sizes can increase their revenue by “dimensionalizing” previously locked or hidden data. Think of a video file as linear —you can see the picture and hear the audio. This process “dimensionalizes” the linear files by adding layers of additional context and analysis with the help of various engines. For example, AI can determine who is speaking, transcribe the audio, using a transcription engine, and utilize object and logo detection as separate process, but then combine engines as needed resulting in more data and faster throughput.

Industry analyst firm IDC forecasts that the amount of image and video content created or consumed for entertainment purposes will exceed 40 zettabytes by 2025. That’s approximately four times the amount in 2016. As media and entertainment organizations look to maximize the value of this growing content, AI will play an increasingly important role. Cognitive engines enable users to delve deeper and faster into their content and leverage the results to drive greater business and organizational success.

You might also like...

Live Sports Production: Camera To Truck

Much of the OB production infrastructure has moved to IP, but has the connectivity between the cameras and the OB or backhaul also migrated to IP?

Building Software Defined Infrastructure: Zero Tolerance Security

Software based systems bring immense flexibility but they also bring increased vulnerability and inevitable trade-offs between flexibility and security.

Live Sports Production: Exploring The Evolving OB

The first of our three articles is focused on comparing what technology is required in OBs and other venue systems to support the various approaches to live sports production.

Cloud Compute Infrastructure At IBC 2025

In celebration of the 2025 IBC Show, this article focuses on the key theme of cloud compute infrastructure and what exhibitors at the show are doing in this key area of technological enablement.

Monitoring & Compliance In Broadcast: Real-time Local Network Monitoring

With many production systems now a hybrid of SDI & IP networking, monitoring becomes a blend of the old and the new within a software controlled environment.