Viewpoint: Artificial Intelligence for Supercharging Media Asset Management
Artificial Intelligence (AI) is an increasingly useful tool in media asset management. Image: Jeremy Bishop, Unsplash
Artificial intelligence (AI) for media analysis is beginning to change how media organizations meet their most daunting media asset management (MAM) challenges. When paired with leading-edge MAM tools, a new and emerging breed of AI platforms offers tremendous potential for transforming media workflows and making it easier than ever for operations to access, manage, and archive large volumes of content.
New AI solutions promise automated logging and tagging content with capabilities such as speech to text, language translation, and the ability to tag content based on people, places, things, and even sentiment.
These capabilities couldn’t come at a better time. In today’s typical media operation, new content is being generated at a breathtaking pace, and production teams are simply unable to keep up with content management tasks unless they have specialized tools. In addition, time is running out for them to digitize for historical content that exists in legacy, analog formats (there is still a lot of tape about), before the original content degrades. It’s essential that these assets be logged and tagged so they can be found easily, but too often such tasks fall by the wayside as too expensive or time consuming.
Connecting the Dots Between AI and MAM
As AI technologies continue to mature, strong MAM capabilities will become even more essential – the best tools can index, search, and easily correct a huge volume of time-based metadata whilst managing complex online, offline and cloud storage landscapes. However, there’s plenty of room for improvement with the current generation of MAM tools, especially from a metadata perspective. Until now, content teams have been left with few options beyond pulling technical metadata from media files or streams, extracting the meaning from file and folder names, or manual logging.
Kansas City Chiefs football organization uses Square Box Systems CatDV for its fan-based communications and marketing to create a range of content for both TV and web broadcasts.
An AI-powered MAM solution offers a way forward. A great approach is to add to the MAM’s existing logging, tagging, and search functions through integrations with best-of-breed AI platforms and cognitive engines such as those from Google, Microsoft, Amazon, IBM, as well as a host of smaller, niche providers. These AI vendors and AI aggregators enable the MAM to leverage these AI analysis tools for speech recognition and video/image analysis, with the flexibility to be deployed either in the cloud or in hybrid on-premises/cloud environments.
Here are a few examples of AI-driven MAM capabilities:
- Speech-to-text, to automatically create transcripts and time-based metadata
- Language translation
- Place analysis, including identification of buildings and locations where GPS was not available
- Object and scene detection (e.g. daytime shots or shots of specific animals)
- Sentiment analysis, for finding and retrieving all content that expresses a certain emotion or sentiment (e.g. “find me celebrations (in a sports event)”)
- Logo detection, to identify when certain brands appear in shots
- Text recognition, to enable text to be extracted from characters in video
- People recognition, for identifying people, including executives and celebrities
AI: Still Coming of Age
Media companies are looking forward to the day when they can rely on AI capabilities to help them not only gain traction on digitization projects, but also get better control over digital content in their existing libraries. But AI technologies still have some maturing to do before they can deliver on these benefits and truly become mainstream.
Accuracy is of particular concern. AI analysis is improving every day, especially with speech-to-text solutions, but there’s still plenty of room for fine-tuning. Some engines might not be able to distinguish between U.K. or American English, and they will probably trip over abbreviations and jargon. Therefore, the industry is currently focused on training AI engines to recognize these language variations and correct mistakes. Also, the sophistication of AI tools varies considerably when it comes to image or video analysis.
Taking the AI-MAM Plunge
So how do you pick the AI tool that’s best for your requirements, and understand the cost for each style of analysis? Some vendors might have different price tiers based on the format of the content; 4K assets might cost more to analyse. At the same time, some lack of transparency in cost structures across the AI industry can make it difficult to work out the total expense of applying AI to a media library. Some AI aggregators are helping customers sidestep some of the costs and complexities of setup by making it easier to choose the right AI engine for a specific task, albeit at greater cost.
Once you’ve chosen your vendor, vendors or aggregator, your next challenge is to get your content uploaded to the AI engine, which is often in the cloud. That’s more complicated than it might seem – some of the steps include creating a video proxy, separating the audio files, creating an image sequence, and making sure the content is in the right format for the AI engine to understand. Once the content is uploaded, the tasks are centered around managing its lifecycle.
Another challenge – actually the good news and the bad news – is the rapid pace at which AI technology is advancing. As the tools improve, any AI analysis performed on your content today might need to be repeated in the future to take advantage of new capabilities. This could result in multiple refreshed data sets, with additional layers of complexity if the content has been corrected or updated by a human after the original analysis. And you should not discount security concerns, especially with cloud providers.
Looking Ahead
The future is bright for AI-powered MAM, and the capabilities listed above are only the beginning. MAM vendors can do their part by offering strong metadata organization with careful and configurable user interface design. This helps keep the system from overloading users with too much information. On the workflow and automation front, truly enterprise-worthy MAM systems will be able to feed the AI engines with the right data and automate the analysis, while keeping down costs.
Executed properly, the MAM can also play a powerful role in training and improving AI engines. For example, manually tagged content in the MAM could be used to identify the executives in a corporation. The MAM could use this manual tagging to train AI engines to do a better job of logging and tagging new content.
In summary, the media and entertainment industry is being transformed by AI. In the right hands, AI becomes the key that unlocks the next generation of MAM technologies.
But only the most powerful, flexible, easy-to-integrate, secure, and scalable MAM platforms will enable media operations to take maximum advantage of AI for unlocking the potential of digital assets and making them searchable, reusable, and monetizable.
Dave Clack is CEO of Square Box Systems.
You might also like...
Designing IP Broadcast Systems
Designing IP Broadcast Systems is another massive body of research driven work - with over 27,000 words in 18 articles, in a free 84 page eBook. It provides extensive insight into the technology and engineering methodology required to create practical IP based broadcast…
Standards: Part 21 - The MPEG, AES & Other Containers
Here we discuss how raw essence data needs to be serialized so it can be stored in media container files. We also describe the various media container file formats and their evolution.
NDI For Broadcast: Part 3 – Bridging The Gap
This third and for now, final part of our mini-series exploring NDI and its place in broadcast infrastructure moves on to a trio of tools released with NDI 5.0 which are all aimed at facilitating remote and collaborative workflows; NDI Audio,…
Microphones: Part 2 - Design Principles
Successful microphones have been built working on a number of different principles. Those ideas will be looked at here.
Expanding Display Capabilities And The Quest For HDR & WCG
Broadcast image production is intrinsically linked to consumer displays and their capacity to reproduce High Dynamic Range and a Wide Color Gamut.