Vendor Content.

AI-Assisted Dialogue Cleaning Technology Helping Broadcasters Support Hearing Impaired

One of the difficulties while recording speech outside of a controlled production environment is unwanted background noise – especially when it’s loud and intrusive. This degrades the intelligibility of what is being communicated. There is also the market demand for addressing the important issue of hearing loss and providing a channel for overall speech intelligibility during programming. Countries like Germany have been steadfast in their move to encourage TV broadcasters to include some type of speech enhancement technology with their programs.

This technology offers hearing-impaired viewers a secondary audio channel that features enhanced dialogue and reduced background noise. It is also key to having clearer audio to improve the production of automated closed captions. This is made possible with AI-assisted “dialogue cleaning” software technology. Rather than being noise-canceling based, it analyzes the audio content and provides users with a way to level both the speech and background noise individually. This method reduces ambient noise (e.g. wind, music, traffic) and makes the dialogue (aka human voice) clearer and dramatically enhances the listening experience.

Often, commentators report in noisy stadiums, and journalists report news in outdoor settings. A broadcaster can use this technology to lower the audio level of the outdoor noises and lift the dialogue, making it clearer and more understandable. The same technology could also be used to lower crowd noise during a live sporting event.

German public broadcasters now use this type of dialogue-cleaning technology in their main production facilities to separate their stereo channels into one standard channel and a second enhanced dialogue audio channel. In the case of LYNX Technik IDC 1411 Dialogue Cleaner, we use an AI algorithm to instantly distinguish the dialogue from everything else in a video program. This allows the viewer at home to choose the “clear voice” audio channel and improve the clarity of the spoken word in front of other noises. The AI technology automatically detects the voice regardless of the surrounding noises and preserves the real integrity of the dialogue. This is unlike other available noise reduction tools, which are often not adequate.

In addition to meeting the voluntary requirements from the European Union regarding audio loudness and dialogue intelligibility, this module is ideal as a pre-stage device for real-time processing for automated closed captioning systems during live events. Pre-processing the audio file makes it easier (and thus, faster) for the automated captioning system to accurately understand and present the appropriate words.

In addition to improving dialogue clarity for live broadcasts (with a unique application for improvements for hearing-impaired audiences, as well as subtitle generation), this tool is suitable for decreasing the background noise and enhancing the dialogue in post-production. For example: in action scenes for movies or television series, which often have high levels of background noise, the dialogue can be enhanced giving it more clarity.

This dialogue cleaner, the yellobrik IDC 1411 offers three different levels of adjustments: One for the voice, one for the background noise, and a third that applies the AI algorithm. Unlike traditional noise reducers that learn and remove noise, the IDC 1411 works by separating and preserving speech by reducing background interference without compromising the integrity of the dialogue.

Our customers often install this device in their master control suites. By doing so, this type of technology ensures that operators can keep the original audio channel and additionally embed the separately enhanced (cleaned) vocal stereo channel in the SDI feed. Once set up, it can operate automatically 24/7.

This workflow makes the original audio feed as well as the dialogue-cleaned channel available to viewers at home. Consumers can simply switch from the standard channel to a Clear Voice audio channel to hear this enhanced audio mix.

We partnered with Audionamix leveraging its AI-based software plugin for dialogue recognition inside our FPGA-based IDC 1411 yellobrik. Instant Dialogue Cleaning (IDC) gives the capability to do real-time processing, which is something customers told us they need to enable their systems to process the dialogue as fast as possible. The incorporation of Deep Neural Networks (DNN), addresses complex and common audio issues, separating and isolating the spoken word from challenging background interference.

The IDC 1411 yellobrik processes uncompressed SDI video formats via BNC or fiber, and AES based audio via BNC. SDI Output can be routed to fiber or BNC and controlled via the LynxCentraal software.

IDC-1411 - AI based instant dialogue cleaner, filter and amplifier.

This powerful combination of AI and FPGA-based processing is allowing broadcasters to produce speech-enhanced content in real-time for live applications like sports or news productions, or in post. The AI-trained algorithm understands many languages, including English, French, German, Spanish, and others.

The IDC 1411 is a real-time device with a total delay of between two and four frames—dependent upon the frame rate of the originating video signal. A human could not do this manually within four frames; only machine learning can quickly and accurately analyze the audio material and adjust the speech and background noise individually. When connected to a control terminal via LynxCentraal or yelloGUI, the IDC 1411 has additional audio filtering: The IDC setting itself, two sequential equalizers, and a compressor. In addition, each filter section has its own gain settings.

The device is installed at the end of the video chain, typically within the master control suite, and is implemented right before playout. Signals come into the IDC module, and a stereo audio channel is de-embedded from the video prior to processing. Additional equalizers are then used to finish the audio track before it is re-embedded with the video. The video is delayed according to the processing time the dialogue cleaning requires.

It supports 1.5G, 3G, and 12G/4K SDI video inputs, AES inputs, (optional) 3G/12G fiber SFPs, and Automatic Video Delay in tandem with Audio Processing time. It also includes settings for Speech Gain, Background Gain, Compressor, and more. Finally, all settings and routing can be applied via the LynxCentraal control software.

In the end, it’s all about helping broadcasters and AV professionals meet their obligations and do it in a very intelligent and cost-effective way. Customers are sending a great deal of positive feedback about the way this tool provides a solution for their constant challenges of cleaning noisy audio recordings. By helping to manage the background noise in their audio and video workflows, customers can enhance the viewing and listening experience of their audience.

More vendor content