FCC Expands Caption Mandate As Automated Processing Takes Center Stage

On October 27, 2020 The Federal Communications Commission issued an order to expand its captioning mandate for broadcasters to include audio description requirements for 40 designated market areas (DMAs) over the next four years. The move came after the Twenty-First Century Communications and Video Accessibility Act of 2010 (CVAA) directed stations in the top 60 DMAs to provide what it calls “described programming.”

This means broadcasters and program creators in the top 100 markets in the U.S. must now include both visual and audio captioning or risk getting fined.

Audio description makes video programming more accessible by inserting narrated descriptions of a television program’s key visual elements during natural pauses in the program’s dialogue. The action ensures that a greater number of individuals who are blind or visually impaired can be better connected, informed, and entertained by television programming.

The FCC’s expanded rules will ensure that audio description will become more widely available across the broadcast and OTT markets, and that’s good news for everyone involved. Taking it a step further, the Commission has said that in 2023 it will determine whether to continue expanding audio description requirements to an additional 10 DMAs per year beyond the top 100 DMAs.

Traditionally, captioning has been accomplished with a certified captioner working on a dedicated computer but that’s changing. More recently, special software and the cloud has enabled the automated transcription and conversion of the spoken word into visual text, which is typically displayed at the lower third of the screen.

Automated Captioning Can Be Tricky

Captioning services can be set up to output different languages, with English and Spanish currently the most popular choices. However, it gets tricky when you have a guest speaking in a different language and your Automatic Speech Recognition (ASR) engine isn’t expecting it. This can cause some issues with accuracy during the captioning of live events. It may be due to the fact that the ASR engine and database isn’t mature enough to fully understand the words being spoken. For example, there have been issues where English-language ASR engines try to turn Spanish speech into English text, and the results are often horrible.

OTT providers like Netflix are now offering audio description features for many of its most popular titles.

OTT providers like Netflix are now offering audio description features for many of its most popular titles.

Most automated captioning systems used today leverage “off-the-shelf” ASR engines from major providers like Amazon, Google, Speechmatics and a number of others. These large companies have the R&D teams required to develop these highly sophisticated ASR engines, so it is probably not a good idea to try and reinvent the wheel but pick the best technology for the job at hand. Many caption providers run quarterly evaluations of these technologies, focusing specifically on each technology’s performance for captioning workflows.

There’s More To It Than Just The Cloud

Legacy services like captioning of pre-recorded material, audio description, and sign language translation—delivered by human-generated captioning services—are now making the move to the cloud, but it has not been a total transformation. Cloud-based workflows are great for most applications, like live captioning, but for many people, a human captioner is familiar, reliable and less expensive than a fully automated system. Newer services like audio description and sign language translation, for the most part, are not cloud based at this point, but this will change with time and familiarity with this unique type of captioning.

Speed Vs. Accuracy

It should be noted that clients face a tradeoff between caption speed and accuracy that when setting up their systems and they can adjust accordingly for each title or group of programs. While they can control the speed of caption transmission, based on the client’s requirements, its sometimes necessary for broadcasters to make compromises regarding the trade-off between speed and accuracy. With a real-time automated system, on average there is about 3-7 seconds of latency during a live event. That’s comparable to what a human captioner can do. If you make the processing go faster, the accuracy goes down, because the system has less time and context to figure out what’s being said in the audio. If you make the time longer, it will be more accurate.

News organizations are increasingly turning to automated live captioning systems to speed the process and ensure accuracy.

News organizations are increasingly turning to automated live captioning systems to speed the process and ensure accuracy.

Cloud-Based Workflows Are The Future

The past year has seen a huge migration into cloud-hosted captioning workflows, simply because it makes economic sense when having to process hundreds of new titles at a time. Netflix, for one, has migrated to an automated system it developed in-house. A considerable amount of internal research has also gone into the timing of the text to ensure readability. The OTT provider now also offers Audio Description captions as well on many of its titles.

Due to looming financial and content demand pressures, large broadcasters have begun to understand they need automation to manage the ever increasing amount of material that needs to be captioned. The scalability of the cloud ensures that broadcasters only pay for cloud connection and processing costs when they actually need it and can turn off the services when they don’t. In addition, cloud-based captioning services with large databases of keywords and phrases have set up thousands of parameters that have been built up over a decade or more of captioning to maintain a high degree of accuracy.

Even some of the smaller TV stations are happy using an automatic caption system instead of the traditional human captioning model because over time the savings are so large and the accuracy is often as good as or better than what they were getting before. There might be a situation where a human captioner would work best for a special event or when addressing a specific foreign language audience, but artificial intelligence and automation is catching up rapidly and the differences are getting smaller every day.

You might also like...

IP Security For Broadcasters: Part 12 - Zero Trust

As users working from home are no longer limited to their working environment by the concept of a physical location, and infrastructures are moving more and more to the cloud-hybrid approach, the outdated concept of perimeter security is moving aside…

Disruptive Future Technologies For HDR & WCG

Consumer demands and innovations in display technology might change things for the future but it is standardization which perhaps holds the most potential for benefit to broadcasters.

EdgeBeam Wireless Technology Furthers ATSC 3.0 Datacasting

Simultaneous broadcast of real-time data to an unlimited number of one-way receivers and locations is the unique catalyst of the amazing potential of the Broadcast Internet. EdgeBeam Wireless is a new market offering from a group of TV broadcasters seeking…

IP Security For Broadcasters: Part 11 - EBU R143 Security Recommendations

EBU R143 formalizes security practices for both broadcasters and vendors. This comprehensive list should be at the forefront of every broadcaster’s and vendor’s thoughts when designing and implementing IP media facilities.

The Interactive Rights Technology Ecosystem: Part 1

As we continue our dive into the new frontier of Interactive Rights we delve deeper into the Interactive Rights technology ecosystem with an exploration of the required functionality and the components required to deliver it.