FCC Expands Caption Mandate As Automated Processing Takes Center Stage

By mandating Audio Description in addition to traditional captioning, the FCC is making video programming more accessible.
On October 27, 2020 The Federal Communications Commission issued an order to expand its captioning mandate for broadcasters to include audio description requirements for 40 designated market areas (DMAs) over the next four years. The move came after the Twenty-First Century Communications and Video Accessibility Act of 2010 (CVAA) directed stations in the top 60 DMAs to provide what it calls “described programming.”
This means broadcasters and program creators in the top 100 markets in the U.S. must now include both visual and audio captioning or risk getting fined.
Audio description makes video programming more accessible by inserting narrated descriptions of a television program’s key visual elements during natural pauses in the program’s dialogue. The action ensures that a greater number of individuals who are blind or visually impaired can be better connected, informed, and entertained by television programming.
The FCC’s expanded rules will ensure that audio description will become more widely available across the broadcast and OTT markets, and that’s good news for everyone involved. Taking it a step further, the Commission has said that in 2023 it will determine whether to continue expanding audio description requirements to an additional 10 DMAs per year beyond the top 100 DMAs.
Traditionally, captioning has been accomplished with a certified captioner working on a dedicated computer but that’s changing. More recently, special software and the cloud has enabled the automated transcription and conversion of the spoken word into visual text, which is typically displayed at the lower third of the screen.
Automated Captioning Can Be Tricky
Captioning services can be set up to output different languages, with English and Spanish currently the most popular choices. However, it gets tricky when you have a guest speaking in a different language and your Automatic Speech Recognition (ASR) engine isn’t expecting it. This can cause some issues with accuracy during the captioning of live events. It may be due to the fact that the ASR engine and database isn’t mature enough to fully understand the words being spoken. For example, there have been issues where English-language ASR engines try to turn Spanish speech into English text, and the results are often horrible.

OTT providers like Netflix are now offering audio description features for many of its most popular titles.
Most automated captioning systems used today leverage “off-the-shelf” ASR engines from major providers like Amazon, Google, Speechmatics and a number of others. These large companies have the R&D teams required to develop these highly sophisticated ASR engines, so it is probably not a good idea to try and reinvent the wheel but pick the best technology for the job at hand. Many caption providers run quarterly evaluations of these technologies, focusing specifically on each technology’s performance for captioning workflows.
There’s More To It Than Just The Cloud
Legacy services like captioning of pre-recorded material, audio description, and sign language translation—delivered by human-generated captioning services—are now making the move to the cloud, but it has not been a total transformation. Cloud-based workflows are great for most applications, like live captioning, but for many people, a human captioner is familiar, reliable and less expensive than a fully automated system. Newer services like audio description and sign language translation, for the most part, are not cloud based at this point, but this will change with time and familiarity with this unique type of captioning.
Speed Vs. Accuracy
It should be noted that clients face a tradeoff between caption speed and accuracy that when setting up their systems and they can adjust accordingly for each title or group of programs. While they can control the speed of caption transmission, based on the client’s requirements, its sometimes necessary for broadcasters to make compromises regarding the trade-off between speed and accuracy. With a real-time automated system, on average there is about 3-7 seconds of latency during a live event. That’s comparable to what a human captioner can do. If you make the processing go faster, the accuracy goes down, because the system has less time and context to figure out what’s being said in the audio. If you make the time longer, it will be more accurate.

News organizations are increasingly turning to automated live captioning systems to speed the process and ensure accuracy.
Cloud-Based Workflows Are The Future
The past year has seen a huge migration into cloud-hosted captioning workflows, simply because it makes economic sense when having to process hundreds of new titles at a time. Netflix, for one, has migrated to an automated system it developed in-house. A considerable amount of internal research has also gone into the timing of the text to ensure readability. The OTT provider now also offers Audio Description captions as well on many of its titles.
Due to looming financial and content demand pressures, large broadcasters have begun to understand they need automation to manage the ever increasing amount of material that needs to be captioned. The scalability of the cloud ensures that broadcasters only pay for cloud connection and processing costs when they actually need it and can turn off the services when they don’t. In addition, cloud-based captioning services with large databases of keywords and phrases have set up thousands of parameters that have been built up over a decade or more of captioning to maintain a high degree of accuracy.
Even some of the smaller TV stations are happy using an automatic caption system instead of the traditional human captioning model because over time the savings are so large and the accuracy is often as good as or better than what they were getting before. There might be a situation where a human captioner would work best for a special event or when addressing a specific foreign language audience, but artificial intelligence and automation is catching up rapidly and the differences are getting smaller every day.
You might also like...
Remote Contribution At NAB 2025
The technology required to get high quality content from the venue to the viewer for live sports production remains an area of intense research and development, so there will be plenty of innovation and expertise in this area on the…
Playout Monitoring & Compliance At NAB 2025
Automation, interoperability, and scaling are overarching themes at NAB 2025, associated with continued progression of hybrid video services that are tilting more and more towards streaming. For monitoring and compliance, this means increasing integration across the whole workflow and content lifecycle,…
Streaming Delivery At NAB 2025
Hybrid workflows combining cloud and on-premise systems, and application of AI for personalization, are major streaming themes for NAB 2025. There is an even stronger focus on remote production than at previous shows, especially for live sports. Security of live streams…
OTA TV Transmission At NAB 2025
It is time to consider the state of the US TV Transmission industry and how this might be reflected on the NAB 2025 show floor.
Channel Creation & Playout At NAB 2025
Playout is moving to the public cloud as broadcasters take this next step in their strategies for master control, even as some analytics functions are being drawn back towards on premise systems. This will be reflected by the offerings and…