Vendor Content.

Subtitling Secrets: Techniques And Best Practices

Subtitles have become essential in today’s interconnected world, making diverse content accessible globally through platforms like Netflix, Amazon Prime, Disney+, and more. The demand for high-quality subtitling has surged with multilingual media consumption and it has led to a growing interest in AI and machine learning tools that can automatically generate subtitles.

Sakshi Jain, Senior Engineer at Interra Systems.

Sakshi Jain, Senior Engineer at Interra Systems.

Film industries — from Hollywood and Bollywood to Nollywood and the cinematic powerhouses of South Korea, Spain, and France — depend on subtitles to reach global audiences. Subtitles also aid viewers in noisy environments, those with hearing impairments, and language learners. However, generating good-quality subtitles is complex. Many languages are spoken worldwide, making manual subtitle generation time-consuming and costly. Hiring in-house experts for each language can be incredibly expensive. 

What goes into a good subtitle

Accurate subtitles are very important for enhancing the overall viewer experience. With the increase in consumption of subtitles coupled with the fact that AI is being used today for subtitle generation, it becomes imperative to have stringent guidelines around subtitle generation. Broadcasters and OTT platforms have established detailed guidelines to ensure quality subtitles. Best practices for generating subtitles include:

Sentence identification: A sentence may span multiple subtitle events. However, auto-translations are most effective when complete sentences are translated instead of partial sentences. It ensures accurate context, natural flow, and proper grammar. For example, "break a leg" (meaning "good luck") could be confusing if not translated as a single unit.

Consider the following example:

Captions in Spanish English Translation (line by line) Translation of Complete Sentence

No creo que
abandone este

planeta hasta que vea esa cosa construida.

I do not think that
abandon this

planet until
see that thing built.

I don't think I'll leave this planet until I see that thing built.

 

This example illustrates that translations are more coherent and meaningful when entire sentences are translated. However, sometimes finding sentence boundaries might be challenging. Such cases involve distinguishing between sentence stoppers and abbreviations. When writing short forms in English sometimes abbreviations are used, such as Dr., P.M. which can be confused with full stops. Furthermore, some languages like Chinese and Japanese don’t use punctuation marks, making it difficult to correctly identify sentence boundaries.

Preserving original segmentation and styles while merging events for sentence formation: Sometimes high-quality captions are well-segmented, and the content producer prefers to maintain the same segmentation in the translated output. In these instances, preserving the original “line break” and subtitle “event break” information is essential while forming a single sentence for translation. Upon translation, the same “line break” and “event break” needs to be applied on the translated text.

Furthermore, caption text might include a different font color, style, and size. It is crucial to preserve all these styles when generating subtitles.

For example, according to Netflix guidelines titles should be italicized.

Original Caption

She was reading The Great Gatsby,
a classic novel written by F. Scott Fitzgerald.

Subtitle in Spanish

Estaba leyendo The Great Gatsby,
una novela clásica escrita por F. Scott Fitzgerald.

In the translated subtitles, the title should also be italicized.

Accurate translation: Translation in subtitling is a crucial aspect, as it involves converting spoken dialog from one language into written text in another, ensuring that the meaning, tone, and context are preserved. It should take care of cultural adaptation, maintaining coherence and consistency in tone and style. Furthermore, the translation should follow all the rules of the target language.

For example:

Once a great king said, "Power belongs to the people that take it".

Japanese Translation: 

Below are the key points to note:

  • When converting from a word-based to character-based language, space should not be present between words.
  • There should be a half space after punctuation.
  • Punctuation such as commas should also be translated as per target language.
  • In place of quotes, 「」are used in Japanese.
  • Similarly in Spanish, inverted question mark and exclamatory marks need to be added.

Locale in translation: Locales in translation ensure content is culturally and linguistically appropriate for specific regions, incorporating language, regional dialect, currency, date/time formats, and cultural conventions. Common locales include en_US (English - US), en_GB (English - UK), es_ES (Spanish - Spain), es_MX (Spanish - Mexico), pt_PT (Portuguese-Portugal), and pt_BR (Portuguese - Brazil).

By leveraging locales, businesses can create more inclusive and engaging content, tailored to the preferences and norms of users in different regions. For example, the Spanish phrase “color rojo” will be translated as “Red Color” in American English (en_US) and as “Red Colour” in British English (en_GB).

Translation glossaries: During the translation of the sentences, translation glossaries should be used. A translation glossary is a specialized dictionary that contains key terms and their translations, often including context-specific definitions, named-entity translations, usage notes, and examples. It is tailored to ensure consistency and accuracy in translations, particularly for specialized content or industry-specific terminology. For example, a translation glossary might be used for the following situations:

  • Product names: Ensures that brand names remain consistent. For example, "iPhone" must translate to "iPhone" in every language.
  • Ambiguous words: Helps clarify words with multiple meanings. For example, the word "train" can refer to a railway vehicle or the act of exercising. If translating content about transportation, the glossary ensures "train" is understood as the railway vehicle, not the act of exercising.
  • Borrowed words: For example, "bouillabaisse" in French translates to "bouillabaisse" in English. English borrowed the word "bouillabaisse" from French in the 19th century. An English speaker lacking French cultural context might not know that bouillabaisse is a fish stew dish. Glossaries can override a translation so that "bouillabaisse" in French translates to "fish stew" in English.

Segmentation: Subtitles are displayed over video when a dialog starts and are removed from display when the dialog ends. If the dialog is a very long one, then the text will need to be broken into a few parts and then displayed part by part. Too much text on the screen will hide video elements as well. Generally, it is advisable to put a maximum of 42 characters in one line and one or two lines in one go. The process of dividing text into a displayable unit is called segmentation.

Segmentation is an essential part of captions. Correct segmentation of text makes it more readable and meaningful. After translation there can be two ways to generate captions.  The first instance involves preserving segmentation of the source file. To achieve source segmentation, original line breaks and display breaks information will be used. The second option is segmenting them according to the rules of target language.

Among the two options, the second one is preferable since max character restriction changes from one language to another. For example, Japanese captions can have max length of 17 while for English 42 characters per row are allowed. Also, after translation, the overall caption length in the target language characters might increase, leading to non-compliance with guidelines. Therefore, it is crucial to have a good quality segmentation after translation output.

The quality assessment of subtitle segmentation involves criteria, including number of rows per display, number of characters in every row, reading speed, display duration, and display breaks.

With intelligent segmentation, coherent phrases are kept together, and dependent words are not segmented, which makes captions more coherent and readable.

Only Character & Row count based segmentation Intelligent Segmentation

utt 1
I'm having nightmares that I'm
being chased by these giant
utt 2
robotic claws.

utt 1
I'm having nightmares
utt 2
that I'm being chased
by these giant robotic claws.

 

Encoding of Subtitles: Meticulously crafted subtitles need to be encoded in multiple delivery formats, such as SRT, VTT, TTML, SCC, STL, and MCC. This ensures compatibility with different media players and platforms, enhancing accessibility and user experience.

Offline Translations: Some content creators are highly sensitive about using online translation engines due to privacy, security, or confidentiality concerns. To address this, it is essential to provide robust support for offline translation models. This ensures that content can be translated securely without relying on internet-based services, safeguarding sensitive information and maintaining control over the translation process.

Profanity censoring, tailored to be language-specific and customizable, is essential for producing professional and viewer-friendly subtitles.

Conclusion

Thanks to new ML solutions for text translation and NLP-based semantic analysis, many automated workflows for subtitling are now a reality. Automated subtitling has emerged as an undisputed winner among all methods to expand content reach. By deploying a complete solution for ingest to delivery, broadcasters can ensure content is delivered with superior quality captions and subtitles, meet legal requirements, and fully monetize their content.