AI Dubbing 101: Terminology
Author: Slava, Chief Sound Designer
Speech is more than a collection of words in a sentence. We perceive words in the context of sentences and situations, so this is not a problem for us.
To train AI, we use natural language processing, understanding, and generation systems.
NLP (Natural Language Processing) is responsible for these tasks — natural language processing technology and its components: NLG, NLU, and NER.
NLU (Natural Language Understanding) - natural language understanding. This is the meaning of what is being said. The algorithm analyzes the syntax of the sentence and establishes relationships between words and phrases to determine the context of the cue.
An important component of NLU is NER (named-entity recognition), which is the selection from the speech of certain semantic "parameters" that are important for a specific task. The algorithm takes the user's replica text and extracts named entities in it: names, addresses, numbers, and other objects.
NLG (Natural Language Generating) is the formation of a response based on the recognized text. Initially, generation templates were used for this. Modern systems are increasingly using hidden Markov models and neural networks — AI learns to independently decide how the text should look.
Technically, NLU and NLG are components of NLP. Working together, they help create an experience close to communicating with a real person.