Tutorial 3 - Speech Translation: Model, Data, Evaluation
- Article Category: TUNTORIALS
- 9/27/2023
Abstract
Speech translation technology, including speech-to-text (S2T) and speech-tospeech translation (S2ST), aims at converting speech from one language into speech or text in another language. Model training is challenging as it requires the model to learn not only the alignment between two languages but also the acoustic and linguistic characteristics of both languages. In recent years, there are several research breakthroughs that transform speech translation systems from proof-of-concept to high-performing real-world products. In this tutorial, we will introduce the full pipeline for building a speech translation system with literature reviews and discussions on model training, dataset creation and robust evaluation. In the end, we will also present examples on how to leverage tools open-sourced by the team to build the pipeline. This tutorial intends to democratize speech translation technology through knowledge sharing and promote future research in the community.
Ann Lee is a tech lead manager at Meta AI. Her current research focuses are on speech translation and expressive speech generation. Before joining Meta, she received her Ph.D. degree from MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) in 2016. She gave a plenary talk on speech-to-speech translation technology in JSALT 2022.
Sravya Popuri is a tech lead on the speech translation team at Meta. Her current research focuses on multilingual and multimodal speech translation. Before joining the industry, she received her Master degree from Language Technologies Institute, Carnegie Mellon University.
Xutai Ma is a research scientist at Meta AI. While having broad interests in machine learning and natural language processing, he mostly works on machine translation, especially simultaneous speech translation. Before he joined Meta, he earned his Ph.D. in Center of Language and Speech Processing, Johns Hopkins University, advised by Philipp Koehn. He has organized and chaired IWSLT Simultaneous Translation task for four years (2020-2023).
David Dale is a research engineer at Meta AI, currently focusing on evaluation of speech and text translation: identifying translation omissions and hallucinations, developing and evaluating automatic metrics of semantic similarity. He received his Master degree in computer science from HSE University in 2016.