Tutorial 2 - Universal Speech Model and Language Foundational Models
- Article Category: TUNTORIALS
- 9/27/2023
Abstract
The topic of the tutorial includes one of the most advanced in-class models for ASR, the Universal Speech Model (USM) as well as for language, the PaLM 2 language model. Based on the frozen foundational speech and language models, we will also present a joint Speech and Language Model (SLU) , a versatile and high-performing speech-language understanding model that performs unseen generation tasks including contextual ASR, dialog generation, speech continuation and question answering given a speech input and text instruction as a prompt.
Shuo-Yiin Chang received the Ph.D degree in Electrical Engineering and Computer Sciences at UC Berkeley in 2016, the M.S. degree at National Taiwan University, and B.E. at the National Tsing Hua University. He is currently a senior staff research scientist at Google. His research interests are mainly in conversational speech recognition, long-form speech recognition, natural conversational interaction and understanding, massively multilingual automatic speech recognition and transfer learning.
Bo Li received the Ph.D degree in computer science from the School of Computing, National University of Singapore in 2014 and the B.E. degree in computer engineering from the School of Computer, Northwestern Polytechnical University, China, in 2008. He is currently a senior staff research scientist at Google. His research interests are mainly in massively multilingual end-to-end automatic speech recognition using semi-supervised learning, lifelong learning and transfer learning.