Tutorial 2 - Universal Speech Model and Language Foundational Models

Abstract

The topic of the tutorial includes one of the most advanced in-class models for ASR, the Universal Speech Model (USM) as well as for language, the PaLM 2 language model. Based on the frozen foundational speech and language models, we will also present a joint Speech and Language Model (SLU) , a versatile and high-performing speech-language understanding model that performs unseen generation tasks including contextual ASR, dialog generation, speech continuation and question answering given a speech input and text instruction as a prompt.

 

Kevin Duh

Shuo-Yiin Chang received the Ph.D degree in Electrical Engineering and Computer Sciences at UC Berkeley in 2016, the M.S. degree at National Taiwan University, and B.E. at the National Tsing Hua University. He is currently a senior staff research scientist at Google. His research interests are mainly in conversational speech recognition, long-form speech recognition, natural conversational interaction and understanding, massively multilingual automatic speech recognition and transfer learning.

 

Kevin Duh

Bo Li received the Ph.D degree in computer science from the School of Computing, National University of Singapore in 2014 and the B.E. degree in computer engineering from the School of Computer, Northwestern Polytechnical University, China, in 2008. He is currently a senior staff research scientist at Google. His research interests are mainly in massively multilingual end-to-end automatic speech recognition using semi-supervised learning, lifelong learning and transfer learning.

logo-wwelcome.png

WELCOME TO TAIWAN

logo

Congress Secretariat

Elite Professional Conference Organizer

footer-solgan