Product Detail

Prof. Shinji WatanabeKeynote Speech 3: Monday 09:00-10:00, December 18th, 2023

  • Item Number: 189800
  • Product Category: KEYNOTE SPEAKERS
  • Out of Stock

Full Description

Keynote Speech 3: Monday 09:00-10:00, December 18th, 2023

Abstract

Unifying Speech Processing Applications with Speech Foundation Models

After the success of large language models in natural language processing, the field of speech processing is currently exploring the possibility of combining speech and language modalities to create a foundation model. This single unified model could perform multiple speech processing applications, such as speech recognition, synthesis, translation, and spoken language processing. Our group is dedicated to achieving this goal through the development of speech foundation models, including speech/text decoder-only models, whisper-style multi-tasking, universal spoken language understanding, and multilingual SUPERB projects. In addition to showcasing the above research outcomes during this talk, we will describe the engineering efforts involved in building such a large foundation model from scratch on an academic computing scale for reproducibility.

 

Biography

Shinji Watanabe is an Associate Professor at Carnegie Mellon University, Pittsburgh, PA. He received his B.S., M.S., and Ph.D. (Dr. Eng.) degrees from Waseda University, Tokyo, Japan. He was a research scientist at NTT Communication Science Laboratories, Kyoto, Japan, from 2001 to 2011, a visiting scholar at Georgia Institute of Technology, Atlanta, GA, in 2009, and a senior principal research scientist at Mitsubishi Electric Research Laboratories (MERL), Cambridge, MA USA from 2012 to 2017. Before Carnegie Mellon University, he was an associate research professor at Johns Hopkins University, Baltimore, MD, USA, from 2017 to 2020. His research interests include automatic speech recognition, speech enhancement, spoken language understanding, and machine learning for speech and language processing. He has published over 400 papers in peer-reviewed journals and conferences and received several awards, including the best paper award from the IEEE ASRU in 2019. He is a Senior Area Editor of the IEEE Transactions on Audio Speech and Language Processing. He was/has been a member of several technical committees, including the APSIPA Speech, Language, and Audio Technical Committee (SLA), IEEE Signal Processing Society Speech and Language Technical Committee (SLTC), and Machine Learning for Signal Processing Technical Committee (MLSP). He is an IEEE and ISCA Fellow.

logo-wwelcome.png

WELCOME TO TAIWAN

logo

Congress Secretariat

Elite Professional Conference Organizer

footer-solgan