Tutorial 4 - Large-Scale and Parameter-Efficient Language Modeling for Speech Processing
- Article Category: TUNTORIALS
- 9/27/2023
Abstract
In this tutorial, we introduce the evolution of language models (LMs) for speech recognition, focusing on the recent advances [1,2] in large-scale generative language models (1B+) and parameter-efficient learning techniques designed for cross-modal adaptation. Additionally, we will introduce a new open-source benchmark, HyPoradise (Chen et al., NeurIPS 2023), which provides open-source n-best hypotheses and reproducible pre-trained language models for speech processing. With rising interests of using frozen pre-trained models for diverse downstream applications, how to design a “performance-effective” and “parameter-efficient” LLM fine-tuning framework is one open topic. We aim to provide an in-depth summary and draw a taxonomy on the differences of parameter-efficient learning modules [3]. The presenting topic is emerging as an essential pathway to design foundation models for the research community.
References
Generative Speech Recognition Error Correction with Large Language Models, IEEE ASRU 2023
HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models, NeuRIPS 2023
Low-rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition, IEEE ASRU 2023
Dr. Huck Yang is an Applied Scientist II at Amazon's ASR Science. He obtained his PhD from the Georgia Institute of Technology in Atlanta, supported by the Wallace H. Coulter Fellowship. During his Ph.D. years, he interned at Google Research, Amazon Alexa AI, and Hitachi Central Research Lab. He served as a special session chair at ICASSP 2022 and 2023, and will serve on the local committee for ASRU 2023. His research interests focus on resource-efficient adaptation and privacy-preserving learning for large pre-trained speech and language models, including rescoring and error correction. He received the Best Reproducible System Award in DCASE 2021 and the Judge’s Award in DCASE 2022.
Prof. Eng-Siong Chng joined NTU in 2003 and is currently an Associate Professor in the School of Computer Science and Engineering in Singapore. Concurrently, he has served as the Assistant Chair of Graduate Students for SCSE since January 2019. He received both his BEng (Hons) and PhD from the University of Edinburgh, U.K., in 1991 and 1996 respectively. Prior to joining NTU, he worked at Knowles Electronics (2001-2002), Lernout and Hauspie (1999-2000, Belgium), the Institute of Infocomm Research (1996-1999, I2R, Singapore), and RIKEN (1996, Japan). Prof. Chng has served as the publication chair for five international conferences: Human Agent Interaction 2016, INTERSPEECH 2014, APSIPA-2010, APSIPA-2011, and ISCSLP-2006. He has also been an associate editor for IEICE (special issue in 2012), and a reviewer for several publications, including Speech Communications, Eupsico, IEEE Transactions on Man, Systems and Cybernetics Part B, Journal of Signal Processing Systems, ACM Multimedia Systems, IEEE Transactions on Neural Networks, IEEE Transactions on CAS-II, and Signal Processing.
Invited Speaker of Tutorial: Dr. Andreas Stolcke obtained his PhD from UC Berkeley and then worked as a researcher at SRI International and Microsoft, before joining Amazon. His research interests include computational linguistics, language modeling, speech recognition, speaker recognition and diarization, and paralinguistics, with over 300 papers and patents in these areas. His open-source SRI Language Modeling Toolkit was widely used in academia (before becoming obsolete by virtue of deep neural network models). Andreas is a Fellow of the IEEE and the International Speech Communication Association (ISCA).