Product Detail

Prof. Chin-Hui LeeKeynote Speech 2: Sunday 13:30-14:30, December 17th, 2023

Item Number: 189799
Product Category: KEYNOTE SPEAKERS

Out of Stock

Full Description

Keynote Speech 2: Sunday 13:30-14:30, December 17th, 2023

Abstract

From Universal Approximation to Deep Regression: Theory and Practices

Many classical speech processing problems, such as enhancement, source separation, dereverberation, and bandwidth expansion, can be formulated as finding mapping functions to transform input to output spectra. Leveraging upon machine learning and big data paradigms, we cast these spectral mapping problems as learnable deep regression. Based on Komogorov’s Representation Theorem (1957), a multivariate scalar function can be expressed exactly as a superposition of a finite number of outer functions with another linear combination of inner functions embedded within. Cybenko (1989) developed a universal approximation theorem showing such a scalar function can be approximated by a superposition of sigmoid functions, inspiring a new wave of neural network algorithms. Barron (1993) later proved that the error in approximation can be tightly bounded and related to the representation power in learning theory. In this talk, we first present four new theorems to generalize the universal approximation theorems from sigmoid to deep neural networks (DNNs) and from vector-to-scalar to vector-to-vector regression. We also show that the generalization loss or regression error in machine learning theory can be decomposed into three terms, namely: approximation, estimation and optimization errors, such that each error term can be tightly bounded, separately.

In practice, our developed theorems provide some guidelines for architecture selections in DNN designs. In a series of experiments for high-dimensional nonlinear regression, we validate our theory in terms of representation and generalization powers and demonstrate that, under adverse acoustic conditions, deep regression achieves a good speech quality and clear intelligibility for microphone-array based speech enhancement, separation and dereverberation. As a result, our proposed deep regression framework was also tested on many recent challenging tasks, including CHiME-2, CHiME-4, CHiME-5, CHiME-6, REVERB and DIHARD III. Our teams scored the lowest error rates in almost all the above-mentioned open evaluation scenarios. Finally, we believe a theoretical understanding of deep classification will be needed in order to advance automatic speech recognition and understanding (ASRU) technologies to the next level of performance and robustness.

Biography

Chin-Hui Lee is a professor at School of Electrical and Computer Engineering, Georgia Institute of Technology. Before joining academia in 2001, he had accumulated 20 years of industrial experience ending in Bell Laboratories, Murray Hill, as the Director of the Dialogue Systems Research Department. Dr. Lee is a Fellow of the IEEE and a Fellow of ISCA. He has published over 600 papers and 30 patents, with more than 55,000 citations and an h-index of 80 on Google Scholar. He received numerous awards, including the Bell Labs President's Gold Award in 1998. He won the SPS's 2006 Technical Achievement Award for “Exceptional Contributions to the Field of Automatic Speech Recognition''. In 2012 he gave an ICASSP plenary talk on the future of automatic speech recognition. In the same year he was awarded the ISCA Medal in Scientific Achievement for “pioneering and seminal contributions to the principles and practice of automatic speech and speaker recognition''. His two pioneering papers on deep regression accumulated over 2000 citations and won a Best Paper Award from IEEE Signal Processing Society in 2019.