Product Detail
Prof. Reinhold Haeb-UmbachInvited Speech 2: Monday 13:30-14:30, December 18th, 2023
- Item Number: 189804
- Product Category: INVITED SPEAKERS
Full Description
Invited Speech 2: Monday 13:30-14:30, December 18th, 2023
Abstract
Multi-Talker Meeting Transcription
Automatic meeting transcription is concerned with scripting conversations, enriched with information about who spoke when. This is a challenging task, because the speech signal captured by microphones from a distance is noisy and reverberated, and, depending on the nature of the meeting, can contain a high degree of overlapped speech, where more than one speaker is active at a time. Also, the interaction dynamics, where speakers articulate themselves in an intermittent manner, pose problems to conventional enhancement and recognition systems.
Multi-talker meeting transcription thus calls for solving several tasks: source separation, diarization, and speech recognition. We will discuss approaches that address those tasks either separately or jointly, where the latter can lead to highly effective solutions. We will also touch upon "ad-hoc" configurations, where several, initially unsynchronized, microphones at unknown positions are used for signal capture. Finally, we will spend a few words on word error rate performance evaluation, which is less straightforward than one might think.
Biography
Reinhold Haeb-Umbach is a professor of Communications Engineering at Paderborn University, Germany. He holds a Ph.D. from RWTH Aachen University, Germany, and has been working in industrial research labs for more than ten years before joining Paderborn University in 2001. From 2015 - 2020 he was member of the IEEE Speech and Language Technical Committee, and since 2022 he is member of the IEEE Audio and Acoustics Signal Processing Technical Committee. He is a fellow of the International Speech Communication Association (ISCA) and of the IEEE. His main research interests are in the fields of statistical signal processing and machine learning, with applications to speech enhancement, automatic speech recognition and unsupervised learning from speech and audio.