CHALLENGE SPECIAL SESSIONS

  • Article Category: PROGRAM
  • 11/30/2022

ML-SUPERB: Multilingual Speech Universal PERformance Benchmark

Organizers

  • Hung-Yi Lee, NTU
  • Shinji Watanabe, CMU
  • Shang-Wen Li, Meta
  • Abdelrahman Mohamed, Rembrand
  • Jiatong Shi, CMU
  • William Chen, CMU
  • Dan Berrebbi, CMU

Abstract

The Multilingual SUPERB (ML-SUPERB) challenge is introduced as an extension to the popular Speech Universal PERformance Benchmark (SUPERB) to evaluate self-supervised learning (SSL) models in multilingual scenarios. While SUPERB focuses on English speech, ML-SUPERB covers 143 languages, including both high-resource and endangered languages. The benchmark primarily assesses SSL models for automatic speech recognition (ASR) and language identification (LID) through two tracks and four tasks. ML-SUPERB utilizes frozen SSL models as feature extractors and lightweight downstream models for efficient training. Additionally, a special New Language Track encourages researchers to submit their own languages, expanding multilingual research and providing an open evaluation set for other participants. The challenge aims to advance speech SSL in a broader range of languages.

 

Website

https://mlsuperb.netlify.app/

 

MULTI-CHANNEL MULTI-PARTY MEETING TRANSCRIPTION CHALLENGE 2.0 (M2MeT2.0)

Organizers

  • Lei Xie, Northwestern Polytechnical University & AISHELL Foundation, China
  • Kong Aik Lee, A*STAR, Singapore
  • Zhijie Yan, Alibaba, China
  • Shiliang Zhang, Alibaba, China
  • Yanmin Qian, Shanghai Jiao Tong University
  • Zhuo Chen, Microsoft, USA
  • Jian Wu, Microsoft, USA
  • Hui Bu, AIShell Inc., China

Abstract

The M2MeT2.0 Challenge builds upon the success of previous challenges in the field of multi-speaker meeting transcription. Meetings present unique challenges for automatic speech recognition (ASR) and speaker diarization due to complex acoustic conditions and diverse speaking styles. The M2MeT2.0 Challenge aims to address the practical problem of identifying "who spoke what at when" by introducing the speaker-attributed ASR task. The challenge includes two tracks: fixed training conditions for reproducible research and open training conditions to benchmark the performance of speaker-attributed ASR. A new test set comprising approximately 10 hours of audio will be released, along with manual annotations after the challenge.

 

Website

N/A

 

2nd COG-MHEAR Audio-Visual Speech Enhancement Challenge (AVSEC-2)

Organizers

  • Amir Hussain, Edinburgh Napier University
  • Peter Bell, Edinburgh University
  • Industry Partner: Sonova

Abstract

The Audio-Visual Speech Enhancement Challenge (AVSEC) aims to improve speech intelligibility and quality in challenging listening conditions by leveraging both visual and auditory information of the target speaker. Existing evaluation metrics have proven unreliable, and previous assessments did not reflect real-world scenarios. AVSEC sets a benchmark for audio-visual speech enhancement, providing a dataset and evaluation protocol for human listening tests. The second edition (AVSEC-2) expects improved results, with a focus on model complexity and latency parameters for future deployment in low-power computing devices. The challenge aims to foster interdisciplinary collaboration and advance the field of AV-SE while shedding light on current approaches' scope and limitations.

 

Website

https://challenge.cogmhear.org/

 

QASR TTS 1.0: Broadcast News Text to Speech Challenge

Organizers

  • Ahmed Ali, QCRI
  • Soumi Maiti, CMU
  • Shinji Watanabe, CMU
  • Shinnosuke Takamichi, University of Tokyo
  • Shammur Chowdhury, QCRI
  • Hamdy Mubarak, QCRI
  • Ahmed AbdelAli, QCRI
  • Simon King, University of Edinburgh

Abstract

The first QASR TTS challenge aims to encourage the development of semi-supervised Text-To-Speech (TTS) systems using broadcast news data, with a focus on multi-dialectal and multi-speaker TTS in Arabic. While high-resource languages have advanced TTS systems, low-resource languages like Arabic face limitations due to a lack of resources and studies. This challenge seeks to advance TTS in Arabic by initially focusing on anchor speakers' voices in the broadcast news domain, with future plans to incorporate guests' speech. Leveraging Automatic Speech Recognition (ASR) data, transcription, and metadata, the challenge evaluates the voices generated by each system using both objective and subjective scoring. The main objective is to explore and compare different approaches for building TTS systems in under-resourced scenarios, particularly in various Arabic dialects.

 

Website

https://arabicspeech.org/qasr_tts/

 

Model Adaptation for ASR in low-resource Indian Languages

Organizers

  • Srinivasa Raghavan, Navana Tech
  • Abhayjeet Singh, Indian Institute of Science
  • Saurabh Kumar, Indian Institute of Science
  • Sathvik Udupa, Indian Institute of Science
  • Amala Nagireddi, Indian Institute of Science
  • Sandhya Badiger, Indian Institute of Science
  • Deeshitha G, Indian Institute of Science
  • Savitha Murthy, Indian Institute of Science

Abstract

This challenge focuses on model adaptation for automatic speech recognition (ASR) in low-resource Indian languages. Despite recent advancements in ASR, low-resource languages with limited audio and text data, compounded by multiple dialects, present challenges. However, many Indian languages share script and grammatical structures, making adaptation and fine-tuning techniques viable by leveraging well-resourced similar languages. The challenge aims to explore the importance of acoustics and text in building reliable ASR systems, and provides dialect-rich data in Bengali and Bhojpuri for participants to work with. The challenge offers four tracks, encouraging participants to experiment with different modeling techniques and architectures, utilize existing resources for acoustic or language models, and combine resources for optimal performance on a blind test set. The solutions developed have potential applications beyond Indian languages, extending to languages worldwide.

 

Website

https://sites.google.com/view/respinasrchallenge2023/home

 

The Singing Voice Conversion Challenge 2023 (SVCC2023)

Organizers

  • Wen-Chin Huang, Nagoya University
  • Lester Phillip Violeta, Nagoya University
  • Songxiang Liu, Tencent AI Lab
  • Jiatong Shi, CMU
  • Tomoki Toda, Nagoya University

Abstract

TThe first Singing Voice Conversion Challenge (SVCC) is aimed at advancing the technology of converting the identity of a singer's voice without changing the linguistic content. Singing voice conversion (SVC) is a challenging task due to the complexity of modeling singing voices and the difficulty of singing data collection. The challenge consists of two tracks: an in-domain track with a training dataset of the target singer's singing voices and a cross-domain track with only speaking voices. The organizers aim to attract researchers from both the speech processing and music processing communities to encourage interdisciplinary research.

 

Website

http://vc-challenge.org/

 

The VoiceMOS Challenge 2023

Organizers

  • Erica Cooper, NII
  • Wen-Chin Huang, Nagoya University
  • Yu Tsao, Academia Sinica
  • Hsin-Min Wang, Academia Sinica
  • Tomoki Toda, Nagoya University
  • Junichi Yamagishi, NII

Abstract

The VoiceMOS Challenge aims to address the limitations of time-consuming and costly human listening tests used to evaluate synthesized and processed speech. This special session focuses on automatic reference-free speech quality assessment, specifically Mean Opinion Score (MOS) prediction. The challenge involves understanding and comparing MOS prediction techniques using standardized datasets across diverse and challenging domains. The previous edition attracted several participating teams, and this year's challenge expands to include tracks for French speech synthesis, singing voice conversion, and noisy/enhanced speech. Participants are tasked with predicting MOS ahead of the true ratings becoming known, emphasizing the development of flexible and generalizable MOS predictors for various speech evaluation tasks. The primary evaluation metrics will prioritize correct ranking of synthesis systems in each track.

 

Website

https://voicemos-challenge-2023.github.io/

logo-wwelcome.png

WELCOME TO TAIWAN

logo

Congress Secretariat

Elite Professional Conference Organizer

footer-solgan