TECHNICAL PROGRAM SCHEDULE

  • Article Category: PROGRAM
  • 11/22/2023

The poster ID is as follows: [Poster Session 1-6]-[Physical/Virtual Session No.]-[Topic]

[Day] is the day of the conference (1,2, 3 or 4),

[Poster Number] is the number of the poster within the session

[Topic] is the technical area of the work as follows

 

Topic ID

[Topic] is the technical area of the work as follow Technical Area
ASR 01. Automatic speech recognition
ASR-TM 01. Automatic speech recognition -> Training methods
ASR-MA 01. Automatic speech recognition -> Model architectures
ASR-RB 01. Automatic speech recognition -> Robustness
ASR-SM 01. Automatic speech recognition -> Streaming models
SLP 02. Spoken language processing
SLP-SLU 02. Spoken language processing -> Spoken language understanding
SLP-ST 02. Spoken language processing -> Speech translation
SLP-SDS 02. Spoken language processing -> Spoken dialog systems
SES 03. Speech enhancement and separation
ANA 04. Speech analysis
SLR 05. Speaker and language recognition
DIA 06. Speaker diarization
TLP 07. Text-only language processing
MMP 08. Multimodal speech processing
MLP 09. Multilingual processing
EMR 10. Emotion recognition and paralinguistics
TTS 11. Speech synthesis and spoken language generation
RES 12. Resources (new corpora, toolkits, evaluation metrics, etc.)
MLS 13. Machine learning for speech applications
SS01 SS01. Audio visual speech enhancement challenge 2
SS02 SS02. ML-SUPERB
SS03 SS03. Model adaptation for low resource ASR for Indian languages
SS04 SS04. multi-channel multi-party meeting transcription
SS06 SS06. Singing voice conversion challenge
SS07 SS07. VoiceMOS challenge

 

Poster Session 1

12.17.2023 / 10:30-12:30

Chair: Thomas Hain, Hao Tang

Poster ID Paper Title Paper ID
Physical
1-P1-SLP Slm: Bridging the Thin Gap Between Speech and Text Foundational Models 423
1-P2-SLP Deriving Translational Acoustic Sub-Word Embeddings 340
1-P3-SLP Summarize while Translating: Universal Model with Parallel Decoding for Summarization and Translation 415
1-P4-SLP-ST Token-Level Serialized Output Training for Joint Streaming Asr and St Leveraging Textual Alignments 174
1-P5-SLP-ST Improving Stability in Simultaneous Speech Translation: A Revision-Controllable Decoding Approach 207
1-P6-SLP-ST Enhancing Expressivity Transfer in Textless Speech-to-Speech Translation 367
1-P7-SLP-ST A Weakly-Supervised Streaming Multilingual Speech Model with Truly Zero-Shot Capability 342
1-P8-SLP-SDS Exploring the Viability of Synthetic Audio Data for Audio-Based Dialogue State Tracking 47
1-P9-SLP-SLU Simulation of Teacher-Learner Interaction in English Language Pronunciation Learning 318
1-P10-SLP-SLU Whisper-SLU: Extending a Pretrained Speech-to-Text Transformer for Low Resource Spoken Language Understanding 78
1-P11-SLP-SLU Towards a Unified End-to-End Language Understanding System for Speech and Text Inputs 299
1-P12-SLP-SLU Few-Shot Spoken Language Understanding via Joint Speech-Text Models 414
1-P13-TLP Pareto Efficiency of Learning-Forgetting Trade-Off in Neural Language Model Adaptation 66
1-P14-TLP Adversarial Augmentation for Adapter Learning 245
1-P15-TLP Enabling Noisy Label Usage for Out-of-Airspace Data in Read-Back Error Detection 362
1-P16-TLP Enhancing Task-Oriented Dialogues with Chitchat: A Comparative Study Based on Lexical Diversity and Divergence 172
1-P17-MLS Preserving Phonemic Distinctions for Ordinal Regression: A Novel Loss Function for Automatic Pronunciation Assessment 110
1-P18-MLS Reducing the Cost of Spoof Detection Labeling Using Mixed-Strategy Active Learning and Pretrained Models 370
1-P19-MLS Joint Audio and Speech Understanding 383
1-P20-MLS Variational Gaussian Process Data Uncertainty 22
1-P21-MLS Towards Matching Phones and Speech Representations 99
1-P22-MLS Fedcpc: An Effective Federated Contrastive Learning Method for Privacy Preserving Early-Stage Alzheimer’s Speech Detection 201
1-P23-MLS Joint Energy-Based Model for Robust Speech Classification System Against Dirty-Label Backdoor Poisoning Attacks 381
1-P24-MLS Clustering Unsupervised Representations as Defense Against Poisoning Attacks on Speech Commands Classification System 324
1-P25-MLS Can We Use Speaker Embeddings on Spontaneous Speech Obtained from Medical Conversations to Predict Intelligibility? 51
Virtual
1-V1-SLP-SLU Leveraging Multilingual Self-Supervised Pretrained Models for Sequence-to-Sequence End-to-End Spoken Language Understanding 11
1-V2-SLP-SLU Generalized Zero-Shot Audio-to-Intent Classification 336
1-V3-TLP Prompt Pool Based Class-Incremental Continual Learning for Dialog State Tracking 252
1-V4-MLS Robust Logarithmic Champernowne Algorithm for Feedback Cancellation in Hearing Aids 344
1-V5-MLS Brouhaha: Multi-Task Training for Voice Activity Detection, Speech-to-Noise Ratio, and C50 Room Acoustics Estimation 223
1-V6-MLS Multitask Learning Model with Text and Speech Representation for Fine-Grained Speech Scoring 327

 

Poster Session 2

12.17.2023 / 16:00-18:00

Chair: Frank Seide, Andreas Stolcke

Poster ID Paper Title Paper ID
Physical
2-P1-ASR Knowledge Distillation from Offline to Streaming Transducer: Toward Accurate and Fast Streaming Model by Matching Alignments 303
2-P2-ASR Learning from Flawed Data: Weakly Supervised Automatic Speech Recognition 96
2-P3-ASR Can Unpaired Textual Data Replace Synthetic Speech in ARU Model Adaptation? 109
2-P4-ASR Acoustic Model Fusion for End-to-End Speech Recognition 417
2-P5-ASR Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition 48
2-P6-ASR Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data 146
2-P7-ASR Exploring Data Augmentation in Bias Mitigation Against Non-Native-Accented Speech 185
2-P8-ASR The Role of Feature Correlation on Quantized Neural Networks 58
2-P9-ASR The Gift of Feedback: Improving ASR Model Quality by Learning from User Corrections through Federated Learning 85
2-P10-ASR Mask-Conformer: Augmenting Conformer with Mask-Predict Decoder 103
2-P11-ASR Efficient Cascaded Streaming ASR System via Frame Rate Reduction 112
2-P12-ASR Segment-Level Vectorized Beam Search Based on Partially Autoregressive Inference 155
2-P13-ASR Speaker Adaptation for End-to-End Speech Recognition Systems in Noisy Environments 228
2-P14-ASR End-to-End Multichannel Speaker-Attributed ASR: Speaker Guided Decoder and Input Feature Analysis 235
2-P15-ASR Audio-AdapterFusion: A Task-ID-free Approach for Efficient and Non-Destructive Multi-Task Speech Recognition 239
2-P16-ASR Cross-Modal Alignment with Optimal Transport for CTC-Based ASR 263
2-P17-ASR Zero-Shot Domain-Sensitive Speech Recognition with Prompt-Conditioning Fine-Tuning 413
2-P18-ASR Adapting Pretrained Speech Model for Mandarin Lyrics Transcription and Alignment 275
2-P19-ASR Ending The Blind Flight: Analyzing The Impact of Acoustic And Lexical Factors on Wav2Vec 2.0 in Air-Traffic Control 320
2-P20-ASR A Token-Wise Beam Search Algorithm for Rnn-T 177
2-P21-ASR GPU-Accelerated WFST Beam Search Decoder for CTC-Based Speech Recognition 238
2-P22-ASR Two-Pass Endpoint Detection for Speech Recognition 375
2-P23-SS03 Dialect Adaptation and Data Augmentation for Low-Resource ASR: TalTech Systems for the MADASR 2023 Challenge 368
2-P24-SS03 Gated Multi Encoders and Multitask Objectives for Dialectal Speech Recognition in Indian Languages 473
6-P5-SLR MASR: Multi-Label Aware Speech Representation Learning 399
Virtual
2-V1-ASR Contextual Spelling Correction with Large Language Models 82
2-V2-ASR Spike-Triggered Contextual Biasing for End-to-End Mandarin Speech Recognition 287
2-V3-ASR U2-Kws: Unified Two-Pass Open-Vocabulary Keyword Spotting with Keyword Bias 294
2-V4-ASR CTC Blank Triggered Dynamic Layer-Skipping for Efficient CTC-Based Speech Recognition 248
2-V5-ASR Domain Adaptation by Data Distribution Matching via Submodularity for Speech Recognition 420
2-V6-ASR Importance of Smoothness Induced by Optimizers in FL4ASR: Towards Understanding Federated Learning for End-to-End ASR 382

 

Poster Session 3

12.18.2023 / 10:30-12:30

Chair: Gil Keren, Xin Lei

Poster ID Paper Title Paper ID
Physical
3-P1-ASR Improving Whispered Speech Recognition Performance Using Pseudo-Whispered Based Data Augmentation 425
3-P2-ASR-MA Fast Conformer With Linearly Scalable Attention For Efficient Speech Recognition 332
3-P3-ASR-MA Ed-Cec: Improving Rare Word Recognition Using ASR Post-Processing Based on Error Detection and Context-Aware Error Correction 105
3-P4-ASR-MA Optimizing Two-Pass Cross-Lingual Transfer Learning: Phoneme Recognition and Phoneme to Grapheme Translation 246
3-P5-ASR-MA Multi Transcription-Style Speech Transcription Using Attention-Based Encoder-Decoder Model 130
3-P6-ASR-MA SA-Paraformer: Non-Autoregressive End-to-End Speaker-Attributed ASR 316
3-P7-ASR-MA Lv-Ctc: Non-autoregressive ASR with CTC and Latent Variable Models 59
3-P8-ASR-MA Discriminative Speech Recognition Rescoring with Pre-trained Language Models 392
3-P9-ASR-RB Locality Enhanced Dynamic Biasing and Sampling Strategies for Contextual ASR 76
3-P10-ASR-RB FAT-HuBERT: Front-End Adaptive Training of Hidden-Unit BERT for Distortion-Invariant Robust Speech Recognition 90
3-P11-ASR-RB Pseudo-Label Based Supervised Contrastive Loss for Robust Speech Representations 406
3-P12-ASR-SM Hierarchical Attention-Based Contextual Biasing for Personalized Speech Recognition Using Neural Transducers 347
3-P13-ASR-TM Low-Rank Adaptation of Neural Language Model Rescoring for Speech Recognition 26
3-P14-ASR-TM Generative Asr Error Correction with Large Language Models 171
3-P15-ASR-TM Melhubert: A Simplified Hubert on Mel Spectrograms 183
3-P16-ASR-TM Improved Long-Form Speech Recognition by Jointly Modeling the Primary and Non-Primary Speakers 379
3-P17-ASR-TM Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning 108
3-P18-ASR-TM Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech Recognition 111
3-P19-ASR-TM Awmc: Online Test-Time Adaptation without Mode Collapse for Continual Adaptation 190
3-P20-ASR-TM Prompting and Adapter Tuning for Self-supervised Encoder-Decoder Speech Model 265
3-P21-ASR-TM Consistency Based Unsupervised Self-Training for ASR Personalisation 295
3-P22-ASR-TM Joint Federated Learning and Personalization for On-Device ASR 181
3-P23-ASR-TM Efficient Text-Only Domain Adaptation for CTC-Based ASR 274
3-P24-MLP Building High-Accuracy Multilingual ASR with Gated Language Experts and Curriculum Training 354
3-P26-MLP MUST: A Multilingual Student-Teacher Learning Approach for Low-Resource Speech Recognition 440
3-P27-MLP On Decoder-Only Architecture for Speech-to-Text and Large Language Model Integration 301
3-V5-ASR-TM On The Relevance of Phoneme Duration Variability of Synthesized Training Data for Automatic Speech Recognition 361
3-V6-ASR-TM End-to-End Training of a Neural HMM with Label and Transition Probabilities 53
3-V7-ASR-TM Investigating the Effect of Language Models in Sequence Discriminative Training for Neural Transducers 337
Virtual
3-V1-ASR-MA Parameter-Efficient Tuning with Adaptive Bottlenecks for Automatic Speech Recognition 456
3-V2-ASR-MA Improving Large-Scale Deep Biasing with Phoneme Features and Text-only Data in Streaming Transducer 73
3-V3-ASR-MA Ba-Moe: Boundary-Aware Mixture-of-Experts Adapter for Code-Switching Speech Recognition 412
3-V4-ASR-RB No Pitch Left Behind: Addressing Gender Unbalance in Automatic Speech Recognition Through Pitch Manipulation 385
3-V8-MLP Lae-St-Moe: Boosted Language-Aware Encoder Using Speech Translation Auxiliary Task for E2E Code-Switching ASR 175
3-P25-MLP Improving Multilingual and Code-switching ASR using Large Language Model Generated Text 65

 

Poster Session 4

12.18.2023 / 16:00-18:00

Chair: Rohit Prabhavalkar, Xugang Lu

Poster ID Paper Title Paper ID
Physical
4-P1-SES Lc4Sv: A Denoising Framework Learning to Compensate for Unseen Speaker Verification Models 12
4-P2-SES Towards Robust Packet Loss Concealment System with ASR-Guided Representations 290
4-P3-SES Deep Learning for Joint Acoustic Echo and Acoustic Howling Suppression in Hybrid Meetings 140
4-P4-SES On Time Domain Conformer Models for Monaural Speech Separation in Noisy Reverberant Acoustic Environments 142
4-P5-SES A Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, and Extraction 373
4-P6-SES NeuralEcho: Hybrid of Full-Band and Sub-Band Recurrent Neural Network for Acoustic Echo Cancellation and Speech Enhancement 143
4-P7-SES Toward Universal Speech Enhancement for Diverse Input Conditions 244
4-P8-SES Exploring Time-Frequency Domain Target Speaker Extraction for Causal and Non-Causal Processing 380
4-P9-SES Improving Speech Enhancement Using Audio Tagging Knowledge from Pre-Trained Representations and Multi-Task Learning 411
4-P10-ANA Paraconsistent Feature Analysis for the Competency Evaluation of Voice Impersonation 302
4-P11-ANA Not All Errors Are Created Equal: Evaluating the Impact of Model and Speaker Factors on ASR Outcomes in Clinical Populations 83
4-P12-ANA Detection of Vowel Errors in Children's Speech Using Synthetic Phonetic Transcripts 308
4-P13-ANA Transferring Speech-Generic and Depression-Specific Knowledge for Alzheimer's Disease Detection 343
4-P14-ANA Spectral Tilt May Have a Smaller Impact on the Intelligibility of Speech in Noise 445
4-P16-ANA Minisuperb: Lightweight Benchmark for Self-Supervised Speech Models 395
4-P17-MMP Cross-Modal learning for CTC-Based ASR: Leveraging CTC-BERTScore and Sequence-Level Training 323
4-P18-MMP Parameter-Efficient Cross-Language Transfer Learning for a Language-Modular Audiovisual Speech Recognition 333
4-P19-MMP Flap: Fast Language-Audio Pre-Training 360
4-P20-MMP Improving Audiovisual Active Speaker Detection in Egocentric Recordings with the Data-Efficient Image Transformer 387
4-P21-MMP Audio-Visual Neural Syntax Acquisition 409
4-P22-MLS NeuralKalman: A Learnable Kalman Filter for Acoustic Echo Cancellation 134
4-P23-SS01 Scenario-Aware Audio-Visual Tf-Gridnet for Target Speech Extraction 170
Virtual
4-V1-SES An Exploration of Task-Decoupling on Two-Stage Neural Post Filter for Real-Time Personalized Acoustic Echo Cancellation 424
4-V2-SES Mbtfnet: Multi-Band Temporal-Frequency Neural Network for Singing Voice Enhancement 50
4-V3-SES VSANet: Real-Time Speech Enhancement Based on Voice Activity Detection and Causal Spatial Attention 119
4-V4-SES Magnitude-and-Phase-Aware Speech Enhancement with Parallel Sequence Modeling 224
4-V5-ANA Zero-Shot Emotion Transfer for Cross-Lingual Speech Synthesis 433
4-V6-MMP Acoustics-Text Dual-Modal Joint Representation Learning for Cover Song Identification 97
4-V7-MMP Boosting Modality Representation with Pre-Trained Models and Multi-Task Training for Multimodal Sentiment Analysis 272
4-V8-SS04 Pp-Met: A Real-World Personalized Prompt Based Meeting Transcription System 220
4-V9-SS04 The Second Multi-Channel Multi-Party Meeting Transcription Challenge (M2Met 2.0): A Benchmark for Speaker-Attributed ASR 422
4-P15-ANA Unconstrained Dysfluency Modeling for Dysfluent Speech Transcription and Detection 394

 

Poster Session 5

12.19.2023 / 10:30-12:30

Chair: Berrak Sisman, Tomoki Toda

Poster ID Paper Title Paper ID
Physical
5-P1-TTS Using Joint Training Speaker Encoder with Consistency Loss to Achieve Cross-Lingual Voice Conversion and Expressive Voice Conversion 127
5-P2-TTS Quickvc: A Lightweight VITS-Based Any-to-Many Voice Conversion Model Using iSTFT for Faster Conversion 128
5-P3-TTS Maximizing Data Efficiency for Cross-Lingual TTS Adaptation by Self-Supervised Representation Mixing and Embedding Initialization 123
5-P4-TTS Invert-Classify: Recovering Discrete Prosody Inputs for Text-to-Speech 312
5-P5-TTS Transduce and Speak: Neural Transducer for Text-to-Speech with Semantic Token Prediction 9
5-P6-TTS Toward General-Purpose Text-Instruction-Guided Voice Conversion 204
5-P7-TTS Improving Severity Preservation of Healthy-to-Pathological Voice Conversion with Global Style Tokens 233
5-P8-TTS PerMod: Perceptually Grounded Voice Modification with Latent Diffusion Models 271
5-P9-TTS Partial Rank Similarity Minimization Method for Quality Mos Prediction Oo Unseen Speech Synthesis Systems in Zero-Shot and Semi-Supervised Setting 276
5-P10-TTS E3 Tts: Easy End-to-End Diffusion-Based Text to Speech 352
5-P11-TTS WaveNeXt: ConvNeXt-Based Fast Neural Vocoder without iSTFT Layer 441
5-P12-TTS Towards Developing State-of-the-Art TTS Synthesisers for 13 Indian Languages with Signal Processing Aided Alignments 453
5-P13-TTS Zero-Shot Singing Voice Synthesis from Musical Score 268
5-P14-TTS Diffusion-Based Mel-Spectrogram Enhancement For Personalized Speech Synthesis with Found Data 165
5-P15-SS06 The Singing Voice Conversion Challenge 2023 64
5-P16-SS06 A Comparative Study of Voice Conversion Models with Large-Scale Speech and Singing Data: The T13 Systems for the Singing Voice Conversion Challenge 2023 403
5-P17-SS07 Le-Ssl-Mos: Self-Supervised Learning Mos Prediction with Listener Enhancement 192
5-P18-SS07 The VoiceMOS Challenge 2023: Zero-Shot Subjective Speech Quality Prediction for Multiple Domains 258
Virtual
5-V1-TTS CrossSinger: A Cross-Lingual Multi-Singer High-Fidelity Singing Voice Synthesizer Trained on Monolingual Singers 27
5-V2-TTS HiGNN-TTS : Hierarchical Prosody Modeling with Graph Neural Networks for Expressive Long-Form TTS 430
5-V3-TTS SALT: Distinguishable Speaker Anonymization Through Latent Space Transformation 436
5-V4-TTS PromptSpeaker: Speaker Generation Based on Text Descriptions 429
5-V5-TTS Bisinger: Bilingual Singing Voice Synthesis 36
5-V6-SS06 VITS-Based Singing Voice Conversion Leveraging Whisper and Multi-Scale F0 Modeling 435
5-V7-SS06 Vits-Based Singing Voice Conversion System with Dspgan Post-Processing for Svcc2023 478
5-V8-SS07 Sqat-Ld: Speech Quality Assessment Transformer Utilizing Listener Dependent Modeling for Zero-Shot Out-of-Domain Mos Prediction 166
5-V9-SS07 Kaq: A Non-Intrusive Stacking Framework for Mean Opinion Score Prediction with Multi-Task Learning 313

 

Poster Session 6

12.19.2023 / 16:00-18:00

Chair: Carlos Busso, Nicholas Cummins

Poster ID Paper Title Paper ID
Physical
6-P1-SLR Meta-Learning Framework for End-to-End Imposter Identification in Unseen Speaker Recognition 124
6-P2-SLR Model-Based Fairness Metric for Speaker Verification 253
6-P3-SLR Generative Linguistic Representation for Spoken Language Identification 284
6-P4-SLR ECAPA2: A Hybrid Neural Network Architecture and Training Strategy for Robust Speaker Embeddings 326
6-P5-SLR MASR: Multi-Label Aware Speech Representation Learning 399
6-P6-PSLR Extending Self-Distilled Self-Supervised Learning for Semi-Supervised Speaker Verification 404
6-P7-DI Robust End-to-End Diarization with Domain Adaptive Training and Multi-Task Learning 77
6-P8-DIA Transformer Attractors for Robust and Efficient End-to-End Neural Diarization 304
6-P9-DIA Semi-Supervised Multi-Channel Speaker Diarization with Cross-Channel Attention 468
6-P11-EMR Active Learning Based Fine-Tuning Framework for Speech Emotion Recognition 30
6-P12-EMR Identifying People with Mild Cognitive Impairment at Risk of Developing Dementia Using Speech Analysis 32
6-P13-EMR Robust Recognition of Speaker Emotion with Difference Feature Extraction Using a Few Enrollment Utterances 44
6-P14-EMR Improved Multi-modal Emotion Recognition using Squeeze-and-Excitation Block in Cross-Modal Attention 72
6-P15-EMR Detecting Speech Abnormalities with a Perceiver-Based Sequence Classifier That Leverages a Universal Speech Model 80
6-P16-EMR Combining Relative and Absolute Learning Formulations to Predict Emotional Attributes From Speech 144
6-P17-EMR Speech Emotion Diarization: Which Emotion Appears When? 93
6-P18-RES RescueSpeech: A German Corpus for Speech Recognition in Search and Rescue Domain 101
6-P19-RES ESPNet-SUMM: Introducing a Novel Large Dataset, Toolkit, and a Cross-Corpora Evaluation of Speech Summarization Systems 145
6-P20-RES Transcribing And Aligning Conversational Speech: A Hybrid Pipeline Applied to French Conversations 198
6-P21-RES Study on the Correlation between Objective Evaluations and Subjective Speech Quality and Intelligibility 264
6-P22-RES Coco-Nut: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-Based Control 282
6-P23-RES Librispeech-Pc: Benchmark For Evaluation of Punctuation And Capitalization Capabilities of End-to-End ASR Models 330
6-P24-RES Torchaudio 2.1: Advancing Speech Recognition, Self-Supervised Learning, and Audio Processing Components for Pytorch 339
6-P25-RES YODAS: Youtube-Oriented Dataset for Audio and Speech 391
6-P26-RES H_Eval: A New Hybrid Evaluation Metric for Automatic Speech Recognition Tasks 447
6-P27-SS02 Thai-Dialect: Low Resource Thai Dialectal Speech to Text Corpora 135
6-P28-SS02 Joint Prediction and Denoising for Large-Scale Multilingual Self-Supervised Learning 156
6-P29-SS02 Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond 163
6-P30-SS02 Evaluating Self-Supervised Speech Models on a Taiwanese Hokkien Corpus 331
6-P31-SS02 Leveraging The Multilingual Indonesian Ethnic Languages Dataset in Self-Supervised Model for Low-Resource ASR Task 426
Virtual
6-PV1-SLR Haha-Pod: An Attempt for Laughter-Based Non-Verbal Speaker Verification 211
6-PV2-SLR CAMSAT: Augmentation Mix and Self-Augmented Training Clustering for Self-Supervised Speaker Recognition 243
6-PV3-SLR VoiceExtender: Short-Utterance Text-Independent Speaker Verification with Guided Diffusion Model 266
6-PV4-RES Wiki-En-Asr-Adapt: Large-Scale Synthetic Dataset for English Asr Customization 54
6-P10-MMP Av-Data2Vec: Self-Supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations 206

 

Demonstration Session

12.20.2023 / 10:30-11:30
Prof. Carlos Busso

Poster ID Paper Title
Physical
D-P1-DEMO Towards Streaming Speech-to-Avatar Synthesis
D-P2-DEMO NYCUKA: A Self-Disclosure Mental Health Spoken Dialogue System
logo-wwelcome.png

WELCOME TO TAIWAN

logo

Congress Secretariat

Elite Professional Conference Organizer

footer-solgan