TECHNICAL PROGRAM SCHEDULE

Article Category: PROGRAM
11/22/2023

Replies (0)

The poster ID is as follows: [Poster Session 1-6]-[Physical/Virtual Session No.]-[Topic]

[Day] is the day of the conference (1,2, 3 or 4),

[Poster Number] is the number of the poster within the session

[Topic] is the technical area of the work as follows

Topic ID

[Topic] is the technical area of the work as follow	Technical Area
ASR	01. Automatic speech recognition
ASR-TM	01. Automatic speech recognition -> Training methods
ASR-MA	01. Automatic speech recognition -> Model architectures
ASR-RB	01. Automatic speech recognition -> Robustness
ASR-SM	01. Automatic speech recognition -> Streaming models
SLP	02. Spoken language processing
SLP-SLU	02. Spoken language processing -> Spoken language understanding
SLP-ST	02. Spoken language processing -> Speech translation
SLP-SDS	02. Spoken language processing -> Spoken dialog systems
SES	03. Speech enhancement and separation
ANA	04. Speech analysis
SLR	05. Speaker and language recognition
DIA	06. Speaker diarization
TLP	07. Text-only language processing
MMP	08. Multimodal speech processing
MLP	09. Multilingual processing
EMR	10. Emotion recognition and paralinguistics
TTS	11. Speech synthesis and spoken language generation
RES	12. Resources (new corpora, toolkits, evaluation metrics, etc.)
MLS	13. Machine learning for speech applications
SS01	SS01. Audio visual speech enhancement challenge 2
SS02	SS02. ML-SUPERB
SS03	SS03. Model adaptation for low resource ASR for Indian languages
SS04	SS04. multi-channel multi-party meeting transcription
SS06	SS06. Singing voice conversion challenge
SS07	SS07. VoiceMOS challenge

Poster Session 1

12.17.2023 / 10:30-12:30

Chair: Thomas Hain, Hao Tang

Poster ID	Paper Title	Paper ID
Physical
1-P1-SLP	Slm: Bridging the Thin Gap Between Speech and Text Foundational Models	423
1-P2-SLP	Deriving Translational Acoustic Sub-Word Embeddings	340
1-P3-SLP	Summarize while Translating: Universal Model with Parallel Decoding for Summarization and Translation	415
1-P4-SLP-ST	Token-Level Serialized Output Training for Joint Streaming Asr and St Leveraging Textual Alignments	174
1-P5-SLP-ST	Improving Stability in Simultaneous Speech Translation: A Revision-Controllable Decoding Approach	207
1-P6-SLP-ST	Enhancing Expressivity Transfer in Textless Speech-to-Speech Translation	367
1-P7-SLP-ST	A Weakly-Supervised Streaming Multilingual Speech Model with Truly Zero-Shot Capability	342
1-P8-SLP-SDS	Exploring the Viability of Synthetic Audio Data for Audio-Based Dialogue State Tracking	47
1-P9-SLP-SLU	Simulation of Teacher-Learner Interaction in English Language Pronunciation Learning	318
1-P10-SLP-SLU	Whisper-SLU: Extending a Pretrained Speech-to-Text Transformer for Low Resource Spoken Language Understanding	78
1-P11-SLP-SLU	Towards a Unified End-to-End Language Understanding System for Speech and Text Inputs	299
1-P12-SLP-SLU	Few-Shot Spoken Language Understanding via Joint Speech-Text Models	414
1-P13-TLP	Pareto Efficiency of Learning-Forgetting Trade-Off in Neural Language Model Adaptation	66
1-P14-TLP	Adversarial Augmentation for Adapter Learning	245
1-P15-TLP	Enabling Noisy Label Usage for Out-of-Airspace Data in Read-Back Error Detection	362
1-P16-TLP	Enhancing Task-Oriented Dialogues with Chitchat: A Comparative Study Based on Lexical Diversity and Divergence	172
1-P17-MLS	Preserving Phonemic Distinctions for Ordinal Regression: A Novel Loss Function for Automatic Pronunciation Assessment	110
1-P18-MLS	Reducing the Cost of Spoof Detection Labeling Using Mixed-Strategy Active Learning and Pretrained Models	370
1-P19-MLS	Joint Audio and Speech Understanding	383
1-P20-MLS	Variational Gaussian Process Data Uncertainty	22
1-P21-MLS	Towards Matching Phones and Speech Representations	99
1-P22-MLS	Fedcpc: An Effective Federated Contrastive Learning Method for Privacy Preserving Early-Stage Alzheimer’s Speech Detection	201
1-P23-MLS	Joint Energy-Based Model for Robust Speech Classification System Against Dirty-Label Backdoor Poisoning Attacks	381
1-P24-MLS	Clustering Unsupervised Representations as Defense Against Poisoning Attacks on Speech Commands Classification System	324
1-P25-MLS	Can We Use Speaker Embeddings on Spontaneous Speech Obtained from Medical Conversations to Predict Intelligibility?	51
Virtual
1-V1-SLP-SLU	Leveraging Multilingual Self-Supervised Pretrained Models for Sequence-to-Sequence End-to-End Spoken Language Understanding	11
1-V2-SLP-SLU	Generalized Zero-Shot Audio-to-Intent Classification	336
1-V3-TLP	Prompt Pool Based Class-Incremental Continual Learning for Dialog State Tracking	252
1-V4-MLS	Robust Logarithmic Champernowne Algorithm for Feedback Cancellation in Hearing Aids	344
1-V5-MLS	Brouhaha: Multi-Task Training for Voice Activity Detection, Speech-to-Noise Ratio, and C50 Room Acoustics Estimation	223
1-V6-MLS	Multitask Learning Model with Text and Speech Representation for Fine-Grained Speech Scoring	327

Poster Session 2

12.17.2023 / 16:00-18:00

Chair: Frank Seide, Andreas Stolcke

Poster ID	Paper Title	Paper ID
Physical
2-P1-ASR	Knowledge Distillation from Offline to Streaming Transducer: Toward Accurate and Fast Streaming Model by Matching Alignments	303
2-P2-ASR	Learning from Flawed Data: Weakly Supervised Automatic Speech Recognition	96
2-P3-ASR	Can Unpaired Textual Data Replace Synthetic Speech in ARU Model Adaptation?	109
2-P4-ASR	Acoustic Model Fusion for End-to-End Speech Recognition	417
2-P5-ASR	Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition	48
2-P6-ASR	Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data	146
2-P7-ASR	Exploring Data Augmentation in Bias Mitigation Against Non-Native-Accented Speech	185
2-P8-ASR	The Role of Feature Correlation on Quantized Neural Networks	58
2-P9-ASR	The Gift of Feedback: Improving ASR Model Quality by Learning from User Corrections through Federated Learning	85
2-P10-ASR	Mask-Conformer: Augmenting Conformer with Mask-Predict Decoder	103
2-P11-ASR	Efficient Cascaded Streaming ASR System via Frame Rate Reduction	112
2-P12-ASR	Segment-Level Vectorized Beam Search Based on Partially Autoregressive Inference	155
2-P13-ASR	Speaker Adaptation for End-to-End Speech Recognition Systems in Noisy Environments	228
2-P14-ASR	End-to-End Multichannel Speaker-Attributed ASR: Speaker Guided Decoder and Input Feature Analysis	235
2-P15-ASR	Audio-AdapterFusion: A Task-ID-free Approach for Efficient and Non-Destructive Multi-Task Speech Recognition	239
2-P16-ASR	Cross-Modal Alignment with Optimal Transport for CTC-Based ASR	263
2-P17-ASR	Zero-Shot Domain-Sensitive Speech Recognition with Prompt-Conditioning Fine-Tuning	413
2-P18-ASR	Adapting Pretrained Speech Model for Mandarin Lyrics Transcription and Alignment	275
2-P19-ASR	Ending The Blind Flight: Analyzing The Impact of Acoustic And Lexical Factors on Wav2Vec 2.0 in Air-Traffic Control	320
2-P20-ASR	A Token-Wise Beam Search Algorithm for Rnn-T	177
2-P21-ASR	GPU-Accelerated WFST Beam Search Decoder for CTC-Based Speech Recognition	238
2-P22-ASR	Two-Pass Endpoint Detection for Speech Recognition	375
2-P23-SS03	Dialect Adaptation and Data Augmentation for Low-Resource ASR: TalTech Systems for the MADASR 2023 Challenge	368
2-P24-SS03	Gated Multi Encoders and Multitask Objectives for Dialectal Speech Recognition in Indian Languages	473
6-P5-SLR	MASR: Multi-Label Aware Speech Representation Learning	399
Virtual
2-V1-ASR	Contextual Spelling Correction with Large Language Models	82
2-V2-ASR	Spike-Triggered Contextual Biasing for End-to-End Mandarin Speech Recognition	287
2-V3-ASR	U2-Kws: Unified Two-Pass Open-Vocabulary Keyword Spotting with Keyword Bias	294
2-V4-ASR	CTC Blank Triggered Dynamic Layer-Skipping for Efficient CTC-Based Speech Recognition	248
2-V5-ASR	Domain Adaptation by Data Distribution Matching via Submodularity for Speech Recognition	420
2-V6-ASR	Importance of Smoothness Induced by Optimizers in FL4ASR: Towards Understanding Federated Learning for End-to-End ASR	382

Poster Session 3

12.18.2023 / 10:30-12:30

Chair: Gil Keren, Xin Lei

Poster ID	Paper Title	Paper ID
Physical
3-P1-ASR	Improving Whispered Speech Recognition Performance Using Pseudo-Whispered Based Data Augmentation	425
3-P2-ASR-MA	Fast Conformer With Linearly Scalable Attention For Efficient Speech Recognition	332
3-P3-ASR-MA	Ed-Cec: Improving Rare Word Recognition Using ASR Post-Processing Based on Error Detection and Context-Aware Error Correction	105
3-P4-ASR-MA	Optimizing Two-Pass Cross-Lingual Transfer Learning: Phoneme Recognition and Phoneme to Grapheme Translation	246
3-P5-ASR-MA	Multi Transcription-Style Speech Transcription Using Attention-Based Encoder-Decoder Model	130
3-P6-ASR-MA	SA-Paraformer: Non-Autoregressive End-to-End Speaker-Attributed ASR	316
3-P7-ASR-MA	Lv-Ctc: Non-autoregressive ASR with CTC and Latent Variable Models	59
3-P8-ASR-MA	Discriminative Speech Recognition Rescoring with Pre-trained Language Models	392
3-P9-ASR-RB	Locality Enhanced Dynamic Biasing and Sampling Strategies for Contextual ASR	76
3-P10-ASR-RB	FAT-HuBERT: Front-End Adaptive Training of Hidden-Unit BERT for Distortion-Invariant Robust Speech Recognition	90
3-P11-ASR-RB	Pseudo-Label Based Supervised Contrastive Loss for Robust Speech Representations	406
3-P12-ASR-SM	Hierarchical Attention-Based Contextual Biasing for Personalized Speech Recognition Using Neural Transducers	347
3-P13-ASR-TM	Low-Rank Adaptation of Neural Language Model Rescoring for Speech Recognition	26
3-P14-ASR-TM	Generative Asr Error Correction with Large Language Models	171
3-P15-ASR-TM	Melhubert: A Simplified Hubert on Mel Spectrograms	183
3-P16-ASR-TM	Improved Long-Form Speech Recognition by Jointly Modeling the Primary and Non-Primary Speakers	379
3-P17-ASR-TM	Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning	108
3-P18-ASR-TM	Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech Recognition	111
3-P19-ASR-TM	Awmc: Online Test-Time Adaptation without Mode Collapse for Continual Adaptation	190
3-P20-ASR-TM	Prompting and Adapter Tuning for Self-supervised Encoder-Decoder Speech Model	265
3-P21-ASR-TM	Consistency Based Unsupervised Self-Training for ASR Personalisation	295
3-P22-ASR-TM	Joint Federated Learning and Personalization for On-Device ASR	181
3-P23-ASR-TM	Efficient Text-Only Domain Adaptation for CTC-Based ASR	274
3-P24-MLP	Building High-Accuracy Multilingual ASR with Gated Language Experts and Curriculum Training	354
3-P26-MLP	MUST: A Multilingual Student-Teacher Learning Approach for Low-Resource Speech Recognition	440
3-P27-MLP	On Decoder-Only Architecture for Speech-to-Text and Large Language Model Integration	301
3-V5-ASR-TM	On The Relevance of Phoneme Duration Variability of Synthesized Training Data for Automatic Speech Recognition	361
3-V6-ASR-TM	End-to-End Training of a Neural HMM with Label and Transition Probabilities	53
3-V7-ASR-TM	Investigating the Effect of Language Models in Sequence Discriminative Training for Neural Transducers	337
Virtual
3-V1-ASR-MA	Parameter-Efficient Tuning with Adaptive Bottlenecks for Automatic Speech Recognition	456
3-V2-ASR-MA	Improving Large-Scale Deep Biasing with Phoneme Features and Text-only Data in Streaming Transducer	73
3-V3-ASR-MA	Ba-Moe: Boundary-Aware Mixture-of-Experts Adapter for Code-Switching Speech Recognition	412
3-V4-ASR-RB	No Pitch Left Behind: Addressing Gender Unbalance in Automatic Speech Recognition Through Pitch Manipulation	385
3-V8-MLP	Lae-St-Moe: Boosted Language-Aware Encoder Using Speech Translation Auxiliary Task for E2E Code-Switching ASR	175
3-P25-MLP	Improving Multilingual and Code-switching ASR using Large Language Model Generated Text	65

Poster Session 4

12.18.2023 / 16:00-18:00

Chair: Rohit Prabhavalkar, Xugang Lu

Poster ID	Paper Title	Paper ID
Physical
4-P1-SES	Lc4Sv: A Denoising Framework Learning to Compensate for Unseen Speaker Verification Models	12
4-P2-SES	Towards Robust Packet Loss Concealment System with ASR-Guided Representations	290
4-P3-SES	Deep Learning for Joint Acoustic Echo and Acoustic Howling Suppression in Hybrid Meetings	140
4-P4-SES	On Time Domain Conformer Models for Monaural Speech Separation in Noisy Reverberant Acoustic Environments	142
4-P5-SES	A Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, and Extraction	373
4-P6-SES	NeuralEcho: Hybrid of Full-Band and Sub-Band Recurrent Neural Network for Acoustic Echo Cancellation and Speech Enhancement	143
4-P7-SES	Toward Universal Speech Enhancement for Diverse Input Conditions	244
4-P8-SES	Exploring Time-Frequency Domain Target Speaker Extraction for Causal and Non-Causal Processing	380
4-P9-SES	Improving Speech Enhancement Using Audio Tagging Knowledge from Pre-Trained Representations and Multi-Task Learning	411
4-P10-ANA	Paraconsistent Feature Analysis for the Competency Evaluation of Voice Impersonation	302
4-P11-ANA	Not All Errors Are Created Equal: Evaluating the Impact of Model and Speaker Factors on ASR Outcomes in Clinical Populations	83
4-P12-ANA	Detection of Vowel Errors in Children's Speech Using Synthetic Phonetic Transcripts	308
4-P13-ANA	Transferring Speech-Generic and Depression-Specific Knowledge for Alzheimer's Disease Detection	343
4-P14-ANA	Spectral Tilt May Have a Smaller Impact on the Intelligibility of Speech in Noise	445
4-P16-ANA	Minisuperb: Lightweight Benchmark for Self-Supervised Speech Models	395
4-P17-MMP	Cross-Modal learning for CTC-Based ASR: Leveraging CTC-BERTScore and Sequence-Level Training	323
4-P18-MMP	Parameter-Efficient Cross-Language Transfer Learning for a Language-Modular Audiovisual Speech Recognition	333
4-P19-MMP	Flap: Fast Language-Audio Pre-Training	360
4-P20-MMP	Improving Audiovisual Active Speaker Detection in Egocentric Recordings with the Data-Efficient Image Transformer	387
4-P21-MMP	Audio-Visual Neural Syntax Acquisition	409
4-P22-MLS	NeuralKalman: A Learnable Kalman Filter for Acoustic Echo Cancellation	134
4-P23-SS01	Scenario-Aware Audio-Visual Tf-Gridnet for Target Speech Extraction	170
Virtual
4-V1-SES	An Exploration of Task-Decoupling on Two-Stage Neural Post Filter for Real-Time Personalized Acoustic Echo Cancellation	424
4-V2-SES	Mbtfnet: Multi-Band Temporal-Frequency Neural Network for Singing Voice Enhancement	50
4-V3-SES	VSANet: Real-Time Speech Enhancement Based on Voice Activity Detection and Causal Spatial Attention	119
4-V4-SES	Magnitude-and-Phase-Aware Speech Enhancement with Parallel Sequence Modeling	224
4-V5-ANA	Zero-Shot Emotion Transfer for Cross-Lingual Speech Synthesis	433
4-V6-MMP	Acoustics-Text Dual-Modal Joint Representation Learning for Cover Song Identification	97
4-V7-MMP	Boosting Modality Representation with Pre-Trained Models and Multi-Task Training for Multimodal Sentiment Analysis	272
4-V8-SS04	Pp-Met: A Real-World Personalized Prompt Based Meeting Transcription System	220
4-V9-SS04	The Second Multi-Channel Multi-Party Meeting Transcription Challenge (M2Met 2.0): A Benchmark for Speaker-Attributed ASR	422
4-P15-ANA	Unconstrained Dysfluency Modeling for Dysfluent Speech Transcription and Detection	394

Poster Session 5

12.19.2023 / 10:30-12:30

Chair: Berrak Sisman, Tomoki Toda

Poster ID	Paper Title	Paper ID
Physical
5-P1-TTS	Using Joint Training Speaker Encoder with Consistency Loss to Achieve Cross-Lingual Voice Conversion and Expressive Voice Conversion	127
5-P2-TTS	Quickvc: A Lightweight VITS-Based Any-to-Many Voice Conversion Model Using iSTFT for Faster Conversion	128
5-P3-TTS	Maximizing Data Efficiency for Cross-Lingual TTS Adaptation by Self-Supervised Representation Mixing and Embedding Initialization	123
5-P4-TTS	Invert-Classify: Recovering Discrete Prosody Inputs for Text-to-Speech	312
5-P5-TTS	Transduce and Speak: Neural Transducer for Text-to-Speech with Semantic Token Prediction	9
5-P6-TTS	Toward General-Purpose Text-Instruction-Guided Voice Conversion	204
5-P7-TTS	Improving Severity Preservation of Healthy-to-Pathological Voice Conversion with Global Style Tokens	233
5-P8-TTS	PerMod: Perceptually Grounded Voice Modification with Latent Diffusion Models	271
5-P9-TTS	Partial Rank Similarity Minimization Method for Quality Mos Prediction Oo Unseen Speech Synthesis Systems in Zero-Shot and Semi-Supervised Setting	276
5-P10-TTS	E3 Tts: Easy End-to-End Diffusion-Based Text to Speech	352
5-P11-TTS	WaveNeXt: ConvNeXt-Based Fast Neural Vocoder without iSTFT Layer	441
5-P12-TTS	Towards Developing State-of-the-Art TTS Synthesisers for 13 Indian Languages with Signal Processing Aided Alignments	453
5-P13-TTS	Zero-Shot Singing Voice Synthesis from Musical Score	268
5-P14-TTS	Diffusion-Based Mel-Spectrogram Enhancement For Personalized Speech Synthesis with Found Data	165
5-P15-SS06	The Singing Voice Conversion Challenge 2023	64
5-P16-SS06	A Comparative Study of Voice Conversion Models with Large-Scale Speech and Singing Data: The T13 Systems for the Singing Voice Conversion Challenge 2023	403
5-P17-SS07	Le-Ssl-Mos: Self-Supervised Learning Mos Prediction with Listener Enhancement	192
5-P18-SS07	The VoiceMOS Challenge 2023: Zero-Shot Subjective Speech Quality Prediction for Multiple Domains	258
Virtual
5-V1-TTS	CrossSinger: A Cross-Lingual Multi-Singer High-Fidelity Singing Voice Synthesizer Trained on Monolingual Singers	27
5-V2-TTS	HiGNN-TTS : Hierarchical Prosody Modeling with Graph Neural Networks for Expressive Long-Form TTS	430
5-V3-TTS	SALT: Distinguishable Speaker Anonymization Through Latent Space Transformation	436
5-V4-TTS	PromptSpeaker: Speaker Generation Based on Text Descriptions	429
5-V5-TTS	Bisinger: Bilingual Singing Voice Synthesis	36
5-V6-SS06	VITS-Based Singing Voice Conversion Leveraging Whisper and Multi-Scale F0 Modeling	435
5-V7-SS06	Vits-Based Singing Voice Conversion System with Dspgan Post-Processing for Svcc2023	478
5-V8-SS07	Sqat-Ld: Speech Quality Assessment Transformer Utilizing Listener Dependent Modeling for Zero-Shot Out-of-Domain Mos Prediction	166
5-V9-SS07	Kaq: A Non-Intrusive Stacking Framework for Mean Opinion Score Prediction with Multi-Task Learning	313

Poster Session 6

12.19.2023 / 16:00-18:00

Chair: Carlos Busso, Nicholas Cummins

Poster ID	Paper Title	Paper ID
Physical
6-P1-SLR	Meta-Learning Framework for End-to-End Imposter Identification in Unseen Speaker Recognition	124
6-P2-SLR	Model-Based Fairness Metric for Speaker Verification	253
6-P3-SLR	Generative Linguistic Representation for Spoken Language Identification	284
6-P4-SLR	ECAPA2: A Hybrid Neural Network Architecture and Training Strategy for Robust Speaker Embeddings	326
6-P5-SLR	MASR: Multi-Label Aware Speech Representation Learning	399
6-P6-PSLR	Extending Self-Distilled Self-Supervised Learning for Semi-Supervised Speaker Verification	404
6-P7-DI	Robust End-to-End Diarization with Domain Adaptive Training and Multi-Task Learning	77
6-P8-DIA	Transformer Attractors for Robust and Efficient End-to-End Neural Diarization	304
6-P9-DIA	Semi-Supervised Multi-Channel Speaker Diarization with Cross-Channel Attention	468
6-P11-EMR	Active Learning Based Fine-Tuning Framework for Speech Emotion Recognition	30
6-P12-EMR	Identifying People with Mild Cognitive Impairment at Risk of Developing Dementia Using Speech Analysis	32
6-P13-EMR	Robust Recognition of Speaker Emotion with Difference Feature Extraction Using a Few Enrollment Utterances	44
6-P14-EMR	Improved Multi-modal Emotion Recognition using Squeeze-and-Excitation Block in Cross-Modal Attention	72
6-P15-EMR	Detecting Speech Abnormalities with a Perceiver-Based Sequence Classifier That Leverages a Universal Speech Model	80
6-P16-EMR	Combining Relative and Absolute Learning Formulations to Predict Emotional Attributes From Speech	144
6-P17-EMR	Speech Emotion Diarization: Which Emotion Appears When?	93
6-P18-RES	RescueSpeech: A German Corpus for Speech Recognition in Search and Rescue Domain	101
6-P19-RES	ESPNet-SUMM: Introducing a Novel Large Dataset, Toolkit, and a Cross-Corpora Evaluation of Speech Summarization Systems	145
6-P20-RES	Transcribing And Aligning Conversational Speech: A Hybrid Pipeline Applied to French Conversations	198
6-P21-RES	Study on the Correlation between Objective Evaluations and Subjective Speech Quality and Intelligibility	264
6-P22-RES	Coco-Nut: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-Based Control	282
6-P23-RES	Librispeech-Pc: Benchmark For Evaluation of Punctuation And Capitalization Capabilities of End-to-End ASR Models	330
6-P24-RES	Torchaudio 2.1: Advancing Speech Recognition, Self-Supervised Learning, and Audio Processing Components for Pytorch	339
6-P25-RES	YODAS: Youtube-Oriented Dataset for Audio and Speech	391
6-P26-RES	H_Eval: A New Hybrid Evaluation Metric for Automatic Speech Recognition Tasks	447
6-P27-SS02	Thai-Dialect: Low Resource Thai Dialectal Speech to Text Corpora	135
6-P28-SS02	Joint Prediction and Denoising for Large-Scale Multilingual Self-Supervised Learning	156
6-P29-SS02	Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond	163
6-P30-SS02	Evaluating Self-Supervised Speech Models on a Taiwanese Hokkien Corpus	331
6-P31-SS02	Leveraging The Multilingual Indonesian Ethnic Languages Dataset in Self-Supervised Model for Low-Resource ASR Task	426
Virtual
6-PV1-SLR	Haha-Pod: An Attempt for Laughter-Based Non-Verbal Speaker Verification	211
6-PV2-SLR	CAMSAT: Augmentation Mix and Self-Augmented Training Clustering for Self-Supervised Speaker Recognition	243
6-PV3-SLR	VoiceExtender: Short-Utterance Text-Independent Speaker Verification with Guided Diffusion Model	266
6-PV4-RES	Wiki-En-Asr-Adapt: Large-Scale Synthetic Dataset for English Asr Customization	54
6-P10-MMP	Av-Data2Vec: Self-Supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations	206

Demonstration Session

12.20.2023 / 10:30-11:30
Prof. Carlos Busso

Poster ID	Paper Title
Physical
D-P1-DEMO	Towards Streaming Speech-to-Avatar Synthesis
D-P2-DEMO	NYCUKA: A Self-Disclosure Mental Health Spoken Dialogue System

TECHNICAL PROGRAM SCHEDULE

Topic ID

Poster Session 1