Project 6 – Electrical and Computer Engineering

Human-Robot Collaboration AI for Advanced Manufacturing and Engineering (AME)

Spoken dialogue is the most natural means of human communication. It has also become part of human-machine interface in smartphones, personal assistants, intelligent agents, robot companions, in-car telematics, among others, that offers invaluable services. In practice, a dialogue system can be goal-driven, such as an ATM machine for people to complete a transaction; it can also be a chatting system, such as ELIZA chatbot for entertainment purposes without a specific goal; and it can also be something in-between that provides humans with information. Most of today’s dialogue systems are built on some existing knowledge database, and perform pattern classification tasks in one of the above three operating modes. They work in two separate phases: learning and run-time execution. At run-time, machines execute what they have learnt during training.

In this project, we develop novel methods for natural language dialogue with humans that will allow the a robotic system to proactively elicit information at run-time from a human co-worker on task details, and co-ordinate with the co-worker on sub-task allocation during planning. In particular, the system may use dialogue interaction to understand the co-worker’s goals, intentions and other aspects of the collaborative task which cannot be explicitly perceived through other means (e.g. visual). The project also studies the methods for the system to explain its responses using general and domain knowledge, as well as contextual information. Specifically, by combining low-level learning and high-level reasoning, we aim to enable machines to provide context-aware, user-centric explanations in response to human inquiries and to converse with humans more naturally to form a peer-like relationship. In this way, machines will perform services (e.g. inspection, repair, etc.) more accurately, while humans can be more confident in machines and make informed decisions.

The project will deliver a dialogue system that initially learns from sample, generic conversations, and further learn to generate domain-specific conversations using limited samples from the domain. The system also consolidates the contextual information such as working environment, user preferences and so on, and integrates such information with commonsense knowledge, and performs reasoning during the conversations to produce only relevant and concise responses, together with necessary explanations for certain types of inquiries.

Project Duration: 26 November 2018 – 24 May 2024

Funding Source: RIE2020 Advanced Manufacturing and Engineering Programmatic Grant A18A2b0046

Acknowledgement: This research work is supported by Programmatic Grant No. A18A2b0046 from the Singapore Government’s Research, Innovation and Enterprise 2020 plan (Advanced Manufacturing and Engineering domain). Project Title: Human Robot Collaborative AI for AME.

PUBLICATIONS

Journal Articles

Xinyi Chen*, Qu Yang*, Jibin Wu, Haizhou Li, , Kay Chen Tan, "A Hybrid Neural Coding Approach for Pattern Recognition With Spiking Neural Networks," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 5, pp. 3064-3078, May 2024, doi: 10.1109/TPAMI.2023.3339211
Zhiping Lin, Zhenyu Weng, Huiping Zhuang, Fulin Luo, Haizhou Li, "Few-Shot Contrastive Transfer Learning With Pretrained Model for Masked Face Verification," in IEEE Transactions on Multimedia, vol. 26, pp. 3871-3883, 2024, doi: 10.1109/TMM.2023.3316920.
Xuehao Zhou, Mingyang Zhang, Yi Zhou, Zhizheng Wu, Haizhou Li, "Accented Text-to-Speech Synthesis With Limited Data" in IEEE/ACM Transactions on Audio, Speech and Language Processing, DOI 10.1109/TASLP.2024.3363414
Kun Zhou; Berrak Sisman; Rajib Rana; Björn W. Schuller; Haizhou Li, "Speech Synthesis With Mixed Emotions," in IEEE Transactions on Affective Computing, vol. 14, no. 4, pp. 3120-3134, 1 Oct.-Dec. 2023, doi: 10.1109/TAFFC.2022.3233324
Qu Yang*, Malu Zhang*, Jibin Wu, Kay Chen Tan, Haizhou Li, "LC-TTFS: Towards Lossless Network Conversion for Spiking Neural Networks with TTFS Coding", IEEE Transactions on Cognitive and Developmental Systems 2023, DOI: 10.1109/TCDS.2023.3334010
Siqi Cai, Tanja Schultz, and Haizhou Li, "Brain Topology Modeling With EEG-Graphs for Auditory Spatial Attention Detection," in IEEE Trans Biomed Eng. 2024 Jan;71(1):171-182. July 2023, doi: 10.1109/TBME.2023.3294242.
Siqi Cai, Peiwen Li, and Haizhou Li, "A Bio-Inspired Spiking Attentional Neural Network for Attentional Selection in the Listening Brain," in IEEE Transactions on Neural Networks and Learning Systems, August 2023, doi: 10.1109/TNNLS.2023.3303308.
Qinyi Wang, Xinyuan Zhou, Haizhou Li, "Speech-and-Text Transformer: Exploiting Unpaired Text for End-to-End Speech Recognition", APSIPA Transactions on Signal and Information Processing: Vol. 12: No. 1, e27. May 2023, http://dx.doi.org/10.1561/116.00000001
Xiaoxue Gao, Chitralekha Gupta, Haizhou Li, "PoLyScriber: Integrated Fine-Tuning of Extractor and Lyrics Transcriber for Polyphonic Music," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 1968-1981, May 2023, DOI: 10.1109/TASLP.2023.3275036.
Yi Zhou, Zhizheng Wu, Mingyang Zhang, Xiaohai Tian, Haizhou Li, "TTS-Guided Training for Accent Conversion Without Parallel Data", in IEEE Signal Processing Letters, vol. 30, pp. 533-537, April 2023, DOI: 10.1109/LSP.2023.3270079.
Yi Zhou, Zhizheng Wu, Xiaohai Tian, Haizhou Li, Optimization of Cross-Lingual Voice Conversion With Linguistics Losses to Reduce Foreign Accents," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 1916-1926, April 2023, DOI: 10.1109/TASLP.2023.3271107.
Ruijie Tao, Kong Aik Lee, Rohan Kumar Das, Ville Hautamaki, Haizhou Li, "Self-Supervised Training of Speaker Encoder with Multi-Modal Diverse Positive Pairs" in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 1706-1719, April 2023, DOI: 10.1109/TASLP.2023.3268568.
Xinyuan Qian, Zhengdong Wang, Jiadong Wang, Guohui Guan, & Haizhou Li. "Audio-Visual Cross-Attention Network for Robotic Speaker Tracking", IEEE/ACM Transactions on Audio, Speech, and Language Processing. vol. 31, pp. 550-562 December 2022, DOI 10.1109/TASLP.2022.3226330.
Qiquan Zhang, Xinyuan Qian, Zhaoheng Ni, Aaron Nicolson, Eliathamby Ambikairajah, & Haizhou Li, "A Time-Frequency Attention Module for Neural Speech Enhancement", IEEE/ACM Transactions on Audio, Speech, and Language Processing. vol. 31, pp. 462-475, 2023, DOI: 10.1109/TASLP.2022.3225649.
Chen Zhang, Luis Fernando D'Haro, Qiquan Zhang, Thomas Friedrichs, Haizhou Li, "PoE: A Panel of Experts for Generalized Automatic Dialogue Assessment," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 1234-1250, March 2023, DOI: 10.1109/TASLP.2023.3250825
Kun Zhou, Berrak Sisman, Rajib Rana, B.W. Schuller, Haizhou Li, “Emotion Intensity and its Control for Emotional Voice Conversion”, in IEEE Transactions on Affective Computing, vol. 14, no. 1, pp. 31-48, 1 Jan.-March 2023, DOI: 10.1109/TAFFC.2022.3175578
Xiaoxue Gao, Chitralekha Gupta, Haizhou Li, "Automatic Lyrics Transcription of Polyphonic Music with Lyrics-Chords Multi-Task Learning", IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 30, pp. 2280-2294, June 2022.
Rui Liu, Berrak Sisman, Guanglai Gao, Haizhou Li, "Decoding Knowledge Transfer for Neural Text-to-Speech Training", IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 1789-1802, 2022, DOI: 10.1109/TASLP.2022.3171974. [link]
Z. Pan, R. Tao, C. Xu and H. Li, "Selective Listening by Synchronizing Speech With Lips," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 1650-1664, 2022, doi: 10.1109/TASLP.2022.3153258. [link]
Chen Zhang, Grandee Lee, Luis Fernando D’Haro, and Haizhou Li, “D-score: Holistic Dialogue Evaluation without Reference”, in IEEE/ACM Transactions on Audio, Speech and Language Processing, April 2021. [link]
Rui Liu, Berrak Sisman, Guanglai Gao and Haizhou Li, “Expressive TTS Training with Frame and Style Reconstruction Loss”, IEEE/ACM Transactions on Audio, Speech, and Language Processing, April 2021, pp. 1-13. [link]
Rui Liu, Berrak Sisman, Yixing Lin and Haizhou Li, “FastTalker: A Neural Text-to-Speech Architecture with Shallow and Group Autoregression”, Neural Networks, April 2021. [link]
Mingyang Zhang, Yi Zhou, Li Zhao, and Haizhou Li, “Transfer learning from speech synthesis to voice conversion with non-parallel training data,” in IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, March 2021, pp. 1290-1302. [link]
Rui Liu, Berrak Sisman, Feilong Bao, Jichen Yang, Guanglai Gao and Haizhou Li, “Exploiting morphological and phonological features to improve prosodic phrasing for Mongolian speech synthesis” IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 2021, pp. 274-285. [link]
Rui Liu, Berrak Sisman, Feilong Bao, Guanglai Gao and Haizhou Li, “Modeling Prosodic Phrasing with Multi-Task Learning in Tacotron-based TTS”, IEEE Signal Processing Letters, 27, 2020, pp. 1470-1474. [link]
Yi Zhou, Xiaohai Tian and Haizhou Li, “Multi-Task WaveRNN with an Integrated Architecture for Cross-lingual Voice Conversion”, IEEE Signal Processing Letters, 27, 2020, pp 1310-1314. [link]
Mingyang Zhang, Berrak Sisman, Li Zhao and Haizhou Li, “DeepConversion: Voice conversion with limited parallel training data”, Speech Communication, 122, 2020, pp. 31-43. [link]

Conference Articles

Danqing Luo, Chen Zhang, Yan Zhang, Haizhou Li, "CrossTune: Black-Box Few-Shot Classification with Label Enhancement" LREC-COLING 2024 - The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation Lingotto Conference Centre - Torino (Italia), 20-25 May, 2024
Siqi Cai, Ran zhang, and Haizhou Li, "Robust decoding of the auditory attention from EEG recordings through graph convolutional networks", IEEE International Conference on Acoustics, Speech and Signal Processing, 2024 (International Conference on Acoustics, Speech, & Signal Processing (ICASSP), in Seoul, Korea, 14-19 April 2024
Yidi Jiang, Zhengyang Chen, Ruijie Tao, Liqun Deng, Yanmin Qian, Haizhou Li, "Prompt-driven Target Speech Diarization", IEEE International Conference on Acoustics, Speech and Signal Processing, 2024 (International Conference on Acoustics, Speech, & Signal Processing (ICASSP), in Seoul, Korea, 14-19 April 2024
Yi Ma, Kong Aik Lee, Ville Hautamaki, Meng Ge, Haizhou Li, "Gradient weighting for speaker verification in extremely low Signal-to-Noise Ratio", IEEE International Conference on Acoustics, Speech and Signal Processing, 2024 (International Conference on Acoustics, Speech, & Signal Processing (ICASSP), in Seoul, Korea, 14-19 April 2024
Zeyang Song, Jibin Wu, Malu Zhang, Mike Zheng Shou, Haizhou Li, "Spiking-LEAF: A Learnable Auditory front-end for Spiking Neural Networks", IEEE International Conference on Acoustics, Speech and Signal Processing, 2024 (International Conference on Acoustics, Speech, & Signal Processing (ICASSP), in Seoul, Korea, 14-19 April 2024
Qu Yang∗, Qianhui Liu*, Nan Li, Meng Ge, Zeyang Song, Haizhou Li, "sVAD: A Robust, Low-Power, and Light-Weight Voice Activity Detection with Spiking Neural Networks", IEEE International Conference on Acoustics, Speech and Signal Processing, 2024 (International Conference on Acoustics, Speech, & Signal Processing (ICASSP), in Seoul, Korea, 14-19 April 2024
Chen Zhang, Luis Fernando D'Haro, Yiming Chen, Malu Zhang, Haizhou Li, "A Comprehensive Analysis of the Effectiveness of Large Language Models as Automatic Dialogue Evaluators" in the 38th Annual AAAI Conference on Artificial Intelligence (AAAI-24), Vancouver, Canada.
Jiadong Wang, Zexu Pan, Malu Zhang, Robby T. Tan, Haizhou Li, "Restoring Speaking Lips from Occlusion for Audio-Visual Speech Recognition" in the 38th Annual AAAI Conference on Artificial Intelligence (AAAI-24), Vancouver, Canada.
Shimin Zhang*, Qu Yang*, Chenxiang Ma, Jibin Wu, Haizhou Li, Kay Chen Tan, "TC-LIF: A Two-Compartment Spiking Neuron Model for Long-term Sequential Modelling" in the 38th Annual AAAI Conference on Artificial Intelligence (AAAI-24), Vancouver, Canada. (Accepted) (* Equal Contribution)
Yan Zhang, Zhaopeng Feng, Zhiyang Teng, Zuozhu Liu, Haizhou Li, "How Well Do Text Embedding Models Understand Syntax?" In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP), December 6–10, 2023, Singapore, Resorts World Convention Centre
Chen Zhang, Luis Fernando D'haro, Chengguang Tang, Ke Shi, Guohua Tang, Haizhou Li, "xDial-Eval: A Multilingual Open-Domain Dialogue Evaluation Benchmark" In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP), December 6–10, 2023, Singapore, Resorts World Convention Centre
Tianchi Liu, Kong Aik Lee, Qiongqiong Wang, Haizhou Li, “Disentangling Voice and Content with Self-Supervision for Speaker Recognition”, Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS 2023), December 10, 2023 – December 16, 2023, New Orleans, Louisiana, U.S.A
Siqi Cai, Jia Li, Hongmeng Yang, and Haizhou Li, " RGCnet: An Efficient Recursive Gated Convolutional Network for EEG-based Auditory Attention Detection", in 2023 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Sydney, Australia, July 24 to 27, 2023.
Yidi Jiang, Ruijie Tao, Zexu Pan, Haizhou Li, "Target Active Speaker Detection with Audio-visual Cues", in Proc. Interspeech 2023, Convention Centre Dublin, Ireland, August 20 to 24, 2023.
Jingru Lin, Xianghu Yue, Junyi Ao, Haizhou Li, "Self-Supervised Acoustic Word Embedding Learning via Correspondence Transformer Encoder", in Proc. Interspeech 2023, Convention Centre Dublin, Ireland, August 20 to 24, 2023.
Ke Zhang, Marvin Borsdorf, Zexu Pan, Haizhou Li, Yangjie Wei, Yi Wang, "Speaker Extraction with Detection of Presence and Absence of Target Speakers", in Proc. Interspeech 2023, Convention Centre Dublin, Ireland, August 20 to 24, 2023.
Ruicong Wang, Siqi Cai and Haizhou Li, "EEG-based Auditory Attention Detection with Spatiotemporal Graph and Graph Convolutional Network", in Proc. Interspeech 2023, Convention Centre Dublin, Ireland, August 20 to 24, 2023.
Rui Liu, Haolin Zuo, De Hu, Guanglai Gao, Haizhou Li, "Explicit Intensity Control for Accented Text-to-speech", in Proc. Interspeech 2023, Convention Centre Dublin, Ireland, August 20 to 24, 2023.
Rui Liu, Jinhua Zhang, Guanglai Gao, Haizhou Li, "Betray Oneself: A Novel Audio DeepFake Detection Model via Mono-to-Stereo Conversion", in Proc. Interspeech 2023, Convention Centre Dublin, Ireland, August 20 to 24, 2023.
Lu Junchen, Berrak Sisman, Mingyang Zhang, Haizhou Li, "High-Quality Automatic Voice Over with Accurate Alignment: Supervision through Self-Supervised Discrete Speech Units", in Proc. Interspeech 2023, Convention Centre Dublin, Ireland, August 20 to 24, 2023.
Yiming Chen, Simin Chen, Zexin Li, Wei Yang, Cong Liu, Robby T. Tan, Haizhou Li, "Dynamic Transformers Provide a False Sense of Efficiency", Annual Meeting of the Association for Computational Linguistics (ACL’23) in Toronto, Canada, July 9 to 14, 2023.
Jiadong Wang, Xinyuan Qian, Malu Zhang, Robby T. Tan, Haizhou Li, "Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert", Computer Vision and Pattern Recognition Conference (CVPR) in Vancouver, Canada. June 18 to 22, 2023.
Jiawei Du*, Yidi Jiang*, Vincent TF Tan, Joey Tianyi Zhou, Haizhou Li (*equal contribution), "Minimizing the Accumulated Trajectory Error to Improve Dataset Distillation", Computer Vision and Pattern Recognition Conference (CVPR) in Vancouver, Canada. June 18 to 22, 2023.
Marvin Borsdorf, Saurav Pahuja, Gabriel Ivucic, Siqi Cai, Haizhou Li, and Tanja Schultz, "Multi-Head Attention and GRU for Improved Match-Mismatch Classification of Speech Stimulus and EEG Response", IEEE International Conference on Acoustics, Speech and Signal Processing, 2023 (International Conference on Acoustics, Speech, & Signal Processing (ICASSP), in Rhodes Island, Greece, June 4 - 10, 2023
Ruijie Tao, Kong Aik Lee, Zhan Shi, Haizhou Li, "Speaker recognition with two-step multi-modal deep cleansing", IEEE International Conference on Acoustics, Speech and Signal Processing, 2023 (International Conference on Acoustics, Speech, & Signal Processing (ICASSP), in Rhodes Island, Greece, June 4 - 10, 2023
Xianghu Yue, Junyi Ao, Xiaoxue Gao, Haizhou Li, "Token2vec: A Joint Self-Supervised Pre-training Framework Using Unpaired Speech and Text", IEEE International Conference on Acoustics, Speech and Signal Processing, 2023 (International Conference on Acoustics, Speech, & Signal Processing (ICASSP), in Rhodes Island, Greece, June 4 - 10, 2023
Qiquan Zhang, Hongxu Zhu, Qi Song, Xinyuan Qian, Zhaoheng Ni, Haizhou Li, "RIPPLE SPARSE SELF-ATTENTION FOR MONAURAL SPEECH ENHANCEMENT", IEEE International Conference on Acoustics, Speech and Signal Processing, 2023 (International Conference on Acoustics, Speech, & Signal Processing (ICASSP), in Rhodes Island, Greece, June 4 - 10, 2023
Xiaoxue Gao, Xianghu Yue and Haizhou Li, "Self-Transriber: Few-shot Lyrics Transcription with Self-training", IEEE International Conference on Acoustics, Speech and Signal Processing, 2023 (International Conference on Acoustics, Speech, & Signal Processing (ICASSP), in Rhodes Island, Greece, June 4 - 10, 2023
Zexu Pan, Wupeng Wang, Marvin Borsdorf, Haizhou Li, "ImagineNET: Target Speaker Extraction with Intermittent Visual Cue through Embedding Inpainting", IEEE International Conference on Acoustics, Speech and Signal Processing, 2023 (International Conference on Acoustics, Speech, & Signal Processing (ICASSP), in Rhodes Island, Greece, June 4 - 10, 2023
Haolin Zuo, Rui Liu, Jinming Zhao, Guanglai Gao and Haizhou Li, "Exploiting modality-invariant feature for robust multimodal emotion recognition with missing modalities", IEEE International Conference on Acoustics, Speech and Signal Processing, 2023 (International Conference on Acoustics, Speech, & Signal Processing (ICASSP), in Rhodes Island, Greece, June 4 - 10, 2023
Peiwen Li, Enze Su, Jia Li, Siqi Cai, Longhan Xie, and Haizhou Li, "ESAA: An Eeg-Speech Auditory Attention Detection Database", 25th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA), Hanoi, Vietnam, November 24-26, 2022, pp. 1-6, doi: 10.1109/O-COCOSDA202257103.2022.9997944
Rui Liu, Berrak Sisman, Bj ̈orn W. Schuller, Guanglai Gao, Haizhou Li, "Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on Data-Driven Deep Learning", in Proc. Interspeech 2022, Songdo ConvensiA, in Incheon, Korea, September 18 to 22, 2022.
Zongyang Du, Berrak Sisman, Kun Zhou and Haizhou Li, “Disentanglement of Emotional Style and Speaker Identity for Expressive Voice Conversion”, in Proc. Interspeech 2022, Songdo ConvensiA, in Incheon, Korea, September 18 to 22, 2022.
Bin Wang, C.-C. Jay Kuo, and Haizhou Li, "Rethinking Evaluation with Word and Sentence Similarities". In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6060–6077, May 22-27, 2022, Dublin, (Ireland). [link]
Chen Zhang, Luis Fernando D’Haro, Thomas Friedrichs and Haizhou Li, “MDD-Eval: Self-Training on Augmented Data for Multi-Domain Dialogue Evaluation”, In. Proc. Thirty-Six AAAI Conference on Artificial Intelligence (AAAI-22), Virtual Event, 2022.
Bidisha Sharma, Maulik Madhavi, Xuehao Zhou, and Haizhou Li, “Exploring Teacher-Student Learning Approach for Multi-lingual Speech-to-Intent Classification”, in Proc. IEEE Automatic Speech Recognition and Understanding (ASRU) Workshop, Cartagena, Colombia, September 2021.
Yi Ma, Kong Aik Lee, Ville Hautamaki, and Haizhou Li, “PL-EESR: Perceptual Loss Based End-to-End Robust Speaker Representation Extraction”, in Proc. IEEE Automatic Speech Recognition and Understanding (ASRU) Workshop, Cartagena, Colombia, September 2021.
Yan Zhang, Ruidan He, Zuozhu Liu, Lidong Bing, and Haizhou Li, “Bootstrapped Unsupervised Sentence Representation Learning”, ACL, August 2021, pp. 5168–5180. [link]
Yidi Jiang, Bidisha Sharma, Maulik Madhavi, and Haizhou Li, “Knowledge Distillation from BERT Transformer to Speech Transformer for Intent Classification” in Proc. Interspeech 2021, Brno, Czech Republic, August 2021. [link]
Jiadong Wang, Xinyuan Qian, Zihan Pan, Malu Zhang, and Haizhou Li, “GCC-PHAT with Speech-oriented Attention for Robotic Sound Source Localization”, in Proc. IEEE International Conference on Robotics and Automation (ICRA), Xian, China, 2021.
Chen Zhang, Yiming Chen, Luis Fernando D’Haro, Yan Zhang, Thomas Friedrichs, Grandee Lee and Haizhou Li, “DynaEval: Unifying Turn and Dialogue Level Evaluation”, in Proc. Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP), August 2021. [link]
Kun Zhou, Berrak Sisman, and Haizhou Li, “VAW-GAN for disentanglement and recomposition of emotional elements in speech,” in Proc. IEEE Spoken Language Technology (SLT), Shenzhen, China, January 2021. [link]
Hongqiang Du, Xiaohai Tian, Lei Xie, and Haizhou Li, “Optimizing voice conversion network with cycle consistency loss of speaker identity” in Proc. IEEE Spoken Language Technology (SLT), Shenzhen, China, January 2021. [link]
Zongyang Du, Kun Zhou, Berrak Sisman, and Haizhou Li, “Spectrum And Prosody Conversion for Cross-Lingual Voice Conversion with Cyclegan”, in Proc. Asia-Pacific Signal and Information Processing Association (APSIPA) Annual Summit and Conference (ASC), Auckland, New Zealand, December 2020, pp. 507-513. [link]
Junchen Lu, Kun Zhou, Berrak Sisman, and Haizhou Li, ” VAW-GAN for Singing Voice Conversion with Non-parallel Training Data”, in Proc. Asia-Pacific Signal and Information Processing Association (APSIPA) Annual Summit and Conference (ASC), Auckland, New Zealand, December 2020, pp. 514-519. [link]
Yi Zhou, Xiaohai Tian, Xuehao Zhou, Mingyang Zhang, Grandee Lee, Rui Liu, Berrak Sisman, and Haizhou Li, “NUS-HLT System for Blizzard Challenge 2020”, in Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge, Shanghai, China, October 2020, pp. 44-48. [link]
Xiaohai Tian, Zhichao Wang, Shan Yang, Xinyong Zhou, Hongqiang Du, Yi Zhou, Mingyang Zhang, Kun Zhou, Berrak Sisman, Lei Xie, and Haizhou Li, “The NUS & NWPU system for Voice Conversion Challenge 2020”, in Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, Shanghai, China, October 2020, pp. 170-174. [link]
Xinyuan Zhou, Emre Yılmaz, Yanhua Long, Yijie Li and Haizhou Li, “Multi-Encoder-Decoder Transformer for Code-Switching Speech Recognition,” in Proc. INTERSPEECH, Shanghai, China, October 2020, pp. 1042-1046. [link]
Xinyuan Zhou, Grandee Lee, Emre Yılmaz, Yanhua Long, Jiaen Liang and Haizhou Li, “Self-and-Mixed Attention Decoder with Deep Acoustic Structure for Transformer-based LVCSR,” in Proc. INTERSPEECH, Shanghai, China, October 2020, pp. 5016-5020. [link]
Nana Hou, Chenglin Xu, Joey Tianyi Zhou, Eng Siong Chng and Haizhou Li, “Multi-task Learning for End-to-end Noise-robust Bandwidth Extension”, in Proc. INTERSPEECH, Shanghai, China, October 2020, pp. 4069-4073. [link]
Nana Hou, Chenglin Xu, Van Tung Pham, Joey Tianyi Zhou, Eng Siong Chng and Haizhou Li, “Speaker and Phoneme-Aware Speech Bandwidth Extension with Residual Dual-Path Network”, in Proc. INTERSPEECH, Shanghai, China, October 2020, pp. 4064-4068. [link]
Grandee Lee and Haizhou Li, “Modeling Code-Switch Languages Using Bilingual Parallel Corpus”, in Association for Computational Linguistics, July 2020, pp. 860-870. [link]
Grandee Lee, Xianghu Yue, Haizhou Li, “Linguistically Motivated Parallel Data Augmentation for Code-switch Language Modeling”, in Proc. INTERSPEECH, Graz, Austria, September 2019, pp. 3730-3734. [link]
Xianghu Yue, Grandee Lee, Emre Yılmaz, Fang Deng and Haizhou Li, “End-to-End Code-Switching ASR for Low-Resourced Language Pairs”, in Proc. IEEE Automatic Speech Recognition Understanding (ASRU) Workshop 2019, Sentosa Island, Singapore, September 2019, pp. 972-979. [link]
Berrak Sisman and Haizhou Li, “Generative Adversarial Networks for Singing Voice Conversion with and without Parallel Data” in Proc. Speaker Odyssey 2020, Tokyo, Japan, November 2020, pp. 238-244. [link]
Kun Zhou, Berrak Sisman and Haizhou Li, “Transforming Spectrum and Prosody for Emotional Voice Conversion with Non-Parallel Training Data” in Proc. Speaker Odyssey 2020, Tokyo, Japan, November 2020, pp. 230-237. [link]
Rui Liu, Berrak Sisman, Feilong Bao, Guanglai Gao and Haizhou Li, “WaveTTS: Tacotron-based TTS with Joint Time-Frequency Domain Loss” in Proc. Speaker Odyssey 2020, Tokyo, Japan, November 2020, pp. 245-251. [link]
Xiaoxue Gao, Xiaohai Tian, Yi Zhou, Rohan Kumar Das and Haizhou Li, “Personalized Singing Voice Generation Using WaveRNN” in Proc. Speaker Odyssey 2020, Tokyo, Japan, November 2020, pp. 252-258. [link]
Bidisha Sharma, Rohan Kumar Das and Haizhou Li, “On the Importance of Audio-source Separation for Singer Identification in Polyphonic Music”, in Proc. INTERSPEECH, Graz, Austria, September 2019, pp. 2020-2024. [link]
Bidisha Sharma, Rohan Kumar Das and Haizhou Li, “Multi-level Adaptive Speech Activity Detector for Speech in Naturalistic Environments”, in Proc. INTERSPEECH, Graz, Austria, September 2019, pp. 2015-2019. [link]
Yi Zhou, Xiaohai Tian, Emre Yılmaz, Rohan Kumar Das and Haizhou Li, “A Modularized Neural Network with Language-Specific Output Layers for Cross-Lingual Voice Conversion”, in Proc. IEEE Automatic Speech Recognition Understanding (ASRU) Workshop, Sentosa Island, Singapore, September 2019, pp. 160-167. [link]
Emre Yılmaz, Samuel Cohen, Xianghu Yue, David van Leeuwen and Haizhou Li, “Multi-Graph Decoding for Code-Switching ASR”, in Proc. INTERSPEECH, Graz, Austria, September 2019, pp. 3750-3754. [link]
Qinyi Wang, Emre Yılmaz, Adem Derinel and Haizhou Li, “Code-Switching Detection Using ASR-Generated Language Posteriors”, in Proc. INTERSPEECH, Graz, Austria, September 2019, pp. 3740-3744. [link]

Return to Project Lists