{"id":12914,"date":"2022-06-14T16:20:44","date_gmt":"2022-06-14T08:20:44","guid":{"rendered":"https:\/\/old-cde.nus.edu.sg\/ece\/?p=12914"},"modified":"2024-07-31T18:25:56","modified_gmt":"2024-07-31T10:25:56","slug":"project-3","status":"publish","type":"post","link":"https:\/\/cde.nus.edu.sg\/ece\/project-3\/","title":{"rendered":"Project 1"},"content":{"rendered":"\n<h2>\n\t\tHuman Robot Interaction Phase 1\u00a0\n\t<\/h2>\n\t<p>In this project, we study novel algorithms that integrate machine listening intelligence into robotic audition that includes audio-visual sound localization to provide both accurate direction and distance information; speaker-recognition for robots to only respond to intended speakers; and integrated audition solutions for far-field speech acquisition. In the meantime, we will also investigate a novel end-to-end speech-to-action conversion. In this way, we allow the users to speak to the robots in free form speech and even in different languages<\/p>\n<p>Project Duration: 1 October 2019 &#8211; 30 March 2023.<\/p>\n<p>Funding Source: National Robotics Programme &#8211; Robotic Enabling Capabilities and Technologies, Grant No. 192 25 00054.<\/p>\n<p>Acknowledgment: This work was supported by the Science and Engineering Research Council, Agency of Science, Technology and Research, Singapore, through the National Robotics Program under Grant No. 192 25 00054.<\/p>\n<p><strong>PUBLICATIONS<\/strong><\/p>\n<p><strong>Journal Articles<\/strong><\/p>\n<ul>\n<li>Kun Zhou; Berrak Sisman; Rajib Rana; Bj\u00f6rn W. Schuller; Haizhou Li, &#8220;Speech Synthesis With Mixed Emotions,&#8221; in\u00a0<em>IEEE Transactions on Affective Computing<\/em>, vol. 14, no. 4, pp. 3120-3134, 1 Oct.-Dec. 2023, doi: 10.1109\/TAFFC.2022.3233324<\/li>\n<li>Xinyuan Qian, Zhengdong Wang, Jiadong Wang, Guohui Guan, &amp; Haizhou Li. &#8220;Audio-Visual Cross-Attention Network for Robotic Speaker Tracking&#8221;, IEEE\/ACM Transactions on Audio, Speech, and Language Processing. vol. 31, pp. 550-562 2022, DOI 10.1109\/TASLP.2022.3226330.<\/li>\n<li>Qiquan Zhang, Xinyuan Qian, Zhaoheng Ni, Aaron Nicolson, Eliathamby Ambikairajah, &amp; Haizhou Li, &#8220;A Time-Frequency Attention Module for Neural Speech Enhancement&#8221;, IEEE\/ACM Transactions on Audio, Speech, and Language Processing. vol. 31, pp. 462-475, 2023, DOI: 10.1109\/TASLP.2022.3225649.<\/li>\n<li>Zexu Pan, Meng Ge, Haizhou Li, &#8220;USEV: Universal Speaker Extraction with Visual Cue&#8221;, in IEEE\/ACM Transactions on Audio, Speech and Language Processing, vol. 30, pp. 3032-3045, 2022, DOI 10.1109\/TASLP.2022.3205759. [<a href=\"https:\/\/ieeexplore.ieee.org\/abstract\/document\/9887809\">link<\/a>]<\/li>\n<li>Kun Zhou, Berrak Sisman, Rajib Rana, B.W. Schuller, Haizhou Li, &#8220;Emotion Intensity and its Control for Emotional Voice Conversion&#8221;, IEEE Transactions on Affective Computing, 2022, DOI 10.1109\/TAFFC.2022.3175578\u00a0[Article In-Process] [<a href=\"https:\/\/ieeexplore.ieee.org\/document\/9778970\">link<\/a>]<\/li>\n<li>Z. Pan, X. Qian and H. Li, &#8220;Speaker Extraction with Co-Speech Gestures Cue&#8221;, in IEEE Signal Processing Letters, vol. 29, pp. 1467-1471, 2022, doi: 10.1109\/LSP.2022.3175130 [<a href=\"https:\/\/ieeexplore.ieee.org\/document\/9774925\">link<\/a>]<\/li>\n<li>Rui Liu, Berrak Sisman, Guanglai Gao, Haizhou Li, &#8220;Decoding Knowledge Transfer for Neural Text-to-Speech Training&#8221;, IEEE\/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 1789-1802, 2022, DOI: 10.1109\/TASLP.2022.3171974. [<a href=\"https:\/\/ieeexplore.ieee.org\/document\/9767637\">link<\/a>]<\/li>\n<li>Z. Pan, R. Tao, C. Xu and H. Li, &#8220;Selective Listening by Synchronizing Speech With Lips,&#8221; in IEEE\/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 1650-1664, 2022, doi: 10.1109\/TASLP.2022.3153258.\u00a0[<a href=\"https:\/\/ieeexplore.ieee.org\/abstract\/document\/9721129\">link<\/a>]<\/li>\n<li>Chenglin Xu, Wei Rao, Jibin Wu, and Haizhou Li, &#8220;Target Speaker Verification with Selective Auditory Attention for Single and Multi-talker Speech&#8221;, IEEE \/ ACM Transactions on Audio, Speech, and Language Processing, July 2021. [<a href=\"https:\/\/arxiv.org\/pdf\/2103.16269.pdf\">link<\/a>]<\/li>\n<li>Qu Yang, Jibin Wu, and Haizhou Li, &#8220;Rethinking Benchmarks for Neuromorphic Learning Algorithms&#8221;, The International Joint Conference on Neural Networks (IJCNN), Virtual Event, July 2021. [<a href=\"https:\/\/scholarbank.nus.edu.sg\/handle\/10635\/190081\">link<\/a>] [Article In-process]<\/li>\n<li>Xinyuan Qian, Qi Liu, Jiadong Wang, and Haizhou Li, &#8220;Three-dimensional Speaker Localization: Audio-refined Visual Scaling Factor Estimation&#8221;, IEEE Signal Processing Letters, July 2021. [<a href=\"https:\/\/ieeexplore.ieee.org\/stamp\/stamp.jsp?tp=&amp;arnumber=9466446\">link<\/a>] [Article In-process]<\/li>\n<li>Chen Zhang, Grandee Lee, Luis Fernando D&#8217;Haro, and Haizhou Li, &#8220;D-score: Holistic Dialogue Evaluation without Reference&#8221;, in IEEE\/ACM Transactions on Audio, Speech and Language Processing, April 2021. [<a href=\"https:\/\/ieeexplore.ieee.org\/stamp\/stamp.jsp?tp=&amp;arnumber=9409633\">link<\/a>] [Article In-process]<\/li>\n<li>Rui Liu, Berrak Sisman, Guanglai Gao and Haizhou Li, &#8220;Expressive TTS Training with Frame and Style Reconstruction Loss&#8221;, IEEE\/ACM Transactions on Audio, Speech, and Language Processing, April 2021, pp. 1-13. [<a href=\"https:\/\/arxiv.org\/pdf\/2008.01490.pdf\">link<\/a>] [Article In-process]<\/li>\n<li>Rui Liu, Berrak Sisman, Yixing Lin and Haizhou Li, &#8220;FastTalker: A Neural Text-to-Speech Architecture with Shallow and Group Autoregression&#8221;,\u00a0Neural Networks, April 2021. [<a href=\"https:\/\/www.sciencedirect.com\/science\/article\/abs\/pii\/S0893608021001532\">link<\/a>] [Article In-process]<\/li>\n<li>Mingyang Zhang, Yi Zhou, Li Zhao, and Haizhou Li, &#8220;Transfer learning from speech synthesis to voice conversion with non-parallel training data,&#8221; in IEEE\/ACM Transactions on Audio, Speech, and Language Processing, 29, March 2021, pp. 1290-1302. [<a href=\"https:\/\/ieeexplore.ieee.org\/stamp\/stamp.jsp?tp=&amp;arnumber=9380685\">link<\/a>] [Article In-process]<\/li>\n<li>Jichen Yang, Hongji Wang, Rohan Kumar Das, and Yanmin Qian, &#8220;Modified Magnitude-phase Spectrum Information for Spoofing Detection&#8221;, in IEEE\/ACM Transactions on Audio, Speech and Language Processing, 29, February 2021, pp. 1065-1078. [<a href=\"https:\/\/ieeexplore.ieee.org\/document\/9360468\">link<\/a>] [Article In-process]<\/li>\n<li>Zhixuan Zhang and Qi Liu, &#8220;Spike-event-driven deep spiking neural network with temporal encoding&#8221;, IEEE Signal Processing Letters, 28, 2021, pp. 484-488. [<a href=\"https:\/\/ieeexplore.ieee.org\/document\/9354570?source=authoralert\">link<\/a>]\u00a0[Article In-process]<\/li>\n<li>Berrak Sisman, Junichi Yamagishi, Simon King, and Haizhou Li, &#8220;An Overview of Voice Conversion and its Challenges: From Statistical Modeling to Deep Learning&#8221;, IEEE\/ACM Transactions on Audio, Speech, and Language Processing, 29, 2021, pp. 132-157. [<a href=\"https:\/\/ieeexplore.ieee.org\/stamp\/stamp.jsp?arnumber=9262021\">link<\/a>] [Article In-process]<\/li>\n<li>Rui Liu, Berrak Sisman, Feilong Bao, Jichen Yang, Guanglai Gao and Haizhou Li, &#8220;Exploiting morphological and phonological features to improve prosodic phrasing for Mongolian speech synthesis&#8221; IEEE\/ACM Transactions on Audio, Speech, and Language Processing, 29, 2021,\u00a0 pp. 274-285. [<a href=\"https:\/\/ieeexplore.ieee.org\/stamp\/stamp.jsp?tp=&amp;arnumber=9271923\">link<\/a>] [Article In-process]<\/li>\n<li>Qi Liu and Jibin Wu, &#8220;Parameter tuning-free missing-feature reconstruction for robust sound recognition&#8221;, IEEE Journal of Selected Topics in Signal Processing, 15(1), January 2021, pp. 78-89. [<a href=\"https:\/\/ieeexplore.ieee.org\/document\/9259032\">link<\/a>]<\/li>\n<li>Yi Zhou, Xiaohai Tian and Haizhou Li, &#8220;Multi-Task WaveRNN with an Integrated Architecture for Cross-lingual Voice Conversion&#8221;, IEEE Signal Processing Letters, 27, 2020, pp 1310-1314. [<a href=\"https:\/\/ieeexplore.ieee.org\/stamp\/stamp.jsp?tp=&amp;arnumber=9143435\">link<\/a>]<\/li>\n<li>Mingyang Zhang, Berrak Sisman, Li Zhao and Haizhou Li, &#8220;DeepConversion: Voice conversion with limited parallel training data&#8221;, Speech Communication, 122, 2020, pp. 31-43. [<a href=\"https:\/\/www.sciencedirect.com\/science\/article\/pii\/S0167639320302296?via%3Dihub\">link<\/a>]<\/li>\n<\/ul>\n<p><strong>Conference Articles<\/strong><\/p>\n<ul>\n<li>Peiwen Li, Enze Su, Jia Li, Siqi Cai, Longhan Xie, and Haizhou Li, &#8220;ESAA: An Eeg-Speech Auditory Attention Detection Database&#8221;, 25th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA), Hanoi, Vietnam, November 24-26, 2022, pp. 1-6, doi: 10.1109\/O-COCOSDA202257103.2022.9997944.<\/li>\n<li>Rui Liu, Berrak Sisman, Bj \u0308orn W. Schuller, Guanglai Gao, Haizhou Li, &#8220;Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on Data-Driven Deep Learning&#8221;, in Proc. Interspeech 2022, Songdo ConvensiA, in Incheon, Korea, September 18 to 22, 2022<\/li>\n<li>Zexu Pan, Meng Ge, Haizhou Li, &#8220;A Hybrid Continuity Loss to Reduce Over-Suppression for Time-domain Target Speaker Extraction&#8221;, in Proc. Interspeech 2022, Songdo ConvensiA, in Incheon, Korea, September 18 to 22, 2022.<\/li>\n<li>Zongyang Du, Berrak Sisman, Kun Zhou and Haizhou Li, &#8220;Disentanglement of Emotional Style and Speaker Identity for Expressive Voice Conversion&#8221;, in Proc. Interspeech 2022, Songdo ConvensiA, in Incheon, Korea, September 18 to 22, 2022.<\/li>\n<li>Bin Wang, C.-C. Jay Kuo, and Haizhou Li, &#8220;Rethinking Evaluation with Word and Sentence Similarities&#8221;. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6060-6077, May 22-27, 2022, Dublin, (Ireland). [<a href=\"https:\/\/aclanthology.org\/2022.acl-long.419.pdf\">link<\/a>]<\/li>\n<li>Chen Zhang, Luis Fernando D&#8217;Haro, Thomas Friedrichs and Haizhou Li, &#8220;MDD-Eval: Self-Training on Augmented Data for Multi-Domain Dialogue Evaluation&#8221;, In. Proc. Thirty-Six AAAI Conference on Artificial Intelligence (AAAI-22), Virtual Event, 2022.<\/li>\n<li>Yan Zhang, Ruidan He, Zuozhu Liu, Lidong Bing, and Haizhou Li, &#8220;Bootstrapped Unsupervised Sentence Representation Learning&#8221;, ACL, August 2021, pp. 5168-5180.\u00a0[<a href=\"https:\/\/aclanthology.org\/2021.acl-long.402.pdf\">link<\/a>]<\/li>\n<li>Xinyuan Qian, Bidisha Sharma, Amine El Abridi and Haizhou Li, &#8220;SLoClas: A DATABASE FOR JOINT SOUND LOCALIZATION AND CLASSIFICATION&#8221;, in Proc. O-COCOSDA 2021, 18-20 November 2021, Singapore.\u00a0<strong>Best Paper Award<\/strong>. [<a href=\"https:\/\/www.colips.org\/conferences\/cocosda2021\/wp\/best-paper-awards\/\">link<\/a>]<\/li>\n<li>Yi Ma, Kong Aik Lee, Ville Hautamaki, and Haizhou Li, &#8220;PL-EESR: Perceptual Loss Based End-to-End Robust Speaker Representation Extraction&#8221;, in Proc. IEEE Automatic Speech Recognition and Understanding (ASRU) Workshop, Cartagena, Colombia, September 2021.<\/li>\n<li>Jiadong Wang, Xinyuan Qian, Zihan Pan, Malu Zhang, and Haizhou Li, &#8220;GCC-PHAT with Speech-oriented Attention for Robotic Sound Source Localization&#8221;, in Proc. IEEE International Conference on Robotics and Automation (ICRA), Xian, China, 2021.<\/li>\n<li>Chen Zhang, Yiming Chen, Luis Fernando D&#8217;Haro, Yan Zhang, Thomas Friedrichs, Grandee Lee and Haizhou Li, &#8220;DynaEval: Unifying Turn and Dialogue Level Evaluation&#8221;, in Proc.\u00a0Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP), August 2021. [<a href=\"https:\/\/arxiv.org\/pdf\/2106.01112.pdf\">link<\/a>]<\/li>\n<li>Rohan Kumar Das, Jichen Yang, and Haizhou Li, &#8220;Data Augmentation with Signal Companding for Detection of Logical Access Attacks&#8221; in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Toronto, Ontario, Canada, June 2021. [<a href=\"https:\/\/arxiv.org\/pdf\/2102.06332.pdf\">link<\/a>]<\/li>\n<li>Kun Zhou, Berrak Sisman, and Haizhou Li, &#8220;VAW-GAN for disentanglement and recomposition of emotional elements in speech,&#8221; in Proc. IEEE Spoken Language Technology (SLT), Shenzhen, China, January 2021. [<a href=\"https:\/\/arxiv.org\/pdf\/2011.02314.pdf\">link<\/a>]<\/li>\n<li>Hongqiang Du, Xiaohai Tian, Lei Xie, and Haizhou Li, &#8220;Optimizing voice conversion network with cycle consistency loss of speaker identity&#8221; in Proc. IEEE Spoken Language Technology (SLT), Shenzhen, China, January 2021. [<a href=\"https:\/\/arxiv.org\/pdf\/2011.08548.pdf\">link<\/a>]<\/li>\n<li>Meidan Ouyang, Rohan Kumar Das, Jichen Yang and Haizhou Li, &#8220;Capsule Network based End-to-end System for Detection of Replay Attacks&#8221;, in Proc. International Symposium on Chinese Spoken Language Processing (ISCSLP) 2021, Hong Kong, January 2021, pp. 1-5. [<a href=\"https:\/\/ieeexplore.ieee.org\/abstract\/document\/9362111\">link<\/a>]<\/li>\n<li>Rohan Kumar Das and Haizhou Li, &#8220;Classification of Speech with and without Face Mask using Acoustic Features&#8221; in Proc. Asia-Pacific Signal and Information Processing Association (APSIPA) Annual Summit and Conference (ASC), Auckland, New Zealand, December 2020, pp. 747-752. [<a href=\"http:\/\/www.apsipa.org\/proceedings\/2020\/pdfs\/0000747.pdf\">link<\/a>]<\/li>\n<li>Rohan Kumar Das, Ruijie Tao, Jichen Yang, Wei Rao, Cheng Yu, and Haizhou Li, &#8220;HLT-NUS Submission for NIST 2019 Multimedia Speaker Recognition Evaluation&#8221;, in Proc. Asia-Pacific Signal and Information Processing Association (APSIPA) Annual Summit and Conference (ASC), Auckland, New Zealand, December 2020, pp. 605-609. [<a href=\"http:\/\/www.apsipa.org\/proceedings\/2020\/pdfs\/0000605.pdf\">link<\/a>]<\/li>\n<li>Junchen Lu, Kun Zhou, Berrak Sisman, and Haizhou Li, &#8221; VAW-GAN for Singing Voice Conversion with Non-parallel Training Data&#8221;, in Proc. Asia-Pacific Signal and Information Processing Association (APSIPA) Annual Summit and Conference (ASC), Auckland, New Zealand, December 2020, pp. 514-519. [<a href=\"http:\/\/www.apsipa.org\/proceedings\/2020\/pdfs\/0000514.pdf\">link<\/a>]<\/li>\n<li>Zongyang Du, Kun Zhou, Berrak Sisman, and Haizhou Li, &#8220;Spectrum And Prosody Conversion for Cross-Lingual Voice Conversion with Cyclegan&#8221;, in Proc. Asia-Pacific Signal and Information Processing Association (APSIPA) Annual Summit and Conference (ASC), Auckland, New Zealand, December 2020, pp. 507-513. [<a href=\"http:\/\/www.apsipa.org\/proceedings\/2020\/pdfs\/0000507.pdf\">link<\/a>]<\/li>\n<li>Biswajit Dev Sarma and Rohan Kumar Das, &#8220;Emotion Invariant Speaker Embeddings for Speaker Identification with Emotional Speech&#8221; in Proc. Asia-Pacific Signal and Information Processing Association (APSIPA) Annual Summit and Conference (ASC), Auckland, New Zealand, December 2020, pp. 610-615. [<a href=\"http:\/\/www.apsipa.org\/proceedings\/2020\/pdfs\/0000610.pdf\">link<\/a>]<\/li>\n<li>Yi Zhou, Xiaohai Tian, Xuehao Zhou, Mingyang Zhang, Grandee Lee, Rui Liu, Berrak Sisman, and Haizhou Li, &#8220;NUS-HLT System for Blizzard Challenge 2020&#8221;, in Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge, Shanghai, China, October 2020, pp. 44-48. [<a href=\"https:\/\/www.isca-speech.org\/archive\/VCC_BC_2020\/pdfs\/VCC2020_paper_6.pdf\">link<\/a>]<\/li>\n<li>Xiaohai Tian, Zhichao Wang, Shan Yang, Xinyong Zhou, Hongqiang Du, Yi Zhou, Mingyang Zhang, Kun Zhou, Berrak Sisman, Lei Xie, and Haizhou Li, &#8220;The NUS &amp; NWPU system for Voice Conversion Challenge 2020&#8221;, in Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, Shanghai, China, October 2020, pp. 170-174. [<a href=\"https:\/\/www.isca-speech.org\/archive\/VCC_BC_2020\/pdfs\/VCC2020_paper_33.pdf\">link<\/a>]<\/li>\n<li>Zhao Yi, Wen-Chin Huang, Xiaohai Tian, Junichi Yamagishi, Rohan Kumar Das, Tomi Kinnunen, Zhenhua Ling, and Tomoki Toda, &#8220;Voice Conversion Challenge 2020 &#8211; Intra-lingual semi-parallel and cross-lingual voice conversion -&#8220;, in Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge, Shanghai, China, October 2020, pp. 80-98. [<a href=\"https:\/\/www.isca-speech.org\/archive\/VCC_BC_2020\/pdfs\/VCC2020_paper_13.pdf\">link<\/a>]<\/li>\n<li>Rohan Kumar Das, Tomi Kinnunen, Wen-Chin Huang, Zhenhua Ling, Junichi Yamagishi, Yi Zhao, Xiaohai Tian, and Tomoki Toda, &#8220;Predictions of Subjective Ratings and Spoofing Assessments of Voice Conversion Challenge 2020 Submissions&#8221;, in Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge, Shanghai, China, October 2020, pp. 99-120. [<a href=\"https:\/\/www.isca-speech.org\/archive\/VCC_BC_2020\/pdfs\/VCC2020_paper_34.pdf\">link<\/a>]<\/li>\n<li>Xinyuan Zhou, Emre Y\u0131lmaz, Yanhua Long, Yijie Li and Haizhou Li, &#8220;Multi-Encoder-Decoder Transformer for Code-Switching Speech Recognition,&#8221; in Proc. INTERSPEECH, Shanghai, China, October 2020, pp. 1042-1046. [<a href=\"https:\/\/www.isca-speech.org\/archive\/Interspeech_2020\/pdfs\/2488.pdf\">link<\/a>]<\/li>\n<li>Xinyuan Zhou, Grandee Lee, Emre Y\u0131lmaz, Yanhua Long, Jiaen Liang and Haizhou Li, &#8220;Self-and-Mixed Attention Decoder with Deep Acoustic Structure for Transformer-based LVCSR,&#8221; in Proc. INTERSPEECH, Shanghai, China, October 2020, pp. 5016-5020. [<a href=\"https:\/\/www.isca-speech.org\/archive\/Interspeech_2020\/pdfs\/2556.pdf\">link<\/a>]<\/li>\n<li>Kun Zhou, Berrak Sisman, Mingyang Zhang and Haizhou Li, &#8220;Converting Anyone&#8217;s Emotion: Towards Speaker-Independent Emotional Voice Conversion&#8221;, in Proc. INTERSPEECH, Shanghai, China, October 2020, pp. 3416-3420. [<a href=\"https:\/\/www.isca-speech.org\/archive\/Interspeech_2020\/pdfs\/2014.pdf\">link<\/a>]<\/li>\n<li>Shoufeng Lin and Xinyuan Qian, &#8220;Audio-Visual Multi-Speaker Tracking Based On the GLMB Framework&#8221;, in Proc. INTERSPEECH, Shanghai, China, October 2020, pp. 3082-3086. [<a href=\"https:\/\/www.isca-speech.org\/archive\/Interspeech_2020\/pdfs\/1969.pdf\">link<\/a>]<\/li>\n<li>Xiaoyi Qin, Ming Li, Hui Bu, Wei Rao, Rohan Kumar Das, Shrikanth Narayanan and Haizhou Li, &#8220;The INTERSPEECH 2020 Far-Field Speaker Verification Challenge&#8221;, in Proc. INTERSPEECH, Shanghai, China, October 2020, pp. 3456-3460. [<a href=\"https:\/\/www.isca-speech.org\/archive\/Interspeech_2020\/pdfs\/1249.pdf\">link<\/a>]<\/li>\n<li>Zhenzong Wu, Rohan Kumar Das, Jichen Yang and Haizhou Li, &#8220;Light Convolutional Neural Network with Feature Genuinization for Detection of Synthetic Speech Attacks&#8221;, in Proc. INTERSPEECH, Shanghai, China, October 2020, pp. 1101-1105. [<a href=\"https:\/\/www.isca-speech.org\/archive\/Interspeech_2020\/pdfs\/1810.pdf\">link<\/a>]<\/li>\n<li>Ruijie Tao, Rohan Kumar Das and Haizhou Li, &#8220;Audio-visual Speaker Recognition with a Cross-modal Discriminative Network&#8221;, in Proc. INTERSPEECH, Shanghai, China, October 2020, pp. 2242-2246. [<a href=\"https:\/\/www.isca-speech.org\/archive\/Interspeech_2020\/pdfs\/1814.pdf\">link<\/a>]<\/li>\n<li>Tianchi Liu, Rohan Kumar Das, Maulik Madhavi, Shengmei Shen and Haizhou Li, &#8220;Speaker-Utterance Dual Attention for Speaker and Utterance Verification&#8221;, in Proc. INTERSPEECH, Shanghai, China, October 2020, pp. 4293-4297. [<a href=\"https:\/\/www.isca-speech.org\/archive\/Interspeech_2020\/pdfs\/1818.pdf\">link<\/a>]<\/li>\n<li>Nana Hou, Chenglin Xu, Joey Tianyi Zhou, Eng Siong Chng and Haizhou Li, &#8220;Multi-task Learning for End-to-end Noise-robust Bandwidth Extension&#8221;, in Proc. INTERSPEECH, Shanghai, China, October 2020, pp. 4069-4073. [<a href=\"https:\/\/www.isca-speech.org\/archive\/Interspeech_2020\/pdfs\/1994.pdf\">link<\/a>]<\/li>\n<li>Nana Hou, Chenglin Xu, Van Tung Pham, Joey Tianyi Zhou, Eng Siong Chng and Haizhou Li, &#8220;Speaker and Phoneme-Aware Speech Bandwidth Extension with Residual Dual-Path Network&#8221;, in Proc. INTERSPEECH, Shanghai, China, October 2020, pp. 4064-4068. [<a href=\"https:\/\/www.isca-speech.org\/archive\/Interspeech_2020\/pdfs\/1994.pdf\">link<\/a>]<\/li>\n<li>Grandee Lee and Haizhou Li, &#8220;Modeling Code-Switch Languages Using Bilingual Parallel Corpus&#8221;, in Association for Computational Linguistics, July 2020, pp. 860-870. [<a href=\"https:\/\/www.aclweb.org\/anthology\/2020.acl-main.80.pdf\">link<\/a>]<\/li>\n<li>Berrak Sisman and Haizhou Li, &#8220;Generative Adversarial Networks for Singing Voice Conversion with and without Parallel Data&#8221; in Proc. Speaker Odyssey, Tokyo, Japan, November 2020, pp. 238-244. [<a href=\"https:\/\/www.isca-speech.org\/archive\/Odyssey_2020\/pdfs\/53.pdf\">link<\/a>]<\/li>\n<li>Kun Zhou, Berrak Sisman and Haizhou Li, &#8220;Transforming Spectrum and Prosody for Emotional Voice Conversion with Non-Parallel Training Data&#8221; in Proc. Speaker Odyssey, Tokyo, Japan, November 2020, pp. 230-237. [<a href=\"https:\/\/www.isca-speech.org\/archive\/Odyssey_2020\/pdfs\/50.pdf\">link<\/a>]<\/li>\n<li>Rui Liu, Berrak Sisman, Feilong Bao, Guanglai Gao and Haizhou Li, &#8220;WaveTTS: Tacotron-based TTS with Joint Time-Frequency Domain Loss&#8221; in Proc. Speaker Odyssey, Tokyo, Japan, November 2020, pp. 245-251. [<a href=\"https:\/\/www.isca-speech.org\/archive\/Odyssey_2020\/pdfs\/56.pdf\">link<\/a>]<\/li>\n<li>Xiaohai Tian, Rohan Kumar Das and Haizhou Li, &#8220;Black-box Attacks on Automatic Speaker Verification using Feedback-controlled Voice Conversion&#8221; in Proc. Speaker Odyssey, Tokyo, Japan, November 2020, pp. 159-164. [<a href=\"https:\/\/www.isca-speech.org\/archive\/Odyssey_2020\/pdfs\/52.pdf\">link<\/a>]<\/li>\n<li>Xiaoxue Gao, Xiaohai Tian, Yi Zhou, Rohan Kumar Das and Haizhou Li, &#8220;Personalized Singing Voice Generation Using WaveRNN&#8221; in Proc. Speaker Odyssey, Tokyo, Japan, November 2020, pp. 252-258. [<a href=\"https:\/\/www.isca-speech.org\/archive\/Odyssey_2020\/pdfs\/25.pdf\">link<\/a>]<\/li>\n<li>Rui Liu, Berrak Sisman, Jingdong Li, Feilong Bao, Guanglai Gao and Haizhou Li, &#8220;Teacher-Student Training for Robust Tacotron-based TTS&#8221;, in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Barcelona, Spain, May 2020, pp. 6274-6278.\u00a0[<a href=\"https:\/\/ieeexplore.ieee.org\/abstract\/document\/9054681?casa_token=8mFmfU16NPAAAAAA:fDe9F-JP4P90-C6QTCwi4k9ptcYmBDkIEe3hqlVNnFxQu5PgjigIEffgHX9Dwc6_2WivLoXB2KUyaw\">link<\/a>]<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<table>\n<tbody>\n<tr>\n<td><a href=\"https:\/\/cde.nus.edu.sg\/ece\/project-lists-hlt\/\">Return to Main Page<\/a><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n\n","protected":false},"excerpt":{"rendered":"<p>Human Robot Interaction Phase 1\u00a0 In this project, we study novel algorithms that integrate machine listening intelligence into robotic audition that includes audio-visual sound localization to provide both accurate direction and distance information; speaker-recognition for robots to only respond to intended speakers; and integrated audition solutions for far-field speech acquisition. In the meantime, we will [&hellip;]<\/p>\n","protected":false},"author":82,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""}},"footnotes":""},"categories":[1],"tags":[],"class_list":["post-12914","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"acf":[],"_links":{"self":[{"href":"https:\/\/cde.nus.edu.sg\/ece\/wp-json\/wp\/v2\/posts\/12914","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cde.nus.edu.sg\/ece\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cde.nus.edu.sg\/ece\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cde.nus.edu.sg\/ece\/wp-json\/wp\/v2\/users\/82"}],"replies":[{"embeddable":true,"href":"https:\/\/cde.nus.edu.sg\/ece\/wp-json\/wp\/v2\/comments?post=12914"}],"version-history":[{"count":9,"href":"https:\/\/cde.nus.edu.sg\/ece\/wp-json\/wp\/v2\/posts\/12914\/revisions"}],"predecessor-version":[{"id":20035,"href":"https:\/\/cde.nus.edu.sg\/ece\/wp-json\/wp\/v2\/posts\/12914\/revisions\/20035"}],"wp:attachment":[{"href":"https:\/\/cde.nus.edu.sg\/ece\/wp-json\/wp\/v2\/media?parent=12914"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cde.nus.edu.sg\/ece\/wp-json\/wp\/v2\/categories?post=12914"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cde.nus.edu.sg\/ece\/wp-json\/wp\/v2\/tags?post=12914"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}