Learning Generative and Parameterized Interactive Sequence Models with RNNs

Generative neural networks are valuable not only for their classification abilities, but for their ability to synthesize novel examples based on model distributions. We exploit this capability of generative models in the domain of audio, and focus particularly on architectures and techniques that can be used to learn how to generate sequences under directed external control. We would like to understand and enable automated data-driven modeling for synthesis and manipulation of arbitrary and naturalistic speech, music, and environmental audio.

The objective of this project is to deepen our understanding of how neural networks can be applied to the analysis and synthesis of general-purpose audio. The primary research question is how to train models that are capable of generating audio in response to real-time input from model parameters and from the environment. We propose to train models end-to-end, learning both the control affordances and the generative algorithms, not simply to learn mappings from input to existing synthesis models.

Project Duration: 01 May 2019 – 31 December 2022.

Funding Source: AcRF Tier 2 funding (MOE2018-T2-2-127).

Acknowledgment: This research work is supported by Academic Research Council, Ministry of Education (ARC, MOE). Grant: MOE2018-T2-2-127. Title: Learning Generative and Parameterized Interactive Sequence Models with RNNs.

PUBLICATIONS

Journal Articles

  • Bidisha Sharma, Xiaoxue Gao, Karthika Vijayan, Xiaohai Tian, and Haizhou Li, “NHSS: A speech and singing parallel database”, Speech Communication, 133,  July 2021, pp. 9-22. [link]

Conference Articles

  • Chitralekha Gupta, Purnima Kamath, and Lonce Wyse, “Signal Representations for Synthesizing Audio Textures with Generative Adversarial Networks”, Sound and Music Computing Conference, May 2021. [Article in-process]
  • Lin Huang, Chitralekha Gupta, and Haizhou Li, “Spectral Features and Pitch Histogram for Automatic Singing Quality Evaluation with CRNN”, in Proc. Asia-Pacific Signal and Information Processing Association (APSIPA) Annual Summit and Conference (ASC), Auckland, New Zealand, December 2020, pp. 492-499. [link]
  • Chitralekha Gupta, Lin Huang, and Haizhou Li, “Automatic Rank-Ordering of Singing Vocals with Twin-Neural Network”, in Proc. International Society for Music Information Retrieval Conference (ISMIR), Montreal, Canada, October 2020, pp. 416-423. [link]
  • Xiaoxue Gao, Xiaohai Tian, Yi Zhou, Rohan Kumar Das and Haizhou Li, “Personalized Singing Voice Generation Using WaveRNN” in Proc. Speaker Odyssey, Tokyo, Japan, November 2020, pp. 252-258. [link]
  • Chitralekha Gupta, Emre Yilmaz and Haizhou Li, “Automatic Lyrics Alignment and Transcription in Polyphonic Music: Does Background music help?”, in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Barcelona, Spain, May 2020, pp. 496-500. [link]