Singing Vocal Processing and Quality Evaluation (17 July 2020 – 16 July 2021)

Singing vocals analysis and the study of lyrical information in music are well-known research areas in Music Information Retrieval (MIR). In this project, we focus on the topics of automatic singing quality assessment, and automatic lyrics alignment and transcription.

Singing is a popular medium of entertainment and a desirable skill to develop. With increase in the online platforms for showcasing singing talent, it is important to develop automatic and objective systems for singing talent hunt. Automatic evaluation of singing quality can be done with the help of a reference singing or the digital sheet music of the song, i.e. reference-dependent evaluation. However, a standard reference for comparison may not always be available. Therefore, this project explores reference-independent methods of evaluation that leverage on large amounts of singing data to characterize the inherent singing properties and inter-singer statistics and build neural network frameworks to rank-order a large pool of singers according to their singing quality in a self-organized way, without any standard reference.

Lyrics is an important component of music, and people often recognize a song by its lyrics. Automatic lyrics alignment is the task of finding word boundaries of the given lyrics with the polyphonic audio, while transcription is the task of recognizing the sung lyrics from audio. These are useful for various music information retrieval applications such as generating karaoke scrolling lyrics, music video subtitling, and query-by-singing. Automatic lyrics alignment and transcription of singing vocals in the presence of background music remains an unsolved problem. Singing vocals are different from speech in terms of pitch dynamics and phoneme duration. Moreover, singing vocals are often highly correlated with the corresponding background music, resulting in overlapping frequency components. In the project, we explore data-driven acoustic and language modeling methods for the purpose of lyrics-to-audio alignment and lyrics transcription of polyphonic music, through traditional Kaldi framework as well as end-to-end ESPNET based framework.

Project Duration: 17 July 2020 – 16 July 2021

PUBLICATIONS

Journal Articles

  • Chitralekha Gupta, Haizhou Li and Ye Wang, “Automatic Leaderboard: Evaluation of Singing Quality Without a Standard Reference,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28, 2020, pp. 13-26. [link] [Article In-process]
  • Chitralekha Gupta, Haizhou Li, and Ye Wang, “A technical framework for automatic perceptual evaluation of singing quality,” APSIPA Transactions on Signal and Information Processing, 7(E10), September 2018, pp. 1-11. [link]

Conference Articles

  • Chitralekha Gupta, Lin Huang, and Haizhou Li, “Automatic Rank-Ordering of Singing Vocals with Twin-Neural Network”, in Proc. International Society for Music Information Retrieval Conference (ISMIR), Montreal, Canada, October 2020.
  • Chitralekha Gupta, Emre Yilmaz and Haizhou Li, “Automatic Lyrics Alignment and Transcription in Polyphonic Music: Does Background music help?”, in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Barcelona, Spain, May 2020. [link]
  • Chitralekha Gupta, Emre Yılmaz and Haizhou Li, “Acoustic Modeling for Automatic Lyrics-to-Audio Alignment”, in Proc. INTERSPEECH, Graz, Austria, September 2019, pp. 2040-2044. [link]
  • Chitralekha Gupta, Haizhou Li, Ye Wang, “Automatic Evaluation of Singing Quality without a Reference,” in Proc. Asia-Pacific Signal and Information Processing Association (APSIPA) Annual Summit and Conference (ASC) 2018, Honolulu, Hawaii, USA, November 2018, pp. 990-997. [link]
  • Chitralekha Gupta, Haizhou Li, Ye Wang, “Perceptual evaluation of singing quality,” in Proc. Asia-Pacific Signal and Information Processing Association (APSIPA) Annual Summit and Conference (ASC) 2017, Kuala Lumpur, Malaysia, December 2017, pp. 577-586. [BEST STUDENT PAPER AWARD]. [link]

Patents/Software Licenses

  • Title: System and Method for Assessing Quality of A Singing Voice
    International Patent Application No: PCT/SG2020/050457  [05 August 2020] (Patent Filed), Inventors: Chitralekha Gupta, Haizhou Li and Ye Wang
  • Title: Automatic Lyrics Alignment for Polyphonic Music
    ILO Reference Number: 2020-117 [April 2020], Inventors: Chitralekha Gupta, Emre Yilmaz and Haizhou Li

Achievements

  • The “Automatic Lyrics-to-Audio Alignment” system developed by the team from HLT-NUS (Chitralekha Gupta, Emre Yilmaz, Haizhou Li) outperformed all other systems in the International Music Information Retrieval Evaluation eXchange platform MIREX 2019. (Poster linkMirexResults)
  • Our research in singing voice evaluation has led to the incorporation of NUS spin-off MuSigPro Pte. Ltd. in August 2019, funded by NUS Graduate Research Innovation Program (GRIP)