Speech Synthesis and Synthetic Speech Detection

Speech generation is an important aspect of speech communication. On one hand, we generate natural sounding speech so that computers can interact with human. Therefore, computer-generated speech is expected to sound like human voice with accents, emotions, styles, and personality. This calls for controllable speech synthesis and voice conversion. On the other hand, natural sounding synthetic speech poses a threat to Automatic Speaker Verification (ASV) systems. An attacker may use text-to-speech (TTS) or voice conversion (VC) systems to impersonate a target speaker’s voice to attack an speaker verification system. To overcome such a challenge, synthetic speech detection is necessary. This project studies the personalized speech generation that includes text-to-speech synthesis and speech-to-speech conversion. We will also study a deep learning approach to the detection of synthetic speech.

Project Duration: 06 October 2020 – 05 October 2023.