Speech Information Processing Research

This is a 5-year research collaboration funded by industry partner to advance the technologies in these areas.

1. Speaker recognition under noisy and mismatched acoustic conditions

While speaker recognition has worked in controlled acoustic environment, its performance degrades significantly in real world conditions. In this research, we will address two major technical challenges in speaker recognition, a) to enable speaker recognition to work in unseen noisy conditions, b) to allow speakers to enroll in one condition, and to perform voiceprint authentication in another condition. The research will benefit from the research progress in deep learning approach to speaker characterization, and disentanglement of speech information.

2. Speaker extraction and speaker separation

Human has the ability to focus on the particular voice while filtering out other stimuli in a multi-talker acoustic environment. However, machines don’t achieve such selective listening ability and result in great limitation when they are deployed to real world applications. In this research, we will develop novel algorithms that we call speaker extraction, to emulate human selective listening ability at cocktail party that was formulated by Colin Cherry in psychoacoustic experiments. The speaker extraction technology is able to extract one target speaker’s voice from overlapping speech of multiple speakers under meeting room acoustic environment.

The multi-talker speech can be recorded from a monaural microphone (single microphone) or array microphones (multiple microphones). The algorithms are expected to provide clean, single speaker voice stream of the target speaker.

3. Wake-up mechanism with audio-visual cues

In this research, we will explore the novel approach to spoken wake-up that makes use of audio-visual cues. The algorithm emulates human auditory attention mechanism to achieve high accuracy and energy efficiency. It responds to wake-up calls only to the right command spoken by the right person, and in the right occasion.

Project Duration: 16 April 2020 – 15 April 2025

Funding Source: Industry Partner

 

Return to Project List