Section: Overall Objectives
Overall Objectives
The goal of the project is the modeling of speech for facilitating oral-based communication. The name MULTISPEECH comes from the following aspects that are particularly considered.
-
Multisource aspects - which means dealing with speech signals originating from several sources, such as speaker plus noise, or overlapping speech signals resulting from multiple speakers; sounds captured from several microphones are also considered.
-
Multilingual aspects - which means dealing with speech in a multilingual context, as for example for computer assisted language learning, where the pronunciations of words in a foreign language (i.e., non-native speech) is strongly influenced by the mother tongue.
-
Multimodal aspects - which means considering simultaneously the various modalities of speech signals, acoustic and visual, in particular for the expressive synthesis of audio-visual speech.
Our objectives are structured in three research axes, which have evolved compared to the project proposal finalized in 2014. Indeed, due to the ubiquitous use of deep learning, the distinction between `explicit modeling' and `statistical modeling' is not relevant anymore and the fundamental issues raised by deep learning have grown into a new research axis `beyond black-box supervised learning'. The three research axes are now the following.
-
Beyond black-box supervised learning This research axis focuses on fundamental, domain-agnostic challenges relating to deep learning, such as the integration of domain knowledge, data efficiency, or privacy preservation. The results of this axis naturally apply in the various domains studied in the two other research axes.
-
Speech production and perception This research axis covers the topics of the research axis on `Explicit modeling of speech production and perception' of the project proposal, but now includes a wide use of deep learning approaches. It also includes topics around prosody that were previously in the research axis on `Uncertainty estimation and exploitation in speech processing' in the project proposal.
-
Speech in its environment The themes covered by this research axis mainly correspond to those of the axis on `Statistical modeling of speech' in the project proposal, plus the acoustic modeling topic that was previously in the research axis on `Uncertainty estimation and exploitation in speech processing' in the project proposal.
A large part of the research is conducted on French and English speech data; German and Arabic languages are also considered either in speech recognition experiments or in language learning. Adaptation to other languages of the machine learning based approaches is possible, depending on the availability of speech corpora.