Manipulating Emotions - Generative Modeling for Expressive Speech Synthesis

Manipulating Emotions - Generative Modeling for Expressive Speech Synthesis
Share this

AAII Technical Lecture Series (ATLS-10) on 16-10-2021

Synthesis of emotional speech has high importance in facilitating more natural and meaningful human-computer interactions. In the next version of the ATLS, we invite Ravi Shankar, The Johns Hopkins University, USA, to talk about how generative models, such as GANs, can be used to synthesize emotional speech.

Conveying emotions through speech is a trivially acquired skill that we rarely pay attention to. It is a complex task that requires higher-order cognition and awareness of one’s self emotional state. It is extremely challenging to replicate this behavior or level of intelligence in a text-to-speech system. Common work in expressive speech synthesis focuses on decomposing the signal into compact representations using unsupervised learning and probing their relative importance in imparting one emotion versus another. This presentation will talk about our automated frameworks for transforming an utterance from one emotional class to another. We will further discuss how emotion conversion is an important stepping-stone to affective speech synthesis. We have proposed several models that combine powerful ideas from generative modeling (supervised/unsupervised) and computer vision to learn smooth mapping functions for prosodic features that allow us to manipulate speech emotions in a controlled manner.

Related links:

Speaker

Ravi is currently a Ph.D. candidate in the department of ECE at Johns Hopkins University. At JHU, he is primarily working on emotion conversion in a speech in the context of expressive speech synthesis. His work lies in the intersection of speech signal processing, statistical modeling, and deep learning. He is currently advised by Dr. Archana Venkataraman (head, NSA Lab, JHU). Ravi did his undergraduate in Electronics and Electrical Engineering at IIT Guwahati, where he worked on Keyword spotting for low-resourced languages supervised by Dr. S.R.M Prasanna. He received the MINDS fellowship award twice for working on the frontiers of machine learning and data science. He has also received the DAAD-WISE fellowship award in the past for doing a research internship in Germany.

Contact:

Title: Manipulating Emotions: Generative Modeling for Expressive Speech Synthesis
Speaker name: Ravi Shankar, PhD Candidate at The Johns Hopkins University, USA
Date: 16 October 2021
Time: 6:30 PM (IST)