Chenyi flies to Greece (for a different kind of trip)
Date: | 04 July 2024 |
Author: | Erika Compatangelo |
Chenyi will be attending the Young Female Researchers in Speech Workshop (YFRSW) in Greece, on August 31st. The YFRSW is a one-day satellite event of Interspeech 2024, the world's largest conference on spoken language processing science and technology. With her background in linguistics and interest in technology, Chenyi seized the chance she encountered during her studies!
Chenyi, congratulations on being accepted to join the Young Female* Researchers in Speech Workshop! Could you tell us a bit more about yourself?
My name is Chenyi and I am originally from China. I did my BA in Linguistics at Utrecht University, in the Netherlands, and I am currently a student of MSc in Voice Technology. I was interested in phonetics and phonology, and in the intersection they have with Speech Technology. I got particularly interested in the subject after a period of study abroad at the University of Edinburgh. There, I took courses in Speech Processing, Computer Programming for Speech and Language Processing, Phonetics and Laboratory Phonology, and I really enjoyed it!
How did the opportunity to join the YFRSW come about?
It came about thanks to Dr. Matt Coler, our programme director. Matt always encourages us to participate in conferences and international events. When the opportunity to be involved in Interspeech 2024 arose, he brought it up, and I decided to apply. That's how I found out about it and ended up being accepted. Now, I'm excited to be heading to Greece at the end of August!
What was the abstract you got accepted at YFRSW about?
This abstract focuses on synthesizing friendly speech in Mandarin Chinese by manipulating acoustic features—namely pitch, duration, and energy—of neutral speech using the FastSpeech 2 framework, aiming to achieve a perceptual transition from neutral to friendly speech. This method deviates from the conventional approach to synthesizing speech, which involves training TTS models with datasets specifically containing the desired styles or tones. However, due to the scarcity of friendly speech data, this research synthesizes friendly speech indirectly by acoustically manipulating neutral speech data.
What is the societal relevance of this research?
In Voice Tech, there are two main subfields: speech recognition and speech synthesis. Synthesized speech allows us to produce speech in any desired style. However, TTS has received quite some criticism because the speech produced lacks expressivity. Because of this, it is hard to convey people’s internal feelings and attitudes. This research can potentially contribute to making synthesized speech more expressive and personalized, and overall more human-like.
❝This research can potentially contribute to making synthesized speech more expressive and personalized, and overall more human-like.❞
Who can make use of it?
Ideally, people with speech-related impairments. Currently, the available voices they can use to express themselves are quite neutral, but they are people and should be able to express themselves with more than just neutral voices! It can also be beneficial for some companies. If they have branding strategies that emphasize being friendly and open, they can benefit from a more friendly tone.
How did you choose this Master’s programme to further your knowledge in this field?
Initially, I was a bit hesitant about coming here because it is, after all, a new programme. I considered a few other master’s programmes, focused, for instance, on text mining, but speech sounded more interesting to me, and this was the only programme that allowed me to dive into that.
Has the programme facilitated your involvement in this opportunity? If so, how?
A lot! I graduated from my bachelor’s with only theoretical knowledge of phonetics and phonology. I had very little knowledge of how to use technology in combination with that. For instance, I knew how to code with Python but not to a great extent. I didn’t have any previous knowledge in machine learning, speech recognition, speech synthesis, and deep learning. Now, I am more familiar with these subjects and know how to work with them!
About the author
Ciao! My name is Erika and I am the Content & Data Management Specialist of Campus Fryslân. I was born and raised in Italy and have recently graduated from the MSc in Climate Adaptation Governance. I have been in charge of the blog and all its content since October 2023. My aim is to make this virtual space serve as a logbook for the Campus Fryslân community and as a welcoming introduction for all newcomers. Here, you will find stories from the people of Campus Fryslân to get a taste of what studying here is like and the exciting opportunities it comes with!