MSc Voice Technology graduates present at Interspeech 2023
Date: | 08 September 2023 |
Two graduates of our MSc Voice Technology were accepted to present the results of their thesis during the prestigious Young Female Researchers in Speech Workshop (YFRSW) during the Interspeech 2023 conference in August 2023. We sat down with Spyretta Leivaditi and Victoria Ivanova shortly after to talk about their research presented at YFRSW 2023.
Spyretta: The title of my research is “An ASR model for Dutch dysarthric speech: The case of Parkinson disease”. People with Parkinson disease often have difficulty producing normal speech. Therefore, Automatic Speech Recognition (ASR) models have difficulties understanding this dysarthric speech or slurred speech.
Research focus
My research focused on improving an ASR model by training it on data of people with Parkinson disease and seeing how I could improve the model by dealing with this.
For this I had access to data with two different speaking styles, namely read speech as well as spontaneous speech. From medical sciences we know that symptoms of people with dysarthric speech are less when they are reading, as opposed to when they speak spontaneously.
The first part of my research looked at whether this assumption was indeed correct and whether I could account for this with the ASR model and the second part looked at whether an ASR model trained with data from people with Parkinson disease, also performs well in cases of dysarthric speech caused by other diseases or a stroke.
The focus really was more on the technical aspects of this model, but this ASR model can be plugged into an app so people with Parkinson disease can better communicate with their doctors, caretakers or loved ones. However, what still needs further research is how the model reacts to different severities of dysarthric speech.
Victoria: While Spyretta’s research was focused on ASR, so from speech to text, my research actually focused on the other way around, so text to speech or speech synthesizing.
Research focus
The title was “Synthesizing Proto-Indo-European using Phonological Features for Zero-Shot Synthesis”. Proto-Indo-European is an extinct language that was spoken about 6000 years ago, but is an ancestor of most European languages, as well as languages spoken on the Asian-continent between the modern Turkey and India such as Farsi or Hindi.
A lot of vocabulary and even some grammar has been reconstructed by the comparative linguistics method. There you take words that are similar in different modern languages and thereby you are able to reconstruct what that word was in proto-Indo-European. This way we were also able to reconstruct how the phonology, so the sounds of the language, probably sounded like.
My research focused on an existing method to synthesize languages where we have very little data on, so no audio recordings, and maybe just little written text. I applied this to proto-Indo-European and I managed to synthesize it with this toolkit. According to evaluations, where I asked ordinary people what sounds more natural, I did better than a previous attempt with older techniques to synthesize proto-Indo-European. But of course, it is very hard to judge how well I did because there are no native speakers anymore.
This method can be used for any language that is very low resource or extinct, which will be progressively more needed as the prognosis is that in 100 years, 90% of the world languages will be extinct. In that regard the results of my research will be quite relevant.
As for the case of proto-Indo-European, I always thought it would be such a great learning tool to study proto-Indo-European. I even made a small web-app that everyone can use, so I think that’s quite needed. I’ve always been interested in the intersection between voice technology/speech technology and entertainment. So, I think it’s also something that can be a lot of fun!