Beyond 'Hey Siri': towards Inclusive Speech Technology

Many of us have encountered speech technology in one way or another. However, can you imagine a world where speech technology is not just about asking your smart speaker about the weather, but one where it gives a voice to those with speech disorders, helps support endangered languages, and understands sarcasm? This is the frontier that Matt Coler and his team are exploring at Campus Fryslân’s Master’s degree programme in Speech Technology.
Texst: Froukje Duursema, Corporate Communication UG / Photo Coler: Piet Douma
Technology's limitations
‘I became interested in speech technology through a very unusual route,’ Coler admits. ‘My PhD actually focuses on the documentation of small languages.’ While doing fieldwork for his PhD, Coler quickly ran into the limitations of technology. ‘I was doing research working with indigenous communities in South America, and I used recording devices to record their language. But when you can’t speak the language you are listening to, you can’t hear word boundaries.’ Coler explains that because of this, it was challenging to figure out where the words began or ended, and his transcribing software could not make heads or tails of it either.
‘This raised a lot of questions, because the amount of possible speech sounds is limited. Yet, there is no technological solution to transcribing anything close to the actual language. I began to wonder, as someone who has worked with much lesser known languages, why doesn’t speech technology work, even for a widespread language such as English?’
A tough row to hoe
Coler explains that speech technology comprises two main components: speech synthesis and speech recognition. Speech synthesis refers to the way machines produce artificial speech, and speech recognition concerns the ability of our technology to recognize speech. ‘When your Alexa or Google Home device speaks to you, it might seem trivial to you, because it probably speaks a large language such as English. In fact, it is very challenging to create a voice that resembles a human voice, especially one that is able to produce melody.’ Coler mentions other notable challenges that arise with speech recognition: ‘It’s difficult to extract the right speech signal from a noisy background, especially because there are so many different backgrounds, which can include other voices, animal sounds, and all other kinds of background noise. When you want to do this with smaller languages, it’s even more challenging.’

What purpose does it serve?
While maintaining his focus on broader applications, Coler acknowledges the impressive advances made by major tech corporations. Think of ChatGPT, transcription services, or applications such as DuoLingo. He says that you should ask yourself whether the main purpose of an application or a device is to improve your life or to sell products. ‘Speech technology has the potential to influence the notion of technological inclusivity. I think that technology is at its best when it improves your life.’ Coler explains that the current market is dominated by a few deeply monopolistic companies, who make it seem like the limitations of technology are determined by what sells. Academics, however, have a different goal: improving the quality of life for humanity. ‘The focus lies much more on who could benefit from an application or device.’
A different take on speech assistance
A good example of a market opportunity with a very large societal impact is a recent PhD project Coler supervised. ‘We had a PhD student who was working on improving speech recognition for people with certain types of speech disfluencies caused by neurodegenerative diseases, such as Parkinson’s disease.’ Many people are impacted by disfluency caused by an illness or a disability, people recovering from throat cancer or people with cerebral palsy, for example. ‘Many of these people speak in a way that’s not easy for devices to understand— and often not easy for people around them either,’ Coler explains. ‘Imagine you could create personal speech recognition devices for all these people, as every disease manifests in a different way. This technology could convert their speech into a synthetic voice or transcribe it into text. Such advancements can be life-changing for individuals with speech disfluencies, enabling them to communicate more easily—whether ordering in a restaurant, making a phone call, or engaging in everyday conversations. This is precisely where universities should be directing their efforts,’ Coler emphasizes.
Maintaining Frisian and other minority languages
Coler and his team are also working on a way to use Frisian when interacting with devices. He advocates for making smaller languages part of the digital landscape, as this could be a new opportunity for human-machine interaction. ‘When young people enter the digital landscape with their devices and realize they can’t dictate Frisian to their phone, they might stop using the language. It’s probably easier to dictate in English or Dutch. English is often the implicit language of modern interaction with our computers.’ Coler emphasizes that free language tools that understand Frisian might lower the threshold for speaking it, even for speakers who are not fluent in the language. He illustrates what it could be like if you practice Frisian at home through speech technology: ‘If you’re not fluent in Frisian, you might feel uncomfortable speaking it. Imagine that you’re practising at home and the device doesn’t mind if you mispronounce words. That might make it less intimidating to use the language casually. I think having these tools available that implicitly recognize this language implies it is just as prestigious as any other. Until recently, speech technology was only available for the most widely spoken languages with the most resources.’

A computer that recognizes sarcasm
Another recent PhD project Coler supervised was about the development of a sarcasm detector. He points out that the way we currently communicate with our devices is very literal. ‘When your device gives you incorrect information and then politely asks “Was this helpful to you?”, a response like Uhuh yeah, it was suuuper helpful, would be understood literally, even though most speakers would immediately recognize it as sarcasm.’ Coler says that the point is not to make a better Siri or Alexa, but to enable people to interact more naturally with their devices. ‘Much of what we say is figurative or sarcastic. It’s not about what you say but how you say it.’ Important aspects of human speech like sincerity, emotional content, and attitudes are expressed through melody.
‘We got the detector up to a point where it could very reliably recognize active sarcasm; the detector was correct about 75% of the time. This might be even better than human sarcasm detection, which isn’t perfectly attuned either.’
Off the beaten track
Coler acknowledges the impressive progress made by major tech corporations, but continues to focus on broader applications. ‘Recent developments such as OpenAI’s Whisper have undoubtedly transformed the field. But commercial innovations are only a part of the story.’ At Campus Fryslân, the focus remains on ensuring that these powerful technologies serve the needs of diverse communities. ‘The question isn't just what speech technology can do, but who it can serve and how it can make lives better.’
More information
Last modified: | 27 March 2025 08.45 a.m. |
More news
-
05 March 2025
Women in Science
The UG celebrates International Women’s Day with a special photo series: Women in Science.
-
19 December 2024
UG offers the best Bachelor’s degree programme in the Netherlands
The UG’s Global Responsibility and Leadership degree programme has again been named by the Dutch Higher Education Guide (Keuzegids) for Universities 2025 as the best Bachelor’s degree programme in the Netherlands. This programme, which is taught at...
-
27 May 2024
Symposium 'From tensions to opportunities'
On 20 June 2024 a symposium will take place around the question: 'How to work effectively and meaningfully with internationalisation and diversity in study programs and disciplines?'. The symposium builds on the PhD research by Franka van den Hende...