Researchers’ Artificial Intelligence-Based Speech Sound Therapy Software Receives $2.5M NIH Grant

Three Syracuse University researchers, supported by a recent $2.5 million grant from the National Institutes of Health, are advancing a clinically intuitive automated system that will improve the treatment of speech-sound disorders while reducing the effects of the global shortage to speech doctors can alleviate.

The project “Intensive Speech Motor Chaining Treatment and Artificial Intelligence Integration for Residual Speech Sound Disorders” will be funded for five years. Jonathan Preston, associate professor of communication sciences and disorders, is principal investigator. Preston is the inventor of Speech Motor Chaining, a treatment approach for people with speech disorders. Co-principal researchers are Asif Salekin, an assistant professor of electrical engineering and computer science whose expertise lies in the development of interpretable and fair human-centric systems on artificial intelligence, and Nina Benway, a graduate student in the Communication Studies and Disorders/Language Language Pathology doctoral program.

Their system uses evidence-based Speech Motor Chaining software, an extensive library of speech sounds, and artificial intelligence to “think” and “hear” like a speech therapist does.

The project focuses on how to most effectively schedule speech motor chaining sessions for children with speech disorders, and also explores whether artificial intelligence can improve speech motor chaining – a topic Benway explored in her dissertation. The work is a collaboration between Salekin’s Laboratory for Ubiquitous and Intelligent Sensing at the College of Engineering and Computer Science and Preston’s Speech Production Lab at the College of Arts and Sciences.

clinical need

In speech therapy, learners typically meet one-on-one with a doctor to practice speech sounds and receive feedback. If the artificial intelligence version of Speech Motor Chaining (“ChainingAI”) accurately replicates a doctor’s judgement, it could help learners do quality practice independently between doctor sessions. This could help them achieve the exercise intensity that best contributes to overcoming a speech disorder.

The software should complement the work of speech therapists and not replace them. “We know that speech therapy works, but there’s a bigger question as to whether learners are receiving the intensity of services that best support language learning,” says Benway. “This project investigates whether AI-enhanced speech therapy can increase the intensity of services through practice at home between sessions with a human clinician. The speech therapist is still responsible for monitoring, critically evaluating, and training the software as to which sounds are correct and which are not; The software is just one tool in the overall arc of medically managed treatment.”

170,000 sounds

A library of 170,000 correctly and incorrectly pronounced “r” sounds was used to train the system. The recorded sounds were made by over 400 children over a 10-year period, collected by researchers at Syracuse, Montclair and New York Universities and archived at the Speech Production Lab.

Benway wrote ChainingAI’s patent-pending speech analysis and machine learning operating code that converts audio from speech sounds into recognizable numerical patterns. The system was taught to predict which patterns represent “right” or “wrong” speech. Predictions can be tailored to individual people’s speech patterns.

During speech training, the code works in real-time with Preston’s Speech Motor Chaining website to recognize, sort, and interpret patterns in speech audio to “hear” whether a sound is being produced correctly. The software provides audio feedback (announcement of “right” or “not quite”), provides tongue position reminders and tongue shape animations to improve correct pronunciation, then selects the next practice word based on whether the child is willing to increase the word or not difficulty.

Speech therapy software ChainingAI provides feedback, including images of tongue positioning, to help learners get the sounds right. (Photo courtesy of Sound Production Lab)

Early Promise

According to the researchers, the system has greater potential than previous systems designed to detect speech defects.

According to Preston, until now, automated systems have not been accurate enough to offer much clinical benefit. This study overcomes problems that hampered previous efforts: the sample audio data set for a residual speech noise disorder is larger; it detects false sounds more accurately; and clinical studies evaluate therapeutic benefits.

“There hasn’t been a clinical therapy system that explicitly leverages AI machine learning to recognize correct and distorted ‘r’ sounds for learners with residual speech-sound disorders,” says Preston. “The data collected so far shows that this system performs well compared to what a human clinician would say under the same circumstances and that learners improve the sound of speech after using ChainingAI.”

So far only “R”

The experiment is currently focused on the “r” sound, the most common speech defect that persists into adolescence and adulthood, and only in American English. Ultimately, the researchers hope to extend the software’s functionality to “s” and “z” sounds, various English dialects, and other languages.

Ethical AI

Faculty and graduate students working on the project include (from left) Ph.D. Student and Project Manager Nicole Caballero, Associate Professor Jonathan Preston, Assistant Professor Asif Salekin, and Nina Benway, graduate of the 2023 Doctoral Program in Communications Studies and Disorders. (Photo by Alex Dunbar)

Researchers have considered ethical aspects of AI throughout the initiative. “We have ensured that ethical oversight is built into this system to ensure fairness in the reviews made by the software,” says Salekin. “In its learning process, the model was taught to adjust for the age and gender of customers to ensure it performed fairly regardless of those factors.” Future improvements will adjust for race and ethnicity.

The team also reviews suitable candidates for therapy and assesses whether scheduling therapy visits differently (e.g., a boot camp experience) could help learners progress faster than longer-term intermittent sessions.

Ultimately, the researchers hope the software will provide informed practice sessions that are effective, accessible, and of sufficient intensity for ChainingAI to routinely supplement time in the doctor’s office. Once the system was expanded to include the “s” and “z” sounds, it would treat 90% of the remaining speech-sound disorders and could benefit many thousands of the estimated six million Americans affected by these disorders.