Machine learning and artificial intelligence (AI) are growing in popularity in fields from arts to science and everything in between, including medicine and bioengineering. While these tools have the potential to bring about significant improvements in healthcare, the systems are not perfect. How can we recognize when machine learning and artificial intelligence are proposing solutions that are not effective in the real world?
Yogatheesan Varatharajah, Carle Illinois College of Medicine (CI MED) faculty member and professor of bioengineering, is working to answer this question through his research aimed at understanding when and how certain AI-generated models fail. Varatharajah and his team recently presented a paper on the subject entitled Evaluation of the latent spatial robustness and uncertainty of EEG-ML models under realistic distribution shifts, at the prestigious Conference on Neural Information Processing Systems (NeurlPS).
“Every field in healthcare uses machine learning in one way or another, and so they become the mainstay of computer diagnostics and prognosis in healthcare,” Varatharajah said. “The problem is that with machine learning-based studies – for example to develop a diagnostic tool – we run the models and then say, okay, the model works well in a limited test environment and is therefore well suited. But when we actually use it in the real world to make real-time clinical decisions, many of these approaches don’t work as expected.”
Varatharajah explained that one of the most common reasons for this disconnect between models and the real world is the natural variability between collected data used to build a model and the data collected after a model is deployed. This variability can come from the hardware or protocol used to collect the data, or simply from differences between patients inside and outside the model. These small differences can add up to significant changes in model predictions and possibly a model that is not helping patients.
“If we can anticipate these differences, we may be able to develop some additional tools to prevent these failures, or at least know that these models will fail in certain scenarios,” Varatharajah said. “And that is the aim of this work.”
To this end, Varatharajah and his students focused on machine learning models based on electrophysiological data, specifically EEG recordings collected from patients with neurological diseases. From there, the team analyzed clinically relevant applications, like comparing normal EEGs with abnormalities, to see if it was possible to distinguish the two.
“We looked at what kind of variability can occur in the real world, specifically those variabilities that could cause problems for machine learning models,” Varatharajah said. “And then we modeled those variabilities and developed some ‘diagnostics’ to self-diagnose the models, to know when and how they’re going to fail. As a result, we can be aware of these errors and take steps to mitigate them in advance, so the models can actually help clinicians in clinical decision making.”
Publication co-author and CI MED student Sam Rawal says this study can help clinicians make better decisions about patient care by bridging the gaps between large-scale study results and factors affecting local populations. “The importance of this work lies in identifying the discrepancy between data on which AI models are trained versus the real-world scenarios they interact with when deployed in hospitals,” said Rawal. “Being able to identify those real-world scenarios where models may fail or perform unexpectedly can help guide their deployment and ensure they are used in a safe and effective manner.”
The presentation of the team’s research results at NeurIPS – one of the world’s leading machine learning conferences – was particularly meaningful. “It is quite an achievement that a publication has been accepted in this place – it gives us a name in this community,” said Varatharajah. “This also gives us the opportunity to develop this tool into something that can be used in the real world.” Bioengineering PhD student Neeraj Wagh presented this work at the NeurIPS conference.
Contributors to the work included co-authors Sam Rawal of CI MED; from bioengineering, Neeraj Wagh, Jionghao Wei, and Brent Berry. Varatharajah also praised the partnership between Illinois Bioengineering and the Mayo Clinic’s Department of Neurology. This project was also made possible by the Mayo Clinic and supported by the National Science Foundation.
Editor’s Notes: The original version of this article by Bethan Owen of the UIUC Department of Bioengineering can be found here.
Assessing the latent spatial robustness and uncertainty of EEG-ML models under realistic distribution shifts can be read online.