That’s because health data such as medical imaging, vital signs, and data from wearable devices can vary for reasons unrelated to a particular health condition, such as lifestyle or background noise. The machine learning algorithms that the tech industry has made famous are very good at finding patterns that can figure out shortcuts to “correct” answers that wouldn’t work in the real world. Smaller data sets make it easier for algorithms to cheat in this way and create blind spots that lead to poor outcomes in the clinic. “Society of Fools [itself] In thinking that we develop models that work much better than they actually do,” Berisha says. “It just adds to the hype of AI.”
This problem has led to a startling and disturbing pattern in some areas of AI healthcare research, Berisha says. In studies that use algorithms to detect signs of Alzheimer’s disease or cognitive impairment in speech recordings, Berisha and colleagues found that larger studies reported worse accuracy than smaller ones — contrary to what big data is supposed to provide. A review of studies trying to identify brain disorders from medical scans and another of studies trying to detect autism using machine learning reported a similar pattern.
The risks of algorithms that perform well in initial studies but behave differently in real patient data are not hypothetical. A 2019 study found that the system used on millions of patients to prioritize access to additional care for people with complex health problems puts white patients ahead of black patients.
Avoiding biased systems like this requires large, balanced data sets and careful testing, but skewed data sets are the norm in health AI research, due to historical and persistent health inequalities. A 2020 study by researchers at Stanford University found that 71 percent of the data used in studies that applied deep learning to US medical data came from California, Massachusetts or New York, with little or no representation from the other 47 states. Low-income countries are hardly represented at all in healthcare AI studies. A review published last year of more than 150 studies using machine learning to predict diagnoses or disease pathways concluded that most of them “show poor methodological quality and are at high risk of bias”.
Two researchers concerned about these shortcomings recently launched a nonprofit organization called Nightingale Open Science to try to improve the quality and volume of data sets available to researchers. It works with health systems to organize collections of medical images and associated data from patient records, anonymize them, and make them available for nonprofit research.
Ziad Obermayer, co-founder of Nightingale and associate professor at the University of California, Berkeley, hopes that providing access to that data will encourage competition that leads to better outcomes, similar to the way large, open sets of images helped spur advances in machine learning. “The crux of the problem is that the researcher can do and say whatever he wants in the health data because no one can verify his results,” he says. “data [is] Closed.”
Nightingale joins other projects trying to improve healthcare with artificial intelligence by enhancing data access and quality. The Lacona Fund supports the creation of machine learning data sets that are representative of low- and middle-income countries working on healthcare; A new project at Birmingham University Hospitals in the UK with support from the National Health Service and MIT is developing criteria for assessing whether AI systems are based on unbiased data.
Matin, editor of the UK Report on Epidemiological Algorithms, is a fan of AI projects like this one, but says the prospects for AI in healthcare also depend on health systems modernizing their often shoddy IT infrastructure. “You have to invest there in the root of the problem to see the benefits,” says Matin.
More great wired stories