Can federated learning unlock healthcare AI without breaching privacy?

Artificial intelligence (AI) can support clinical trials by optimizing patient recruitment, improving trial retention and developing digital biomarkers. However, many large healthcare datasets that could train AI algorithms remain locked in silos over patient privacy concerns. However, federated learning could unleash machine learning by training locally on these datasets while maintaining patient privacy.

In federated learning, local versions of an AI train data from individual locations without the data ever leaving their local server, explains Sarthak Pati, a biomedical AI software engineer at the University of Pennsylvania. The local AI versions then share what they learn from local data sets – rather than the data itself – to form a collaborative predictive model.

By avoiding the need to transfer data, federated learning models could integrate many more sets of data, according to experts. However, the technology is far from a silver bullet. Like any new AI technology, federated learning requires careful implementation to avoid the biases and safety traps pervasive in many current AI models, explains Dr. Eric Perakslis, Chief Science and Digital Officer at Duke Clinical Research Institute.

“Most technologies are agnostic…they are neither good nor evil,” says Perakslis. “It depends on how you use them.”

Decentralization of AI in healthcare

Federated learning could disrupt traditional machine learning through a decentralized approach that doesn’t externally aggregate patient data, Pati explains. Traditional machine learning requires sites to transfer patient data to a central server for an AI to train, he adds. This opens the door to third-party data breaches, resulting in many websites and data sources opting out of data sharing altogether.

However, by decentralizing machine learning, federated learning programs can train on a broader range of datasets, explains Dr. Ittai Dayan, CEO of Rhino Health, an AI healthcare company. Without the need to transfer data and content with the resulting privacy risks, many more datasets become available. Rather than developing increasingly complex tools and contracts to ethically redact patient data, federated learning eliminates the need to share patient data in the first place.

In clinical trials, federated learning can significantly improve the scope of predictive models, Dayan says. For example, an AI could be tasked with predicting when patients are likely to transition to second-line breast cancer therapy to optimize clinical trial recruitment. A centralized AI addressing this question would likely only use data from one or two hospitals due to privacy concerns under the Health Insurance Portability and Accountability Act (HIPAA) and General Data Protection Regulation (GDPR) laws, he notes . In contrast, a federated learning program could train on many more datasets—including ones previously unavailable to machine learning algorithms—because no data transfer is required. Decentralized machine learning, Dayan explains, is a much more scalable technology.

Can federated learning eliminate prejudice?

Because federated learning programs can use more datasets, they tend to minimize the impact of a biased dataset, Dayan says. However, a poorly designed program could serve as an engine for the dissemination of biased data, notes Perakslis.

In healthcare, access to biased and unapproved datasets is often the cheapest, and it’s no different with federated learning, says Perakslis. For example, even large swathes of biometric data captured on patients’ cell phones – a key feature of many decentralized clinical trials – can have significant biases. Apple devices have much stronger privacy protections than Android devices, which use an open-source operating system, he explains. However, the decision to only use Apple devices creates a data distortion problem because only a certain segment of the population can afford these more expensive products, he notes.

To minimize the risk of data corruption, local copies of a federated learning program should share model weights, Pati explains. This ensures that each data set’s impact on the overall program is proportional to the strength of that specific data, he notes.

Meanwhile, federated learning programs should use edge computing to minimize concerns about security breaches, Dayan explains. Edge computing is a type of information technology that processes data close to the “edge” of a network rather than in an external cloud or data center.

“By its very nature, federated learning is just a computational method,” says Dayan. “By creating the supporting infrastructure that preserves data quality and privacy, you make it effective.”

Where is the journey through federated learning in healthcare leading to?

As public concerns about data protection increase, data-sharing regulations are likely to tighten in parallel. The EU recently enacted the GDPR, and the US Congress has proposed new privacy legislation that would strengthen existing HIPAA regulations. Experts say these new laws could provide a launch pad for federated learning in healthcare.

Because data sets never leave their source in federated learning, they’re not subject to GDPR, HIPAA, or any laws in the works that focus on data sharing, Pati explains. Earlier this year, Pati and his team used a federated tutorial to define clinical outcomes in glioblastoma by defining volumetric measurements of tumors. The program trained on data from 71 sites on six continents, in large part because those sites never actually shared data and had to deal with the resulting regulatory and legal hurdles, he explains.

Meanwhile, Dayan says there is a growing demand from drug companies to collaborate on design data for clinical trials. As real-world data becomes more ubiquitous, industry players are realizing they can share data insights in mutually beneficial ways, he notes. Federated learning will enable companies to share data insights that can improve patient selection, adverse event prediction and more—all while maintaining full control of their proprietary datasets, he explains.

Still, despite the many opportunities to improve clinical trial outcomes and healthcare delivery, Perakslis urges caution when implementing federated learning programs. “It’s very easy to get excited about what technology can do,” he says. “It can take a lot longer to figure out what damage they’re causing.”


  • Federated learning allows AIs to train on datasets from multiple experimental sites and institutions without data ever leaving those sites.
  • Since no patient data is transmitted, this technology can accommodate a much larger amount of data sets.
  • Federated learning programs should give appropriate weight to individual data sets and use edge computing to reduce the risk of biased or insecure data.
  • As data-sharing regulations tighten and the pharmaceutical industry becomes more collaborative, federated learning will play an important role in the future of AI-driven drug development and healthcare.