Decentralised, collaborative, and privacy-preserving machine learning for multi-hospital data.

Fang, Congyu; Dziedzic, Adam; Zhang, Lin; Oliva, Laura; Verma, Amol; Razak, Fahad; Papernot, Nicolas; Wang, Bo

Fang, Congyu; Dziedzic, Adam; Zhang, Lin; Oliva, Laura; Verma, Amol; Razak, Fahad; Papernot, Nicolas; Wang, Bo.

Afiliação

Fang C; Department of Computer Science, University of Toronto, Canada; Peter Munk Cardiac Centre, University Health Network, Canada; Vector Institute, Toronto, Canada.
Dziedzic A; Vector Institute, Toronto, Canada; CISPA Helmholtz Center for Information Security, Germany; Department of Electrical and Computer Engineering, University of Toronto, Canada.
Zhang L; Peter Munk Cardiac Centre, University Health Network, Canada; Simon Fraser University, Canada.
Oliva L; Peter Munk Cardiac Centre, University Health Network, Canada.
Verma A; St. Michael's Hospital, Unity Health Toronto, Canada; Department of Medicine, University of Toronto, Canada; Institute of Health Policy, Management and Evaluation, University of Toronto, Canada.
Razak F; St. Michael's Hospital, Unity Health Toronto, Canada; Department of Medicine, University of Toronto, Canada; Institute of Health Policy, Management and Evaluation, University of Toronto, Canada.
Papernot N; Department of Computer Science, University of Toronto, Canada; Vector Institute, Toronto, Canada; Department of Electrical and Computer Engineering, University of Toronto, Canada. Electronic address: nicolas.papernot@utoronto.ca.
Wang B; Department of Computer Science, University of Toronto, Canada; Peter Munk Cardiac Centre, University Health Network, Canada; Vector Institute, Toronto, Canada; Department of Laboratory Medicine and Pathobiology, Temerty Faculty of Medicine, University of Toronto, Canada. Electronic address: bowang@v

EBioMedicine ; 101: 105006, 2024 Mar.

Article em En | MEDLINE | ID: mdl-38377795

ABSTRACT

ABSTRACT

BACKGROUND:

Machine Learning (ML) has demonstrated its great potential on medical data analysis. Large datasets collected from diverse sources and settings are essential for ML models in healthcare to achieve better accuracy and generalizability. Sharing data across different healthcare institutions or jurisdictions is challenging because of complex and varying privacy and regulatory requirements. Hence, it is hard but crucial to allow multiple parties to collaboratively train an ML model leveraging the private datasets available at each party without the need for direct sharing of those datasets or compromising the privacy of the datasets through collaboration.

METHODS:

In this paper, we address this challenge by proposing Decentralized, Collaborative, and Privacy-preserving ML for Multi-Hospital Data (DeCaPH). This framework offers the following key benefits (1) it allows different parties to collaboratively train an ML model without transferring their private datasets (i.e., no data centralization); (2) it safeguards patients' privacy by limiting the potential privacy leakage arising from any contents shared across the parties during the training process; and (3) it facilitates the ML model training without relying on a centralized party/server.

FINDINGS:

We demonstrate the generalizability and power of DeCaPH on three distinct tasks using real-world distributed medical datasets patient mortality prediction using electronic health records, cell-type classification using single-cell human genomes, and pathology identification using chest radiology images. The ML models trained with DeCaPH framework have less than 3.2% drop in model performance comparing to those trained by the non-privacy-preserving collaborative framework. Meanwhile, the average vulnerability to privacy attacks of the models trained with DeCaPH decreased by up to 16%. In addition, models trained with our DeCaPH framework achieve better performance than those models trained solely with the private datasets from individual parties without collaboration and those trained with the previous privacy-preserving collaborative training framework under the same privacy guarantee by up to 70% and 18.2% respectively.

INTERPRETATION:

We demonstrate that the ML models trained with DeCaPH framework have an improved utility-privacy trade-off, showing DeCaPH enables the models to have good performance while preserving the privacy of the training data points. In addition, the ML models trained with DeCaPH framework in general outperform those trained solely with the private datasets from individual parties, showing that DeCaPH enhances the model generalizability.

FUNDING:

This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC, RGPIN-2020-06189 and DGECR-2020-00294), Canadian Institute for Advanced Research (CIFAR) AI Catalyst Grants, CIFAR AI Chair programs, Temerty Professor of AI Research and Education in Medicine, University of Toronto, Amazon, Apple, DARPA through the GARD project, Intel, Meta, the Ontario Early Researcher Award, and the Sloan Foundation. Resources used in preparing this research were provided, in part, by the Province of Ontario, the Government of Canada through CIFAR, and companies sponsoring the Vector Institute.

Assuntos

Hospitais; Privacidade; Humanos; Ontário; Análise de Dados; Registros Eletrônicos de Saúde

Palavras-chave

(Distributed) differential privacy; Collaborative machine learning (ML); Decentralization; ML for healthcare

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Privacidade / Hospitais Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Privacidade / Hospitais Idioma: En Ano de publicação: 2024 Tipo de documento: Article