Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 509
Filter
Add more filters

Publication year range
1.
Brief Bioinform ; 25(1)2023 11 22.
Article in English | MEDLINE | ID: mdl-38221903

ABSTRACT

The advent of single-cell RNA sequencing (scRNA-seq) has revolutionized our understanding of cellular heterogeneity and complexity in biological tissues. However, the nature of large, sparse scRNA-seq datasets and privacy regulations present challenges for efficient cell identification. Federated learning provides a solution, allowing efficient and private data use. Here, we introduce scFed, a unified federated learning framework that allows for benchmarking of four classification algorithms without violating data privacy, including single-cell-specific and general-purpose classifiers. We evaluated scFed using eight publicly available scRNA-seq datasets with diverse sizes, species and technologies, assessing its performance via intra-dataset and inter-dataset experimental setups. We find that scFed performs well on a variety of datasets with competitive accuracy to centralized models. Though Transformer-based model excels in centralized training, its performance slightly lags behind single-cell-specific model within the scFed framework, coupled with a notable time complexity concern. Our study not only helps select suitable cell identification methods but also highlights federated learning's potential for privacy-preserving, collaborative biomedical research.


Subject(s)
Biomedical Research , Single-Cell Gene Expression Analysis , Learning , Algorithms , Benchmarking , Sequence Analysis, RNA
2.
Brief Bioinform ; 24(5)2023 09 20.
Article in English | MEDLINE | ID: mdl-37497720

ABSTRACT

Vertical federated learning has gained popularity as a means of enabling collaboration and information sharing between different entities while maintaining data privacy and security. This approach has potential applications in disease healthcare, cancer prognosis prediction, and other industries where data privacy is a major concern. Although using multi-omics data for cancer prognosis prediction provides more information for treatment selection, collecting different types of omics data can be challenging due to their production in various medical institutions. Data owners must comply with strict data protection regulations such as European Union (EU) General Data Protection Regulation. To share patient data across multiple institutions, privacy and security issues must be addressed. Therefore, we propose an adaptive optimized vertical federated-learning-based framework adaptive optimized vertical federated learning for heterogeneous multi-omics data integration (AFEI) to integrate multi-omics data collected from multiple institutions for cancer prognosis prediction. AFEI enables participating parties to build an accurate joint evaluation model for learning more information related to cancer patients from different perspectives, based on the distributed and encrypted multi-omics features shared by multiple institutions. The experimental results demonstrate that AFEI achieves higher prediction accuracy (6.5% on average) than using single omics data by utilizing the encrypted multi-omics data from different institutions, and it performs almost as well as prognosis prediction by directly integrating multi-omics data. Overall, AFEI can be seen as an efficient solution for breaking down barriers to multi-institutional collaboration and promoting the development of cancer prognosis prediction.


Subject(s)
Learning , Multiomics , Humans , Information Dissemination , Privacy
3.
Brief Bioinform ; 23(1)2022 01 17.
Article in English | MEDLINE | ID: mdl-34874995

ABSTRACT

The growing expansion of data availability in medical fields could help improve the performance of machine learning methods. However, with healthcare data, using multi-institutional datasets is challenging due to privacy and security concerns. Therefore, privacy-preserving machine learning methods are required. Thus, we use a federated learning model to train a shared global model, which is a central server that does not contain private data, and all clients maintain the sensitive data in their own institutions. The scattered training data are connected to improve model performance, while preserving data privacy. However, in the federated training procedure, data errors or noise can reduce learning performance. Therefore, we introduce the self-paced learning, which can effectively select high-confidence samples and drop high noisy samples to improve the performances of the training model and reduce the risk of data privacy leakage. We propose the federated self-paced learning (FedSPL), which combines the advantage of federated learning and self-paced learning. The proposed FedSPL model was evaluated on gene expression data distributed across different institutions where the privacy concerns must be considered. The results demonstrate that the proposed FedSPL model is secure, i.e. it does not expose the original record to other parties, and the computational overhead during training is acceptable. Compared with learning methods based on the local data of all parties, the proposed model can significantly improve the predicted F1-score by approximately 4.3%. We believe that the proposed method has the potential to benefit clinicians in gene selections and disease prognosis.


Subject(s)
Machine Learning , Privacy , Humans , Research Design
4.
BMC Cancer ; 24(1): 688, 2024 Jun 05.
Article in English | MEDLINE | ID: mdl-38840081

ABSTRACT

BACKGROUND: Multicenter non-small cell lung cancer (NSCLC) patient data is information-rich. However, its direct integration becomes exceptionally challenging due to constraints involving different healthcare organizations and regulations. Traditional centralized machine learning methods require centralizing these sensitive medical data for training, posing risks of patient privacy leakage and data security issues. In this context, federated learning (FL) has attracted much attention as a distributed machine learning framework. It effectively addresses this contradiction by preserving data locally, conducting local model training, and aggregating model parameters. This approach enables the utilization of multicenter data with maximum benefit while ensuring privacy safeguards. Based on pre-radiotherapy planning target volume images of NSCLC patients, a multicenter treatment response prediction model is designed by FL for predicting the probability of remission of NSCLC patients. This approach ensures medical data privacy, high prediction accuracy and computing efficiency, offering valuable insights for clinical decision-making. METHODS: We retrospectively collected CT images from 245 NSCLC patients undergoing chemotherapy and radiotherapy (CRT) in four Chinese hospitals. In a simulation environment, we compared the performance of the centralized deep learning (DL) model with that of the FL model using data from two sites. Additionally, due to the unavailability of data from one hospital, we established a real-world FL model using data from three sites. Assessments were conducted using measures such as accuracy, receiver operating characteristic curve, and confusion matrices. RESULTS: The model's prediction performance obtained using FL methods outperforms that of traditional centralized learning methods. In the comparative experiment, the DL model achieves an AUC of 0.718/0.695, while the FL model demonstrates an AUC of 0.725/0.689, with real-world FL model achieving an AUC of 0.698/0.672. CONCLUSIONS: We demonstrate that the performance of a FL predictive model, developed by combining convolutional neural networks (CNNs) with data from multiple medical centers, is comparable to that of a traditional DL model obtained through centralized training. It can efficiently predict CRT treatment response in NSCLC patients while preserving privacy.


Subject(s)
Carcinoma, Non-Small-Cell Lung , Lung Neoplasms , Carcinoma, Non-Small-Cell Lung/therapy , Carcinoma, Non-Small-Cell Lung/radiotherapy , Carcinoma, Non-Small-Cell Lung/pathology , Humans , Lung Neoplasms/therapy , Lung Neoplasms/pathology , Lung Neoplasms/radiotherapy , Retrospective Studies , Female , Male , Middle Aged , Deep Learning , Aged , Machine Learning , Tomography, X-Ray Computed , Treatment Outcome , Chemoradiotherapy/methods
5.
Stat Med ; 43(11): 2263-2279, 2024 May 20.
Article in English | MEDLINE | ID: mdl-38551130

ABSTRACT

Data sharing barriers present paramount challenges arising from multicenter clinical studies where multiple data sources are stored and managed in a distributed fashion at different local study sites. Merging such data sources into a common data storage for a centralized statistical analysis requires a data use agreement, which is often time-consuming. Data merging may become more burdensome when propensity score modeling is involved in the analysis because combining many confounding variables, and systematic incorporation of this additional modeling in a meta-analysis has not been thoroughly investigated in the literature. Motivated from a multicenter clinical trial of basal insulin treatment for reducing the risk of post-transplantation diabetes mellitus, we propose a new inference framework that avoids the merging of subject-level raw data from multiple sites at a centralized facility but needs only the sharing of summary statistics. Unlike the architecture of federated learning, the proposed collaborative inference does not need a center site to combine local results and thus enjoys maximal protection of data privacy and minimal sensitivity to unbalanced data distributions across data sources. We show theoretically and numerically that the new distributed inference approach has little loss of statistical power compared to the centralized method that requires merging the entire data. We present large-sample properties and algorithms for the proposed method. We illustrate its performance by simulation experiments and the motivating example on the differential average treatment effect of basal insulin to lower risk of diabetes among kidney-transplant patients compared to the standard-of-care.


Subject(s)
Multicenter Studies as Topic , Humans , Information Dissemination , Diabetes Mellitus/therapy , Computer Simulation , Models, Statistical , Insulin/therapeutic use , Propensity Score , Treatment Outcome , Hypoglycemic Agents/therapeutic use
6.
Stat Med ; 43(12): 2421-2438, 2024 May 30.
Article in English | MEDLINE | ID: mdl-38589978

ABSTRACT

Identifying predictive factors for an outcome of interest via a multivariable analysis is often difficult when the data set is small. Combining data from different medical centers into a single (larger) database would alleviate this problem, but is in practice challenging due to regulatory and logistic problems. Federated learning (FL) is a machine learning approach that aims to construct from local inferences in separate data centers what would have been inferred had the data sets been merged. It seeks to harvest the statistical power of larger data sets without actually creating them. The FL strategy is not always efficient and precise. Therefore, in this paper we refine and implement an alternative Bayesian federated inference (BFI) framework for multicenter data with the same aim as FL. The BFI framework is designed to cope with small data sets by inferring locally not only the optimal parameter values, but also additional features of the posterior parameter distribution, capturing information beyond what is used in FL. BFI has the additional benefit that a single inference cycle across the centers is sufficient, whereas FL needs multiple cycles. We quantify the performance of the proposed methodology on simulated and real life data.


Subject(s)
Bayes Theorem , Models, Statistical , Multicenter Studies as Topic , Humans , Machine Learning , Computer Simulation , Data Interpretation, Statistical , Multivariate Analysis
7.
Methods ; 218: 94-100, 2023 10.
Article in English | MEDLINE | ID: mdl-37507060

ABSTRACT

In recent years, healthcare data from various sources such as clinical institutions, patients, and pharmaceutical industries have become increasingly abundant. However, due to the complex healthcare system and data privacy concerns, aggregating and utilizing these data in a centralized manner can be challenging. Federated learning (FL) has emerged as a promising solution for distributed training in edge computing scenarios, utilizing on-device user data while reducing server costs. In traditional FL, a central server trains a global model sampled client data randomly, and the server combines the collected model from different clients into one global model. However, for not independent and identically distributed (non-i.i.d.) datasets, randomly selecting users to train server is not an optimal choice and can lead to poor model training performance. To address this limitation, we propose the Federated Multi-Center Clustering algorithm (FedMCC) to enhance the robustness and accuracy for all clients. FedMCC leverages the Model-Agnostic Meta-Learning (MAML) algorithm, focusing on training a robust base model during the initial training phase and better capturing features from different users. Subsequently, clustering methods are used to ensure that features among users within each cluster are similar, approximating an i.i.d. training process in each round, resulting in more effective training of the global model. We validate the effectiveness and generalizability of FedMCC through extensive experiments on public healthcare datasets. The results demonstrate that FedMCC achieves improved performance and accuracy for all clients while maintaining data privacy and security, showcasing its potential for various healthcare applications.


Subject(s)
Algorithms , Privacy , Humans , Cluster Analysis
8.
Methods ; 219: 1-7, 2023 11.
Article in English | MEDLINE | ID: mdl-37689121

ABSTRACT

With the increasing availability of large-scale QSAR (Quantitative Structure-Activity Relationship) datasets, collaborative analysis has become a promising approach for drug discovery. Traditional centralized analysis which typically concentrates data on a central server for training faces challenges such as data privacy and security. Distributed analysis such as federated learning offers a solution by enabling collaborative model training without sharing raw data. However, it may fail when the training data in the local devices are non-independent and identically distributed (non-IID). In this paper, we propose a novel framework for collaborative drug discovery using federated learning on non-IID datasets. We address the difficulty of training on non-IID data by globally sharing a small subset of data among all institutions. Our framework allows multiple institutions to jointly train a robust predictive model while preserving the privacy of their individual data. We leverage the federated learning paradigm to distribute the model training process across local devices, eliminating the need for data exchange. The experimental results on 15 benchmark datasets demonstrate that the proposed method achieves competitive predictive accuracy to centralized analysis while respecting data privacy. Moreover, our framework offers benefits such as reduced data transmission and enhanced scalability, making it suitable for large-scale collaborative drug discovery efforts.


Subject(s)
Benchmarking , Drug Discovery
9.
J Biomed Inform ; 155: 104661, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38806105

ABSTRACT

BACKGROUND: Establishing collaborations between cohort studies has been fundamental for progress in health research. However, such collaborations are hampered by heterogeneous data representations across cohorts and legal constraints to data sharing. The first arises from a lack of consensus in standards of data collection and representation across cohort studies and is usually tackled by applying data harmonization processes. The second is increasingly important due to raised awareness for privacy protection and stricter regulations, such as the GDPR. Federated learning has emerged as a privacy-preserving alternative to transferring data between institutions through analyzing data in a decentralized manner. METHODS: In this study, we set up a federated learning infrastructure for a consortium of nine Dutch cohorts with appropriate data available to the etiology of dementia, including an extract, transform, and load (ETL) pipeline for data harmonization. Additionally, we assessed the challenges of transforming and standardizing cohort data using the Observational Medical Outcomes Partnership (OMOP) common data model (CDM) and evaluated our tool in one of the cohorts employing federated algorithms. RESULTS: We successfully applied our ETL tool and observed a complete coverage of the cohorts' data by the OMOP CDM. The OMOP CDM facilitated the data representation and standardization, but we identified limitations for cohort-specific data fields and in the scope of the vocabularies available. Specific challenges arise in a multi-cohort federated collaboration due to technical constraints in local environments, data heterogeneity, and lack of direct access to the data. CONCLUSION: In this article, we describe the solutions to these challenges and limitations encountered in our study. Our study shows the potential of federated learning as a privacy-preserving solution for multi-cohort studies that enhance reproducibility and reuse of both data and analyses.


Subject(s)
Dementia , Humans , Netherlands , Cohort Studies , Algorithms , Information Dissemination/methods , Biomedical Research
10.
J Biomed Inform ; 149: 104532, 2024 Jan.
Article in English | MEDLINE | ID: mdl-38070817

ABSTRACT

INTRODUCTION: Risk prediction, including early disease detection, prevention, and intervention, is essential to precision medicine. However, systematic bias in risk estimation caused by heterogeneity across different demographic groups can lead to inappropriate or misinformed treatment decisions. In addition, low incidence (class-imbalance) outcomes negatively impact the classification performance of many standard learning algorithms which further exacerbates the racial disparity issues. Therefore, it is crucial to improve the performance of statistical and machine learning models in underrepresented populations in the presence of heavy class imbalance. METHOD: To address demographic disparity in the presence of class imbalance, we develop a novel framework, Trans-Balance, by leveraging recent advances in imbalance learning, transfer learning, and federated learning. We consider a practical setting where data from multiple sites are stored locally under privacy constraints. RESULTS: We show that the proposed Trans-Balance framework improves upon existing approaches by explicitly accounting for heterogeneity across demographic subgroups and cohorts. We demonstrate the feasibility and validity of our methods through numerical experiments and a real application to a multi-cohort study with data from participants of four large, NIH-funded cohorts for stroke risk prediction. CONCLUSION: Our findings indicate that the Trans-Balance approach significantly improves predictive performance, especially in scenarios marked by severe class imbalance and demographic disparity. Given its versatility and effectiveness, Trans-Balance offers a valuable contribution to enhancing risk prediction in biomedical research and related fields.


Subject(s)
Algorithms , Biomedical Research , Humans , Cohort Studies , Machine Learning , Demography
11.
J Biomed Inform ; 150: 104595, 2024 02.
Article in English | MEDLINE | ID: mdl-38244958

ABSTRACT

OBJECTIVE: To characterize the interplay between multiple medical conditions across sites and account for the heterogeneity in patient population characteristics across sites within a distributed research network, we develop a one-shot algorithm that can efficiently utilize summary-level data from various institutions. By applying our proposed algorithm to a large pediatric cohort across four national Children's hospitals, we replicated a recently published prospective cohort, the RISK study, and quantified the impact of the risk factors associated with the penetrating or stricturing behaviors of pediatric Crohn's disease (PCD). METHODS: In this study, we introduce the ODACoRH algorithm, a one-shot distributed algorithm designed for the competing risks model with heterogeneity. Our approach considers the variability in baseline hazard functions of multiple endpoints of interest across different sites. To accomplish this, we build a surrogate likelihood function by combining patient-level data from the local site with aggregated data from other external sites. We validated our method through extensive simulation studies and replication of the RISK study to investigate the impact of risk factors on the PCD for adolescents and children from four children's hospitals within the PEDSnet, A National Pediatric Learning Health System. To evaluate our ODACoRH algorithm, we compared results from the ODACoRH algorithms with those from meta-analysis as well as those derived from the pooled data. RESULTS: The ODACoRH algorithm had the smallest relative bias to the gold standard method (-0.2%), outperforming the meta-analysis method (-11.4%). In the PCD association study, the estimated subdistribution hazard ratios obtained through the ODACoRH algorithms are identical on par with the results derived from pooled data, which demonstrates the high reliability of our federated learning algorithms. From a clinical standpoint, the identified risk factors for PCD align well with the RISK study published in the Lancet in 2017 and other published studies, supporting the validity of our findings. CONCLUSION: With the ODACoRH algorithm, we demonstrate the capability of effectively integrating data from multiple sites in a decentralized data setting while accounting for between-site heterogeneity. Importantly, our study reveals several crucial clinical risk factors for PCD that merit further investigations.


Subject(s)
Algorithms , Humans , Child , Adolescent , Reproducibility of Results , Computer Simulation , Proportional Hazards Models , Likelihood Functions
12.
J Biomed Inform ; 152: 104623, 2024 04.
Article in English | MEDLINE | ID: mdl-38458578

ABSTRACT

INTRODUCTION: Patients' functional status assesses their independence in performing activities of daily living, including basic ADLs (bADL), and more complex instrumental activities (iADL). Existing studies have discovered that patients' functional status is a strong predictor of health outcomes, particularly in older adults. Depite their usefulness, much of the functional status information is stored in electronic health records (EHRs) in either semi-structured or free text formats. This indicates the pressing need to leverage computational approaches such as natural language processing (NLP) to accelerate the curation of functional status information. In this study, we introduced FedFSA, a hybrid and federated NLP framework designed to extract functional status information from EHRs across multiple healthcare institutions. METHODS: FedFSA consists of four major components: 1) individual sites (clients) with their private local data, 2) a rule-based information extraction (IE) framework for ADL extraction, 3) a BERT model for functional status impairment classification, and 4) a concept normalizer. The framework was implemented using the OHNLP Backbone for rule-based IE and open-source Flower and PyTorch library for federated BERT components. For gold standard data generation, we carried out corpus annotation to identify functional status-related expressions based on ICF definitions. Four healthcare institutions were included in the study. To assess FedFSA, we evaluated the performance of category- and institution-specific ADL extraction across different experimental designs. RESULTS: ADL extraction performance ranges from an F1-score of 0.907 to 0.986 for bADL and 0.825 to 0.951 for iADL across the four healthcare sites. The performance for ADL extraction with impairment ranges from an F1-score of 0.722 to 0.954 for bADL and 0.674 to 0.813 for iADL across four healthcare sites. For category-specific ADL extraction, laundry and transferring yielded relatively high performance, while dressing, medication, bathing, and continence achieved moderate-high performance. Conversely, food preparation and toileting showed low performance. CONCLUSION: NLP performance varied across ADL categories and healthcare sites. Federated learning using a FedFSA framework performed higher than non-federated learning for impaired ADL extraction at all healthcare sites. Our study demonstrated the potential of the federated learning framework in functional status extraction and impairment classification in EHRs, exemplifying the importance of a large-scale, multi-institutional collaborative development effort.


Subject(s)
Activities of Daily Living , Functional Status , Humans , Aged , Learning , Information Storage and Retrieval , Natural Language Processing
13.
BMC Med Imaging ; 24(1): 105, 2024 May 10.
Article in English | MEDLINE | ID: mdl-38730390

ABSTRACT

Categorizing Artificial Intelligence of Medical Things (AIoMT) devices within the realm of standard Internet of Things (IoT) and Internet of Medical Things (IoMT) devices, particularly at the server and computational layers, poses a formidable challenge. In this paper, we present a novel methodology for categorizing AIoMT devices through the application of decentralized processing, referred to as "Federated Learning" (FL). Our approach involves deploying a system on standard IoT devices and labeled IoMT devices for training purposes and attribute extraction. Through this process, we extract and map the interconnected attributes from a global federated cum aggression server. The aim of this terminology is to extract interdependent devices via federated learning, ensuring data privacy and adherence to operational policies. Consequently, a global training dataset repository is coordinated to establish a centralized indexing and synchronization knowledge repository. The categorization process employs generic labels for devices transmitting medical data through regular communication channels. We evaluate our proposed methodology across a variety of IoT, IoMT, and AIoMT devices, demonstrating effective classification and labeling. Our technique yields a reliable categorization index for facilitating efficient access and optimization of medical devices within global servers.


Subject(s)
Artificial Intelligence , Blockchain , Internet of Things , Humans
14.
BMC Med Imaging ; 24(1): 110, 2024 May 15.
Article in English | MEDLINE | ID: mdl-38750436

ABSTRACT

Brain tumor classification using MRI images is a crucial yet challenging task in medical imaging. Accurate diagnosis is vital for effective treatment planning but is often hindered by the complex nature of tumor morphology and variations in imaging. Traditional methodologies primarily rely on manual interpretation of MRI images, supplemented by conventional machine learning techniques. These approaches often lack the robustness and scalability needed for precise and automated tumor classification. The major limitations include a high degree of manual intervention, potential for human error, limited ability to handle large datasets, and lack of generalizability to diverse tumor types and imaging conditions.To address these challenges, we propose a federated learning-based deep learning model that leverages the power of Convolutional Neural Networks (CNN) for automated and accurate brain tumor classification. This innovative approach not only emphasizes the use of a modified VGG16 architecture optimized for brain MRI images but also highlights the significance of federated learning and transfer learning in the medical imaging domain. Federated learning enables decentralized model training across multiple clients without compromising data privacy, addressing the critical need for confidentiality in medical data handling. This model architecture benefits from the transfer learning technique by utilizing a pre-trained CNN, which significantly enhances its ability to classify brain tumors accurately by leveraging knowledge gained from vast and diverse datasets.Our model is trained on a diverse dataset combining figshare, SARTAJ, and Br35H datasets, employing a federated learning approach for decentralized, privacy-preserving model training. The adoption of transfer learning further bolsters the model's performance, making it adept at handling the intricate variations in MRI images associated with different types of brain tumors. The model demonstrates high precision (0.99 for glioma, 0.95 for meningioma, 1.00 for no tumor, and 0.98 for pituitary), recall, and F1-scores in classification, outperforming existing methods. The overall accuracy stands at 98%, showcasing the model's efficacy in classifying various tumor types accurately, thus highlighting the transformative potential of federated learning and transfer learning in enhancing brain tumor classification using MRI images.


Subject(s)
Brain Neoplasms , Deep Learning , Magnetic Resonance Imaging , Humans , Brain Neoplasms/diagnostic imaging , Brain Neoplasms/classification , Magnetic Resonance Imaging/methods , Neural Networks, Computer , Machine Learning , Image Interpretation, Computer-Assisted/methods
15.
BMC Med Imaging ; 24(1): 133, 2024 Jun 05.
Article in English | MEDLINE | ID: mdl-38840240

ABSTRACT

BACKGROUND: Breast cancer is the most common cancer among women, and ultrasound is a usual tool for early screening. Nowadays, deep learning technique is applied as an auxiliary tool to provide the predictive results for doctors to decide whether to make further examinations or treatments. This study aimed to develop a hybrid learning approach for breast ultrasound classification by extracting more potential features from local and multi-center ultrasound data. METHODS: We proposed a hybrid learning approach to classify the breast tumors into benign and malignant. Three multi-center datasets (BUSI, BUS, OASBUD) were used to pretrain a model by federated learning, then every dataset was fine-tuned at local. The proposed model consisted of a convolutional neural network (CNN) and a graph neural network (GNN), aiming to extract features from images at a spatial level and from graphs at a geometric level. The input images are small-sized and free from pixel-level labels, and the input graphs are generated automatically in an unsupervised manner, which saves the costs of labor and memory space. RESULTS: The classification AUCROC of our proposed method is 0.911, 0.871 and 0.767 for BUSI, BUS and OASBUD. The balanced accuracy is 87.6%, 85.2% and 61.4% respectively. The results show that our method outperforms conventional methods. CONCLUSIONS: Our hybrid approach can learn the inter-feature among multi-center data and the intra-feature of local data. It shows potential in aiding doctors for breast tumor classification in ultrasound at an early stage.


Subject(s)
Breast Neoplasms , Deep Learning , Neural Networks, Computer , Ultrasonography, Mammary , Humans , Breast Neoplasms/diagnostic imaging , Female , Ultrasonography, Mammary/methods , Image Interpretation, Computer-Assisted/methods , Adult
16.
Eur Spine J ; 2024 Feb 25.
Article in English | MEDLINE | ID: mdl-38403832

ABSTRACT

PURPOSE: Integrating machine learning models into electronic medical record systems can greatly enhance decision-making, patient outcomes, and value-based care in healthcare systems. Challenges related to data accessibility, privacy, and sharing can impede the development and deployment of effective predictive models in spine surgery. Federated learning (FL) offers a decentralized approach to machine learning that allows local model training while preserving data privacy, making it well-suited for healthcare settings. Our objective was to describe federated learning solutions for enhanced predictive modeling in spine surgery. METHODS: The authors reviewed the literature. RESULTS: FL has promising applications in spine surgery, including telesurgery, AI-based prediction models, and medical image segmentation. Implementing FL requires careful consideration of infrastructure, data quality, and standardization, but it holds the potential to revolutionize orthopedic surgery while ensuring patient privacy and data control. CONCLUSIONS: Federated learning shows great promise in revolutionizing predictive modeling in spine surgery by addressing the challenges of data privacy, accessibility, and sharing. The applications of FL in telesurgery, AI-based predictive models, and medical image segmentation have demonstrated their potential to enhance patient outcomes and value-based care.

17.
Proc Natl Acad Sci U S A ; 118(17)2021 04 27.
Article in English | MEDLINE | ID: mdl-33888586

ABSTRACT

Federated learning (FL) enables edge devices, such as Internet of Things devices (e.g., sensors), servers, and institutions (e.g., hospitals), to collaboratively train a machine learning (ML) model without sharing their private data. FL requires devices to exchange their ML parameters iteratively, and thus the time it requires to jointly learn a reliable model depends not only on the number of training steps but also on the ML parameter transmission time per step. In practice, FL parameter transmissions are often carried out by a multitude of participating devices over resource-limited communication networks, for example, wireless networks with limited bandwidth and power. Therefore, the repeated FL parameter transmission from edge devices induces a notable delay, which can be larger than the ML model training time by orders of magnitude. Hence, communication delay constitutes a major bottleneck in FL. Here, a communication-efficient FL framework is proposed to jointly improve the FL convergence time and the training loss. In this framework, a probabilistic device selection scheme is designed such that the devices that can significantly improve the convergence speed and training loss have higher probabilities of being selected for ML model transmission. To further reduce the FL convergence time, a quantization method is proposed to reduce the volume of the model parameters exchanged among devices, and an efficient wireless resource allocation scheme is developed. Simulation results show that the proposed FL framework can improve the identification accuracy and convergence time by up to 3.6% and 87% compared to standard FL.

18.
BMC Med Inform Decis Mak ; 24(1): 141, 2024 May 27.
Article in English | MEDLINE | ID: mdl-38802861

ABSTRACT

BACKGROUND: Acute pulmonary thromboembolism (PTE) is a common cardiovascular disease and recognizing low prognosis risk patients with PTE accurately is significant for clinical treatment. This study evaluated the value of federated learning (FL) technology in PTE prognosis risk assessment while ensuring the security of clinical data. METHODS: A retrospective dataset consisted of PTE patients from 12 hospitals were collected, and 19 physical indicators of patients were included to train the FL-based prognosis assessment model to predict the 30-day death event. Firstly, multiple machine learning methods based on FL were compared to choose the superior model. And then performance of models trained on the independent (IID) and non-independent identical distributed(Non-IID) datasets was calculated and they were tested further on Real-world data. Besides, the optimal model was compared with pulmonary embolism severity index (PESI), simplified PESI (sPESI), Peking Union Medical College Hospital (PUMCH). RESULTS: The area under the receiver operating characteristic curve (AUC) of logistic regression(0.842) outperformed convolutional neural network (0.819) and multi layer perceptron (0.784). Under IID, AUC of model trained using FL(Fed) on the training, validation and test sets was 0.852 ± 0.002, 0.867 ± 0.012 and 0.829 ± 0.004. Under Real-world, AUC of Fed was 0.855 ± 0.005, 0.882 ± 0.003 and 0.835 ± 0.005. Under IID and Real-world, AUC of Fed surpassed centralization model(NonFed) (0.847 ± 0.001, 0.841 ± 0.001 and 0.811 ± 0.001). Under Non-IID, although AUC of Fed (0.846 ± 0.047) outperformed NonFed (0.841 ± 0.001) on validation set, it (0.821 ± 0.016 and 0.799 ± 0.031) slightly lagged behind NonFed (0.847 ± 0.001 and 0.811 ± 0.001) on the training and test sets. In practice, AUC of Fed (0.853, 0.884 and 0.842) outshone PESI (0.812, 0.789 and 0.791), sPESI (0.817, 0.770 and 0.786) and PUMCH(0.848, 0.814 and 0.832) on the training, validation and test sets. Additionally, Fed (0.842) exhibited higher AUC values across test sets compared to those trained directly on the clients (0.758, 0.801, 0.783, 0.741, 0.788). CONCLUSIONS: In this study, the FL based machine learning model demonstrated commendable efficacy on PTE prognostic risk prediction, rendering it well-suited for deployment in hospitals.


Subject(s)
Machine Learning , Pulmonary Embolism , Humans , Prognosis , Male , Female , Middle Aged , Retrospective Studies , Risk Assessment , Aged , Acute Disease
19.
J Assist Reprod Genet ; 41(7): 1811-1820, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38834757

ABSTRACT

PURPOSE: To study the effectiveness of federated learning in in vitro fertilization on embryo evaluation tasks. METHODS: This is a retrospective cohort analysis. Two datasets were used in this study. The ploidy status dataset consisted of 10,065 embryo records, 3760 treatments, and 2479 infertile couples from 5 hospitals. The clinical pregnancy dataset consisted of 4495 embryo records, 4495 treatments, and 3704 infertile couples from 4 hospitals. Federated learning and the gradient boosting decision tree algorithm were utilized for modeling. RESULTS: On the ploidy status dataset, the areas under the receiver operating characteristic curves of our model trained with federated learning were 71.78%, 73.10%, 69.39%, 69.72%, and 73.46% for 5 hospitals respectively, showing an average increase of 2.5% compared to those of our model trained without federated learning. On the clinical pregnancy dataset, the areas under the receiver operating characteristic curves of our model trained with federated learning were 72.03%, 56.77%, 61.63%, and 58.58% for 4 hospitals respectively, showing an average increase of 3.08%. CONCLUSIONS: Federated learning can improve data privacy and data security and meanwhile improve the performance of embryo selection tasks by leveraging data from multiple sources. This study demonstrates the effectiveness of federated learning in embryo evaluation, and the results show the promise for future application.


Subject(s)
Fertilization in Vitro , Humans , Fertilization in Vitro/methods , Female , Pregnancy , Male , Retrospective Studies , Embryo Transfer/methods , Adult , ROC Curve , Algorithms
20.
Pattern Recognit ; 1512024 Jul.
Article in English | MEDLINE | ID: mdl-38559674

ABSTRACT

Machine learning in medical imaging often faces a fundamental dilemma, namely, the small sample size problem. Many recent studies suggest using multi-domain data pooled from different acquisition sites/centers to improve statistical power. However, medical images from different sites cannot be easily shared to build large datasets for model training due to privacy protection reasons. As a promising solution, federated learning, which enables collaborative training of machine learning models based on data from different sites without cross-site data sharing, has attracted considerable attention recently. In this paper, we conduct a comprehensive survey of the recent development of federated learning methods in medical image analysis. We have systematically gathered research papers on federated learning and its applications in medical image analysis published between 2017 and 2023. Our search and compilation were conducted using databases from IEEE Xplore, ACM Digital Library, Science Direct, Springer Link, Web of Science, Google Scholar, and PubMed. In this survey, we first introduce the background of federated learning for dealing with privacy protection and collaborative learning issues. We then present a comprehensive review of recent advances in federated learning methods for medical image analysis. Specifically, existing methods are categorized based on three critical aspects of a federated learning system, including client end, server end, and communication techniques. In each category, we summarize the existing federated learning methods according to specific research problems in medical image analysis and also provide insights into the motivations of different approaches. In addition, we provide a review of existing benchmark medical imaging datasets and software platforms for current federated learning research. We also conduct an experimental study to empirically evaluate typical federated learning methods for medical image analysis. This survey can help to better understand the current research status, challenges, and potential research opportunities in this promising research field.

SELECTION OF CITATIONS
SEARCH DETAIL