Search | Brasil - Virtual Health Library

An international study presenting a federated learning AI platform for pediatric brain tumors.

Lee, Edward H; Han, Michelle; Wright, Jason; Kuwabara, Michael; Mevorach, Jacob; Fu, Gang; Choudhury, Olivia; Ratan, Ujjwal; Zhang, Michael; Wagner, Matthias W; Goetti, Robert; Toescu, Sebastian; Perreault, Sebastien; Dogan, Hakan; Altinmakas, Emre; Mohammadzadeh, Maryam; Szymanski, Kathryn A; Campen, Cynthia J; Lai, Hollie; Eghbal, Azam; Radmanesh, Alireza; Mankad, Kshitij; Aquilina, Kristian; Said, Mourad; Vossough, Arastoo; Oztekin, Ozgur; Ertl-Wagner, Birgit; Poussaint, Tina; Thompson, Eric M; Ho, Chang Y; Jaju, Alok; Curran, John; Ramaswamy, Vijay; Cheshier, Samuel H; Grant, Gerald A; Wong, S Simon; Moseley, Michael E; Lober, Robert M; Wilms, Mattias; Forkert, Nils D; Vitanza, Nicholas A; Miller, Jeffrey H; Prolo, Laura M; Yeom, Kristen W.

Nat Commun ; 15(1): 7615, 2024 Sep 02.

Article in English | MEDLINE | ID: mdl-39223133

ABSTRACT

While multiple factors impact disease, artificial intelligence (AI) studies in medicine often use small, non-diverse patient cohorts due to data sharing and privacy issues. Federated learning (FL) has emerged as a solution, enabling training across hospitals without direct data sharing. Here, we present FL-PedBrain, an FL platform for pediatric posterior fossa brain tumors, and evaluate its performance on a diverse, realistic, multi-center cohort. Pediatric brain tumors were targeted due to the scarcity of such datasets, even in tertiary care hospitals. Our platform orchestrates federated training for joint tumor classification and segmentation across 19 international sites. FL-PedBrain exhibits less than a 1.5% decrease in classification and a 3% reduction in segmentation performance compared to centralized data training. FL boosts segmentation performance by 20 to 30% on three external, out-of-network sites. Finally, we explore the sources of data heterogeneity and examine FL robustness in real-world scenarios with data imbalances.

Subject(s)

Artificial Intelligence , Brain Neoplasms , Humans , Child , Brain Neoplasms/diagnostic imaging , Brain Neoplasms/pathology , Adolescent , Female , Male , Child, Preschool , Information Dissemination/methods

Highly Accurate and Efficient Data-Driven Methods for Genotype Imputation.

Choudhury, Olivia; Chakrabarty, Ankush; Emrich, Scott J.

IEEE/ACM Trans Comput Biol Bioinform ; 16(4): 1107-1116, 2019.

Article in English | MEDLINE | ID: mdl-28574365

ABSTRACT

High-throughput sequencing techniques have generated massive quantities of genotype data. Haplotype phasing has proven to be a useful and effective method for analyzing these data. However, the quality of phasing is undermined due to missing information. Imputation provides an effective means of improving the underlying genotype information. For model organisms, imputation can rely on an available reference genotype panel and a physical or genetic map. For non-model organisms, which often do not have a genotype panel, it is important to design an imputation technique that does not rely on reference data. Here, we present Accurate Data-Driven Imputation Technique (ADDIT), which is composed of two data-driven algorithms capable of handling data generated from model and non-model organisms. The non-model variant of ADDIT (referred to as ADDIT-NM) employs statistical inference methods to impute missing genotypes, whereas the model variant (referred to as ADDIT-M) leverages a supervised learning-based approach for imputation. We demonstrate that both variants of ADDIT are more accurate, faster, and require less memory than leading state-of-the-art imputation tools using model (human) and non-model (maize, apple, and grape) genotype data. Software Availability: The source code of ADDIT and test data sets are available at https://github.com/NDBL/ADDIT.

Subject(s)

Computational Biology/methods , Genetic Techniques , Genotype , Algorithms , Genomics/methods , Genotyping Techniques , Haplotypes , High-Throughput Nucleotide Sequencing , Humans , Malus/genetics , Models, Statistical , Polymorphism, Single Nucleotide , Reproducibility of Results , Software , Vitis/genetics , Zea mays/genetics

Predicting Adverse Drug Reactions on Distributed Health Data using Federated Learning.

Choudhury, Olivia; Park, Yoonyoung; Salonidis, Theodoros; Gkoulalas-Divanis, Aris; Sylla, Issa; Das, Amar K.

AMIA Annu Symp Proc ; 2019: 313-322, 2019.

Article in English | MEDLINE | ID: mdl-32308824

ABSTRACT

Using electronic health data to predict adverse drug reaction (ADR) incurs practical challenges, such as lack of adequate data from any single site for rare ADR detection, resource constraints on integrating data from multiple sources, and privacy concerns with creating a centralized database from person-specific, sensitive data. We introduce a federated learning framework that can learn a global ADR prediction model from distributed health data held locally at different sites. We propose two novel methods of local model aggregation to improve the predictive capability of the global model. Through comprehensive experimental evaluation using real-world health data from 1 million patients, we demonstrate the effectiveness of our proposed approach in achieving comparable performance to centralized learning and outperforming localized learning models for two types of ADRs. We also demonstrate that, for varying data distributions, our aggregation methods outperform state-of-the-art techniques, in terms of precision, recall, and accuracy.

Subject(s)

Adverse Drug Reaction Reporting Systems , Drug-Related Side Effects and Adverse Reactions , Electronic Health Records , Machine Learning , Databases, Factual , Humans , Logistic Models , Support Vector Machine

HECIL: A Hybrid Error Correction Algorithm for Long Reads with Iterative Learning.

Choudhury, Olivia; Chakrabarty, Ankush; Emrich, Scott J.

Sci Rep ; 8(1): 9936, 2018 07 02.

Article in English | MEDLINE | ID: mdl-29967328

ABSTRACT

Second-generation DNA sequencing techniques generate short reads that can result in fragmented genome assemblies. Third-generation sequencing platforms mitigate this limitation by producing longer reads that span across complex and repetitive regions. However, the usefulness of such long reads is limited because of high sequencing error rates. To exploit the full potential of these longer reads, it is imperative to correct the underlying errors. We propose HECIL-Hybrid Error Correction with Iterative Learning-a hybrid error correction framework that determines a correction policy for erroneous long reads, based on optimal combinations of decision weights obtained from short read alignments. We demonstrate that HECIL outperforms state-of-the-art error correction algorithms for an overwhelming majority of evaluation metrics on diverse, real-world data sets including E. coli, S. cerevisiae, and the malaria vector mosquito A. funestus. Additionally, we provide an optional avenue of improving the performance of HECIL's core algorithm by introducing an iterative learning paradigm that enhances the correction policy at each iteration by incorporating knowledge gathered from previous iterations via data-driven confidence metrics assigned to prior corrections.

Subject(s)

High-Throughput Nucleotide Sequencing/methods , Machine Learning , Sequence Analysis, DNA/methods , Escherichia coli/genetics , Mosquito Vectors/genetics , Repetitive Sequences, Nucleic Acid , Saccharomyces cerevisiae/genetics

High-quality genetic mapping with ddRADseq in the non-model tree Quercus rubra.

Konar, Arpita; Choudhury, Olivia; Bullis, Rebecca; Fiedler, Lauren; Kruser, Jacqueline M; Stephens, Melissa T; Gailing, Oliver; Schlarbaum, Scott; Coggeshall, Mark V; Staton, Margaret E; Carlson, John E; Emrich, Scott; Romero-Severson, Jeanne.

BMC Genomics ; 18(1): 417, 2017 05 30.

Article in English | MEDLINE | ID: mdl-28558688

ABSTRACT

BACKGROUND: Restriction site associated DNA sequencing (RADseq) has the potential to be a broadly applicable, low-cost approach for high-quality genetic linkage mapping in forest trees lacking a reference genome. The statistical inference of linear order must be as accurate as possible for the correct ordering of sequence scaffolds and contigs to chromosomal locations. Accurate maps also facilitate the discovery of chromosome segments containing allelic variants conferring resistance to the biotic and abiotic stresses that threaten forest trees worldwide. We used ddRADseq for genetic mapping in the tree Quercus rubra, with an approach optimized to produce a high-quality map. Our study design also enabled us to model the results we would have obtained with less depth of coverage. RESULTS: Our sequencing design produced a high sequencing depth in the parents (248×) and a moderate sequencing depth (15×) in the progeny. The digital normalization method of generating a de novo reference and the SAMtools SNP variant caller yielded the most SNP calls (78,725). The major drivers of map inflation were multiple SNPs located within the same sequence (77% of SNPs called). The highest quality map was generated with a low level of missing data (5%) and a genome-wide threshold of 0.025 for deviation from Mendelian expectation. The final map included 849 SNP markers (1.8% of the 78,725 SNPs called). Downsampling the individual FASTQ files to model lower depth of coverage revealed that sequencing the progeny using 96 samples per lane would have yielded too few SNP markers to generate a map, even if we had sequenced the parents at depth 248×. CONCLUSIONS: The ddRADseq technology produced enough high-quality SNP markers to make a moderately dense, high-quality map. The success of this project was due to high depth of coverage of the parents, moderate depth of coverage of the progeny, a good framework map, an optimized bioinformatics pipeline, and rigorous premapping filters. The ddRADseq approach is useful for the construction of high-quality genetic maps in organisms lacking a reference genome if the parents and progeny are sequenced at sufficient depth. Technical improvements in reduced representation sequencing (RRS) approaches are needed to reduce the amount of missing data.

Subject(s)

Chromosome Mapping/methods , DNA Restriction Enzymes/metabolism , Quercus/genetics , Sequence Analysis, DNA , Genotyping Techniques , Polymorphism, Single Nucleotide

An integrated pathway system modeling of Saccharomyces cerevisiae HOG pathway: a Petri net based approach.

Tomar, Namrata; Choudhury, Olivia; Chakrabarty, Ankush; De, Rajat K.

Mol Biol Rep ; 40(2): 1103-25, 2013 Feb.

Article in English | MEDLINE | ID: mdl-23086300

ABSTRACT

Biochemical networks comprise many diverse components and interactions between them. It has intracellular signaling, metabolic and gene regulatory pathways which are highly integrated and whose responses are elicited by extracellular actions. Previous modeling techniques mostly consider each pathway independently without focusing on the interrelation of these which actually functions as a single system. In this paper, we propose an approach of modeling an integrated pathway using an event-driven modeling tool, i.e., Petri nets (PNs). PNs have the ability to simulate the dynamics of the system with high levels of accuracy. The integrated set of signaling, regulatory and metabolic reactions involved in Saccharomyces cerevisiae's HOG pathway has been collected from the literature. The kinetic parameter values have been used for transition firings. The dynamics of the system has been simulated and the concentrations of major biological species over time have been observed. The phenotypic characteristics of the integrated system have been investigated under two conditions, viz., under the absence and presence of osmotic pressure. The results have been validated favorably with the existing experimental results. We have also compared our study with the study of idFBA (Lee et al., PLoS Comput Biol 4:e1000-e1086, 2008) and pointed out the differences between both studies. We have simulated and monitored concentrations of multiple biological entities over time and also incorporated feedback inhibition by Ptp2 which has not been included in the idFBA study. We have concluded that our study is the first to the best of our knowledge to model signaling, metabolic and regulatory events in an integrated form through PN model framework. This study is useful in computational simulation of system dynamics for integrated pathways as there are growing evidences that the malfunctioning of the interplay among these pathways is associated with disease.

Subject(s)

Computer Simulation , Mitogen-Activated Protein Kinases/physiology , Models, Biological , Saccharomyces cerevisiae Proteins/physiology , Saccharomyces cerevisiae/physiology , Feedback, Physiological , Gene Expression Regulation, Fungal , Gene Regulatory Networks , Metabolic Networks and Pathways , Osmotic Pressure , Phosphoric Monoester Hydrolases/physiology , Signal Transduction , Stress, Physiological , Water-Electrolyte Balance

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL