Search | VHL Regional Portal

1.

Using DeepSignalingFlow to mine signaling flows interpreting mechanism of synergy of cocktails.

Zhang, Heming; Chen, Yixin; Payne, Philip; Li, Fuhai.

NPJ Syst Biol Appl ; 10(1): 92, 2024 Aug 21.

Article in English | MEDLINE | ID: mdl-39169016

ABSTRACT

Complex signaling pathways are believed to be responsible for drug resistance. Drug combinations perturbing multiple signaling targets have the potential to reduce drug resistance. The large-scale multi-omic datasets and experimental drug combination synergistic score data are valuable resources to study mechanisms of synergy (MoS) to guide the development of precision drug combinations. However, signaling patterns of MoS are complex and remain unclear, and thus it is challenging to identify synergistic drug combinations in clinical. Herein, we proposed a novel integrative and interpretable graph AI model, DeepSignalingFlow, to uncover the MoS by integrating and mining multi-omic data. The major innovation is that we uncover MoS by modeling the signaling flow from multi-omic features of essential disease proteins to the drug targets, which has not been introduced by the existing models. The model performance was assessed utilizing four distinct drug combination synergy evaluation datasets, i.e., NCI ALMANAC, O'Neil, DrugComb, and DrugCombDB. The comparison results showed that the proposed model outperformed existing graph AI models in terms of synergy score prediction, and can interpret MoS using the core signaling flows. The code is publicly accessible via Github: https://github.com/FuhaiLiAiLab/DeepSignalingFlow.

Subject(s)

Drug Synergism , Signal Transduction , Signal Transduction/drug effects , Signal Transduction/physiology , Humans , Computational Biology/methods

2.

mosGraphGPT: a foundation model for multi-omic signaling graphs using generative AI.

Zhang, Heming; Huang, Di; Chen, Emily; Cao, Dekang; Xu, Tim; Dizdar, Ben; Li, Guangfu; Chen, Yixin; Payne, Philip; Province, Michael; Li, Fuhai.

bioRxiv ; 2024 Aug 06.

Article in English | MEDLINE | ID: mdl-39149314

ABSTRACT

Generative pretrained models represent a significant advancement in natural language processing and computer vision, which can generate coherent and contextually relevant content based on the pre-training on large general datasets and fine-tune for specific tasks. Building foundation models using large scale omic data is promising to decode and understand the complex signaling language patterns within cells. Different from existing foundation models of omic data, we build a foundation model, mosGraphGPT, for multi-omic signaling (mos) graphs, in which the multi-omic data was integrated and interpreted using a multi-level signaling graph. The model was pretrained using multi-omic data of cancers in The Cancer Genome Atlas (TCGA), and fine-turned for multi-omic data of Alzheimer's Disease (AD). The experimental evaluation results showed that the model can not only improve the disease classification accuracy, but also is interpretable by uncovering disease targets and signaling interactions. And the model code are uploaded via GitHub with link: https://github.com/mosGraph/mosGraphGPT.

3.

Meeting the Artificial Intelligence Needs of U.S. Health Systems.

Lyons, Patrick G; Dorr, David A; Melton, Genevieve B; Singh, Karandeep; Payne, Philip R O.

Ann Intern Med ; 2024 Aug 27.

Article in English | MEDLINE | ID: mdl-39186786

4.

Analyzing heterogeneity in Alzheimer Disease using multimodal normative modeling on imaging-based ATN biomarkers.

Kumar, Sayantan; Earnest, Tom; Yang, Braden; Kothapalli, Deydeep; Aschenbrenner, Andrew J; Hassenstab, Jason; Xiong, Chengie; Ances, Beau; Morris, John; Benzinger, Tammie L S; Gordon, Brian A; Payne, Philip; Sotiras, Aristeidis.

ArXiv ; 2024 Jul 01.

Article in English | MEDLINE | ID: mdl-39010871

ABSTRACT

INTRODUCTION: Previous studies have applied normative modeling on a single neuroimaging modality to investigate Alzheimer Disease (AD) heterogeneity. We employed a deep learning-based multimodal normative framework to analyze individual-level variation across ATN (amyloid-tau-neurodegeneration) imaging biomarkers. METHODS: We selected cross-sectional discovery (n = 665) and replication cohorts (n = 430) with available T1-weighted MRI, amyloid and tau PET. Normative modeling estimated individual-level abnormal deviations in amyloid-positive individuals compared to amyloid-negative controls. Regional abnormality patterns were mapped at different clinical group levels to assess intra-group heterogeneity. An individual-level disease severity index (DSI) was calculated using both the spatial extent and magnitude of abnormal deviations across ATN. RESULTS: Greater intra-group heterogeneity in ATN abnormality patterns was observed in more severe clinical stages of AD. Higher DSI was associated with worse cognitive function and increased risk of disease progression. DISCUSSION: Subject-specific abnormality maps across ATN reveal the heterogeneous impact of AD on the brain.

5.

Leveraging GPT-4 for identifying cancer phenotypes in electronic health records: a performance comparison between GPT-4, GPT-3.5-turbo, Flan-T5, Llama-3-8B, and spaCy's rule-based and machine learning-based methods.

Bhattarai, Kriti; Oh, Inez Y; Sierra, Jonathan Moran; Tang, Jonathan; Payne, Philip R O; Abrams, Zach; Lai, Albert M.

JAMIA Open ; 7(3): ooae060, 2024 Oct.

Article in English | MEDLINE | ID: mdl-38962662

ABSTRACT

Objective: Accurately identifying clinical phenotypes from Electronic Health Records (EHRs) provides additional insights into patients' health, especially when such information is unavailable in structured data. This study evaluates the application of OpenAI's Generative Pre-trained Transformer (GPT)-4 model to identify clinical phenotypes from EHR text in non-small cell lung cancer (NSCLC) patients. The goal was to identify disease stages, treatments and progression utilizing GPT-4, and compare its performance against GPT-3.5-turbo, Flan-T5-xl, Flan-T5-xxl, Llama-3-8B, and 2 rule-based and machine learning-based methods, namely, scispaCy and medspaCy. Materials and Methods: Phenotypes such as initial cancer stage, initial treatment, evidence of cancer recurrence, and affected organs during recurrence were identified from 13 646 clinical notes for 63 NSCLC patients from Washington University in St. Louis, Missouri. The performance of the GPT-4 model is evaluated against GPT-3.5-turbo, Flan-T5-xxl, Flan-T5-xl, Llama-3-8B, medspaCy, and scispaCy by comparing precision, recall, and micro-F1 scores. Results: GPT-4 achieved higher F1 score, precision, and recall compared to Flan-T5-xl, Flan-T5-xxl, Llama-3-8B, medspaCy, and scispaCy's models. GPT-3.5-turbo performed similarly to that of GPT-4. GPT, Flan-T5, and Llama models were not constrained by explicit rule requirements for contextual pattern recognition. spaCy models relied on predefined patterns, leading to their suboptimal performance. Discussion and Conclusion: GPT-4 improves clinical phenotype identification due to its robust pre-training and remarkable pattern recognition capability on the embedded tokens. It demonstrates data-driven effectiveness even with limited context in the input. While rule-based models remain useful for some tasks, GPT models offer improved contextual understanding of the text, and robust clinical phenotype extraction.

6.

PathFinder: a novel graph transformer model to infer multi-cell intra- and inter-cellular signaling pathways and communications.

Feng, Jiarui; Song, Haoran; Province, Michael; Li, Guangfu; Payne, Philip R O; Chen, Yixin; Li, Fuhai.

Front Cell Neurosci ; 18: 1369242, 2024.

Article in English | MEDLINE | ID: mdl-38846640

ABSTRACT

Recently, large-scale scRNA-seq datasets have been generated to understand the complex signaling mechanisms within the microenvironment of Alzheimer's Disease (AD), which are critical for identifying novel therapeutic targets and precision medicine. However, the background signaling networks are highly complex and interactive. It remains challenging to infer the core intra- and inter-multi-cell signaling communication networks using scRNA-seq data. In this study, we introduced a novel graph transformer model, PathFinder, to infer multi-cell intra- and inter-cellular signaling pathways and communications among multi-cell types. Compared with existing models, the novel and unique design of PathFinder is based on the divide-and-conquer strategy. This model divides complex signaling networks into signaling paths, which are then scored and ranked using a novel graph transformer architecture to infer intra- and inter-cell signaling communications. We evaluated the performance of PathFinder using two scRNA-seq data cohorts. The first cohort is an APOE4 genotype-specific AD, and the second is a human cirrhosis cohort. The evaluation confirms the promising potential of using PathFinder as a general signaling network inference model.

7.

Cumulus: a federated electronic health record-based learning system powered by Fast Healthcare Interoperability Resources and artificial intelligence.

McMurry, Andrew J; Gottlieb, Daniel I; Miller, Timothy A; Jones, James R; Atreja, Ashish; Crago, Jennifer; Desai, Pankaja M; Dixon, Brian E; Garber, Matthew; Ignatov, Vladimir; Kirchner, Lyndsey A; Payne, Philip R O; Saldanha, Anil J; Shankar, Prabhu R V; Solad, Yauheni V; Sprouse, Elizabeth A; Terry, Michael; Wilcox, Adam B; Mandl, Kenneth D.

J Am Med Inform Assoc ; 31(8): 1638-1647, 2024 Aug 01.

Article in English | MEDLINE | ID: mdl-38860521

ABSTRACT

OBJECTIVE: To address challenges in large-scale electronic health record (EHR) data exchange, we sought to develop, deploy, and test an open source, cloud-hosted app "listener" that accesses standardized data across the SMART/HL7 Bulk FHIR Access application programming interface (API). METHODS: We advance a model for scalable, federated, data sharing and learning. Cumulus software is designed to address key technology and policy desiderata including local utility, control, and administrative simplicity as well as privacy preservation during robust data sharing, and artificial intelligence (AI) for processing unstructured text. RESULTS: Cumulus relies on containerized, cloud-hosted software, installed within a healthcare organization's security envelope. Cumulus accesses EHR data via the Bulk FHIR interface and streamlines automated processing and sharing. The modular design enables use of the latest AI and natural language processing tools and supports provider autonomy and administrative simplicity. In an initial test, Cumulus was deployed across 5 healthcare systems each partnered with public health. Cumulus output is patient counts which were aggregated into a table stratifying variables of interest to enable population health studies. All code is available open source. A policy stipulating that only aggregate data leave the institution greatly facilitated data sharing agreements. DISCUSSION AND CONCLUSION: Cumulus addresses barriers to data sharing based on (1) federally required support for standard APIs, (2) increasing use of cloud computing, and (3) advances in AI. There is potential for scalability to support learning across myriad network configurations and use cases.

Subject(s)

Artificial Intelligence , Electronic Health Records , Humans , Software , Cloud Computing , Health Information Interoperability , Information Dissemination

8.

mosGraphGen: a novel tool to generate multi-omic signaling graphs to facilitate integrative and interpretable graph AI model development.

Zhang, Heming; Cao, Dekang; Chen, Zirui; Zhang, Xiuyuan; Chen, Yixin; Sessions, Cole; Cruchaga, Carlos; Payne, Philip; Li, Guangfu; Province, Michael; Li, Fuhai.

bioRxiv ; 2024 May 18.

Article in English | MEDLINE | ID: mdl-38798349

ABSTRACT

Multi-omic data, i.e., genomics, epigenomics, transcriptomics, proteomics, characterize cellular complex signaling systems from multi-level and multi-view and provide a holistic view of complex cellular signaling pathways. However, it remains challenging to integrate and interpret multi-omics data. Graph neural network (GNN) AI models have been widely used to analyze graph-structure datasets and are ideal for integrative multi-omics data analysis because they can naturally integrate and represent multi-omics data as a biologically meaningful multi-level signaling graph and interpret multi-omics data by node and edge ranking analysis for signaling flow/cascade inference. However, it is non-trivial for graph-AI model developers to pre-analyze multi-omics data and convert them into graph-structure data for individual samples, which can be directly fed into graph-AI models. To resolve this challenge, we developed mosGraphGen (multi-omics signaling graph generator), a novel computational tool that generates multi-omics signaling graphs of individual samples by mapping the multi-omics data onto a biologically meaningful multi-level background signaling network. With mosGraphGen, AI model developers can directly apply and evaluate their models using these mos-graphs. We evaluated the mosGraphGen using both multi-omics datasets of cancer and Alzheimer's disease (AD) samples. The code of mosGraphGen is open-source and publicly available via GitHub: https://github.com/Multi-OmicGraphBuilder/mosGraphGen.

9.

The Brain Gene Registry: a data snapshot.

Baldridge, Dustin; Kaster, Levi; Sancimino, Catherine; Srivastava, Siddharth; Molholm, Sophie; Gupta, Aditi; Oh, Inez; Lanzotti, Virginia; Grewal, Daleep; Riggs, Erin Rooney; Savatt, Juliann M; Hauck, Rachel; Sveden, Abigail; Constantino, John N; Piven, Joseph; Gurnett, Christina A; Chopra, Maya; Hazlett, Heather; Payne, Philip R O.

J Neurodev Disord ; 16(1): 17, 2024 Apr 17.

Article in English | MEDLINE | ID: mdl-38632549

ABSTRACT

Monogenic disorders account for a large proportion of population-attributable risk for neurodevelopmental disabilities. However, the data necessary to infer a causal relationship between a given genetic variant and a particular neurodevelopmental disorder is often lacking. Recognizing this scientific roadblock, 13 Intellectual and Developmental Disabilities Research Centers (IDDRCs) formed a consortium to create the Brain Gene Registry (BGR), a repository pairing clinical genetic data with phenotypic data from participants with variants in putative brain genes. Phenotypic profiles are assembled from the electronic health record (EHR) and a battery of remotely administered standardized assessments collectively referred to as the Rapid Neurobehavioral Assessment Protocol (RNAP), which include cognitive, neurologic, and neuropsychiatric assessments, as well as assessments for attention deficit hyperactivity disorder (ADHD) and autism spectrum disorder (ASD). Co-enrollment of BGR participants in the Clinical Genome Resource's (ClinGen's) GenomeConnect enables display of variant information in ClinVar. The BGR currently contains data on 479 participants who are 55% male, 6% Asian, 6% Black or African American, 76% white, and 12% Hispanic/Latine. Over 200 genes are represented in the BGR, with 12 or more participants harboring variants in each of these genes: CACNA1A, DNMT3A, SLC6A1, SETD5, and MYT1L. More than 30% of variants are de novo and 43% are classified as variants of uncertain significance (VUSs). Mean standard scores on cognitive or developmental screens are below average for the BGR cohort. EHR data reveal developmental delay as the earliest and most common diagnosis in this sample, followed by speech and language disorders, ASD, and ADHD. BGR data has already been used to accelerate gene-disease validity curation of 36 genes evaluated by ClinGen's BGR Intellectual Disability (ID)-Autism (ASD) Gene Curation Expert Panel. In summary, the BGR is a resource for use by stakeholders interested in advancing translational research for brain genes and continues to recruit participants with clinically reported variants to establish a rich and well-characterized national resource to promote research on neurodevelopmental disorders.

Subject(s)

Autism Spectrum Disorder , Autistic Disorder , Intellectual Disability , Neurodevelopmental Disorders , Humans , Male , Female , Autism Spectrum Disorder/genetics , Brain , Registries , Methyltransferases

10.

Real world performance of the 21st Century Cures Act population-level application programming interface.

Jones, James R; Gottlieb, Daniel; McMurry, Andrew J; Atreja, Ashish; Desai, Pankaja M; Dixon, Brian E; Payne, Philip R O; Saldanha, Anil J; Shankar, Prabhu; Solad, Yauheni; Wilcox, Adam B; Ali, Momeena S; Kang, Eugene; Martin, Andrew M; Sprouse, Elizabeth; Taylor, David E; Terry, Michael; Ignatov, Vladimir; Mandl, Kenneth D.

J Am Med Inform Assoc ; 31(5): 1144-1150, 2024 Apr 19.

Article in English | MEDLINE | ID: mdl-38447593

ABSTRACT

OBJECTIVE: To evaluate the real-world performance of the SMART/HL7 Bulk Fast Health Interoperability Resources (FHIR) Access Application Programming Interface (API), developed to enable push button access to electronic health record data on large populations, and required under the 21st Century Cures Act Rule. MATERIALS AND METHODS: We used an open-source Bulk FHIR Testing Suite at 5 healthcare sites from April to September 2023, including 4 hospitals using electronic health records (EHRs) certified for interoperability, and 1 Health Information Exchange (HIE) using a custom, standards-compliant API build. We measured export speeds, data sizes, and completeness across 6 types of FHIR. RESULTS: Among the certified platforms, Oracle Cerner led in speed, managing 5-16 million resources at over 8000 resources/min. Three Epic sites exported a FHIR data subset, achieving 1-12 million resources at 1555-2500 resources/min. Notably, the HIE's custom API outperformed, generating over 141 million resources at 12 000 resources/min. DISCUSSION: The HIE's custom API showcased superior performance, endorsing the effectiveness of SMART/HL7 Bulk FHIR in enabling large-scale data exchange while underlining the need for optimization in existing EHR platforms. Agility and scalability are essential for diverse health, research, and public health use cases. CONCLUSION: To fully realize the interoperability goals of the 21st Century Cures Act, addressing the performance limitations of Bulk FHIR API is critical. It would be beneficial to include performance metrics in both certification and reporting processes.

Subject(s)

Health Information Exchange , Health Level Seven , Software , Electronic Health Records , Delivery of Health Care

11.

Cumulus: A federated EHR-based learning system powered by FHIR and AI.

McMurry, Andrew J; Gottlieb, Daniel I; Miller, Timothy A; Jones, James R; Atreja, Ashish; Crago, Jennifer; Desai, Pankaja M; Dixon, Brian E; Garber, Matthew; Ignatov, Vladimir; Kirchner, Lyndsey A; Payne, Philip R O; Saldanha, Anil J; Shankar, Prabhu R V; Solad, Yauheni V; Sprouse, Elizabeth A; Terry, Michael; Wilcox, Adam B; Mandl, Kenneth D.

medRxiv ; 2024 Feb 06.

Article in English | MEDLINE | ID: mdl-38370642

ABSTRACT

Objective: To address challenges in large-scale electronic health record (EHR) data exchange, we sought to develop, deploy, and test an open source, cloud-hosted app 'listener' that accesses standardized data across the SMART/HL7 Bulk FHIR Access application programming interface (API). Methods: We advance a model for scalable, federated, data sharing and learning. Cumulus software is designed to address key technology and policy desiderata including local utility, control, and administrative simplicity as well as privacy preservation during robust data sharing, and AI for processing unstructured text. Results: Cumulus relies on containerized, cloud-hosted software, installed within a healthcare organization's security envelope. Cumulus accesses EHR data via the Bulk FHIR interface and streamlines automated processing and sharing. The modular design enables use of the latest AI and natural language processing tools and supports provider autonomy and administrative simplicity. In an initial test, Cumulus was deployed across five healthcare systems each partnered with public health. Cumulus output is patient counts which were aggregated into a table stratifying variables of interest to enable population health studies. All code is available open source. A policy stipulating that only aggregate data leave the institution greatly facilitated data sharing agreements. Discussion and Conclusion: Cumulus addresses barriers to data sharing based on (1) federally required support for standard APIs (2), increasing use of cloud computing, and (3) advances in AI. There is potential for scalability to support learning across myriad network configurations and use cases.

12.

PathFinder: a novel graph transformer model to infer multi-cell intra- and inter-cellular signaling pathways and communications.

Feng, Jiarui; Province, Michael; Li, Guangfu; Payne, Philip R O; Chen, Yixin; Li, Fuhai.

bioRxiv ; 2024 Jan 15.

Article in English | MEDLINE | ID: mdl-38293243

ABSTRACT

Recently, large-scale scRNA-seq datasets have been generated to understand the complex and poorly understood signaling mechanisms within microenvironment of Alzheimer's Disease (AD), which are critical for identifying novel therapeutic targets and precision medicine. Though a set of targets have been identified, however, it remains a challenging to infer the core intra- and inter-multi-cell signaling communication networks using the scRNA-seq data, considering the complex and highly interactive background signaling network. Herein, we introduced a novel graph transformer model, PathFinder, to infer multi-cell intra- and inter-cellular signaling pathways and signaling communications among multi-cell types. Compared with existing models, the novel and unique design of PathFinder is based on the divide-and-conquer strategy, which divides the complex signaling networks into signaling paths, and then score and rank them using a novel graph transformer architecture to infer the intra- and inter-cell signaling communications. We evaluated PathFinder using scRNA-seq data of APOE4-genotype specific AD mice models and identified novel APOE4 altered intra- and inter-cell interaction networks among neurons, astrocytes, and microglia. PathFinder is a general signaling network inference model and can be applied to other omics data-driven signaling network inference.

13.

sc2MeNetDrug: A computational tool to uncover inter-cell signaling targets and identify relevant drugs based on single cell RNA-seq data.

Feng, Jiarui; Goedegebuure, S Peter; Zeng, Amanda; Bi, Ye; Wang, Ting; Payne, Philip; Ding, Li; DeNardo, David; Hawkins, William; Fields, Ryan C; Li, Fuhai.

PLoS Comput Biol ; 20(1): e1011785, 2024 Jan.

Article in English | MEDLINE | ID: mdl-38181047

ABSTRACT

Single-cell RNA sequencing (scRNA-seq) is a powerful technology to investigate the transcriptional programs in stromal, immune, and disease cells, like tumor cells or neurons within the Alzheimer's Disease (AD) brain or tumor microenvironment (ME) or niche. Cell-cell communications within ME play important roles in disease progression and immunotherapy response and are novel and critical therapeutic targets. Though many tools of scRNA-seq analysis have been developed to investigate the heterogeneity and sub-populations of cells, few were designed for uncovering cell-cell communications of ME and predicting the potentially effective drugs to inhibit the communications. Moreover, the data analysis processes of discovering signaling communication networks and effective drugs using scRNA-seq data are complex and involve a set of critical analysis processes and external supportive data resources, which are difficult for researchers who have no strong computational background and training in scRNA-seq data analysis. To address these challenges, in this study, we developed a novel open-source computational tool, sc2MeNetDrug (https://fuhaililab.github.io/sc2MeNetDrug/). It was specifically designed using scRNA-seq data to identify cell types within disease MEs, uncover the dysfunctional signaling pathways within individual cell types and interactions among different cell types, and predict effective drugs that can potentially disrupt cell-cell signaling communications. sc2MeNetDrug provided a user-friendly graphical user interface to encapsulate the data analysis modules, which can facilitate the scRNA-seq data-based discovery of novel inter-cell signaling communications and novel therapeutic regimens.

Subject(s)

Single-Cell Analysis , Software , RNA-Seq , Sequence Analysis, RNA , Gene Expression Profiling , Signal Transduction/genetics

14.

Analysing heterogeneity in Alzheimer's Disease using multimodal normative modelling on ATN biomarkers.

Kumar, Sayantan; Earnest, Thomas; Yang, Braden; Kothapalli, Deydeep; Benzinger, Tammie L S; Gordon, Brian A; Payne, Philip; Sotiras, Aristeidis.

bioRxiv ; 2024 Apr 04.

Article in English | MEDLINE | ID: mdl-37662280

ABSTRACT

Background and Objectives: Previous approaches pursuing normative modelling for analyzing heterogeneity in Alzheimer's Disease (AD) have relied on a single neuroimaging modality. However, AD is a multi-faceted disorder, with each modality providing unique and complementary info about AD. In this study, we used a deep-learning based multimodal normative model to assess the heterogeneity in regional brain patterns for ATN (amyloid-tau-neurodegeneration) biomarkers. Methods: We selected discovery (n = 665) and replication (n = 430) cohorts with simultaneous availability of ATN biomarkers: Florbetapir amyloid, Flortaucipir tau and T1-weighted MRI (magnetic resonance imaging) imaging. A multimodal variational autoencoder (conditioned on age and sex) was used as a normative model to learn the multimodal regional brain patterns of a cognitively unimpaired (CU) control group. The trained model was applied on individuals on the ADS (AD Spectrum) to estimate their deviations (Z-scores) from the normative distribution, resulting in a Z-score regional deviation map per ADS individual per modality. Regions with Z-scores < -1.96 for MRI and Z-scores > 1.96 for amyloid and tau were labelled as outliers. Hamming distance was used to quantify the dissimilarity between individual based on their outlier deviations across each modality. We also calculated a disease severity index (DSI) for each ADS individual which was estimated by averaging the deviations across all outlier regions corresponding to each modality. Results: ADS individuals with moderate or severe dementia showed higher proportion of regional outliers for each modality as well as more dissimilarity in modality-specific regional outlier patterns compared to ADS individuals with early or mild dementia. DSI was associated with the progressive stages of dementia, (ii) showed significant associations with neuropsychological composite scores and (iii) related to the longitudinal risk of CDR progression. Findings were reproducible in both discovery and replication cohorts. Discussion: Our is the first study to examine the heterogeneity in AD through the lens of multiple neuroimaging modalities (ATN), based on distinct or overlapping patterns of regional outlier deviations. Regional MRI and tau outliers were more heterogenous than regional amyloid outliers. DSI has the potential to be an individual patient metric of neurodegeneration that can help in clinical decision making and monitoring patient response for anti-amyloid treatments.

15.

Developing machine learning models to predict primary graft dysfunction after lung transplantation.

Michelson, Andrew P; Oh, Inez; Gupta, Aditi; Puri, Varun; Kreisel, Daniel; Gelman, Andrew E; Nava, Ruben; Witt, Chad A; Byers, Derek E; Halverson, Laura; Vazquez-Guillamet, Rodrigo; Payne, Philip R O; Hachem, Ramsey R.

Am J Transplant ; 24(3): 458-467, 2024 Mar.

Article in English | MEDLINE | ID: mdl-37468109

ABSTRACT

Primary graft dysfunction (PGD) is the leading cause of morbidity and mortality in the first 30 days after lung transplantation. Risk factors for the development of PGD include donor and recipient characteristics, but how multiple variables interact to impact the development of PGD and how clinicians should consider these in making decisions about donor acceptance remain unclear. This was a single-center retrospective cohort study to develop and evaluate machine learning pipelines to predict the development of PGD grade 3 within the first 72 hours of transplantation using donor and recipient variables that are known at the time of donor offer acceptance. Among 576 bilateral lung recipients, 173 (30%) developed PGD grade 3. The cohort underwent a 75% to 25% train-test split, and lasso regression was used to identify 11 variables for model development. A K-nearest neighbor's model showing the best calibration and performance with relatively small confidence intervals was selected as the final predictive model with an area under the receiver operating characteristics curve of 0.65. Machine learning models can predict the risk for development of PGD grade 3 based on data available at the time of donor offer acceptance. This may improve donor-recipient matching and donor utilization in the future.

Subject(s)

Lung Transplantation , Primary Graft Dysfunction , Humans , Retrospective Studies , Primary Graft Dysfunction/diagnosis , Primary Graft Dysfunction/etiology , Lung Transplantation/adverse effects , Risk Factors , Lung

16.

Leveraging GPT-4 for Identifying Cancer Phenotypes in Electronic Health Records: A Performance Comparison between GPT-4, GPT-3.5-turbo, Flan-T5 and spaCy's Rule-based & Machine Learning-based methods.

Bhattarai, Kriti; Oh, Inez Y; Sierra, Jonathan Moran; Tang, Jonathan; Payne, Philip R O; Abrams, Zachary B; Lai, Albert M.

bioRxiv ; 2024 Apr 06.

Article in English | MEDLINE | ID: mdl-37808763

ABSTRACT

Objective: Accurately identifying clinical phenotypes from Electronic Health Records (EHRs) provides additional insights into patients' health, especially when such information is unavailable in structured data. This study evaluates the application of OpenAI's Generative Pre-trained Transformer (GPT)-4 model to identify clinical phenotypes from EHR text in non-small cell lung cancer (NSCLC) patients. The goal was to identify disease stages, treatments and progression utilizing GPT-4, and compare its performance against GPT-3.5-turbo, Flan-T5-xl, Flan-T5-xxl, and two rule-based and machine learning-based methods, namely, scispaCy and medspaCy. Materials and Methods: Phenotypes such as initial cancer stage, initial treatment, evidence of cancer recurrence, and affected organs during recurrence were identified from 13,646 records for 63 NSCLC patients from Washington University in St. Louis, Missouri. The performance of the GPT-4 model is evaluated against GPT-3.5-turbo, Flan-T5-xxl, Flan-T5-xl, medspaCy and scispaCy by comparing precision, recall, and micro-F1 scores. Results: GPT-4 achieved higher F1 score, precision, and recall compared to Flan-T5-xl, Flan-T5-xxl, medspaCy and scispaCy's models. GPT-3.5-turbo performed similarly to that of GPT-4. GPT and Flan-T5 models were not constrained by explicit rule requirements for contextual pattern recognition. SpaCy models relied on predefined patterns, leading to their suboptimal performance. Discussion and Conclusion: GPT-4 improves clinical phenotype identification due to its robust pre-training and remarkable pattern recognition capability on the embedded tokens. It demonstrates data-driven effectiveness even with limited context in the input. While rule-based models remain useful for some tasks, GPT models offer improved contextual understanding of the text, and robust clinical phenotype extraction.

17.

Clinical variants paired with phenotype: A rich resource for brain gene curation.

Chopra, Maya; Savatt, Juliann M; Bingaman, Taylor I; Good, Molly E; Morgan, Alexis; Cooney, Caitlin; Rossel, Allison M; VanHoute, Bryanna; Cordova, Ineke; Mahida, Sonal; Lanzotti, Virginia; Baldridge, Dustin; Gurnett, Christina A; Piven, Joseph; Hazlett, Heather; Pomeroy, Scott L; Sahin, Mustafa; Payne, Philip R O; Riggs, Erin Rooney; Constantino, John N.

Genet Med ; 26(3): 101035, 2024 Mar.

Article in English | MEDLINE | ID: mdl-38059438

ABSTRACT

PURPOSE: Clinically ascertained variants are under-utilized in neurodevelopmental disorder research. We established the Brain Gene Registry (BGR) to coregister clinically identified variants in putative brain genes with participant phenotypes. Here, we report 179 genetic variants in the first 179 BGR registrants and analyze the proportion that were novel to ClinVar at the time of entry and those that were absent in other disease databases. METHODS: From 10 academically affiliated institutions, 179 individuals with 179 variants were enrolled into the BGR. Variants were cross-referenced for previous presence in ClinVar and for presence in 6 other genetic databases. RESULTS: Of 179 variants in 76 genes, 76 (42.5%) were novel to ClinVar, and 62 (34.6%) were absent from all databases analyzed. Of the 103 variants present in ClinVar, 37 (35.9%) were uncertain (ClinVar aggregate classification of variant of uncertain significance or conflicting classifications). For 5 variants, the aggregate ClinVar classification was inconsistent with the interpretation from the BGR site-provided classification. CONCLUSION: A significant proportion of clinical variants that are novel or uncertain are not shared, limiting the evidence base for new gene-disease relationships. Registration of paired clinical genetic test results with phenotype has the potential to advance knowledge of the relationships between genes and neurodevelopmental disorders.

Subject(s)

Databases, Genetic , Genetic Variation , Humans , Genetic Variation/genetics , Genetic Testing/methods , Phenotype , Brain

18.

Normative Modeling using Multimodal Variational Autoencoders to Identify Abnormal Brain Volume Deviations in Alzheimer's Disease.

Kumar, Sayantan; Payne, Philip R O; Sotiras, Aristeidis.

Proc SPIE Int Soc Opt Eng ; 124652023 Feb.

Article in English | MEDLINE | ID: mdl-38130873

ABSTRACT

Normative modelling is a method for understanding the underlying heterogeneity within brain disorders like Alzheimer Disease (AD), by quantifying how each patient deviates from the expected normative pattern that has been learned from a healthy control distribution. Existing deep learning based normative models have been applied on only single modality Magnetic Resonance Imaging (MRI) neuroimaging data. However, these do not take into account the complementary information offered by multimodal M RI, which is essential for understanding a multifactorial disease like AD. To address this limitation, we propose a multi-modal variational autoencoder (mmVAE) based normative modelling framework that can capture the joint distribution between different modalities to identify abnormal brain volume deviations due to AD. Our multi-modal framework takes as input Freesurfer processed brain region volumes from T1-weighted (cortical and subcortical) and T2-weighed (hippocampal) scans of cognitively normal participants to learn the morphological characteristics of the healthy brain. The estimated normative model is then applied on AD patients to quantify the deviation in brain volumes and identify abnormal brain pattern deviations due to the progressive stages of AD. We compared our proposed mmVAE with a baseline unimodal VAE having a single encoder and decoder and the two modalities concatenated as unimodal input. Our experimental results show that deviation maps generated by mmVAE are more sensitive to disease staging within AD, have a better correlation with patient cognition and result in higher number of brain regions with statistically significant deviations compared to the unimodal baseline model.

19.

Highly accurate disease diagnosis and highly reproducible biomarker identification with PathFormer.

Dong, Zehao; Zhao, Qihang; Payne, Philip R O; Province, Michael A; Cruchaga, Carlos; Zhang, Muhan; Zhao, Tianyu; Chen, Yixin; Li, Fuhai.

Res Sq ; 2023 Nov 16.

Article in English | MEDLINE | ID: mdl-38014034

ABSTRACT

Biomarker identification is critical for precise disease diagnosis and understanding disease pathogenesis in omics data analysis, like using fold change and regression analysis. Graph neural networks (GNNs) have been the dominant deep learning model for analyzing graph-structured data. However, we found two major limitations of existing GNNs in omics data analysis, i.e., limited-prediction/diagnosis accuracy and limited-reproducible biomarker identification capacity across multiple datasets. The root of the challenges is the unique graph structure of biological signaling pathways, which consists of a large number of targets and intensive and complex signaling interactions among these targets. To resolve these two challenges, in this study, we presented a novel GNN model architecture, named PathFormer, which systematically integrate signaling network, priori knowledge and omics data to rank biomarkers and predict disease diagnosis. In the comparison results, PathFormer outperformed existing GNN models significantly in terms of highly accurate prediction capability (~30% accuracy improvement in disease diagnosis compared with existing GNN models) and high reproducibility of biomarker ranking across different datasets. The improvement was confirmed using two independent Alzheimer's Disease (AD) and cancer transcriptomic datasets. The PathFormer model can be directly applied to other omics data analysis studies.

20.

Real World Performance of the 21st Century Cures Act Population Level Application Programming Interface.

Jones, James R; Gottlieb, Daniel; McMurry, Andrew J; Atreja, Ashish; Desai, Pankaja M; Dixon, Brian E; Payne, Philip R O; Saldanha, Anil J; Shankar, Prabhu; Solad, Yauheni; Wilcox, Adam B; Ali, Momeena S; Kang, Eugene; Martin, Andrew M; Sprouse, Elizabeth; Taylor, David; Terry, Michael; Ignatov, Vladimir; Mandl, Kenneth D.

medRxiv ; 2023 Oct 06.

Article in English | MEDLINE | ID: mdl-37873390

ABSTRACT

Objective: To evaluate the real-world performance in delivering patient data on populations, of the SMART/HL7 Bulk FHIR Access API, required in Electronic Health Records (EHRs) under the 21st Century Cures Act Rule. Materials and Methods: We used an open-source Bulk FHIR Testing Suite at five healthcare sites from April to September 2023, including four hospitals using EHRs certified for interoperability, and one Health Information Exchange (HIE) using a custom, standards-compliant API build. We measured export speeds, data sizes, and completeness across six types of FHIR resources. Results: Among the certified platforms, Oracle Cerner led in speed, managing 5-16 million resources at over 8,000 resources/min. Three Epic sites exported a FHIR data subset, achieving 1-12 million resources at 1,555-2,500 resources/min. Notably, the HIE's custom API outperformed, generating over 141 million resources at 12,000 resources/min. Discussion: The HIE's custom API showcased superior performance, endorsing the effectiveness of SMART/HL7 Bulk FHIR in enabling large-scale data exchange while underlining the need for optimization in existing EHR platforms. Agility and scalability are essential for diverse health, research, and public health use cases. Conclusion: To fully realize the interoperability goals of the 21st Century Cures Act, addressing the performance limitations of Bulk FHIR API is critical. It would be beneficial to include performance metrics in both certification and reporting processes.

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL