Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 177
Filtrar
1.
bioRxiv ; 2024 Sep 03.
Artículo en Inglés | MEDLINE | ID: mdl-39282361

RESUMEN

Multi-omic data can better characterize complex cellular signaling pathways from multiple views compared to individual omic data. However, integrative multi-omic data analysis to rank key disease biomarkers and infer core signaling pathways remains an open problem. In this study, our novel contributions are that we developed a novel graph AI model, mosGraphFlow, for analyzing multi-omic signaling graphs (mosGraphs), 2) analyzed multi-omic mosGraph datasets of AD, and 3) identified, visualized and evaluated a set of AD associated signaling biomarkers and network. The comparison results show that the proposed model not only achieves the best classification accuracy but also identifies important AD disease biomarkers and signaling interactions. Moreover, the signaling sources are highlighted at specific omic levels to facilitate the understanding of the pathogenesis of AD. The proposed model can also be applied and expanded for other studies using multi-omic data. Model code is accessible via GitHub: https://github.com/FuhaiLiAiLab/mosGraphFlow.

2.
JAMIA Open ; 7(3): ooae087, 2024 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-39297151

RESUMEN

Objective: We aimed to develop and validate a novel multimodal framework Hierarchical Multi-task Auxiliary Learning (HiMAL) framework, for predicting cognitive composite functions as auxiliary tasks that estimate the longitudinal risk of transition from Mild Cognitive Impairment (MCI) to Alzheimer's Disease (AD). Materials and Methods: HiMAL utilized multimodal longitudinal visit data including imaging features, cognitive assessment scores, and clinical variables from MCI patients in the Alzheimer's Disease Neuroimaging Initiative dataset, to predict at each visit if an MCI patient will progress to AD within the next 6 months. Performance of HiMAL was compared with state-of-the-art single-task and multitask baselines using area under the receiver operator curve (AUROC) and precision recall curve (AUPRC) metrics. An ablation study was performed to assess the impact of each input modality on model performance. Additionally, longitudinal explanations regarding risk of disease progression were provided to interpret the predicted cognitive decline. Results: Out of 634 MCI patients (mean [IQR] age: 72.8 [67-78], 60% male), 209 (32%) progressed to AD. HiMAL showed better prediction performance compared to all state-of-the-art longitudinal single-modality singe-task baselines (AUROC = 0.923 [0.915-0.937]; AUPRC = 0.623 [0.605-0.644]; all P < .05). Ablation analysis highlighted that imaging and cognition scores with maximum contribution towards prediction of disease progression. Discussion: Clinically informative model explanations anticipate cognitive decline 6 months in advance, aiding clinicians in future disease progression assessment. HiMAL relies on routinely collected electronic health records (EHR) variables for proximal (6 months) prediction of AD onset, indicating its translational potential for point-of-care monitoring and managing of high-risk patients.

4.
NPJ Syst Biol Appl ; 10(1): 92, 2024 Aug 21.
Artículo en Inglés | MEDLINE | ID: mdl-39169016

RESUMEN

Complex signaling pathways are believed to be responsible for drug resistance. Drug combinations perturbing multiple signaling targets have the potential to reduce drug resistance. The large-scale multi-omic datasets and experimental drug combination synergistic score data are valuable resources to study mechanisms of synergy (MoS) to guide the development of precision drug combinations. However, signaling patterns of MoS are complex and remain unclear, and thus it is challenging to identify synergistic drug combinations in clinical. Herein, we proposed a novel integrative and interpretable graph AI model, DeepSignalingFlow, to uncover the MoS by integrating and mining multi-omic data. The major innovation is that we uncover MoS by modeling the signaling flow from multi-omic features of essential disease proteins to the drug targets, which has not been introduced by the existing models. The model performance was assessed utilizing four distinct drug combination synergy evaluation datasets, i.e., NCI ALMANAC, O'Neil, DrugComb, and DrugCombDB. The comparison results showed that the proposed model outperformed existing graph AI models in terms of synergy score prediction, and can interpret MoS using the core signaling flows. The code is publicly accessible via Github: https://github.com/FuhaiLiAiLab/DeepSignalingFlow.


Asunto(s)
Sinergismo Farmacológico , Transducción de Señal , Transducción de Señal/efectos de los fármacos , Transducción de Señal/fisiología , Humanos , Biología Computacional/métodos
5.
bioRxiv ; 2024 Aug 06.
Artículo en Inglés | MEDLINE | ID: mdl-39149314

RESUMEN

Generative pretrained models represent a significant advancement in natural language processing and computer vision, which can generate coherent and contextually relevant content based on the pre-training on large general datasets and fine-tune for specific tasks. Building foundation models using large scale omic data is promising to decode and understand the complex signaling language patterns within cells. Different from existing foundation models of omic data, we build a foundation model, mosGraphGPT, for multi-omic signaling (mos) graphs, in which the multi-omic data was integrated and interpreted using a multi-level signaling graph. The model was pretrained using multi-omic data of cancers in The Cancer Genome Atlas (TCGA), and fine-turned for multi-omic data of Alzheimer's Disease (AD). The experimental evaluation results showed that the model can not only improve the disease classification accuracy, but also is interpretable by uncovering disease targets and signaling interactions. And the model code are uploaded via GitHub with link: https://github.com/mosGraph/mosGraphGPT.

6.
ArXiv ; 2024 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-39010871

RESUMEN

INTRODUCTION: Previous studies have applied normative modeling on a single neuroimaging modality to investigate Alzheimer Disease (AD) heterogeneity. We employed a deep learning-based multimodal normative framework to analyze individual-level variation across ATN (amyloid-tau-neurodegeneration) imaging biomarkers. METHODS: We selected cross-sectional discovery (n = 665) and replication cohorts (n = 430) with available T1-weighted MRI, amyloid and tau PET. Normative modeling estimated individual-level abnormal deviations in amyloid-positive individuals compared to amyloid-negative controls. Regional abnormality patterns were mapped at different clinical group levels to assess intra-group heterogeneity. An individual-level disease severity index (DSI) was calculated using both the spatial extent and magnitude of abnormal deviations across ATN. RESULTS: Greater intra-group heterogeneity in ATN abnormality patterns was observed in more severe clinical stages of AD. Higher DSI was associated with worse cognitive function and increased risk of disease progression. DISCUSSION: Subject-specific abnormality maps across ATN reveal the heterogeneous impact of AD on the brain.

7.
JAMIA Open ; 7(3): ooae060, 2024 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-38962662

RESUMEN

Objective: Accurately identifying clinical phenotypes from Electronic Health Records (EHRs) provides additional insights into patients' health, especially when such information is unavailable in structured data. This study evaluates the application of OpenAI's Generative Pre-trained Transformer (GPT)-4 model to identify clinical phenotypes from EHR text in non-small cell lung cancer (NSCLC) patients. The goal was to identify disease stages, treatments and progression utilizing GPT-4, and compare its performance against GPT-3.5-turbo, Flan-T5-xl, Flan-T5-xxl, Llama-3-8B, and 2 rule-based and machine learning-based methods, namely, scispaCy and medspaCy. Materials and Methods: Phenotypes such as initial cancer stage, initial treatment, evidence of cancer recurrence, and affected organs during recurrence were identified from 13 646 clinical notes for 63 NSCLC patients from Washington University in St. Louis, Missouri. The performance of the GPT-4 model is evaluated against GPT-3.5-turbo, Flan-T5-xxl, Flan-T5-xl, Llama-3-8B, medspaCy, and scispaCy by comparing precision, recall, and micro-F1 scores. Results: GPT-4 achieved higher F1 score, precision, and recall compared to Flan-T5-xl, Flan-T5-xxl, Llama-3-8B, medspaCy, and scispaCy's models. GPT-3.5-turbo performed similarly to that of GPT-4. GPT, Flan-T5, and Llama models were not constrained by explicit rule requirements for contextual pattern recognition. spaCy models relied on predefined patterns, leading to their suboptimal performance. Discussion and Conclusion: GPT-4 improves clinical phenotype identification due to its robust pre-training and remarkable pattern recognition capability on the embedded tokens. It demonstrates data-driven effectiveness even with limited context in the input. While rule-based models remain useful for some tasks, GPT models offer improved contextual understanding of the text, and robust clinical phenotype extraction.

8.
J Am Med Inform Assoc ; 31(8): 1638-1647, 2024 Aug 01.
Artículo en Inglés | MEDLINE | ID: mdl-38860521

RESUMEN

OBJECTIVE: To address challenges in large-scale electronic health record (EHR) data exchange, we sought to develop, deploy, and test an open source, cloud-hosted app "listener" that accesses standardized data across the SMART/HL7 Bulk FHIR Access application programming interface (API). METHODS: We advance a model for scalable, federated, data sharing and learning. Cumulus software is designed to address key technology and policy desiderata including local utility, control, and administrative simplicity as well as privacy preservation during robust data sharing, and artificial intelligence (AI) for processing unstructured text. RESULTS: Cumulus relies on containerized, cloud-hosted software, installed within a healthcare organization's security envelope. Cumulus accesses EHR data via the Bulk FHIR interface and streamlines automated processing and sharing. The modular design enables use of the latest AI and natural language processing tools and supports provider autonomy and administrative simplicity. In an initial test, Cumulus was deployed across 5 healthcare systems each partnered with public health. Cumulus output is patient counts which were aggregated into a table stratifying variables of interest to enable population health studies. All code is available open source. A policy stipulating that only aggregate data leave the institution greatly facilitated data sharing agreements. DISCUSSION AND CONCLUSION: Cumulus addresses barriers to data sharing based on (1) federally required support for standard APIs, (2) increasing use of cloud computing, and (3) advances in AI. There is potential for scalability to support learning across myriad network configurations and use cases.


Asunto(s)
Inteligencia Artificial , Registros Electrónicos de Salud , Humanos , Programas Informáticos , Nube Computacional , Interoperabilidad de la Información en Salud , Difusión de la Información
9.
Front Cell Neurosci ; 18: 1369242, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38846640

RESUMEN

Recently, large-scale scRNA-seq datasets have been generated to understand the complex signaling mechanisms within the microenvironment of Alzheimer's Disease (AD), which are critical for identifying novel therapeutic targets and precision medicine. However, the background signaling networks are highly complex and interactive. It remains challenging to infer the core intra- and inter-multi-cell signaling communication networks using scRNA-seq data. In this study, we introduced a novel graph transformer model, PathFinder, to infer multi-cell intra- and inter-cellular signaling pathways and communications among multi-cell types. Compared with existing models, the novel and unique design of PathFinder is based on the divide-and-conquer strategy. This model divides complex signaling networks into signaling paths, which are then scored and ranked using a novel graph transformer architecture to infer intra- and inter-cell signaling communications. We evaluated the performance of PathFinder using two scRNA-seq data cohorts. The first cohort is an APOE4 genotype-specific AD, and the second is a human cirrhosis cohort. The evaluation confirms the promising potential of using PathFinder as a general signaling network inference model.

10.
bioRxiv ; 2024 Aug 27.
Artículo en Inglés | MEDLINE | ID: mdl-38798349

RESUMEN

Multi-omics data, i.e., genomics, epigenomics, transcriptomics, proteomics, characterize cellular complex signaling systems from multi-level and multi-view and provide a holistic view of complex cellular signaling pathways. However, it remains challenging to integrate and interpret multi-omics data for mining key disease targets and signaling pathways. Graph AI models have been widely used to analyze graph-structure datasets, and are ideal for integrative multi-omics data analysis because they can naturally integrate and represent multi-omics data as a biologically meaningful multi-level signaling graph and interpret multi-omics data via graph node and edge ranking analysis. However, it is non-trivial for graph-AI model developers to pre-analyze multi-omics data and convert the data into biologically meaningful graphs, which can be directly fed into graph-AI models. To resolve this challenge, we developed mosGraphGen (multi-omics signaling graph generator), generating Multi-omics Signaling graphs (mos-graph) of individual samples by mapping multi-omics data onto a biologically meaningful multi-level background signaling network with data normalization by aggregating measurements and aligning to the reference genome. With mosGraphGen, AI model developers can directly apply and evaluate their models using these mos-graphs. In the results, mosGraphGen was used and illustrated using two widely used multi-omics datasets of TCGA and Alzheimer's disease (AD) samples. The code of mosGraphGen is open-source and publicly available via GitHub: https://github.com/FuhaiLiAiLab/mosGraphGen.

11.
J Neurodev Disord ; 16(1): 17, 2024 Apr 17.
Artículo en Inglés | MEDLINE | ID: mdl-38632549

RESUMEN

Monogenic disorders account for a large proportion of population-attributable risk for neurodevelopmental disabilities. However, the data necessary to infer a causal relationship between a given genetic variant and a particular neurodevelopmental disorder is often lacking. Recognizing this scientific roadblock, 13 Intellectual and Developmental Disabilities Research Centers (IDDRCs) formed a consortium to create the Brain Gene Registry (BGR), a repository pairing clinical genetic data with phenotypic data from participants with variants in putative brain genes. Phenotypic profiles are assembled from the electronic health record (EHR) and a battery of remotely administered standardized assessments collectively referred to as the Rapid Neurobehavioral Assessment Protocol (RNAP), which include cognitive, neurologic, and neuropsychiatric assessments, as well as assessments for attention deficit hyperactivity disorder (ADHD) and autism spectrum disorder (ASD). Co-enrollment of BGR participants in the Clinical Genome Resource's (ClinGen's) GenomeConnect enables display of variant information in ClinVar. The BGR currently contains data on 479 participants who are 55% male, 6% Asian, 6% Black or African American, 76% white, and 12% Hispanic/Latine. Over 200 genes are represented in the BGR, with 12 or more participants harboring variants in each of these genes: CACNA1A, DNMT3A, SLC6A1, SETD5, and MYT1L. More than 30% of variants are de novo and 43% are classified as variants of uncertain significance (VUSs). Mean standard scores on cognitive or developmental screens are below average for the BGR cohort. EHR data reveal developmental delay as the earliest and most common diagnosis in this sample, followed by speech and language disorders, ASD, and ADHD. BGR data has already been used to accelerate gene-disease validity curation of 36 genes evaluated by ClinGen's BGR Intellectual Disability (ID)-Autism (ASD) Gene Curation Expert Panel. In summary, the BGR is a resource for use by stakeholders interested in advancing translational research for brain genes and continues to recruit participants with clinically reported variants to establish a rich and well-characterized national resource to promote research on neurodevelopmental disorders.


Asunto(s)
Trastorno del Espectro Autista , Trastorno Autístico , Discapacidad Intelectual , Trastornos del Neurodesarrollo , Humanos , Masculino , Femenino , Trastorno del Espectro Autista/genética , Encéfalo , Sistema de Registros , Metiltransferasas
12.
J Am Med Inform Assoc ; 31(5): 1144-1150, 2024 Apr 19.
Artículo en Inglés | MEDLINE | ID: mdl-38447593

RESUMEN

OBJECTIVE: To evaluate the real-world performance of the SMART/HL7 Bulk Fast Health Interoperability Resources (FHIR) Access Application Programming Interface (API), developed to enable push button access to electronic health record data on large populations, and required under the 21st Century Cures Act Rule. MATERIALS AND METHODS: We used an open-source Bulk FHIR Testing Suite at 5 healthcare sites from April to September 2023, including 4 hospitals using electronic health records (EHRs) certified for interoperability, and 1 Health Information Exchange (HIE) using a custom, standards-compliant API build. We measured export speeds, data sizes, and completeness across 6 types of FHIR. RESULTS: Among the certified platforms, Oracle Cerner led in speed, managing 5-16 million resources at over 8000 resources/min. Three Epic sites exported a FHIR data subset, achieving 1-12 million resources at 1555-2500 resources/min. Notably, the HIE's custom API outperformed, generating over 141 million resources at 12 000 resources/min. DISCUSSION: The HIE's custom API showcased superior performance, endorsing the effectiveness of SMART/HL7 Bulk FHIR in enabling large-scale data exchange while underlining the need for optimization in existing EHR platforms. Agility and scalability are essential for diverse health, research, and public health use cases. CONCLUSION: To fully realize the interoperability goals of the 21st Century Cures Act, addressing the performance limitations of Bulk FHIR API is critical. It would be beneficial to include performance metrics in both certification and reporting processes.


Asunto(s)
Intercambio de Información en Salud , Estándar HL7 , Programas Informáticos , Registros Electrónicos de Salud , Atención a la Salud
13.
medRxiv ; 2024 Feb 06.
Artículo en Inglés | MEDLINE | ID: mdl-38370642

RESUMEN

Objective: To address challenges in large-scale electronic health record (EHR) data exchange, we sought to develop, deploy, and test an open source, cloud-hosted app 'listener' that accesses standardized data across the SMART/HL7 Bulk FHIR Access application programming interface (API). Methods: We advance a model for scalable, federated, data sharing and learning. Cumulus software is designed to address key technology and policy desiderata including local utility, control, and administrative simplicity as well as privacy preservation during robust data sharing, and AI for processing unstructured text. Results: Cumulus relies on containerized, cloud-hosted software, installed within a healthcare organization's security envelope. Cumulus accesses EHR data via the Bulk FHIR interface and streamlines automated processing and sharing. The modular design enables use of the latest AI and natural language processing tools and supports provider autonomy and administrative simplicity. In an initial test, Cumulus was deployed across five healthcare systems each partnered with public health. Cumulus output is patient counts which were aggregated into a table stratifying variables of interest to enable population health studies. All code is available open source. A policy stipulating that only aggregate data leave the institution greatly facilitated data sharing agreements. Discussion and Conclusion: Cumulus addresses barriers to data sharing based on (1) federally required support for standard APIs (2), increasing use of cloud computing, and (3) advances in AI. There is potential for scalability to support learning across myriad network configurations and use cases.

14.
PLoS Comput Biol ; 20(1): e1011785, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-38181047

RESUMEN

Single-cell RNA sequencing (scRNA-seq) is a powerful technology to investigate the transcriptional programs in stromal, immune, and disease cells, like tumor cells or neurons within the Alzheimer's Disease (AD) brain or tumor microenvironment (ME) or niche. Cell-cell communications within ME play important roles in disease progression and immunotherapy response and are novel and critical therapeutic targets. Though many tools of scRNA-seq analysis have been developed to investigate the heterogeneity and sub-populations of cells, few were designed for uncovering cell-cell communications of ME and predicting the potentially effective drugs to inhibit the communications. Moreover, the data analysis processes of discovering signaling communication networks and effective drugs using scRNA-seq data are complex and involve a set of critical analysis processes and external supportive data resources, which are difficult for researchers who have no strong computational background and training in scRNA-seq data analysis. To address these challenges, in this study, we developed a novel open-source computational tool, sc2MeNetDrug (https://fuhaililab.github.io/sc2MeNetDrug/). It was specifically designed using scRNA-seq data to identify cell types within disease MEs, uncover the dysfunctional signaling pathways within individual cell types and interactions among different cell types, and predict effective drugs that can potentially disrupt cell-cell signaling communications. sc2MeNetDrug provided a user-friendly graphical user interface to encapsulate the data analysis modules, which can facilitate the scRNA-seq data-based discovery of novel inter-cell signaling communications and novel therapeutic regimens.


Asunto(s)
Análisis de la Célula Individual , Programas Informáticos , RNA-Seq , Análisis de Secuencia de ARN , Perfilación de la Expresión Génica , Transducción de Señal/genética
15.
bioRxiv ; 2024 Jan 15.
Artículo en Inglés | MEDLINE | ID: mdl-38293243

RESUMEN

Recently, large-scale scRNA-seq datasets have been generated to understand the complex and poorly understood signaling mechanisms within microenvironment of Alzheimer's Disease (AD), which are critical for identifying novel therapeutic targets and precision medicine. Though a set of targets have been identified, however, it remains a challenging to infer the core intra- and inter-multi-cell signaling communication networks using the scRNA-seq data, considering the complex and highly interactive background signaling network. Herein, we introduced a novel graph transformer model, PathFinder, to infer multi-cell intra- and inter-cellular signaling pathways and signaling communications among multi-cell types. Compared with existing models, the novel and unique design of PathFinder is based on the divide-and-conquer strategy, which divides the complex signaling networks into signaling paths, and then score and rank them using a novel graph transformer architecture to infer the intra- and inter-cell signaling communications. We evaluated PathFinder using scRNA-seq data of APOE4-genotype specific AD mice models and identified novel APOE4 altered intra- and inter-cell interaction networks among neurons, astrocytes, and microglia. PathFinder is a general signaling network inference model and can be applied to other omics data-driven signaling network inference.

16.
bioRxiv ; 2024 Apr 04.
Artículo en Inglés | MEDLINE | ID: mdl-37662280

RESUMEN

Background and Objectives: Previous approaches pursuing normative modelling for analyzing heterogeneity in Alzheimer's Disease (AD) have relied on a single neuroimaging modality. However, AD is a multi-faceted disorder, with each modality providing unique and complementary info about AD. In this study, we used a deep-learning based multimodal normative model to assess the heterogeneity in regional brain patterns for ATN (amyloid-tau-neurodegeneration) biomarkers. Methods: We selected discovery (n = 665) and replication (n = 430) cohorts with simultaneous availability of ATN biomarkers: Florbetapir amyloid, Flortaucipir tau and T1-weighted MRI (magnetic resonance imaging) imaging. A multimodal variational autoencoder (conditioned on age and sex) was used as a normative model to learn the multimodal regional brain patterns of a cognitively unimpaired (CU) control group. The trained model was applied on individuals on the ADS (AD Spectrum) to estimate their deviations (Z-scores) from the normative distribution, resulting in a Z-score regional deviation map per ADS individual per modality. Regions with Z-scores < -1.96 for MRI and Z-scores > 1.96 for amyloid and tau were labelled as outliers. Hamming distance was used to quantify the dissimilarity between individual based on their outlier deviations across each modality. We also calculated a disease severity index (DSI) for each ADS individual which was estimated by averaging the deviations across all outlier regions corresponding to each modality. Results: ADS individuals with moderate or severe dementia showed higher proportion of regional outliers for each modality as well as more dissimilarity in modality-specific regional outlier patterns compared to ADS individuals with early or mild dementia. DSI was associated with the progressive stages of dementia, (ii) showed significant associations with neuropsychological composite scores and (iii) related to the longitudinal risk of CDR progression. Findings were reproducible in both discovery and replication cohorts. Discussion: Our is the first study to examine the heterogeneity in AD through the lens of multiple neuroimaging modalities (ATN), based on distinct or overlapping patterns of regional outlier deviations. Regional MRI and tau outliers were more heterogenous than regional amyloid outliers. DSI has the potential to be an individual patient metric of neurodegeneration that can help in clinical decision making and monitoring patient response for anti-amyloid treatments.

17.
Am J Transplant ; 24(3): 458-467, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-37468109

RESUMEN

Primary graft dysfunction (PGD) is the leading cause of morbidity and mortality in the first 30 days after lung transplantation. Risk factors for the development of PGD include donor and recipient characteristics, but how multiple variables interact to impact the development of PGD and how clinicians should consider these in making decisions about donor acceptance remain unclear. This was a single-center retrospective cohort study to develop and evaluate machine learning pipelines to predict the development of PGD grade 3 within the first 72 hours of transplantation using donor and recipient variables that are known at the time of donor offer acceptance. Among 576 bilateral lung recipients, 173 (30%) developed PGD grade 3. The cohort underwent a 75% to 25% train-test split, and lasso regression was used to identify 11 variables for model development. A K-nearest neighbor's model showing the best calibration and performance with relatively small confidence intervals was selected as the final predictive model with an area under the receiver operating characteristics curve of 0.65. Machine learning models can predict the risk for development of PGD grade 3 based on data available at the time of donor offer acceptance. This may improve donor-recipient matching and donor utilization in the future.


Asunto(s)
Trasplante de Pulmón , Disfunción Primaria del Injerto , Humanos , Estudios Retrospectivos , Disfunción Primaria del Injerto/diagnóstico , Disfunción Primaria del Injerto/etiología , Trasplante de Pulmón/efectos adversos , Factores de Riesgo , Pulmón
18.
bioRxiv ; 2024 Apr 06.
Artículo en Inglés | MEDLINE | ID: mdl-37808763

RESUMEN

Objective: Accurately identifying clinical phenotypes from Electronic Health Records (EHRs) provides additional insights into patients' health, especially when such information is unavailable in structured data. This study evaluates the application of OpenAI's Generative Pre-trained Transformer (GPT)-4 model to identify clinical phenotypes from EHR text in non-small cell lung cancer (NSCLC) patients. The goal was to identify disease stages, treatments and progression utilizing GPT-4, and compare its performance against GPT-3.5-turbo, Flan-T5-xl, Flan-T5-xxl, and two rule-based and machine learning-based methods, namely, scispaCy and medspaCy. Materials and Methods: Phenotypes such as initial cancer stage, initial treatment, evidence of cancer recurrence, and affected organs during recurrence were identified from 13,646 records for 63 NSCLC patients from Washington University in St. Louis, Missouri. The performance of the GPT-4 model is evaluated against GPT-3.5-turbo, Flan-T5-xxl, Flan-T5-xl, medspaCy and scispaCy by comparing precision, recall, and micro-F1 scores. Results: GPT-4 achieved higher F1 score, precision, and recall compared to Flan-T5-xl, Flan-T5-xxl, medspaCy and scispaCy's models. GPT-3.5-turbo performed similarly to that of GPT-4. GPT and Flan-T5 models were not constrained by explicit rule requirements for contextual pattern recognition. SpaCy models relied on predefined patterns, leading to their suboptimal performance. Discussion and Conclusion: GPT-4 improves clinical phenotype identification due to its robust pre-training and remarkable pattern recognition capability on the embedded tokens. It demonstrates data-driven effectiveness even with limited context in the input. While rule-based models remain useful for some tasks, GPT models offer improved contextual understanding of the text, and robust clinical phenotype extraction.

19.
Genet Med ; 26(3): 101035, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38059438

RESUMEN

PURPOSE: Clinically ascertained variants are under-utilized in neurodevelopmental disorder research. We established the Brain Gene Registry (BGR) to coregister clinically identified variants in putative brain genes with participant phenotypes. Here, we report 179 genetic variants in the first 179 BGR registrants and analyze the proportion that were novel to ClinVar at the time of entry and those that were absent in other disease databases. METHODS: From 10 academically affiliated institutions, 179 individuals with 179 variants were enrolled into the BGR. Variants were cross-referenced for previous presence in ClinVar and for presence in 6 other genetic databases. RESULTS: Of 179 variants in 76 genes, 76 (42.5%) were novel to ClinVar, and 62 (34.6%) were absent from all databases analyzed. Of the 103 variants present in ClinVar, 37 (35.9%) were uncertain (ClinVar aggregate classification of variant of uncertain significance or conflicting classifications). For 5 variants, the aggregate ClinVar classification was inconsistent with the interpretation from the BGR site-provided classification. CONCLUSION: A significant proportion of clinical variants that are novel or uncertain are not shared, limiting the evidence base for new gene-disease relationships. Registration of paired clinical genetic test results with phenotype has the potential to advance knowledge of the relationships between genes and neurodevelopmental disorders.


Asunto(s)
Bases de Datos Genéticas , Variación Genética , Humanos , Variación Genética/genética , Pruebas Genéticas/métodos , Fenotipo , Encéfalo
20.
Artículo en Inglés | MEDLINE | ID: mdl-38130873

RESUMEN

Normative modelling is a method for understanding the underlying heterogeneity within brain disorders like Alzheimer Disease (AD), by quantifying how each patient deviates from the expected normative pattern that has been learned from a healthy control distribution. Existing deep learning based normative models have been applied on only single modality Magnetic Resonance Imaging (MRI) neuroimaging data. However, these do not take into account the complementary information offered by multimodal M RI, which is essential for understanding a multifactorial disease like AD. To address this limitation, we propose a multi-modal variational autoencoder (mmVAE) based normative modelling framework that can capture the joint distribution between different modalities to identify abnormal brain volume deviations due to AD. Our multi-modal framework takes as input Freesurfer processed brain region volumes from T1-weighted (cortical and subcortical) and T2-weighed (hippocampal) scans of cognitively normal participants to learn the morphological characteristics of the healthy brain. The estimated normative model is then applied on AD patients to quantify the deviation in brain volumes and identify abnormal brain pattern deviations due to the progressive stages of AD. We compared our proposed mmVAE with a baseline unimodal VAE having a single encoder and decoder and the two modalities concatenated as unimodal input. Our experimental results show that deviation maps generated by mmVAE are more sensitive to disease staging within AD, have a better correlation with patient cognition and result in higher number of brain regions with statistically significant deviations compared to the unimodal baseline model.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...