Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 32
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
medRxiv ; 2024 Jul 22.
Artículo en Inglés | MEDLINE | ID: mdl-39108510

RESUMEN

Large language models (LLM) have shown great promise in supporting differential diagnosis, but 23 available published studies on the diagnostic accuracy evaluated small cohorts (number of cases, 30-422, mean 104) and have evaluated LLM responses subjectively by manual curation (23/23 studies). The performance of LLMs for rare disease diagnosis has not been evaluated systematically. Here, we perform a rigorous and large-scale analysis of the performance of a GPT-4 in prioritizing candidate diagnoses, using the largest-ever cohort of rare disease patients. Our computational study used 5267 computational case reports from previously published data. Each case was formatted as a Global Alliance for Genomics and Health (GA4GH) phenopacket, in which clinical anomalies were represented as Human Phenotype Ontology (HPO) terms. We developed software to generate prompts from each phenopacket. Prompts were sent to Generative Pre-trained Transformer 4 (GPT-4), and the rank of the correct diagnosis, if present in the response, was recorded. The mean reciprocal rank of the correct diagnosis was 0.24 (with the reciprocal of the MRR corresponding to a rank of 4.2), and the correct diagnosis was placed in rank 1 in 19.2% of the cases, in the first 3 ranks in 28.6%, and in the first 10 ranks in 32.5%. Our study is the largest to be reported to date and provides a realistic estimate of the performance of GPT-4 in rare disease medicine.

2.
medRxiv ; 2024 May 29.
Artículo en Inglés | MEDLINE | ID: mdl-38854034

RESUMEN

The Global Alliance for Genomics and Health (GA4GH) Phenopacket Schema was released in 2022 and approved by ISO as a standard for sharing clinical and genomic information about an individual, including phenotypic descriptions, numerical measurements, genetic information, diagnoses, and treatments. A phenopacket can be used as an input file for software that supports phenotype-driven genomic diagnostics and for algorithms that facilitate patient classification and stratification for identifying new diseases and treatments. There has been a great need for a collection of phenopackets to test software pipelines and algorithms. Here, we present phenopacket-store. Version 0.1.12 of phenopacket-store includes 4916 phenopackets representing 277 Mendelian and chromosomal diseases associated with 236 genes, and 2872 unique pathogenic alleles curated from 605 different publications. This represents the first large-scale collection of case-level, standardized phenotypic information derived from case reports in the literature with detailed descriptions of the clinical data and will be useful for many purposes, including the development and testing of software for prioritizing genes and diseases in diagnostic genomics, machine learning analysis of clinical phenotype data, patient stratification, and genotype-phenotype correlations. This corpus also provides best-practice examples for curating literature-derived data using the GA4GH Phenopacket Schema.

4.
Sci Rep ; 14(1): 8842, 2024 04 17.
Artículo en Inglés | MEDLINE | ID: mdl-38632317

RESUMEN

Sarcopenia is a serious systemic disease that reduces overall survival. TAVI is selectively performed in patients with severe aortic stenosis who are not indicated for open cardiac surgery due to severe polymorbidity. Artificial intelligence-assisted body composition assessment from available CT scans appears to be a simple tool to stratify these patients into low and high risk based on future estimates of all-cause mortality. Within our study, the segmentation of preprocedural CT scans at the level of the lumbar third vertebra in patients undergoing TAVI was performed using a neural network (AutoMATiCA). The obtained parameters (area and density of skeletal muscles and intramuscular, visceral, and subcutaneous adipose tissue) were analyzed using Cox univariate and multivariable models for continuous and categorical variables to assess the relation of selected variables with all-cause mortality. 866 patients were included (median(interquartile range)): age 79.7 (74.9-83.3) years; BMI 28.9 (25.9-32.6) kg/m2. Survival analysis was performed on all automatically obtained parameters of muscle and fat density and area. Skeletal muscle index (SMI in cm2/m2), visceral (VAT in HU) and subcutaneous adipose tissue (SAT in HU) density predicted the all-cause mortality in patients after TAVI expressed as hazard ratio (HR) with 95% confidence interval (CI): SMI HR 0.986, 95% CI (0.975-0.996); VAT 1.015 (1.002-1.028) and SAT 1.014 (1.004-1.023), all p < 0.05. Automatic body composition assessment can estimate higher all-cause mortality risk in patients after TAVI, which may be useful in preoperative clinical reasoning and stratification of patients.


Asunto(s)
Sarcopenia , Humanos , Anciano , Inteligencia Artificial , Tejido Adiposo , Músculo Esquelético , Grasa Subcutánea , Composición Corporal/fisiología , Estudios Retrospectivos
5.
Hum Genet ; 2024 Jan 03.
Artículo en Inglés | MEDLINE | ID: mdl-38170232

RESUMEN

Variants which disrupt splicing are a frequent cause of rare disease that have been under-ascertained clinically. Accurate and efficient methods to predict a variant's impact on splicing are needed to interpret the growing number of variants of unknown significance (VUS) identified by exome and genome sequencing. Here, we present the results of the CAGI6 Splicing VUS challenge, which invited predictions of the splicing impact of 56 variants ascertained clinically and functionally validated to determine splicing impact. The performance of 12 prediction methods, along with SpliceAI and CADD, was compared on the 56 functionally validated variants. The maximum accuracy achieved was 82% from two different approaches, one weighting SpliceAI scores by minor allele frequency, and one applying the recently published Splicing Prediction Pipeline (SPiP). SPiP performed optimally in terms of sensitivity, while an ensemble method combining multiple prediction tools and information from databases exceeded all others for specificity. Several challenge methods equalled or exceeded the performance of SpliceAI, with ultimate choice of prediction method likely to depend on experimental or clinical aims. One quarter of the variants were incorrectly predicted by at least 50% of the methods, highlighting the need for further improvements to splicing prediction methods for successful clinical application.

6.
medRxiv ; 2024 Feb 26.
Artículo en Inglés | MEDLINE | ID: mdl-37503093

RESUMEN

Objective: Large Language Models such as GPT-4 previously have been applied to differential diagnostic challenges based on published case reports. Published case reports have a sophisticated narrative style that is not readily available from typical electronic health records (EHR). Furthermore, even if such a narrative were available in EHRs, privacy requirements would preclude sending it outside the hospital firewall. We therefore tested a method for parsing clinical texts to extract ontology terms and programmatically generating prompts that by design are free of protected health information. Materials and Methods: We investigated different methods to prepare prompts from 75 recently published case reports. We transformed the original narratives by extracting structured terms representing phenotypic abnormalities, comorbidities, treatments, and laboratory tests and creating prompts programmatically. Results: Performance of all of these approaches was modest, with the correct diagnosis ranked first in only 5.3-17.6% of cases. The performance of the prompts created from structured data was substantially worse than that of the original narrative texts, even if additional information was added following manual review of term extraction. Moreover, different versions of GPT-4 demonstrated substantially different performance on this task. Discussion: The sensitivity of the performance to the form of the prompt and the instability of results over two GPT-4 versions represent important current limitations to the use of GPT-4 to support diagnosis in real-life clinical settings. Conclusion: Research is needed to identify the best methods for creating prompts from typically available clinical data to support differential diagnostics.

7.
Bioinformatics ; 39(12)2023 12 01.
Artículo en Inglés | MEDLINE | ID: mdl-38001031

RESUMEN

MOTIVATION: Methods for concept recognition (CR) in clinical texts have largely been tested on abstracts or articles from the medical literature. However, texts from electronic health records (EHRs) frequently contain spelling errors, abbreviations, and other nonstandard ways of representing clinical concepts. RESULTS: Here, we present a method inspired by the BLAST algorithm for biosequence alignment that screens texts for potential matches on the basis of matching k-mer counts and scores candidates based on conformance to typical patterns of spelling errors derived from 2.9 million clinical notes. Our method, the Term-BLAST-like alignment tool (TBLAT) leverages a gold standard corpus for typographical errors to implement a sequence alignment-inspired method for efficient entity linkage. We present a comprehensive experimental comparison of TBLAT with five widely used tools. Experimental results show an increase of 10% in recall on scientific publications and 20% increase in recall on EHR records (when compared against the next best method), hence supporting a significant enhancement of the entity linking task. The method can be used stand-alone or as a complement to existing approaches. AVAILABILITY AND IMPLEMENTATION: Fenominal is a Java library that implements TBLAT for named CR of Human Phenotype Ontology terms and is available at https://github.com/monarch-initiative/fenominal under the GNU General Public License v3.0.


Asunto(s)
Algoritmos , Lenguaje , Humanos , Alineación de Secuencia , Registros Electrónicos de Salud , Publicaciones
8.
Artículo en Inglés | MEDLINE | ID: mdl-37684057

RESUMEN

We identified a de novo heterozygous transient receptor potential cation channel subfamily M (melastatin) member 3 (TRPM3) missense variant, p.(Asn1126Asp), in a patient with developmental delay and manifestations of cerebral palsy (CP) using phenotype-driven prioritization analysis of whole-genome sequencing data with Exomiser. The variant is localized in the functionally important ion transport domain of the TRPM3 protein and predicted to impact the protein structure. Our report adds TRPM3 to the list of Mendelian disease-associated genes that can be associated with CP and provides further evidence for the pathogenicity of the variant p.(Asn1126Asp).


Asunto(s)
Parálisis Cerebral , Discapacidad Intelectual , Malformaciones del Sistema Nervioso , Canales Catiónicos TRPM , Humanos , Parálisis Cerebral/genética , Discapacidad Intelectual/genética , Mutación Missense/genética , Fenotipo , Canales Catiónicos TRPM/genética
9.
bioRxiv ; 2023 Oct 11.
Artículo en Inglés | MEDLINE | ID: mdl-37398049

RESUMEN

Numerous factors regulate alternative splicing of human genes at a co-transcriptional level. However, how alternative splicing depends on the regulation of gene expression is poorly understood. We leveraged data from the Genotype-Tissue Expression (GTEx) project to show a significant association of gene expression and splicing for 6874 (4.9%) of 141,043 exons in 1106 (13.3%) of 8314 genes with substantially variable expression in ten GTEx tissues. About half of these exons demonstrate higher inclusion with higher gene expression, and half demonstrate higher exclusion, with the observed direction of coupling being highly consistent across different tissues and in external datasets. The exons differ with respect to sequence characteristics, enriched sequence motifs, RNA polymerase II binding, and inferred transcription rate of downstream introns. The exons were enriched for hundreds of isoform-specific Gene Ontology annotations, suggesting that the coupling of expression and alternative splicing described here may provide an important gene regulatory mechanism that might be used in a variety of biological contexts. In particular, higher inclusion exons could play an important role during cell division.

10.
PLoS One ; 18(5): e0285433, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37196000

RESUMEN

The Global Alliance for Genomics and Health (GA4GH) is a standards-setting organization that is developing a suite of coordinated standards for genomics. The GA4GH Phenopacket Schema is a standard for sharing disease and phenotype information that characterizes an individual person or biosample. The Phenopacket Schema is flexible and can represent clinical data for any kind of human disease including rare disease, complex disease, and cancer. It also allows consortia or databases to apply additional constraints to ensure uniform data collection for specific goals. We present phenopacket-tools, an open-source Java library and command-line application for construction, conversion, and validation of phenopackets. Phenopacket-tools simplifies construction of phenopackets by providing concise builders, programmatic shortcuts, and predefined building blocks (ontology classes) for concepts such as anatomical organs, age of onset, biospecimen type, and clinical modifiers. Phenopacket-tools can be used to validate the syntax and semantics of phenopackets as well as to assess adherence to additional user-defined requirements. The documentation includes examples showing how to use the Java library and the command-line tool to create and validate phenopackets. We demonstrate how to create, convert, and validate phenopackets using the library or the command-line application. Source code, API documentation, comprehensive user guide and a tutorial can be found at https://github.com/phenopackets/phenopacket-tools. The library can be installed from the public Maven Central artifact repository and the application is available as a standalone archive. The phenopacket-tools library helps developers implement and standardize the collection and exchange of phenotypic and other clinical data for use in phenotype-driven genomic diagnostics, translational research, and precision medicine applications.


Asunto(s)
Neoplasias , Programas Informáticos , Humanos , Genómica , Bases de Datos Factuales , Biblioteca de Genes
11.
Adv Genet (Hoboken) ; 4(1): 2200016, 2023 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-36910590

RESUMEN

The Global Alliance for Genomics and Health (GA4GH) is developing a suite of coordinated standards for genomics for healthcare. The Phenopacket is a new GA4GH standard for sharing disease and phenotype information that characterizes an individual person, linking that individual to detailed phenotypic descriptions, genetic information, diagnoses, and treatments. A detailed example is presented that illustrates how to use the schema to represent the clinical course of a patient with retinoblastoma, including demographic information, the clinical diagnosis, phenotypic features and clinical measurements, an examination of the extirpated tumor, therapies, and the results of genomic analysis. The Phenopacket Schema, together with other GA4GH data and technical standards, will enable data exchange and provide a foundation for the computational analysis of disease and phenotype information to improve our ability to diagnose and conduct research on all types of disorders, including cancer and rare diseases.

13.
Genome Med ; 14(1): 44, 2022 04 28.
Artículo en Inglés | MEDLINE | ID: mdl-35484572

RESUMEN

Structural variants (SVs) are implicated in the etiology of Mendelian diseases but have been systematically underascertained owing to sequencing technology limitations. Long-read sequencing enables comprehensive detection of SVs, but approaches for prioritization of candidate SVs are needed. Structural variant Annotation and analysis (SvAnna) assesses all classes of SVs and their intersection with transcripts and regulatory sequences, relating predicted effects on gene function with clinical phenotype data. SvAnna places 87% of deleterious SVs in the top ten ranks. The interpretable prioritizations offered by SvAnna will facilitate the widespread adoption of long-read sequencing in diagnostic genomics. SvAnna is available at https://github.com/TheJacksonLaboratory/SvAnn a .


Asunto(s)
Genómica , Secuencia de Bases , Mapeo Cromosómico , Humanos , Análisis de Secuencia de ADN , Virulencia
14.
Hum Mutat ; 43(8): 1071-1081, 2022 08.
Artículo en Inglés | MEDLINE | ID: mdl-35391505

RESUMEN

Rare disease diagnostics and disease gene discovery have been revolutionized by whole-exome and genome sequencing but identifying the causative variant(s) from the millions in each individual remains challenging. The use of deep phenotyping of patients and reference genotype-phenotype knowledge, alongside variant data such as allele frequency, segregation, and predicted pathogenicity, has proved an effective strategy to tackle this issue. Here we review the numerous tools that have been developed to automate this approach and demonstrate the power of such an approach on several thousand diagnosed cases from the 100,000 Genomes Project. Finally, we discuss the challenges that need to be overcome if we are going to improve detection rates and help the majority of patients that still remain without a molecular diagnosis after state-of-the-art genomic interpretation.


Asunto(s)
Exoma , Enfermedades Raras , Exoma/genética , Genómica , Humanos , Fenotipo , Enfermedades Raras/diagnóstico , Enfermedades Raras/genética , Secuenciación del Exoma
16.
Am J Hum Genet ; 108(9): 1564-1577, 2021 09 02.
Artículo en Inglés | MEDLINE | ID: mdl-34289339

RESUMEN

A critical challenge in genetic diagnostics is the computational assessment of candidate splice variants, specifically the interpretation of nucleotide changes located outside of the highly conserved dinucleotide sequences at the 5' and 3' ends of introns. To address this gap, we developed the Super Quick Information-content Random-forest Learning of Splice variants (SQUIRLS) algorithm. SQUIRLS generates a small set of interpretable features for machine learning by calculating the information-content of wild-type and variant sequences of canonical and cryptic splice sites, assessing changes in candidate splicing regulatory sequences, and incorporating characteristics of the sequence such as exon length, disruptions of the AG exclusion zone, and conservation. We curated a comprehensive collection of disease-associated splice-altering variants at positions outside of the highly conserved AG/GT dinucleotides at the termini of introns. SQUIRLS trains two random-forest classifiers for the donor and for the acceptor and combines their outputs by logistic regression to yield a final score. We show that SQUIRLS transcends previous state-of-the-art accuracy in classifying splice variants as assessed by rank analysis in simulated exomes, and is significantly faster than competing methods. SQUIRLS provides tabular output files for incorporation into diagnostic pipelines for exome and genome analysis, as well as visualizations that contextualize predicted effects of variants on splicing to make it easier to interpret splice variants in diagnostic settings.


Asunto(s)
Algoritmos , Curaduría de Datos/métodos , Enfermedades Genéticas Congénitas/genética , Sitios de Empalme de ARN , Empalme del ARN , Programas Informáticos , Secuencia de Bases , Biología Computacional/métodos , Exoma , Exones , Enfermedades Genéticas Congénitas/diagnóstico , Enfermedades Genéticas Congénitas/patología , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Intrones , Mutación , Secuenciación del Exoma
17.
Int J Pediatr Otorhinolaryngol ; 140: 110499, 2021 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-33234331

RESUMEN

Waardenburg syndrome (WS) is a clinically and genetically heterogeneous group of inherited disorders manifesting with sensorineural hearing loss and pigmentary anomalies. Here we present two Caucasian families with novel variants in EDNRB and SOX10 representing both sides of phenotype spectrum in WS. The c.521G>A variant in EDNRB identified in Family 1 leads to disruption of the cysteine disulfide bridge between extracellular segments of endothelin receptor type B and causes relatively mild phenotype of WS type II with low penetrance. The novel nonsense variant c.900C>A in SOX10 detected in Family 2 leads to PCWH syndrome and was found to be lethal.


Asunto(s)
Síndrome de Waardenburg , Humanos , Mutación , Fenotipo , Receptor de Endotelina B/genética , Factores de Transcripción SOXE/genética , Síndrome , Síndrome de Waardenburg/genética
18.
Nucleic Acids Res ; 49(D1): D1207-D1217, 2021 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-33264411

RESUMEN

The Human Phenotype Ontology (HPO, https://hpo.jax.org) was launched in 2008 to provide a comprehensive logical standard to describe and computationally analyze phenotypic abnormalities found in human disease. The HPO is now a worldwide standard for phenotype exchange. The HPO has grown steadily since its inception due to considerable contributions from clinical experts and researchers from a diverse range of disciplines. Here, we present recent major extensions of the HPO for neurology, nephrology, immunology, pulmonology, newborn screening, and other areas. For example, the seizure subontology now reflects the International League Against Epilepsy (ILAE) guidelines and these enhancements have already shown clinical validity. We present new efforts to harmonize computational definitions of phenotypic abnormalities across the HPO and multiple phenotype ontologies used for animal models of disease. These efforts will benefit software such as Exomiser by improving the accuracy and scope of cross-species phenotype matching. The computational modeling strategy used by the HPO to define disease entities and phenotypic features and distinguish between them is explained in detail.We also report on recent efforts to translate the HPO into indigenous languages. Finally, we summarize recent advances in the use of HPO in electronic health record systems.


Asunto(s)
Ontologías Biológicas , Biología Computacional/métodos , Bases de Datos Factuales , Enfermedad/genética , Genoma , Fenotipo , Programas Informáticos , Animales , Modelos Animales de Enfermedad , Genotipo , Humanos , Recién Nacido , Cooperación Internacional , Internet , Tamizaje Neonatal/métodos , Farmacogenética/métodos , Terminología como Asunto
19.
Am J Hum Genet ; 107(3): 403-417, 2020 09 03.
Artículo en Inglés | MEDLINE | ID: mdl-32755546

RESUMEN

Human Phenotype Ontology (HPO)-based analysis has become standard for genomic diagnostics of rare diseases. Current algorithms use a variety of semantic and statistical approaches to prioritize the typically long lists of genes with candidate pathogenic variants. These algorithms do not provide robust estimates of the strength of the predictions beyond the placement in a ranked list, nor do they provide measures of how much any individual phenotypic observation has contributed to the prioritization result. However, given that the overall success rate of genomic diagnostics is only around 25%-50% or less in many cohorts, a good ranking cannot be taken to imply that the gene or disease at rank one is necessarily a good candidate. Here, we present an approach to genomic diagnostics that exploits the likelihood ratio (LR) framework to provide an estimate of (1) the posttest probability of candidate diagnoses, (2) the LR for each observed HPO phenotype, and (3) the predicted pathogenicity of observed genotypes. LIkelihood Ratio Interpretation of Clinical AbnormaLities (LIRICAL) placed the correct diagnosis within the first three ranks in 92.9% of 384 case reports comprising 262 Mendelian diseases, and the correct diagnosis had a mean posttest probability of 67.3%. Simulations show that LIRICAL is robust to many typically encountered forms of genomic and phenomic noise. In summary, LIRICAL provides accurate, clinically interpretable results for phenotype-driven genomic diagnostics.


Asunto(s)
Biología Computacional , Bases de Datos Genéticas , Genómica , Enfermedades Raras/diagnóstico , Algoritmos , Exoma/genética , Humanos , Fenotipo , Enfermedades Raras/genética , Programas Informáticos
20.
Genome Biol ; 21(1): 171, 2020 07 13.
Artículo en Inglés | MEDLINE | ID: mdl-32660516

RESUMEN

We present Hierarchical Bayesian Analysis of Differential Expression and ALternative Splicing (HBA-DEALS), which simultaneously characterizes differential expression and splicing in cohorts. HBA-DEALS attains state of the art or better performance for both expression and splicing and allows genes to be characterized as having differential gene expression, differential alternative splicing, both, or neither. HBA-DEALS analysis of GTEx data demonstrated sets of genes that show predominant DGE or DAST across multiple tissue types. These sets have pervasive differences with respect to gene structure, function, membership in protein complexes, and promoter architecture.


Asunto(s)
Empalme Alternativo , Expresión Génica , Modelos Biológicos , Análisis de Secuencia de ARN , Programas Informáticos , Teorema de Bayes
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...