Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 286
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Cell ; 169(1): 6-12, 2017 03 23.
Artículo en Inglés | MEDLINE | ID: mdl-28340351

RESUMEN

Genome sequencing has revolutionized the diagnosis of genetic diseases. Close collaborations between basic scientists and clinical genomicists are now needed to link genetic variants with disease causation. To facilitate such collaborations, we recommend prioritizing clinically relevant genes for functional studies, developing reference variant-phenotype databases, adopting phenotype description standards, and promoting data sharing.


Asunto(s)
Investigación Biomédica , Genómica , Animales , Análisis Mutacional de ADN , Bases de Datos Genéticas , Enfermedad/genética , Proyecto Genoma Humano , Humanos , Difusión de la Información , Modelos Animales
2.
Nucleic Acids Res ; 52(D1): D938-D949, 2024 Jan 05.
Artículo en Inglés | MEDLINE | ID: mdl-38000386

RESUMEN

Bridging the gap between genetic variations, environmental determinants, and phenotypic outcomes is critical for supporting clinical diagnosis and understanding mechanisms of diseases. It requires integrating open data at a global scale. The Monarch Initiative advances these goals by developing open ontologies, semantic data models, and knowledge graphs for translational research. The Monarch App is an integrated platform combining data about genes, phenotypes, and diseases across species. Monarch's APIs enable access to carefully curated datasets and advanced analysis tools that support the understanding and diagnosis of disease for diverse applications such as variant prioritization, deep phenotyping, and patient profile-matching. We have migrated our system into a scalable, cloud-based infrastructure; simplified Monarch's data ingestion and knowledge graph integration systems; enhanced data mapping and integration standards; and developed a new user interface with novel search and graph navigation features. Furthermore, we advanced Monarch's analytic tools by developing a customized plugin for OpenAI's ChatGPT to increase the reliability of its responses about phenotypic data, allowing us to interrogate the knowledge in the Monarch graph using state-of-the-art Large Language Models. The resources of the Monarch Initiative can be found at monarchinitiative.org and its corresponding code repository at github.com/monarch-initiative/monarch-app.


Asunto(s)
Bases de Datos Factuales , Enfermedad , Genes , Fenotipo , Humanos , Internet , Bases de Datos Factuales/normas , Programas Informáticos , Genes/genética , Enfermedad/genética
3.
Trends Genet ; 38(12): 1271-1283, 2022 12.
Artículo en Inglés | MEDLINE | ID: mdl-35934592

RESUMEN

A molecular diagnosis from the analysis of sequencing data in rare Mendelian diseases has a huge impact on the management of patients and their families. Numerous patient phenotype-aware variant prioritisation (VP) tools have been developed to help automate this process, and shorten the diagnostic odyssey, but performance statistics on real patient data are limited. Here we identify, assess, and compare the performance of all up-to-date, freely available, and programmatically accessible tools using a whole-exome, retinal disease dataset from 134 individuals with a molecular diagnosis. All tools were able to identify around two-thirds of the genetic diagnoses as the top-ranked candidate, with LIRICAL performing best overall. Finally, we discuss the challenges to overcome most cases remaining undiagnosed after current, state-of-the-art practices.


Asunto(s)
Exoma , Enfermedades Raras , Humanos , Fenotipo , Secuenciación del Exoma , Enfermedades Raras/diagnóstico , Enfermedades Raras/genética
4.
Bioinformatics ; 2024 Jun 24.
Artículo en Inglés | MEDLINE | ID: mdl-38913850

RESUMEN

MOTIVATION: Human Phenotype Ontology (HPO)-based phenotype concept recognition underpins a faster and more effective mechanism to create patient phenotype profiles or to document novel phenotype-centred knowledge statements. While the increasing adoption of large language models (LLM) for natural language understanding has led to several LLM-based solutions, we argue that their intrinsic resource-intensive nature is not suitable for realistic management of the phenotype concept recognition (CR) lifecycle. Consequently, we propose to go back to the basics and adopt a dictionary-based approach that enables both an immediate refresh of the ontological concepts as well as efficient re-analysis of past data. RESULTS: We developed a dictionary-based approach using a pre-built large collection of clusters of morphologically-equivalent tokens-to address lexical variability and a more effective concept recognition step by reducing the entity boundary detection strictly to candidates consisting of tokens belonging to ontology concepts. Our method achieves state-of-the-art results (0.76 F1 on the GSC+ corpus) and a processing efficiency of 10,000 publication abstracts in 5s. AVAILABILITY: FastHPOCR is available as a Python package installable via pip. The source code is available at https://github.com/tudorgroza/fast_hpo_cr. A Java implementation of FastHPOCR will be made available as part of the Fenominal Java library available at https://github.com/monarch-initiative/fenominal. The up-to-date GCS-2024 corpus is available at https://github.com/tudorgroza/code-for-papers/tree/main/gsc-2024.

5.
Bioinformatics ; 40(3)2024 Mar 04.
Artículo en Inglés | MEDLINE | ID: mdl-38383067

RESUMEN

MOTIVATION: Creating knowledge bases and ontologies is a time consuming task that relies on manual curation. AI/NLP approaches can assist expert curators in populating these knowledge bases, but current approaches rely on extensive training data, and are not able to populate arbitrarily complex nested knowledge schemas. RESULTS: Here we present Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES), a Knowledge Extraction approach that relies on the ability of Large Language Models (LLMs) to perform zero-shot learning and general-purpose query answering from flexible prompts and return information conforming to a specified schema. Given a detailed, user-defined knowledge schema and an input text, SPIRES recursively performs prompt interrogation against an LLM to obtain a set of responses matching the provided schema. SPIRES uses existing ontologies and vocabularies to provide identifiers for matched elements. We present examples of applying SPIRES in different domains, including extraction of food recipes, multi-species cellular signaling pathways, disease treatments, multi-step drug mechanisms, and chemical to disease relationships. Current SPIRES accuracy is comparable to the mid-range of existing Relation Extraction methods, but greatly surpasses an LLM's native capability of grounding entities with unique identifiers. SPIRES has the advantage of easy customization, flexibility, and, crucially, the ability to perform new tasks in the absence of any new training data. This method supports a general strategy of leveraging the language interpreting capabilities of LLMs to assemble knowledge bases, assisting manual knowledge curation and acquisition while supporting validation with publicly-available databases and ontologies external to the LLM. AVAILABILITY AND IMPLEMENTATION: SPIRES is available as part of the open source OntoGPT package: https://github.com/monarch-initiative/ontogpt.


Asunto(s)
Bases del Conocimiento , Semántica , Bases de Datos Factuales
6.
Blood ; 142(24): 2055-2068, 2023 12 14.
Artículo en Inglés | MEDLINE | ID: mdl-37647632

RESUMEN

Rare genetic diseases affect millions, and identifying causal DNA variants is essential for patient care. Therefore, it is imperative to estimate the effect of each independent variant and improve their pathogenicity classification. Our study of 140 214 unrelated UK Biobank (UKB) participants found that each of them carries a median of 7 variants previously reported as pathogenic or likely pathogenic. We focused on 967 diagnostic-grade gene (DGG) variants for rare bleeding, thrombotic, and platelet disorders (BTPDs) observed in 12 367 UKB participants. By association analysis, for a subset of these variants, we estimated effect sizes for platelet count and volume, and odds ratios for bleeding and thrombosis. Variants causal of some autosomal recessive platelet disorders revealed phenotypic consequences in carriers. Loss-of-function variants in MPL, which cause chronic amegakaryocytic thrombocytopenia if biallelic, were unexpectedly associated with increased platelet counts in carriers. We also demonstrated that common variants identified by genome-wide association studies (GWAS) for platelet count or thrombosis risk may influence the penetrance of rare variants in BTPD DGGs on their associated hemostasis disorders. Network-propagation analysis applied to an interactome of 18 410 nodes and 571 917 edges showed that GWAS variants with large effect sizes are enriched in DGGs and their first-order interactors. Finally, we illustrate the modifying effect of polygenic scores for platelet count and thrombosis risk on disease severity in participants carrying rare variants in TUBB1 or PROC and PROS1, respectively. Our findings demonstrate the power of association analyses using large population datasets in improving pathogenicity classifications of rare variants.


Asunto(s)
Estudio de Asociación del Genoma Completo , Trombosis , Humanos , Bancos de Muestras Biológicas , Hemostasis , Hemorragia/genética , Enfermedades Raras
7.
Am J Hum Genet ; 108(9): 1564-1577, 2021 09 02.
Artículo en Inglés | MEDLINE | ID: mdl-34289339

RESUMEN

A critical challenge in genetic diagnostics is the computational assessment of candidate splice variants, specifically the interpretation of nucleotide changes located outside of the highly conserved dinucleotide sequences at the 5' and 3' ends of introns. To address this gap, we developed the Super Quick Information-content Random-forest Learning of Splice variants (SQUIRLS) algorithm. SQUIRLS generates a small set of interpretable features for machine learning by calculating the information-content of wild-type and variant sequences of canonical and cryptic splice sites, assessing changes in candidate splicing regulatory sequences, and incorporating characteristics of the sequence such as exon length, disruptions of the AG exclusion zone, and conservation. We curated a comprehensive collection of disease-associated splice-altering variants at positions outside of the highly conserved AG/GT dinucleotides at the termini of introns. SQUIRLS trains two random-forest classifiers for the donor and for the acceptor and combines their outputs by logistic regression to yield a final score. We show that SQUIRLS transcends previous state-of-the-art accuracy in classifying splice variants as assessed by rank analysis in simulated exomes, and is significantly faster than competing methods. SQUIRLS provides tabular output files for incorporation into diagnostic pipelines for exome and genome analysis, as well as visualizations that contextualize predicted effects of variants on splicing to make it easier to interpret splice variants in diagnostic settings.


Asunto(s)
Algoritmos , Curaduría de Datos/métodos , Enfermedades Genéticas Congénitas/genética , Sitios de Empalme de ARN , Empalme del ARN , Programas Informáticos , Secuencia de Bases , Biología Computacional/métodos , Exoma , Exones , Enfermedades Genéticas Congénitas/diagnóstico , Enfermedades Genéticas Congénitas/patología , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Intrones , Mutación , Secuenciación del Exoma
8.
Brief Bioinform ; 23(5)2022 09 20.
Artículo en Inglés | MEDLINE | ID: mdl-35595299

RESUMEN

Yuan et al. recently described an independent evaluation of several phenotype-driven gene prioritization methods for Mendelian disease on two separate, clinical datasets. Although they attempted to use default settings for each tool, we describe three key differences from those we currently recommend for our Exomiser and PhenIX tools. These influence how variant frequency, quality and predicted pathogenicity are used for filtering and prioritization. We propose that these differences account for much of the discrepancy in performance between that reported by them (15-26% diagnoses ranked top by Exomiser) and previously published reports by us and others (72-77%). On a set of 161 singleton samples, we show using these settings increases performance from 34% to 72% and suggest a reassessment of Exomiser and PhenIX on their datasets using these would show a similar uplift.


Asunto(s)
Enfermedades Genéticas Congénitas , Fenotipo , Biología Computacional , Humanos
9.
Bioinformatics ; 39(12)2023 12 01.
Artículo en Inglés | MEDLINE | ID: mdl-38001031

RESUMEN

MOTIVATION: Methods for concept recognition (CR) in clinical texts have largely been tested on abstracts or articles from the medical literature. However, texts from electronic health records (EHRs) frequently contain spelling errors, abbreviations, and other nonstandard ways of representing clinical concepts. RESULTS: Here, we present a method inspired by the BLAST algorithm for biosequence alignment that screens texts for potential matches on the basis of matching k-mer counts and scores candidates based on conformance to typical patterns of spelling errors derived from 2.9 million clinical notes. Our method, the Term-BLAST-like alignment tool (TBLAT) leverages a gold standard corpus for typographical errors to implement a sequence alignment-inspired method for efficient entity linkage. We present a comprehensive experimental comparison of TBLAT with five widely used tools. Experimental results show an increase of 10% in recall on scientific publications and 20% increase in recall on EHR records (when compared against the next best method), hence supporting a significant enhancement of the entity linking task. The method can be used stand-alone or as a complement to existing approaches. AVAILABILITY AND IMPLEMENTATION: Fenominal is a Java library that implements TBLAT for named CR of Human Phenotype Ontology terms and is available at https://github.com/monarch-initiative/fenominal under the GNU General Public License v3.0.


Asunto(s)
Algoritmos , Lenguaje , Humanos , Alineación de Secuencia , Registros Electrónicos de Salud , Publicaciones
10.
Bioinformatics ; 39(4)2023 04 03.
Artículo en Inglés | MEDLINE | ID: mdl-36929917

RESUMEN

MOTIVATION: Advances in RNA sequencing technologies have achieved an unprecedented accuracy in the quantification of mRNA isoforms, but our knowledge of isoform-specific functions has lagged behind. There is a need to understand the functional consequences of differential splicing, which could be supported by the generation of accurate and comprehensive isoform-specific gene ontology annotations. RESULTS: We present isoform interpretation, a method that uses expectation-maximization to infer isoform-specific functions based on the relationship between sequence and functional isoform similarity. We predicted isoform-specific functional annotations for 85 617 isoforms of 17 900 protein-coding human genes spanning a range of 17 430 distinct gene ontology terms. Comparison with a gold-standard corpus of manually annotated human isoform functions showed that isoform interpretation significantly outperforms state-of-the-art competing methods. We provide experimental evidence that functionally related isoforms predicted by isoform interpretation show a higher degree of domain sharing and expression correlation than functionally related genes. We also show that isoform sequence similarity correlates better with inferred isoform function than with gene-level function. AVAILABILITY AND IMPLEMENTATION: Source code, documentation, and resource files are freely available under a GNU3 license at https://github.com/TheJacksonLaboratory/isopretEM and https://zenodo.org/record/7594321.


Asunto(s)
Motivación , Programas Informáticos , Humanos , Isoformas de Proteínas/genética , Empalme Alternativo , Análisis de Secuencia de ARN
11.
Bioinformatics ; 39(7)2023 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-37389415

RESUMEN

MOTIVATION: Knowledge graphs (KGs) are a powerful approach for integrating heterogeneous data and making inferences in biology and many other domains, but a coherent solution for constructing, exchanging, and facilitating the downstream use of KGs is lacking. RESULTS: Here we present KG-Hub, a platform that enables standardized construction, exchange, and reuse of KGs. Features include a simple, modular extract-transform-load pattern for producing graphs compliant with Biolink Model (a high-level data model for standardizing biological data), easy integration of any OBO (Open Biological and Biomedical Ontologies) ontology, cached downloads of upstream data sources, versioned and automatically updated builds with stable URLs, web-browsable storage of KG artifacts on cloud infrastructure, and easy reuse of transformed subgraphs across projects. Current KG-Hub projects span use cases including COVID-19 research, drug repurposing, microbial-environmental interactions, and rare disease research. KG-Hub is equipped with tooling to easily analyze and manipulate KGs. KG-Hub is also tightly integrated with graph machine learning (ML) tools which allow automated graph ML, including node embeddings and training of models for link prediction and node classification. AVAILABILITY AND IMPLEMENTATION: https://kghub.org.


Asunto(s)
Ontologías Biológicas , COVID-19 , Humanos , Reconocimiento de Normas Patrones Automatizadas , Enfermedades Raras , Aprendizaje Automático
12.
Genet Med ; 26(7): 101141, 2024 Apr 15.
Artículo en Inglés | MEDLINE | ID: mdl-38629401

RESUMEN

PURPOSE: Existing resources that characterize the essentiality status of genes are based on either proliferation assessment in human cell lines, viability evaluation in mouse knockouts, or constraint metrics derived from human population sequencing studies. Several repositories document phenotypic annotations for rare disorders; however, there is a lack of comprehensive reporting on lethal phenotypes. METHODS: We queried Online Mendelian Inheritance in Man for terms related to lethality and classified all Mendelian genes according to the earliest age of death recorded for the associated disorders, from prenatal death to no reports of premature death. We characterized the genes across these lethality categories, examined the evidence on viability from mouse models and explored how this information could be used for novel gene discovery. RESULTS: We developed the Lethal Phenotypes Portal to showcase this curated catalog of human essential genes. Differences in the mode of inheritance, physiological systems affected, and disease class were found for genes in different lethality categories, as well as discrepancies between the lethal phenotypes observed in mouse and human. CONCLUSION: We anticipate that this resource will aid clinicians in the diagnosis of early lethal conditions and assist researchers in investigating the properties that make these genes essential for human development.

13.
Genet Med ; 26(2): 101029, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-37982373

RESUMEN

PURPOSE: The terminology used for gene-disease curation and variant annotation to describe inheritance, allelic requirement, and both sequence and functional consequences of a variant is currently not standardized. There is considerable discrepancy in the literature and across clinical variant reporting in the derivation and application of terms. Here, we standardize the terminology for the characterization of disease-gene relationships to facilitate harmonized global curation and to support variant classification within the ACMG/AMP framework. METHODS: Terminology for inheritance, allelic requirement, and both structural and functional consequences of a variant used by Gene Curation Coalition members and partner organizations was collated and reviewed. Harmonized terminology with definitions and use examples was created, reviewed, and validated. RESULTS: We present a standardized terminology to describe gene-disease relationships, and to support variant annotation. We demonstrate application of the terminology for classification of variation in the ACMG SF 2.0 genes recommended for reporting of secondary findings. Consensus terms were agreed and formalized in both Sequence Ontology (SO) and Human Phenotype Ontology (HPO) ontologies. Gene Curation Coalition member groups intend to use or map to these terms in their respective resources. CONCLUSION: The terminology standardization presented here will improve harmonization, facilitate the pooling of curation datasets across international curation efforts and, in turn, improve consistency in variant classification and genetic test interpretation.


Asunto(s)
Pruebas Genéticas , Variación Genética , Humanos , Alelos , Bases de Datos Genéticas
14.
Prenat Diagn ; 44(4): 454-464, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38242839

RESUMEN

Advances in sequencing and imaging technologies enable enhanced assessment in the prenatal space, with a goal to diagnose and predict the natural history of disease, to direct targeted therapies, and to implement clinical management, including transfer of care, election of supportive care, and selection of surgical interventions. The current lack of standardization and aggregation stymies variant interpretation and gene discovery, which hinders the provision of prenatal precision medicine, leaving clinicians and patients without an accurate diagnosis. With large amounts of data generated, it is imperative to establish standards for data collection, processing, and aggregation. Aggregated and homogeneously processed genetic and phenotypic data permits dissection of the genomic architecture of prenatal presentations of disease and provides a dataset on which data analysis algorithms can be tuned to the prenatal space. Here we discuss the importance of generating aggregate data sets and how the prenatal space is driving the development of interoperable standards and phenotype-driven tools.


Asunto(s)
Medicina de Precisión , Diagnóstico Prenatal , Embarazo , Femenino , Humanos , Fenotipo , Genómica , Algoritmos
15.
Nucleic Acids Res ; 50(W1): W322-W329, 2022 07 05.
Artículo en Inglés | MEDLINE | ID: mdl-35639768

RESUMEN

While great advances in predicting the effects of coding variants have been made, the assessment of non-coding variants remains challenging. This is especially problematic for variants within promoter regions which can lead to over-expression of a gene or reduce or even abolish its expression. The binding of transcription factors to the DNA can be predicted using position weight matrices (PWMs). More recently, transcription factor flexible models (TFFMs) have been introduced and shown to be more accurate than PWMs. TFFMs are based on hidden Markov models and can account for complex positional dependencies. Our new web-based application FABIAN-variant uses 1224 TFFMs and 3790 PWMs to predict whether and to which degree DNA variants affect the binding of 1387 different human transcription factors. For each variant and transcription factor, the software combines the results of different models for a final prediction of the resulting binding-affinity change. The software is written in C++ for speed but variants can be entered through a web interface. Alternatively, a VCF file can be uploaded to assess variants identified by high-throughput sequencing. The search can be restricted to variants in the vicinity of candidate genes. FABIAN-variant is available freely at https://www.genecascade.org/fabian/.


Asunto(s)
Proteínas de Unión al ADN , ADN , Variación Genética , Programas Informáticos , Factores de Transcripción , Humanos , Sitios de Unión/genética , ADN/genética , ADN/metabolismo , Posición Específica de Matrices de Puntuación , Unión Proteica , Factores de Transcripción/genética , Factores de Transcripción/metabolismo , Variación Genética/genética , Proteínas de Unión al ADN/genética , Proteínas de Unión al ADN/metabolismo , Internet , Lenguajes de Programación
16.
Nucleic Acids Res ; 50(W1): W677-W681, 2022 07 05.
Artículo en Inglés | MEDLINE | ID: mdl-35524573

RESUMEN

Precision medicine needs precise phenotypes. The Human Phenotype Ontology (HPO) uses clinical signs instead of diagnoses and has become the standard annotation for patients' phenotypes when describing single gene disorders. Use of the HPO beyond human genetics is however still limited. With SAMS (Symptom Annotation Made Simple), we want to bring sign-based phenotyping to routine clinical care, to hospital patients as well as to outpatients. Our web-based application provides access to three widely used annotation systems: HPO, OMIM, Orphanet. Whilst data can be stored in our database, phenotypes can also be imported and exported as Global Alliance for Genomics and Health (GA4GH) Phenopackets without using the database. The web interface can easily be integrated into local databases, e.g. clinical information systems. SAMS offers users to share their data with others, empowering patients to record their own signs and symptoms (or those of their children) and thus provide their doctors with additional information. We think that our approach will lead to better characterised patients which is not only helpful for finding disease mutations but also to better understand the pathophysiology of diseases and to recruit patients for studies and clinical trials. SAMS is freely available at https://www.genecascade.org/SAMS/.


Asunto(s)
Bases de Datos Genéticas , Programas Informáticos , Niño , Humanos , Genómica , Fenotipo , Mutación
17.
Proc Natl Acad Sci U S A ; 118(2)2021 01 12.
Artículo en Inglés | MEDLINE | ID: mdl-33402532

RESUMEN

Pathogenic germline mutations in PIGV lead to glycosylphosphatidylinositol biosynthesis deficiency (GPIBD). Individuals with pathogenic biallelic mutations in genes of the glycosylphosphatidylinositol (GPI)-anchor pathway exhibit cognitive impairments, motor delay, and often epilepsy. Thus far, the pathophysiology underlying the disease remains unclear, and suitable rodent models that mirror all symptoms observed in human patients have not been available. Therefore, we used CRISPR-Cas9 to introduce the most prevalent hypomorphic missense mutation in European patients, Pigv:c.1022C > A (p.A341E), at a site that is conserved in mice. Mirroring the human pathology, mutant Pigv341E mice exhibited deficits in motor coordination, cognitive impairments, and alterations in sociability and sleep patterns, as well as increased seizure susceptibility. Furthermore, immunohistochemistry revealed reduced synaptophysin immunoreactivity in Pigv341E mice, and electrophysiology recordings showed decreased hippocampal synaptic transmission that could underlie impaired memory formation. In single-cell RNA sequencing, Pigv341E-hippocampal cells exhibited changes in gene expression, most prominently in a subtype of microglia and subicular neurons. A significant reduction in Abl1 transcript levels in several cell clusters suggested a link to the signaling pathway of GPI-anchored ephrins. We also observed elevated levels of Hdc transcripts, which might affect histamine metabolism with consequences for circadian rhythm. This mouse model will not only open the doors to further investigation into the pathophysiology of GPIBD, but will also deepen our understanding of the role of GPI-anchor-related pathways in brain development.


Asunto(s)
Glicosilfosfatidilinositoles/genética , Glicosilfosfatidilinositoles/metabolismo , Manosiltransferasas/metabolismo , Anomalías Múltiples/genética , Secuencia de Aminoácidos , Aminoácidos/genética , Animales , Sistemas CRISPR-Cas , Modelos Animales de Enfermedad , Epilepsia/genética , Glicosilfosfatidilinositoles/deficiencia , Hipocampo/metabolismo , Discapacidad Intelectual/genética , Manosiltransferasas/fisiología , Ratones , Ratones Endogámicos C57BL , Mutación , Mutación Missense , Fenotipo , Ingeniería de Proteínas/métodos , Convulsiones/genética , Convulsiones/fisiopatología
18.
BMC Med Inform Decis Mak ; 24(1): 30, 2024 Jan 31.
Artículo en Inglés | MEDLINE | ID: mdl-38297371

RESUMEN

OBJECTIVE: Clinical deep phenotyping and phenotype annotation play a critical role in both the diagnosis of patients with rare disorders as well as in building computationally-tractable knowledge in the rare disorders field. These processes rely on using ontology concepts, often from the Human Phenotype Ontology, in conjunction with a phenotype concept recognition task (supported usually by machine learning methods) to curate patient profiles or existing scientific literature. With the significant shift in the use of large language models (LLMs) for most NLP tasks, we examine the performance of the latest Generative Pre-trained Transformer (GPT) models underpinning ChatGPT as a foundation for the tasks of clinical phenotyping and phenotype annotation. MATERIALS AND METHODS: The experimental setup of the study included seven prompts of various levels of specificity, two GPT models (gpt-3.5-turbo and gpt-4.0) and two established gold standard corpora for phenotype recognition, one consisting of publication abstracts and the other clinical observations. RESULTS: The best run, using in-context learning, achieved 0.58 document-level F1 score on publication abstracts and 0.75 document-level F1 score on clinical observations, as well as a mention-level F1 score of 0.7, which surpasses the current best in class tool. Without in-context learning, however, performance is significantly below the existing approaches. CONCLUSION: Our experiments show that gpt-4.0 surpasses the state of the art performance if the task is constrained to a subset of the target ontology where there is prior knowledge of the terms that are expected to be matched. While the results are promising, the non-deterministic nature of the outcomes, the high cost and the lack of concordance between different runs using the same prompt and input make the use of these LLMs challenging for this particular task.


Asunto(s)
Conocimiento , Lenguaje , Humanos , Aprendizaje Automático , Fenotipo , Enfermedades Raras
19.
J Cell Mol Med ; 27(4): 496-505, 2023 02.
Artículo en Inglés | MEDLINE | ID: mdl-36691971

RESUMEN

We describe a 3.5-year-old Iranian female child and her affected 10-month-old brother with a maternally inherited derivative chromosome 9 [der(9)]. The postnatally detected rearrangement was finely characterized by aCGH analysis, which revealed a 15.056 Mb deletion of 9p22.3-p24.3p22.3 encompassing 14 OMIM morbid genes such as DOCK8, KANK1, DMRT1 and SMARCA2, and a gain of 3.309 Mb on 18p11.31-p11.32 encompassing USP14, THOC1, COLEC12, SMCHD1 and LPIN2. We aligned the genes affected by detected CNVs to clinical and functional phenotypic features using PhenogramViz. In this regard, the patient's phenotype and CNVs data were entered into PhenogramViz. For the 9p deletion CNV, 53 affected genes were identified and 17 of them were matched to 24 HPO terms describing the patient's phenotypes. Also, for CNV of 18p duplication, 22 affected genes were identified and six of them were matched to 13 phenotypes. Moreover, we used DECIPHER for in-depth characterization of involved genes in detected CNVs and also comparison of patient phenotypes with 9p and 18p genomic imbalances. Based on our filtration strategy, in the 9p22.3-p24.3 region, approximately 80 pathogenic/likely pathogenic/uncertain overlapping CNVs were in DECIPHER. The size of these CNVs ranged from 12.01 kb to 18.45 Mb and 52 CNVs were smaller than 1 Mb in size affecting 10 OMIM morbid genes. The 18p11.31-p11.32 region overlapped 19 CNVs in the DECIPHER database with the size ranging from 23.42 kb to 1.82 Mb. These CNVs affect eight haploinsufficient genes.


Asunto(s)
Deleción Cromosómica , Proteínas del Citoesqueleto , Masculino , Femenino , Humanos , Irán , Hibridación Genómica Comparativa , Fenotipo , Proteínas Adaptadoras Transductoras de Señales , Ubiquitina Tiolesterasa , Factores de Intercambio de Guanina Nucleótido , Proteínas Cromosómicas no Histona
20.
Am J Med Genet C Semin Med Genet ; 193(3): e32056, 2023 09.
Artículo en Inglés | MEDLINE | ID: mdl-37654076

RESUMEN

Heterozygous ARID1B variants result in Coffin-Siris syndrome. Features may include hypoplastic nails, slow growth, characteristic facial features, hypotonia, hypertrichosis, and sparse scalp hair. Most reported cases are due to ARID1B loss of function variants. We report a boy with developmental delay, feeding difficulties, aspiration, recurrent respiratory infections, slow growth, and hypotonia without a clinical diagnosis, where a previously unreported ARID1B missense variant was classified as a variant of uncertain significance. The pathogenicity of this variant was refined through combined methodologies including genome-wide methylation signature analysis (EpiSign), Machine Learning (ML) facial phenotyping, and LIRICAL. Trio exome sequencing and EpiSign were performed. ML facial phenotyping compared facial images using FaceMatch and GestaltMatcher to syndrome-specific libraries to prioritize the trio exome bioinformatic pipeline gene list output. Phenotype-driven variant prioritization was performed with LIRICAL. A de novo heterozygous missense variant, ARID1B p.(Tyr1268His), was reported as a variant of uncertain significance. The ACMG classification was refined to likely pathogenic by a supportive methylation signature, ML facial phenotyping, and prioritization through LIRICAL. The ARID1B genotype-phenotype has been expanded through an extended analysis of missense variation through genome-wide methylation signatures, ML facial phenotyping, and likelihood-ratio gene prioritization.


Asunto(s)
Anomalías Múltiples , Deformidades Congénitas de la Mano , Discapacidad Intelectual , Micrognatismo , Masculino , Humanos , Proteínas de Unión al ADN/genética , Hipotonía Muscular/patología , Factores de Transcripción/genética , Cara/patología , Anomalías Múltiples/diagnóstico , Micrognatismo/genética , Discapacidad Intelectual/patología , Deformidades Congénitas de la Mano/genética , Cuello/patología
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA