Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 99
Filtrar
1.
Am J Hum Genet ; 109(9): 1591-1604, 2022 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-35998640

RESUMO

Diagnosis for rare genetic diseases often relies on phenotype-driven methods, which hinge on the accuracy and completeness of the rare disease phenotypes in the underlying annotation knowledgebase. Existing knowledgebases are often manually curated with additional annotations found in published case reports. Despite their potential, real-world data such as electronic health records (EHRs) have not been fully exploited to derive rare disease annotations. Here, we present open annotation for rare diseases (OARD), a real-world-data-derived resource with annotation for rare-disease-related phenotypes. This resource is derived from the EHRs of two academic health institutions containing more than 10 million individuals spanning wide age ranges and different disease subgroups. By leveraging ontology mapping and advanced natural-language-processing (NLP) methods, OARD automatically and efficiently extracts concepts for both rare diseases and their phenotypic traits from billing codes and lab tests as well as over 100 million clinical narratives. The rare disease prevalence derived by OARD is highly correlated with those annotated in the original rare disease knowledgebase. By performing association analysis, we identified more than 1 million novel disease-phenotype association pairs that were previously missed by human annotation, and >60% were confirmed true associations via manual review of a list of sampled pairs. Compared to the manual curated annotation, OARD is 100% data driven and its pipeline can be shared across different institutions. By supporting privacy-preserving sharing of aggregated summary statistics, such as term frequencies and disease-phenotype associations, it fills an important gap to facilitate data-driven research in the rare disease community.


Assuntos
Processamento de Linguagem Natural , Doenças Raras , Registros Eletrônicos de Saúde , Humanos , Fenótipo , Doenças Raras/genética
2.
Brief Bioinform ; 24(4)2023 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-37248747

RESUMO

Human Phenotype Ontology (HPO)-based approaches have gained popularity in recent times as a tool for genomic diagnostics of rare diseases. However, these approaches do not make full use of the available information on disease and patient phenotypes. We present a new method called Phen2Disease, which utilizes the bidirectional maximum matching semantic similarity between two phenotype sets of patients and diseases to prioritize diseases and genes. Our comprehensive experiments have been conducted on six real data cohorts with 2051 cases (Cohort 1, n = 384; Cohort 2, n = 281; Cohort 3, n = 185; Cohort 4, n = 784; Cohort 5, n = 208; and Cohort 6, n = 209) and two simulated data cohorts with 1000 cases. The results of the experiments showed that Phen2Disease outperforms the three state-of-the-art methods when only phenotype information and HPO knowledge base are used, particularly in cohorts with fewer average numbers of HPO terms. We also observed that patients with higher information content scores have more specific information, leading to more accurate predictions. Moreover, Phen2Disease provides high interpretability with ranked diseases and patient HPO terms presented. Our method provides a novel approach to utilizing phenotype data for genomic diagnostics of rare diseases, with potential for clinical impact. Phen2Disease is freely available on GitHub at https://github.com/ZhuLab-Fudan/Phen2Disease.


Assuntos
Ontologias Biológicas , Doenças Raras , Humanos , Semântica , Genômica , Fenótipo
3.
J Allergy Clin Immunol ; 153(3): 615-628.e4, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38185417

RESUMO

Autoimmunity in inborn errors of immunity (IEIs) has a multifactorial pathogenesis and develops subsequent to a genetic predisposition in conjunction with gene regulation, environmental modifiers, and infectious triggers. On the basis of incremental data availability owing to upfront application of omics technologies, a more granular and dynamic view of mechanisms and manifestations is warranted. Here, we present a comprehensive novel concept of autoimmunity in IEIs that considers multiple layers of interdependent elements and connects 101 causative genes or deletions according to the quality of the allelic variants with 47 molecular pathways and 22 immune effector mechanisms. Furthermore, we list 50 resulting manifestations together with the corresponding Human Phenotype Ontology terms and review the types and frequencies of the most relevant clinical presentations. When all of its elements are taken together, this concept (1) extends the historical anatomic view of central versus peripheral tolerance toward multiple interdependent mechanisms of immune tolerance, (2) delineates the mechanisms underlying the protean clinical manifestations, and thereby, (3) points toward the most suitable precision therapy for autoimmunity in IEIs. The multilayer concept of autoimmune mechanisms and manifestations in IEIs will facilitate research design and provide clinical guidance on the use of precision medicine irrespective of the data depth available in each health care scenario.


Assuntos
Autoimunidade , Medicina de Precisão , Humanos , Alelos , Predisposição Genética para Doença , Tolerância Imunológica
4.
BMC Med Inform Decis Mak ; 24(1): 30, 2024 Jan 31.
Artigo em Inglês | MEDLINE | ID: mdl-38297371

RESUMO

OBJECTIVE: Clinical deep phenotyping and phenotype annotation play a critical role in both the diagnosis of patients with rare disorders as well as in building computationally-tractable knowledge in the rare disorders field. These processes rely on using ontology concepts, often from the Human Phenotype Ontology, in conjunction with a phenotype concept recognition task (supported usually by machine learning methods) to curate patient profiles or existing scientific literature. With the significant shift in the use of large language models (LLMs) for most NLP tasks, we examine the performance of the latest Generative Pre-trained Transformer (GPT) models underpinning ChatGPT as a foundation for the tasks of clinical phenotyping and phenotype annotation. MATERIALS AND METHODS: The experimental setup of the study included seven prompts of various levels of specificity, two GPT models (gpt-3.5-turbo and gpt-4.0) and two established gold standard corpora for phenotype recognition, one consisting of publication abstracts and the other clinical observations. RESULTS: The best run, using in-context learning, achieved 0.58 document-level F1 score on publication abstracts and 0.75 document-level F1 score on clinical observations, as well as a mention-level F1 score of 0.7, which surpasses the current best in class tool. Without in-context learning, however, performance is significantly below the existing approaches. CONCLUSION: Our experiments show that gpt-4.0 surpasses the state of the art performance if the task is constrained to a subset of the target ontology where there is prior knowledge of the terms that are expected to be matched. While the results are promising, the non-deterministic nature of the outcomes, the high cost and the lack of concordance between different runs using the same prompt and input make the use of these LLMs challenging for this particular task.


Assuntos
Conhecimento , Idioma , Humanos , Aprendizado de Máquina , Fenótipo , Doenças Raras
5.
BMC Med Inform Decis Mak ; 24(1): 134, 2024 May 24.
Artigo em Inglês | MEDLINE | ID: mdl-38789985

RESUMO

BACKGROUND: There are approximately 8,000 different rare diseases that affect roughly 400 million people worldwide. Many of them suffer from delayed diagnosis. Ciliopathies are rare monogenic disorders characterized by a significant phenotypic and genetic heterogeneity that raises an important challenge for clinical diagnosis. Diagnosis support systems (DSS) applied to electronic health record (EHR) data may help identify undiagnosed patients, which is of paramount importance to improve patients' care. Our objective was to evaluate three online-accessible rare disease DSSs using phenotypes derived from EHRs for the diagnosis of ciliopathies. METHODS: Two datasets of ciliopathy cases, either proven or suspected, and two datasets of controls were used to evaluate the DSSs. Patient phenotypes were automatically extracted from their EHRs and converted to Human Phenotype Ontology terms. We tested the ability of the DSSs to diagnose cases in contrast to controls based on Orphanet ontology. RESULTS: A total of 79 cases and 38 controls were selected. Performances of the DSSs on ciliopathy real world data (best DSS with area under the ROC curve = 0.72) were not as good as published performances on the test set used in the DSS development phase. None of these systems obtained results which could be described as "expert-level". Patients with multisystemic symptoms were generally easier to diagnose than patients with isolated symptoms. Diseases easily confused with ciliopathy generally affected multiple organs and had overlapping phenotypes. Four challenges need to be considered to improve the performances: to make the DSSs interoperable with EHR systems, to validate the performances in real-life settings, to deal with data quality, and to leverage methods and resources for rare and complex diseases. CONCLUSION: Our study provides insights into the complexities of diagnosing highly heterogenous rare diseases and offers lessons derived from evaluation existing DSSs in real-world settings. These insights are not only beneficial for ciliopathy diagnosis but also hold relevance for the enhancement of DSS for various complex rare disorders, by guiding the development of more clinically relevant rare disease DSSs, that could support early diagnosis and finally make more patients eligible for treatment.


Assuntos
Ciliopatias , Registros Eletrônicos de Saúde , Doenças Raras , Humanos , Ciliopatias/diagnóstico , Doenças Raras/diagnóstico , Sistemas de Apoio a Decisões Clínicas , Fenótipo
6.
Am J Hum Genet ; 107(3): 403-417, 2020 09 03.
Artigo em Inglês | MEDLINE | ID: mdl-32755546

RESUMO

Human Phenotype Ontology (HPO)-based analysis has become standard for genomic diagnostics of rare diseases. Current algorithms use a variety of semantic and statistical approaches to prioritize the typically long lists of genes with candidate pathogenic variants. These algorithms do not provide robust estimates of the strength of the predictions beyond the placement in a ranked list, nor do they provide measures of how much any individual phenotypic observation has contributed to the prioritization result. However, given that the overall success rate of genomic diagnostics is only around 25%-50% or less in many cohorts, a good ranking cannot be taken to imply that the gene or disease at rank one is necessarily a good candidate. Here, we present an approach to genomic diagnostics that exploits the likelihood ratio (LR) framework to provide an estimate of (1) the posttest probability of candidate diagnoses, (2) the LR for each observed HPO phenotype, and (3) the predicted pathogenicity of observed genotypes. LIkelihood Ratio Interpretation of Clinical AbnormaLities (LIRICAL) placed the correct diagnosis within the first three ranks in 92.9% of 384 case reports comprising 262 Mendelian diseases, and the correct diagnosis had a mean posttest probability of 67.3%. Simulations show that LIRICAL is robust to many typically encountered forms of genomic and phenomic noise. In summary, LIRICAL provides accurate, clinically interpretable results for phenotype-driven genomic diagnostics.


Assuntos
Biologia Computacional , Bases de Dados Genéticas , Genômica , Doenças Raras/diagnóstico , Algoritmos , Exoma/genética , Humanos , Fenótipo , Doenças Raras/genética , Software
7.
Am J Hum Genet ; 107(4): 683-697, 2020 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-32853554

RESUMO

More than 100 genetic etiologies have been identified in developmental and epileptic encephalopathies (DEEs), but correlating genetic findings with clinical features at scale has remained a hurdle because of a lack of frameworks for analyzing heterogenous clinical data. Here, we analyzed 31,742 Human Phenotype Ontology (HPO) terms in 846 individuals with existing whole-exome trio data and assessed associated clinical features and phenotypic relatedness by using HPO-based semantic similarity analysis for individuals with de novo variants in the same gene. Gene-specific phenotypic signatures included associations of SCN1A with "complex febrile seizures" (HP: 0011172; p = 2.1 × 10-5) and "focal clonic seizures" (HP: 0002266; p = 8.9 × 10-6), STXBP1 with "absent speech" (HP: 0001344; p = 1.3 × 10-11), and SLC6A1 with "EEG with generalized slow activity" (HP: 0010845; p = 0.018). Of 41 genes with de novo variants in two or more individuals, 11 genes showed significant phenotypic similarity, including SCN1A (n = 16, p < 0.0001), STXBP1 (n = 14, p = 0.0021), and KCNB1 (n = 6, p = 0.011). Including genetic and phenotypic data of control subjects increased phenotypic similarity for all genetic etiologies, whereas the probability of observing de novo variants decreased, emphasizing the conceptual differences between semantic similarity analysis and approaches based on the expected number of de novo events. We demonstrate that HPO-based phenotype analysis captures unique profiles for distinct genetic etiologies, reflecting the breadth of the phenotypic spectrum in genetic epilepsies. Semantic similarity can be used to generate statistical evidence for disease causation analogous to the traditional approach of primarily defining disease entities through similar clinical features.


Assuntos
Proteínas da Membrana Plasmática de Transporte de GABA/genética , Proteínas Munc18/genética , Canal de Sódio Disparado por Voltagem NAV1.1/genética , Convulsões/genética , Espasmos Infantis/genética , Distúrbios da Fala/genética , Pré-Escolar , Estudos de Coortes , Feminino , Expressão Gênica , Ontologia Genética , Humanos , Masculino , Mutação , Fenótipo , Convulsões/classificação , Convulsões/diagnóstico , Convulsões/fisiopatologia , Semântica , Canais de Potássio Shab/genética , Espasmos Infantis/classificação , Espasmos Infantis/diagnóstico , Espasmos Infantis/fisiopatologia , Distúrbios da Fala/classificação , Distúrbios da Fala/diagnóstico , Distúrbios da Fala/fisiopatologia , Terminologia como Assunto , Sequenciamento do Exoma
8.
Brain ; 145(5): 1668-1683, 2022 06 03.
Artigo em Inglês | MEDLINE | ID: mdl-35190816

RESUMO

Disease-causing variants in STXBP1 are among the most common genetic causes of neurodevelopmental disorders. However, the phenotypic spectrum in STXBP1-related disorders is wide and clear correlations between variant type and clinical features have not been observed so far. Here, we harmonized clinical data across 534 individuals with STXBP1-related disorders and analysed 19 973 derived phenotypic terms, including phenotypes of 253 individuals previously unreported in the scientific literature. The overall phenotypic landscape in STXBP1-related disorders is characterized by neurodevelopmental abnormalities in 95% and seizures in 89% of individuals, including focal-onset seizures as the most common seizure type (47%). More than 88% of individuals with STXBP1-related disorders have seizure onset in the first year of life, including neonatal seizure onset in 47%. Individuals with protein-truncating variants and deletions in STXBP1 (n = 261) were almost twice as likely to present with West syndrome and were more phenotypically similar than expected by chance. Five genetic hotspots with recurrent variants were identified in more than 10 individuals, including p.Arg406Cys/His (n = 40), p.Arg292Cys/His/Leu/Pro (n = 30), p.Arg551Cys/Gly/His/Leu (n = 24), p.Pro139Leu (n = 12), and p.Arg190Trp (n = 11). None of the recurrent variants were significantly associated with distinct electroclinical syndromes, single phenotypic features, or showed overall clinical similarity, indicating that the baseline variability in STXBP1-related disorders is too high for discrete phenotypic subgroups to emerge. We then reconstructed the seizure history in 62 individuals with STXBP1-related disorders in detail, retrospectively assigning seizure type and seizure frequency monthly across 4433 time intervals, and retrieved 251 anti-seizure medication prescriptions from the electronic medical records. We demonstrate a dynamic pattern of seizure control and complex interplay with response to specific medications particularly in the first year of life when seizures in STXBP1-related disorders are the most prominent. Adrenocorticotropic hormone and phenobarbital were more likely to initially reduce seizure frequency in infantile spasms and focal seizures compared to other treatment options, while the ketogenic diet was most effective in maintaining seizure freedom. In summary, we demonstrate how the multidimensional spectrum of phenotypic features in STXBP1-related disorders can be assessed using a computational phenotype framework to facilitate the development of future precision-medicine approaches.


Assuntos
Epilepsia , Espasmos Infantis , Eletroencefalografia , Epilepsia/genética , Humanos , Lactente , Proteínas Munc18/genética , Estudos Retrospectivos , Convulsões/genética , Espasmos Infantis/tratamento farmacológico , Espasmos Infantis/genética
9.
Hum Mutat ; 43(5): 539-546, 2022 05.
Artigo em Inglês | MEDLINE | ID: mdl-35224813

RESUMO

Identifying the causal variant for diagnosis of genetic diseases is challenging when using next-generation sequencing approaches and variant prioritization tools can assist in this task. These tools provide in silico predictions of variant pathogenicity, however they are agnostic to the disease under study. We previously performed a disease-specific benchmark of 24 such tools to assess how they perform in different disease contexts. We found that the tools themselves show large differences in performance, but more importantly that the best tools for variant prioritization are dependent on the disease phenotypes being considered. Here we expand the assessment to 37 tools and refine our assessment by separating performance for nonsynonymous single nucleotide variants (nsSNVs) and missense variants (i.e., excluding nonsense variants). We found differences in performance for missense variants compared to nsSNVs and recommend three tools that stand out in terms of their performance (BayesDel, CADD, and ClinPred).


Assuntos
Benchmarking , Biologia Computacional , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Mutação de Sentido Incorreto , Fenótipo
10.
Hum Mutat ; 43(11): 1642-1658, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-35460582

RESUMO

Making a specific diagnosis in neurodevelopmental disorders is traditionally based on recognizing clinical features of a distinct syndrome, which guides testing of its possible genetic etiologies. Scalable frameworks for genomic diagnostics, however, have struggled to integrate meaningful measurements of clinical phenotypic features. While standardization has enabled generation and interpretation of genomic data for clinical diagnostics at unprecedented scale, making the equivalent breakthrough for clinical data has proven challenging. However, increasingly clinical features are being recorded using controlled dictionaries with machine readable formats such as the Human Phenotype Ontology (HPO), which greatly facilitates their use in the diagnostic space. Improving the tractability of large-scale clinical information will present new opportunities to inform genomic research and diagnostics from a clinical perspective. Here, we describe novel approaches for computational phenotyping to harmonize clinical features, improve data translation through revising domain-specific dictionaries, quantify phenotypic features, and determine clinical relatedness. We demonstrate how these concepts can be applied to longitudinal phenotypic information, which represents a critical element of developmental disorders and pediatric conditions. Finally, we expand our discussion to clinical data derived from electronic medical records, a largely untapped resource of deep clinical information with distinct strengths and weaknesses.


Assuntos
Registros Eletrônicos de Saúde , Genômica , Criança , Humanos , Fenótipo
11.
Am J Med Genet C Semin Med Genet ; 190(2): 231-242, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-35872606

RESUMO

Technological advances in both genome sequencing and prenatal imaging are increasing our ability to accurately recognize and diagnose Mendelian conditions prenatally. Phenotype-driven early genetic diagnosis of fetal genetic disease can help to strategize treatment options and clinical preventive measures during the perinatal period, to plan in utero therapies, and to inform parental decision-making. Fetal phenotypes of genetic diseases are often unique and at present are not well understood; more comprehensive knowledge about prenatal phenotypes and computational resources have an enormous potential to improve diagnostics and translational research. The Human Phenotype Ontology (HPO) has been widely used to support diagnostics and translational research in human genetics. To better support prenatal usage, the HPO consortium conducted a series of workshops with a group of domain experts in a variety of medical specialties, diagnostic techniques, as well as diseases and phenotypes related to prenatal medicine, including perinatal pathology, musculoskeletal anomalies, neurology, medical genetics, hydrops fetalis, craniofacial malformations, cardiology, neonatal-perinatal medicine, fetal medicine, placental pathology, prenatal imaging, and bioinformatics. We expanded the representation of prenatal phenotypes in HPO by adding 95 new phenotype terms under the Abnormality of prenatal development or birth (HP:0001197) grouping term, and revised definitions, synonyms, and disease annotations for most of the 152 terms that existed before the beginning of this effort. The expansion of prenatal phenotypes in HPO will support phenotype-driven prenatal exome and genome sequencing for precision genetic diagnostics of rare diseases to support prenatal care.


Assuntos
Biologia Computacional , Placenta , Recém-Nascido , Humanos , Feminino , Gravidez , Biologia Computacional/métodos , Fenótipo , Doenças Raras , Sequenciamento do Exoma
12.
Am J Hum Genet ; 104(6): 1060-1072, 2019 06 06.
Artigo em Inglês | MEDLINE | ID: mdl-31104773

RESUMO

The developmental and epileptic encephalopathies (DEEs) are heterogeneous disorders with a strong genetic contribution, but the underlying genetic etiology remains unknown in a significant proportion of individuals. To explore whether statistical support for genetic etiologies can be generated on the basis of phenotypic features, we analyzed whole-exome sequencing data and phenotypic similarities by using Human Phenotype Ontology (HPO) in 314 individuals with DEEs. We identified a de novo c.508C>T (p.Arg170Trp) variant in AP2M1 in two individuals with a phenotypic similarity that was higher than expected by chance (p = 0.003) and a phenotype related to epilepsy with myoclonic-atonic seizures. We subsequently found the same de novo variant in two individuals with neurodevelopmental disorders and generalized epilepsy in a cohort of 2,310 individuals who underwent diagnostic whole-exome sequencing. AP2M1 encodes the µ-subunit of the adaptor protein complex 2 (AP-2), which is involved in clathrin-mediated endocytosis (CME) and synaptic vesicle recycling. Modeling of protein dynamics indicated that the p.Arg170Trp variant impairs the conformational activation and thermodynamic entropy of the AP-2 complex. Functional complementation of both the µ-subunit carrying the p.Arg170Trp variant in human cells and astrocytes derived from AP-2µ conditional knockout mice revealed a significant impairment of CME of transferrin. In contrast, stability, expression levels, membrane recruitment, and localization were not impaired, suggesting a functional alteration of the AP-2 complex as the underlying disease mechanism. We establish a recurrent pathogenic variant in AP2M1 as a cause of DEEs with distinct phenotypic features, and we implicate dysfunction of the early steps of endocytosis as a disease mechanism in epilepsy.


Assuntos
Complexo 2 de Proteínas Adaptadoras/genética , Subunidades mu do Complexo de Proteínas Adaptadoras/genética , Encefalopatias/etiologia , Clatrina/metabolismo , Endocitose , Epilepsia/etiologia , Mutação de Sentido Incorreto , Transtornos do Neurodesenvolvimento/etiologia , Adolescente , Animais , Encefalopatias/patologia , Criança , Pré-Escolar , Clatrina/genética , Epilepsia/patologia , Feminino , Humanos , Lactente , Camundongos , Camundongos Knockout , Transtornos do Neurodesenvolvimento/patologia , Sequenciamento do Exoma
13.
J Biomed Inform ; 129: 104059, 2022 05.
Artigo em Inglês | MEDLINE | ID: mdl-35351638

RESUMO

The study aims at developing a neural network model to improve the performance of Human Phenotype Ontology (HPO) concept recognition tools. We used the terms, definitions, and comments about the phenotypic concepts in the HPO database to train our model. The document to be analyzed is first split into sentences and annotated with a base method to generate candidate concepts. The sentences, along with the candidate concepts, are then fed into the pre-trained model for re-ranking. Our model comprises the pre-trained BlueBERT and a feature selection module, followed by a contrastive loss. We re-ranked the results generated by three robust HPO annotation tools and compared the performance against most of the existing approaches. The experimental results show that our model can improve the performance of the existing methods. Significantly, it boosted 3.0% and 5.6% in F1 score on the two evaluated datasets compared with the base methods. It removed more than 80% of the false positives predicted by the base methods, resulting in up to 18% improvement in precision. Our model utilizes the descriptive data in the ontology and the contextual information in the sentences for re-ranking. The results indicate that the additional information and the re-ranking model can significantly enhance the precision of HPO concept recognition compared with the base method.


Assuntos
Idioma , Redes Neurais de Computação , Bases de Dados Factuais , Humanos , Fenótipo
14.
Artigo em Alemão | MEDLINE | ID: mdl-36239768

RESUMO

The ICD-10-GM coding system used in the German healthcare system only captures a minority of rare disease diagnoses. Therefore, information on the incidence and prevalence of rare diseases as well as necessary (financial) resources for the expert care required for evidence-based decisions by health insurers, care providers, and politicians are lacking. Furthermore, the missing information complicates and sometimes even precludes the generation of scientific knowledge on rare diseases. Therefore, starting in 2023, all in-patient cases in Germany with a rare disease diagnosis must be coded by an ORPHAcode using the Alpha-ID-SE file.The file Alpha-ID-SE links the ICD-10-GM codes to the internationally established ORPHAcodes for rare diseases. Commercially available software tools progressively support the coding of rare diseases. In several centers for rare diseases linked to university hospitals, IT tools and procedures were established to realize a complete coding of rare diseases. These include financial incentives for the institutions providing rare disease codes, systematic queries asking for rare disease codes during the coding process, and a semi-automated coding process for all patients with a rare disease previously seen at the institution. A combination of the different approaches probably results in the most complete coding.To get the complete picture of rare disease epidemiology and care requirements, a specific and unique coding of out-patient cases is also desirable. Furthermore, a structured reporting of phenotype is required, especially for complex rare diseases and for yet undiagnosed cases.


Assuntos
Classificação Internacional de Doenças , Doenças Raras , Humanos , Doenças Raras/diagnóstico , Doenças Raras/epidemiologia , Doenças Raras/terapia , Alemanha/epidemiologia , Atenção à Saúde , Instalações de Saúde
15.
BMC Bioinformatics ; 22(1): 500, 2021 Oct 16.
Artigo em Inglês | MEDLINE | ID: mdl-34656098

RESUMO

BACKGROUND: Identifying human protein-phenotype relationships has attracted researchers in bioinformatics and biomedical natural language processing due to its importance in uncovering rare and complex diseases. Since experimental validation of protein-phenotype associations is prohibitive, automated tools capable of accurately extracting these associations from the biomedical text are in high demand. However, while the manual annotation of protein-phenotype co-mentions required for training such models is highly resource-consuming, extracting millions of unlabeled co-mentions is straightforward. RESULTS: In this study, we propose a novel deep semi-supervised ensemble framework that combines deep neural networks, semi-supervised, and ensemble learning for classifying human protein-phenotype co-mentions with the help of unlabeled data. This framework allows the ability to incorporate an extensive collection of unlabeled sentence-level co-mentions of human proteins and phenotypes with a small labeled dataset to enhance overall performance. We develop PPPredSS, a prototype of our proposed semi-supervised framework that combines sophisticated language models, convolutional networks, and recurrent networks. Our experimental results demonstrate that the proposed approach provides a new state-of-the-art performance in classifying human protein-phenotype co-mentions by outperforming other supervised and semi-supervised counterparts. Furthermore, we highlight the utility of PPPredSS in powering a curation assistant system through case studies involving a group of biologists. CONCLUSIONS: This article presents a novel approach for human protein-phenotype co-mention classification based on deep, semi-supervised, and ensemble learning. The insights and findings from this work have implications for biomedical researchers, biocurators, and the text mining community working on biomedical relationship extraction.


Assuntos
Redes Neurais de Computação , Aprendizado de Máquina Supervisionado , Mineração de Dados , Humanos , Fenótipo
16.
Hum Mutat ; 42(6): 762-776, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-33847017

RESUMO

Bi-allelic TECPR2 variants have been associated with a complex syndrome with features of both a neurodevelopmental and neurodegenerative disorder. Here, we provide a comprehensive clinical description and variant interpretation framework for this genetic locus. Through international collaboration, we identified 17 individuals from 15 families with bi-allelic TECPR2-variants. We systemically reviewed clinical and molecular data from this cohort and 11 cases previously reported. Phenotypes were standardized using Human Phenotype Ontology terms. A cross-sectional analysis revealed global developmental delay/intellectual disability, muscular hypotonia, ataxia, hyporeflexia, respiratory infections, and central/nocturnal hypopnea as core manifestations. A review of brain magnetic resonance imaging scans demonstrated a thin corpus callosum in 52%. We evaluated 17 distinct variants. Missense variants in TECPR2 are predominantly located in the N- and C-terminal regions containing ß-propeller repeats. Despite constituting nearly half of disease-associated TECPR2 variants, classifying missense variants as (likely) pathogenic according to ACMG criteria remains challenging. We estimate a pathogenic variant carrier frequency of 1/1221 in the general and 1/155 in the Jewish Ashkenazi populations. Based on clinical, neuroimaging, and genetic data, we provide recommendations for variant reporting, clinical assessment, and surveillance/treatment of individuals with TECPR2-associated disorder. This sets the stage for future prospective natural history studies.


Assuntos
Proteínas de Transporte/genética , Neuropatias Hereditárias Sensoriais e Autônomas , Deficiência Intelectual , Proteínas do Tecido Nervoso/genética , Adolescente , Proteínas de Transporte/química , Criança , Pré-Escolar , Estudos de Coortes , Estudos Transversais , Família , Feminino , Neuropatias Hereditárias Sensoriais e Autônomas/complicações , Neuropatias Hereditárias Sensoriais e Autônomas/diagnóstico , Neuropatias Hereditárias Sensoriais e Autônomas/genética , Neuropatias Hereditárias Sensoriais e Autônomas/patologia , Humanos , Lactente , Deficiência Intelectual/complicações , Deficiência Intelectual/diagnóstico , Deficiência Intelectual/genética , Deficiência Intelectual/patologia , Imageamento por Ressonância Magnética , Masculino , Modelos Moleculares , Mutação de Sentido Incorreto , Proteínas do Tecido Nervoso/química , Neuroimagem/métodos , Linhagem , Fenótipo , Conformação Proteica
17.
Am J Hum Genet ; 103(3): 389-399, 2018 09 06.
Artigo em Inglês | MEDLINE | ID: mdl-30173820

RESUMO

Recently, to speed up the differential-diagnosis process based on symptoms and signs observed from an affected individual in the diagnosis of rare diseases, researchers have developed and implemented phenotype-driven differential-diagnosis systems. The performance of those systems relies on the quantity and quality of underlying databases of disease-phenotype associations (DPAs). Although such databases are often developed by manual curation, they inherently suffer from limited coverage. To address this problem, we propose a text-mining approach to increase the coverage of DPA databases and consequently improve the performance of differential-diagnosis systems. Our analysis showed that a text-mining approach using one million case reports obtained from PubMed could increase the coverage of manually curated DPAs in Orphanet by 125.6%. We also present PubCaseFinder (see Web Resources), a new phenotype-driven differential-diagnosis system in a freely available web application. By utilizing automatically extracted DPAs from case reports in addition to manually curated DPAs, PubCaseFinder improves the performance of automated differential diagnosis. Moreover, PubCaseFinder helps clinicians search for relevant case reports by using phenotype-based comparisons and confirm the results with detailed contextual information.


Assuntos
Doenças Raras/diagnóstico , Doenças Raras/genética , Mineração de Dados/métodos , Bases de Dados Genéticas , Diagnóstico Diferencial , Humanos , Fenótipo
18.
J Med Internet Res ; 23(3): e21023, 2021 03 16.
Artigo em Inglês | MEDLINE | ID: mdl-33724192

RESUMO

BACKGROUND: 16p13.11 microduplication syndrome has a variable presentation and is characterized primarily by neurodevelopmental and physical phenotypes resulting from copy number variation at chromosome 16p13.11. Given its variability, there may be features that have not yet been reported. The goal of this study was to use a patient "self-phenotyping" survey to collect data directly from patients to further characterize the phenotypes of 16p13.11 microduplication syndrome. OBJECTIVE: This study aimed to (1) discover self-identified phenotypes in 16p13.11 microduplication syndrome that have been underrepresented in the scientific literature and (2) demonstrate that self-phenotyping tools are valuable sources of data for the medical and scientific communities. METHODS: As part of a large study to compare and evaluate patient self-phenotyping surveys, an online survey tool, Phenotypr, was developed for patients with rare disorders to self-report phenotypes. Participants with 16p13.11 microduplication syndrome were recruited through the Boston Children's Hospital 16p13.11 Registry. Either the caregiver, parent, or legal guardian of an affected child or the affected person (if aged 18 years or above) completed the survey. Results were securely transferred to a Research Electronic Data Capture database and aggregated for analysis. RESULTS: A total of 19 participants enrolled in the study. Notably, among the 19 participants, aggression and anxiety were mentioned by 3 (16%) and 4 (21%) participants, respectively, which is an increase over the numbers in previously published literature. Additionally, among the 19 participants, 3 (16%) had asthma and 2 (11%) had other immunological disorders, both of which have not been previously described in the syndrome. CONCLUSIONS: Several phenotypes might be underrepresented in the previous 16p13.11 microduplication literature, and new possible phenotypes have been identified. Whenever possible, patients should continue to be referenced as a source of complete phenotyping data on their condition. Self-phenotyping may lead to a better understanding of the prevalence of phenotypes in genetic disorders and may identify previously unreported phenotypes.


Assuntos
Variações do Número de Cópias de DNA , Família , Variação Biológica da População , Estudos de Coortes , Humanos , Fenótipo
19.
Hum Mutat ; 41(2): 347-362, 2020 02.
Artigo em Inglês | MEDLINE | ID: mdl-31680375

RESUMO

Precise identification of causative variants from whole-genome sequencing data, including both coding and noncoding variants, is challenging. The Critical Assessment of Genome Interpretation 5 SickKids clinical genome challenge provided an opportunity to assess our ability to extract such information. Participants in the challenge were required to match each of the 24 whole-genome sequences to the correct phenotypic profile and to identify the disease class of each genome. These are all rare disease cases that have resisted genetic diagnosis in a state-of-the-art pipeline. The patients have a range of eye, neurological, and connective-tissue disorders. We used a gene-centric approach to address this problem, assigning each gene a multiphenotype-matching score. Mutations in the top-scoring genes for each phenotype profile were ranked on a 6-point scale of pathogenicity probability, resulting in an approximately equal number of top-ranked coding and noncoding candidate variants overall. We were able to assign the correct disease class for 12 cases and the correct genome to a clinical profile for five cases. The challenge assessor found genes in three of these five cases as likely appropriate. In the postsubmission phase, after careful screening of the genes in the correct genome, we identified additional potential diagnostic variants, a high proportion of which are noncoding.


Assuntos
Estudos de Associação Genética/métodos , Doenças Genéticas Inatas/diagnóstico , Doenças Genéticas Inatas/genética , Predisposição Genética para Doença , Genoma Humano , Genômica/métodos , Doenças Raras , Algoritmos , Alelos , Variação Genética , Estudo de Associação Genômica Ampla/métodos , Genótipo , Humanos , Modelos Teóricos , Fenótipo , Sequenciamento Completo do Genoma , Fluxo de Trabalho
20.
Genet Med ; 22(12): 2060-2070, 2020 12.
Artigo em Inglês | MEDLINE | ID: mdl-32773773

RESUMO

PURPOSE: Childhood epilepsies have a strong genetic contribution, but the disease trajectory for many genetic etiologies remains unknown. Electronic medical record (EMR) data potentially allow for the analysis of longitudinal clinical information but this has not yet been explored. METHODS: We analyzed provider-entered neurological diagnoses made at 62,104 patient encounters from 658 individuals with known or presumed genetic epilepsies. To harmonize clinical terminology, we mapped clinical descriptors to Human Phenotype Ontology (HPO) terms and inferred higher-level phenotypic concepts. We then binned the resulting 286,085 HPO terms to 100 3-month time intervals and assessed gene-phenotype associations at each interval. RESULTS: We analyzed a median follow-up of 6.9 years per patient and a cumulative 3251 patient years. Correcting for multiple testing, we identified significant associations between "Status epilepticus" with SCN1A at 1.0 years, "Severe intellectual disability" with PURA at 9.75 years, and "Infantile spasms" and "Epileptic spasms" with STXBP1 at 0.5 years. The identified associations reflect known clinical features of these conditions, and manual chart review excluded provider bias. CONCLUSION: Some aspects of the longitudinal disease histories can be reconstructed through EMR data and reveal significant gene-phenotype associations, even within closely related conditions. Gene-specific EMR footprints may enable outcome studies and clinical decision support.


Assuntos
Epilepsia , Deficiência Intelectual , Espasmos Infantis , Criança , Registros Eletrônicos de Saúde , Epilepsia/diagnóstico , Epilepsia/genética , Humanos , Fenótipo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA