Pesquisa | Portal de Pesquisa da BVS

1.

Bedside Back to Bench: Building Bridges between Basic and Clinical Genomic Research.

Manolio, Teri A; Fowler, Douglas M; Starita, Lea M; Haendel, Melissa A; MacArthur, Daniel G; Biesecker, Leslie G; Worthey, Elizabeth; Chisholm, Rex L; Green, Eric D; Jacob, Howard J; McLeod, Howard L; Roden, Dan; Rodriguez, Laura Lyman; Williams, Marc S; Cooper, Gregory M; Cox, Nancy J; Herman, Gail E; Kingsmore, Stephen; Lo, Cecilia; Lutz, Cathleen; MacRae, Calum A; Nussbaum, Robert L; Ordovas, Jose M; Ramos, Erin M; Robinson, Peter N; Rubinstein, Wendy S; Seidman, Christine; Stranger, Barbara E; Wang, Haoyi; Westerfield, Monte; Bult, Carol.

Cell ; 169(1): 6-12, 2017 03 23.

Artigo em Inglês | MEDLINE | ID: mdl-28340351

RESUMO

Genome sequencing has revolutionized the diagnosis of genetic diseases. Close collaborations between basic scientists and clinical genomicists are now needed to link genetic variants with disease causation. To facilitate such collaborations, we recommend prioritizing clinically relevant genes for functional studies, developing reference variant-phenotype databases, adopting phenotype description standards, and promoting data sharing.

Assuntos

Pesquisa Biomédica , Genômica , Animais , Análise Mutacional de DNA , Bases de Dados Genéticas , Doença/genética , Projeto Genoma Humano , Humanos , Disseminação de Informação , Modelos Animais

2.

Geometric constraints on human brain function.

Pang, James C; Aquino, Kevin M; Oldehinkel, Marianne; Robinson, Peter A; Fulcher, Ben D; Breakspear, Michael; Fornito, Alex.

Nature ; 618(7965): 566-574, 2023 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-37258669

RESUMO

The anatomy of the brain necessarily constrains its function, but precisely how remains unclear. The classical and dominant paradigm in neuroscience is that neuronal dynamics are driven by interactions between discrete, functionally specialized cell populations connected by a complex array of axonal fibres1-3. However, predictions from neural field theory, an established mathematical framework for modelling large-scale brain activity4-6, suggest that the geometry of the brain may represent a more fundamental constraint on dynamics than complex interregional connectivity7,8. Here, we confirm these theoretical predictions by analysing human magnetic resonance imaging data acquired under spontaneous and diverse task-evoked conditions. Specifically, we show that cortical and subcortical activity can be parsimoniously understood as resulting from excitations of fundamental, resonant modes of the brain's geometry (that is, its shape) rather than from modes of complex interregional connectivity, as classically assumed. We then use these geometric modes to show that task-evoked activations across over 10,000 brain maps are not confined to focal areas, as widely believed, but instead excite brain-wide modes with wavelengths spanning over 60 mm. Finally, we confirm predictions that the close link between geometry and function is explained by a dominant role for wave-like activity, showing that wave dynamics can reproduce numerous canonical spatiotemporal properties of spontaneous and evoked recordings. Our findings challenge prevailing views and identify a previously underappreciated role of geometry in shaping function, as predicted by a unifying and physically principled model of brain-wide dynamics.

Assuntos

Mapeamento Encefálico , Encéfalo , Humanos , Axônios/fisiologia , Encéfalo/anatomia & histologia , Encéfalo/citologia , Encéfalo/fisiologia , Imageamento por Ressonância Magnética , Neurônios/fisiologia

3.

The Monarch Initiative in 2024: an analytic platform integrating phenotypes, genes and diseases across species.

Putman, Tim E; Schaper, Kevin; Matentzoglu, Nicolas; Rubinetti, Vincent P; Alquaddoomi, Faisal S; Cox, Corey; Caufield, J Harry; Elsarboukh, Glass; Gehrke, Sarah; Hegde, Harshad; Reese, Justin T; Braun, Ian; Bruskiewich, Richard M; Cappelletti, Luca; Carbon, Seth; Caron, Anita R; Chan, Lauren E; Chute, Christopher G; Cortes, Katherina G; De Souza, Vinícius; Fontana, Tommaso; Harris, Nomi L; Hartley, Emily L; Hurwitz, Eric; Jacobsen, Julius O B; Krishnamurthy, Madan; Laraway, Bryan J; McLaughlin, James A; McMurry, Julie A; Moxon, Sierra A T; Mullen, Kathleen R; O'Neil, Shawn T; Shefchek, Kent A; Stefancsik, Ray; Toro, Sabrina; Vasilevsky, Nicole A; Walls, Ramona L; Whetzel, Patricia L; Osumi-Sutherland, David; Smedley, Damian; Robinson, Peter N; Mungall, Christopher J; Haendel, Melissa A; Munoz-Torres, Monica C.

Nucleic Acids Res ; 52(D1): D938-D949, 2024 Jan 05.

Artigo em Inglês | MEDLINE | ID: mdl-38000386

RESUMO

Bridging the gap between genetic variations, environmental determinants, and phenotypic outcomes is critical for supporting clinical diagnosis and understanding mechanisms of diseases. It requires integrating open data at a global scale. The Monarch Initiative advances these goals by developing open ontologies, semantic data models, and knowledge graphs for translational research. The Monarch App is an integrated platform combining data about genes, phenotypes, and diseases across species. Monarch's APIs enable access to carefully curated datasets and advanced analysis tools that support the understanding and diagnosis of disease for diverse applications such as variant prioritization, deep phenotyping, and patient profile-matching. We have migrated our system into a scalable, cloud-based infrastructure; simplified Monarch's data ingestion and knowledge graph integration systems; enhanced data mapping and integration standards; and developed a new user interface with novel search and graph navigation features. Furthermore, we advanced Monarch's analytic tools by developing a customized plugin for OpenAI's ChatGPT to increase the reliability of its responses about phenotypic data, allowing us to interrogate the knowledge in the Monarch graph using state-of-the-art Large Language Models. The resources of the Monarch Initiative can be found at monarchinitiative.org and its corresponding code repository at github.com/monarch-initiative/monarch-app.

Assuntos

Bases de Dados Factuais , Doença , Genes , Fenótipo , Humanos , Internet , Bases de Dados Factuais/normas , Software , Genes/genética , Doença/genética

4.

Phenotype-aware prioritisation of rare Mendelian disease variants.

Kelly, Catherine; Szabo, Anita; Pontikos, Nikolas; Arno, Gavin; Robinson, Peter N; Jacobsen, Jules O B; Smedley, Damian; Cipriani, Valentina.

Trends Genet ; 38(12): 1271-1283, 2022 12.

Artigo em Inglês | MEDLINE | ID: mdl-35934592

RESUMO

A molecular diagnosis from the analysis of sequencing data in rare Mendelian diseases has a huge impact on the management of patients and their families. Numerous patient phenotype-aware variant prioritisation (VP) tools have been developed to help automate this process, and shorten the diagnostic odyssey, but performance statistics on real patient data are limited. Here we identify, assess, and compare the performance of all up-to-date, freely available, and programmatically accessible tools using a whole-exome, retinal disease dataset from 134 individuals with a molecular diagnosis. All tools were able to identify around two-thirds of the genetic diagnoses as the top-ranked candidate, with LIRICAL performing best overall. Finally, we discuss the challenges to overcome most cases remaining undiagnosed after current, state-of-the-art practices.

Assuntos

Exoma , Doenças Raras , Humanos , Fenótipo , Sequenciamento do Exoma , Doenças Raras/diagnóstico , Doenças Raras/genética

5.

Germline thymidylate synthase deficiency impacts nucleotide metabolism and causes dyskeratosis congenita.

Tummala, Hemanth; Walne, Amanda; Buccafusca, Roberto; Alnajar, Jenna; Szabo, Anita; Robinson, Peter; McConkie-Rosell, Allyn; Wilson, Meredith; Crowley, Suzanne; Kinsler, Veronica; Ewins, Anna-Maria; Madapura, Pradeepa M; Patel, Manthan; Pontikos, Nikolas; Codd, Veryan; Vulliamy, Tom; Dokal, Inderjeet.

Am J Hum Genet ; 109(8): 1472-1483, 2022 08 04.

Artigo em Inglês | MEDLINE | ID: mdl-35931051

RESUMO

Dyskeratosis congenita (DC) is an inherited bone-marrow-failure disorder characterized by a triad of mucocutaneous features that include abnormal skin pigmentation, nail dystrophy, and oral leucoplakia. Despite the identification of several genetic variants that cause DC, a significant proportion of probands remain without a molecular diagnosis. In a cohort of eight independent DC-affected families, we have identified a remarkable series of heterozygous germline variants in the gene encoding thymidylate synthase (TYMS). Although the inheritance appeared to be autosomal recessive, one parent in each family had a wild-type TYMS coding sequence. Targeted genomic sequencing identified a specific haplotype and rare variants in the naturally occurring TYMS antisense regulator ENOSF1 (enolase super family 1) inherited from the other parent. Lymphoblastoid cells from affected probands have severe TYMS deficiency, altered cellular deoxyribonucleotide triphosphate pools, and hypersensitivity to the TYMS-specific inhibitor 5-fluorouracil. These defects in the nucleotide metabolism pathway resulted in genotoxic stress, defective transcription, and abnormal telomere maintenance. Gene-rescue studies in cells from affected probands revealed that post-transcriptional epistatic silencing of TYMS is occurring via elevated ENOSF1. These cell and molecular abnormalities generated by the combination of germline digenic variants at the TYMS-ENOSF1 locus represent a unique pathogenetic pathway for DC causation in these affected individuals, whereas the parents who are carriers of either of these variants in a singular fashion remain unaffected.

Assuntos

Disceratose Congênita , Timidilato Sintase , Disceratose Congênita/genética , Células Germinativas , Heterozigoto , Humanos , Nucleotídeos , Timidilato Sintase/deficiência , Timidilato Sintase/genética

6.

FastHPOCR: Pragmatic, fast and accurate concept recognition using the Human Phenotype Ontology.

Groza, Tudor; Gration, Dylan; Baynam, Gareth; Robinson, Peter N.

Bioinformatics ; 2024 Jun 24.

Artigo em Inglês | MEDLINE | ID: mdl-38913850

RESUMO

MOTIVATION: Human Phenotype Ontology (HPO)-based phenotype concept recognition underpins a faster and more effective mechanism to create patient phenotype profiles or to document novel phenotype-centred knowledge statements. While the increasing adoption of large language models (LLM) for natural language understanding has led to several LLM-based solutions, we argue that their intrinsic resource-intensive nature is not suitable for realistic management of the phenotype concept recognition (CR) lifecycle. Consequently, we propose to go back to the basics and adopt a dictionary-based approach that enables both an immediate refresh of the ontological concepts as well as efficient re-analysis of past data. RESULTS: We developed a dictionary-based approach using a pre-built large collection of clusters of morphologically-equivalent tokens-to address lexical variability and a more effective concept recognition step by reducing the entity boundary detection strictly to candidates consisting of tokens belonging to ontology concepts. Our method achieves state-of-the-art results (0.76 F1 on the GSC+ corpus) and a processing efficiency of 10,000 publication abstracts in 5s. AVAILABILITY: FastHPOCR is available as a Python package installable via pip. The source code is available at https://github.com/tudorgroza/fast_hpo_cr. A Java implementation of FastHPOCR will be made available as part of the Fenominal Java library available at https://github.com/monarch-initiative/fenominal. The up-to-date GCS-2024 corpus is available at https://github.com/tudorgroza/code-for-papers/tree/main/gsc-2024.

7.

Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES): a method for populating knowledge bases using zero-shot learning.

Caufield, J Harry; Hegde, Harshad; Emonet, Vincent; Harris, Nomi L; Joachimiak, Marcin P; Matentzoglu, Nicolas; Kim, HyeongSik; Moxon, Sierra; Reese, Justin T; Haendel, Melissa A; Robinson, Peter N; Mungall, Christopher J.

Bioinformatics ; 40(3)2024 Mar 04.

Artigo em Inglês | MEDLINE | ID: mdl-38383067

RESUMO

MOTIVATION: Creating knowledge bases and ontologies is a time consuming task that relies on manual curation. AI/NLP approaches can assist expert curators in populating these knowledge bases, but current approaches rely on extensive training data, and are not able to populate arbitrarily complex nested knowledge schemas. RESULTS: Here we present Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES), a Knowledge Extraction approach that relies on the ability of Large Language Models (LLMs) to perform zero-shot learning and general-purpose query answering from flexible prompts and return information conforming to a specified schema. Given a detailed, user-defined knowledge schema and an input text, SPIRES recursively performs prompt interrogation against an LLM to obtain a set of responses matching the provided schema. SPIRES uses existing ontologies and vocabularies to provide identifiers for matched elements. We present examples of applying SPIRES in different domains, including extraction of food recipes, multi-species cellular signaling pathways, disease treatments, multi-step drug mechanisms, and chemical to disease relationships. Current SPIRES accuracy is comparable to the mid-range of existing Relation Extraction methods, but greatly surpasses an LLM's native capability of grounding entities with unique identifiers. SPIRES has the advantage of easy customization, flexibility, and, crucially, the ability to perform new tasks in the absence of any new training data. This method supports a general strategy of leveraging the language interpreting capabilities of LLMs to assemble knowledge bases, assisting manual knowledge curation and acquisition while supporting validation with publicly-available databases and ontologies external to the LLM. AVAILABILITY AND IMPLEMENTATION: SPIRES is available as part of the open source OntoGPT package: https://github.com/monarch-initiative/ontogpt.

Assuntos

Bases de Conhecimento , Semântica , Bases de Dados Factuais

8.

The effects of pathogenic and likely pathogenic variants for inherited hemostasis disorders in 140 214 UK Biobank participants.

Stefanucci, Luca; Collins, Janine; Sims, Matthew C; Barrio-Hernandez, Inigo; Sun, Luanluan; Burren, Oliver S; Perfetto, Livia; Bender, Isobel; Callahan, Tiffany J; Fleming, Kathryn; Guerrero, Jose A; Hermjakob, Henning; Martin, Maria J; Stephenson, James; Paneerselvam, Kalpana; Petrovski, Slavé; Porras, Pablo; Robinson, Peter N; Wang, Quanli; Watkins, Xavier; Frontini, Mattia; Laskowski, Roman A; Beltrao, Pedro; Di Angelantonio, Emanuele; Gomez, Keith; Laffan, Mike; Ouwehand, Willem H; Mumford, Andrew D; Freson, Kathleen; Carss, Keren; Downes, Kate; Gleadall, Nick; Megy, Karyn; Bruford, Elspeth; Vuckovic, Dragana.

Blood ; 142(24): 2055-2068, 2023 12 14.

Artigo em Inglês | MEDLINE | ID: mdl-37647632

RESUMO

Rare genetic diseases affect millions, and identifying causal DNA variants is essential for patient care. Therefore, it is imperative to estimate the effect of each independent variant and improve their pathogenicity classification. Our study of 140 214 unrelated UK Biobank (UKB) participants found that each of them carries a median of 7 variants previously reported as pathogenic or likely pathogenic. We focused on 967 diagnostic-grade gene (DGG) variants for rare bleeding, thrombotic, and platelet disorders (BTPDs) observed in 12 367 UKB participants. By association analysis, for a subset of these variants, we estimated effect sizes for platelet count and volume, and odds ratios for bleeding and thrombosis. Variants causal of some autosomal recessive platelet disorders revealed phenotypic consequences in carriers. Loss-of-function variants in MPL, which cause chronic amegakaryocytic thrombocytopenia if biallelic, were unexpectedly associated with increased platelet counts in carriers. We also demonstrated that common variants identified by genome-wide association studies (GWAS) for platelet count or thrombosis risk may influence the penetrance of rare variants in BTPD DGGs on their associated hemostasis disorders. Network-propagation analysis applied to an interactome of 18 410 nodes and 571 917 edges showed that GWAS variants with large effect sizes are enriched in DGGs and their first-order interactors. Finally, we illustrate the modifying effect of polygenic scores for platelet count and thrombosis risk on disease severity in participants carrying rare variants in TUBB1 or PROC and PROS1, respectively. Our findings demonstrate the power of association analyses using large population datasets in improving pathogenicity classifications of rare variants.

Assuntos

Estudo de Associação Genômica Ampla , Trombose , Humanos , Bancos de Espécimes Biológicos , Hemostasia , Hemorragia/genética , Doenças Raras

9.

Landscape fire smoke airway exposure impairs respiratory and cardiac function and worsens experimental asthma.

Gomez, Henry M; Haw, Tatt J; Ilic, Dusan; Robinson, Peter; Donovan, Chantal; Croft, Amanda J; Vanka, Kanth S; Small, Ellen; Carroll, Olivia R; Kim, Richard Y; Mayall, Jemma R; Beyene, Tesfalidet; Palanisami, Thava; Ngo, Doan T M; Zosky, Graeme R; Holliday, Elizabeth G; Jensen, Megan E; McDonald, Vanessa M; Murphy, Vanessa E; Gibson, Peter G; Horvat, Jay C.

J Allergy Clin Immunol ; 2024 Mar 19.

Artigo em Inglês | MEDLINE | ID: mdl-38513838

RESUMO

BACKGROUND: Millions of people are exposed to landscape fire smoke (LFS) globally, and inhalation of LFS particulate matter (PM) is associated with poor respiratory and cardiovascular outcomes. However, how LFS affects respiratory and cardiovascular function is less well understood. OBJECTIVE: We aimed to characterize the pathophysiologic effects of representative LFS airway exposure on respiratory and cardiac function and on asthma outcomes. METHODS: LFS was generated using a customized combustion chamber. In 8-week-old female BALB/c mice, low (25 µg/m3, 24-hour equivalent) or moderate (100 µg/m3, 24-hour equivalent) concentrations of LFS PM (10 µm and below [PM10]) were administered daily for 3 (short-term) and 14 (long-term) days in the presence and absence of experimental asthma. Lung inflammation, gene expression, structural changes, and lung function were assessed. In 8-week-old male C57BL/6 mice, low concentrations of LFS PM10 were administered for 3 days. Cardiac function and gene expression were assessed. RESULTS: Short- and long-term LFS PM10 airway exposure increased airway hyperresponsiveness and induced steroid insensitivity in experimental asthma, independent of significant changes in airway inflammation. Long-term LFS PM10 airway exposure also decreased gas diffusion. Short-term LFS PM10 airway exposure decreased cardiac function and expression of gene changes relating to oxidative stress and cardiovascular pathologies. CONCLUSIONS: We characterized significant detrimental effects of physiologically relevant concentrations and durations of LFS PM10 airway exposure on lung and heart function. Our study provides a platform for assessment of mechanisms that underpin LFS PM10 airway exposure on respiratory and cardiovascular disease outcomes.

10.

Interpretable prioritization of splice variants in diagnostic next-generation sequencing.

Danis, Daniel; Jacobsen, Julius O B; Carmody, Leigh C; Gargano, Michael A; McMurry, Julie A; Hegde, Ayushi; Haendel, Melissa A; Valentini, Giorgio; Smedley, Damian; Robinson, Peter N.

Am J Hum Genet ; 108(9): 1564-1577, 2021 09 02.

Artigo em Inglês | MEDLINE | ID: mdl-34289339

RESUMO

A critical challenge in genetic diagnostics is the computational assessment of candidate splice variants, specifically the interpretation of nucleotide changes located outside of the highly conserved dinucleotide sequences at the 5' and 3' ends of introns. To address this gap, we developed the Super Quick Information-content Random-forest Learning of Splice variants (SQUIRLS) algorithm. SQUIRLS generates a small set of interpretable features for machine learning by calculating the information-content of wild-type and variant sequences of canonical and cryptic splice sites, assessing changes in candidate splicing regulatory sequences, and incorporating characteristics of the sequence such as exon length, disruptions of the AG exclusion zone, and conservation. We curated a comprehensive collection of disease-associated splice-altering variants at positions outside of the highly conserved AG/GT dinucleotides at the termini of introns. SQUIRLS trains two random-forest classifiers for the donor and for the acceptor and combines their outputs by logistic regression to yield a final score. We show that SQUIRLS transcends previous state-of-the-art accuracy in classifying splice variants as assessed by rank analysis in simulated exomes, and is significantly faster than competing methods. SQUIRLS provides tabular output files for incorporation into diagnostic pipelines for exome and genome analysis, as well as visualizations that contextualize predicted effects of variants on splicing to make it easier to interpret splice variants in diagnostic settings.

Assuntos

Algoritmos , Curadoria de Dados/métodos , Doenças Genéticas Inatas/genética , Sítios de Splice de RNA , Splicing de RNA , Software , Sequência de Bases , Biologia Computacional/métodos , Exoma , Éxons , Doenças Genéticas Inatas/diagnóstico , Doenças Genéticas Inatas/patologia , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Íntrons , Mutação , Sequenciamento do Exoma

11.

Evaluation of phenotype-driven gene prioritization methods for Mendelian diseases.

Jacobsen, Julius O B; Kelly, Catherine; Cipriani, Valentina; Robinson, Peter N; Smedley, Damian.

Brief Bioinform ; 23(5)2022 09 20.

Artigo em Inglês | MEDLINE | ID: mdl-35595299

RESUMO

Yuan et al. recently described an independent evaluation of several phenotype-driven gene prioritization methods for Mendelian disease on two separate, clinical datasets. Although they attempted to use default settings for each tool, we describe three key differences from those we currently recommend for our Exomiser and PhenIX tools. These influence how variant frequency, quality and predicted pathogenicity are used for filtering and prioritization. We propose that these differences account for much of the discrepancy in performance between that reported by them (15-26% diagnoses ranked top by Exomiser) and previously published reports by us and others (72-77%). On a set of 161 singleton samples, we show using these settings increases performance from 34% to 72% and suggest a reassessment of Exomiser and PhenIX on their datasets using these would show a similar uplift.

Assuntos

Doenças Genéticas Inatas , Fenótipo , Biologia Computacional , Humanos

12.

Term-BLAST-like alignment tool for concept recognition in noisy clinical texts.

Groza, Tudor; Wu, Honghan; Dinger, Marcel E; Danis, Daniel; Hilton, Coleman; Bagley, Anita; Davids, Jon R; Luo, Ling; Lu, Zhiyong; Robinson, Peter N.

Bioinformatics ; 39(12)2023 12 01.

Artigo em Inglês | MEDLINE | ID: mdl-38001031

RESUMO

MOTIVATION: Methods for concept recognition (CR) in clinical texts have largely been tested on abstracts or articles from the medical literature. However, texts from electronic health records (EHRs) frequently contain spelling errors, abbreviations, and other nonstandard ways of representing clinical concepts. RESULTS: Here, we present a method inspired by the BLAST algorithm for biosequence alignment that screens texts for potential matches on the basis of matching k-mer counts and scores candidates based on conformance to typical patterns of spelling errors derived from 2.9 million clinical notes. Our method, the Term-BLAST-like alignment tool (TBLAT) leverages a gold standard corpus for typographical errors to implement a sequence alignment-inspired method for efficient entity linkage. We present a comprehensive experimental comparison of TBLAT with five widely used tools. Experimental results show an increase of 10% in recall on scientific publications and 20% increase in recall on EHR records (when compared against the next best method), hence supporting a significant enhancement of the entity linking task. The method can be used stand-alone or as a complement to existing approaches. AVAILABILITY AND IMPLEMENTATION: Fenominal is a Java library that implements TBLAT for named CR of Human Phenotype Ontology terms and is available at https://github.com/monarch-initiative/fenominal under the GNU General Public License v3.0.

Assuntos

Algoritmos , Idioma , Humanos , Alinhamento de Sequência , Registros Eletrônicos de Saúde , Publicações

13.

An expectation-maximization framework for comprehensive prediction of isoform-specific functions.

Karlebach, Guy; Carmody, Leigh; Sundaramurthi, Jagadish Chandrabose; Casiraghi, Elena; Hansen, Peter; Reese, Justin; Mungall, Christopher J; Valentini, Giorgio; Robinson, Peter N.

Bioinformatics ; 39(4)2023 04 03.

Artigo em Inglês | MEDLINE | ID: mdl-36929917

RESUMO

MOTIVATION: Advances in RNA sequencing technologies have achieved an unprecedented accuracy in the quantification of mRNA isoforms, but our knowledge of isoform-specific functions has lagged behind. There is a need to understand the functional consequences of differential splicing, which could be supported by the generation of accurate and comprehensive isoform-specific gene ontology annotations. RESULTS: We present isoform interpretation, a method that uses expectation-maximization to infer isoform-specific functions based on the relationship between sequence and functional isoform similarity. We predicted isoform-specific functional annotations for 85 617 isoforms of 17 900 protein-coding human genes spanning a range of 17 430 distinct gene ontology terms. Comparison with a gold-standard corpus of manually annotated human isoform functions showed that isoform interpretation significantly outperforms state-of-the-art competing methods. We provide experimental evidence that functionally related isoforms predicted by isoform interpretation show a higher degree of domain sharing and expression correlation than functionally related genes. We also show that isoform sequence similarity correlates better with inferred isoform function than with gene-level function. AVAILABILITY AND IMPLEMENTATION: Source code, documentation, and resource files are freely available under a GNU3 license at https://github.com/TheJacksonLaboratory/isopretEM and https://zenodo.org/record/7594321.

Assuntos

Motivação , Software , Humanos , Isoformas de Proteínas/genética , Processamento Alternativo , Análise de Sequência de RNA

14.

KG-Hub-building and exchanging biological knowledge graphs.

Caufield, J Harry; Putman, Tim; Schaper, Kevin; Unni, Deepak R; Hegde, Harshad; Callahan, Tiffany J; Cappelletti, Luca; Moxon, Sierra A T; Ravanmehr, Vida; Carbon, Seth; Chan, Lauren E; Cortes, Katherina; Shefchek, Kent A; Elsarboukh, Glass; Balhoff, Jim; Fontana, Tommaso; Matentzoglu, Nicolas; Bruskiewich, Richard M; Thessen, Anne E; Harris, Nomi L; Munoz-Torres, Monica C; Haendel, Melissa A; Robinson, Peter N; Joachimiak, Marcin P; Mungall, Christopher J; Reese, Justin T.

Bioinformatics ; 39(7)2023 07 01.

Artigo em Inglês | MEDLINE | ID: mdl-37389415

RESUMO

MOTIVATION: Knowledge graphs (KGs) are a powerful approach for integrating heterogeneous data and making inferences in biology and many other domains, but a coherent solution for constructing, exchanging, and facilitating the downstream use of KGs is lacking. RESULTS: Here we present KG-Hub, a platform that enables standardized construction, exchange, and reuse of KGs. Features include a simple, modular extract-transform-load pattern for producing graphs compliant with Biolink Model (a high-level data model for standardizing biological data), easy integration of any OBO (Open Biological and Biomedical Ontologies) ontology, cached downloads of upstream data sources, versioned and automatically updated builds with stable URLs, web-browsable storage of KG artifacts on cloud infrastructure, and easy reuse of transformed subgraphs across projects. Current KG-Hub projects span use cases including COVID-19 research, drug repurposing, microbial-environmental interactions, and rare disease research. KG-Hub is equipped with tooling to easily analyze and manipulate KGs. KG-Hub is also tightly integrated with graph machine learning (ML) tools which allow automated graph ML, including node embeddings and training of models for link prediction and node classification. AVAILABILITY AND IMPLEMENTATION: https://kghub.org.

Assuntos

Ontologias Biológicas , COVID-19 , Humanos , Reconhecimento Automatizado de Padrão , Doenças Raras , Aprendizado de Máquina

15.

Lethal phenotypes in Mendelian disorders.

Cacheiro, Pilar; Lawson, Samantha; Van den Veyver, Ignatia B; Marengo, Gabriel; Zocche, David; Murray, Stephen A; Duyzend, Michael; Robinson, Peter N; Smedley, Damian.

Genet Med ; 26(7): 101141, 2024 Apr 15.

Artigo em Inglês | MEDLINE | ID: mdl-38629401

RESUMO

PURPOSE: Existing resources that characterize the essentiality status of genes are based on either proliferation assessment in human cell lines, viability evaluation in mouse knockouts, or constraint metrics derived from human population sequencing studies. Several repositories document phenotypic annotations for rare disorders; however, there is a lack of comprehensive reporting on lethal phenotypes. METHODS: We queried Online Mendelian Inheritance in Man for terms related to lethality and classified all Mendelian genes according to the earliest age of death recorded for the associated disorders, from prenatal death to no reports of premature death. We characterized the genes across these lethality categories, examined the evidence on viability from mouse models and explored how this information could be used for novel gene discovery. RESULTS: We developed the Lethal Phenotypes Portal to showcase this curated catalog of human essential genes. Differences in the mode of inheritance, physiological systems affected, and disease class were found for genes in different lethality categories, as well as discrepancies between the lethal phenotypes observed in mouse and human. CONCLUSION: We anticipate that this resource will aid clinicians in the diagnosis of early lethal conditions and assist researchers in investigating the properties that make these genes essential for human development.

16.

Toward robust clinical genome interpretation: Developing a consistent terminology to characterize Mendelian disease-gene relationships-allelic requirement, inheritance modes, and disease mechanisms.

Roberts, Angharad M; DiStefano, Marina T; Riggs, Erin Rooney; Josephs, Katherine S; Alkuraya, Fowzan S; Amberger, Joanna; Amin, Mutaz; Berg, Jonathan S; Cunningham, Fiona; Eilbeck, Karen; Firth, Helen V; Foreman, Julia; Hamosh, Ada; Hay, Eleanor; Leigh, Sarah; Martin, Christa L; McDonagh, Ellen M; Perrett, Daniel; Ramos, Erin M; Robinson, Peter N; Rath, Ana; Sant, David W; Stark, Zornitza; Whiffin, Nicola; Rehm, Heidi L; Ware, James S.

Genet Med ; 26(2): 101029, 2024 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-37982373

RESUMO

PURPOSE: The terminology used for gene-disease curation and variant annotation to describe inheritance, allelic requirement, and both sequence and functional consequences of a variant is currently not standardized. There is considerable discrepancy in the literature and across clinical variant reporting in the derivation and application of terms. Here, we standardize the terminology for the characterization of disease-gene relationships to facilitate harmonized global curation and to support variant classification within the ACMG/AMP framework. METHODS: Terminology for inheritance, allelic requirement, and both structural and functional consequences of a variant used by Gene Curation Coalition members and partner organizations was collated and reviewed. Harmonized terminology with definitions and use examples was created, reviewed, and validated. RESULTS: We present a standardized terminology to describe gene-disease relationships, and to support variant annotation. We demonstrate application of the terminology for classification of variation in the ACMG SF 2.0 genes recommended for reporting of secondary findings. Consensus terms were agreed and formalized in both Sequence Ontology (SO) and Human Phenotype Ontology (HPO) ontologies. Gene Curation Coalition member groups intend to use or map to these terms in their respective resources. CONCLUSION: The terminology standardization presented here will improve harmonization, facilitate the pooling of curation datasets across international curation efforts and, in turn, improve consistency in variant classification and genetic test interpretation.

Assuntos

Testes Genéticos , Variação Genética , Humanos , Alelos , Bases de Dados Genéticas

17.

CalScope: methodology and lessons learned for conducting a remote statewide SARS-CoV-2 seroprevalence study in California using an at-home dried blood spot collection kit and online survey.

Lim, Esther; Mehrotra, Megha L; Lamba, Katherine; Kamali, Amanda; Lai, Kristina W; Meza, Erika; Bertsch-Merbach, Stephanie; Szeto, Irvin; Ley, Catherine; Martin, Andrew B; Parsonnet, Julie; Robinson, Peter; Gebhart, David; Fonseca, Noemi; Tsai, Cheng-Ting; Seftel, David; Nicolici, Allyx; Melton, David; Jain, Seema.

BMC Med Res Methodol ; 24(1): 120, 2024 May 27.

Artigo em Inglês | MEDLINE | ID: mdl-38802749

RESUMO

BACKGROUND: To describe the methodology for conducting the CalScope study, a remote, population-based survey launched by the California Department of Public Health (CDPH) to estimate SARS-CoV-2 seroprevalence and understand COVID-19 disease burden in California. METHODS: Between April 2021 and August 2022, 666,857 randomly selected households were invited by mail to complete an online survey and at-home test kit for up to one adult and one child. A gift card was given for each completed survey and test kit. Multiple customized REDCap databases were used to create a data system which provided task automation and scalable data management through API integrations. Support infrastructure was developed to manage follow-up for participant questions and a communications plan was used for outreach through local partners. RESULTS: Across 3 waves, 32,671 out of 666,857 (4.9%) households registered, 6.3% by phone using an interactive voice response (IVR) system and 95.7% in English. Overall, 25,488 (78.0%) households completed surveys, while 23,396 (71.6%) households returned blood samples for testing. Support requests (n = 5,807) received through the web-based form (36.3%), by email (34.1%), and voicemail (29.7%) were mostly concerned with the test kit (31.6%), test result (26.8%), and gift card (21.3%). CONCLUSIONS: Ensuring a well-integrated and scalable data system, responsive support infrastructure for participant follow-up, and appropriate academic and local health department partnerships for study management and communication allowed for successful rollout of a large population-based survey. Remote data collection utilizing online surveys and at-home test kits can complement routine surveillance data for a state health department.

Assuntos

COVID-19 , Teste em Amostras de Sangue Seco , SARS-CoV-2 , Humanos , COVID-19/epidemiologia , COVID-19/diagnóstico , Estudos Soroepidemiológicos , California/epidemiologia , SARS-CoV-2/imunologia , Teste em Amostras de Sangue Seco/métodos , Teste em Amostras de Sangue Seco/estatística & dados numéricos , Adulto , Inquéritos e Questionários , Masculino , Feminino , Criança , Pessoa de Meia-Idade , Adolescente

18.

Improving prenatal diagnosis through standards and aggregation.

Duyzend, Michael H; Cacheiro, Pilar; Jacobsen, Julius O B; Giordano, Jessica; Brand, Harrison; Wapner, Ronald J; Talkowski, Michael E; Robinson, Peter N; Smedley, Damian.

Prenat Diagn ; 44(4): 454-464, 2024 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-38242839

RESUMO

Advances in sequencing and imaging technologies enable enhanced assessment in the prenatal space, with a goal to diagnose and predict the natural history of disease, to direct targeted therapies, and to implement clinical management, including transfer of care, election of supportive care, and selection of surgical interventions. The current lack of standardization and aggregation stymies variant interpretation and gene discovery, which hinders the provision of prenatal precision medicine, leaving clinicians and patients without an accurate diagnosis. With large amounts of data generated, it is imperative to establish standards for data collection, processing, and aggregation. Aggregated and homogeneously processed genetic and phenotypic data permits dissection of the genomic architecture of prenatal presentations of disease and provides a dataset on which data analysis algorithms can be tuned to the prenatal space. Here we discuss the importance of generating aggregate data sets and how the prenatal space is driving the development of interoperable standards and phenotype-driven tools.

Assuntos

Medicina de Precisão , Diagnóstico Pré-Natal , Gravidez , Feminino , Humanos , Fenótipo , Genômica , Algoritmos

19.

FABIAN-variant: predicting the effects of DNA variants on transcription factor binding.

Steinhaus, Robin; Robinson, Peter N; Seelow, Dominik.

Nucleic Acids Res ; 50(W1): W322-W329, 2022 07 05.

Artigo em Inglês | MEDLINE | ID: mdl-35639768

RESUMO

While great advances in predicting the effects of coding variants have been made, the assessment of non-coding variants remains challenging. This is especially problematic for variants within promoter regions which can lead to over-expression of a gene or reduce or even abolish its expression. The binding of transcription factors to the DNA can be predicted using position weight matrices (PWMs). More recently, transcription factor flexible models (TFFMs) have been introduced and shown to be more accurate than PWMs. TFFMs are based on hidden Markov models and can account for complex positional dependencies. Our new web-based application FABIAN-variant uses 1224 TFFMs and 3790 PWMs to predict whether and to which degree DNA variants affect the binding of 1387 different human transcription factors. For each variant and transcription factor, the software combines the results of different models for a final prediction of the resulting binding-affinity change. The software is written in C++ for speed but variants can be entered through a web interface. Alternatively, a VCF file can be uploaded to assess variants identified by high-throughput sequencing. The search can be restricted to variants in the vicinity of candidate genes. FABIAN-variant is available freely at https://www.genecascade.org/fabian/.

Assuntos

Proteínas de Ligação a DNA , DNA , Variação Genética , Software , Fatores de Transcrição , Humanos , Sítios de Ligação/genética , DNA/genética , DNA/metabolismo , Matrizes de Pontuação de Posição Específica , Ligação Proteica , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Variação Genética/genética , Proteínas de Ligação a DNA/genética , Proteínas de Ligação a DNA/metabolismo , Internet , Linguagens de Programação

20.

Deep phenotyping: symptom annotation made simple with SAMS.

Steinhaus, Robin; Proft, Sebastian; Seelow, Evelyn; Schalau, Tobias; Robinson, Peter N; Seelow, Dominik.

Nucleic Acids Res ; 50(W1): W677-W681, 2022 07 05.

Artigo em Inglês | MEDLINE | ID: mdl-35524573

RESUMO

Precision medicine needs precise phenotypes. The Human Phenotype Ontology (HPO) uses clinical signs instead of diagnoses and has become the standard annotation for patients' phenotypes when describing single gene disorders. Use of the HPO beyond human genetics is however still limited. With SAMS (Symptom Annotation Made Simple), we want to bring sign-based phenotyping to routine clinical care, to hospital patients as well as to outpatients. Our web-based application provides access to three widely used annotation systems: HPO, OMIM, Orphanet. Whilst data can be stored in our database, phenotypes can also be imported and exported as Global Alliance for Genomics and Health (GA4GH) Phenopackets without using the database. The web interface can easily be integrated into local databases, e.g. clinical information systems. SAMS offers users to share their data with others, empowering patients to record their own signs and symptoms (or those of their children) and thus provide their doctors with additional information. We think that our approach will lead to better characterised patients which is not only helpful for finding disease mutations but also to better understand the pathophysiology of diseases and to recruit patients for studies and clinical trials. SAMS is freely available at https://www.genecascade.org/SAMS/.

Assuntos

Bases de Dados Genéticas , Software , Criança , Humanos , Genômica , Fenótipo , Mutação

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA