Search | VHL Search Portal

1.

The Monarch Initiative in 2024: an analytic platform integrating phenotypes, genes and diseases across species.

Putman, Tim E; Schaper, Kevin; Matentzoglu, Nicolas; Rubinetti, Vincent P; Alquaddoomi, Faisal S; Cox, Corey; Caufield, J Harry; Elsarboukh, Glass; Gehrke, Sarah; Hegde, Harshad; Reese, Justin T; Braun, Ian; Bruskiewich, Richard M; Cappelletti, Luca; Carbon, Seth; Caron, Anita R; Chan, Lauren E; Chute, Christopher G; Cortes, Katherina G; De Souza, Vinícius; Fontana, Tommaso; Harris, Nomi L; Hartley, Emily L; Hurwitz, Eric; Jacobsen, Julius O B; Krishnamurthy, Madan; Laraway, Bryan J; McLaughlin, James A; McMurry, Julie A; Moxon, Sierra A T; Mullen, Kathleen R; O'Neil, Shawn T; Shefchek, Kent A; Stefancsik, Ray; Toro, Sabrina; Vasilevsky, Nicole A; Walls, Ramona L; Whetzel, Patricia L; Osumi-Sutherland, David; Smedley, Damian; Robinson, Peter N; Mungall, Christopher J; Haendel, Melissa A; Munoz-Torres, Monica C.

Nucleic Acids Res ; 52(D1): D938-D949, 2024 Jan 05.

Article in English | MEDLINE | ID: mdl-38000386

ABSTRACT

Bridging the gap between genetic variations, environmental determinants, and phenotypic outcomes is critical for supporting clinical diagnosis and understanding mechanisms of diseases. It requires integrating open data at a global scale. The Monarch Initiative advances these goals by developing open ontologies, semantic data models, and knowledge graphs for translational research. The Monarch App is an integrated platform combining data about genes, phenotypes, and diseases across species. Monarch's APIs enable access to carefully curated datasets and advanced analysis tools that support the understanding and diagnosis of disease for diverse applications such as variant prioritization, deep phenotyping, and patient profile-matching. We have migrated our system into a scalable, cloud-based infrastructure; simplified Monarch's data ingestion and knowledge graph integration systems; enhanced data mapping and integration standards; and developed a new user interface with novel search and graph navigation features. Furthermore, we advanced Monarch's analytic tools by developing a customized plugin for OpenAI's ChatGPT to increase the reliability of its responses about phenotypic data, allowing us to interrogate the knowledge in the Monarch graph using state-of-the-art Large Language Models. The resources of the Monarch Initiative can be found at monarchinitiative.org and its corresponding code repository at github.com/monarch-initiative/monarch-app.

Subject(s)

Databases, Factual , Disease , Genes , Phenotype , Humans , Internet , Databases, Factual/standards , Software , Genes/genetics , Disease/genetics

2.

Interpretable prioritization of splice variants in diagnostic next-generation sequencing.

Danis, Daniel; Jacobsen, Julius O B; Carmody, Leigh C; Gargano, Michael A; McMurry, Julie A; Hegde, Ayushi; Haendel, Melissa A; Valentini, Giorgio; Smedley, Damian; Robinson, Peter N.

Am J Hum Genet ; 108(9): 1564-1577, 2021 09 02.

Article in English | MEDLINE | ID: mdl-34289339

ABSTRACT

A critical challenge in genetic diagnostics is the computational assessment of candidate splice variants, specifically the interpretation of nucleotide changes located outside of the highly conserved dinucleotide sequences at the 5' and 3' ends of introns. To address this gap, we developed the Super Quick Information-content Random-forest Learning of Splice variants (SQUIRLS) algorithm. SQUIRLS generates a small set of interpretable features for machine learning by calculating the information-content of wild-type and variant sequences of canonical and cryptic splice sites, assessing changes in candidate splicing regulatory sequences, and incorporating characteristics of the sequence such as exon length, disruptions of the AG exclusion zone, and conservation. We curated a comprehensive collection of disease-associated splice-altering variants at positions outside of the highly conserved AG/GT dinucleotides at the termini of introns. SQUIRLS trains two random-forest classifiers for the donor and for the acceptor and combines their outputs by logistic regression to yield a final score. We show that SQUIRLS transcends previous state-of-the-art accuracy in classifying splice variants as assessed by rank analysis in simulated exomes, and is significantly faster than competing methods. SQUIRLS provides tabular output files for incorporation into diagnostic pipelines for exome and genome analysis, as well as visualizations that contextualize predicted effects of variants on splicing to make it easier to interpret splice variants in diagnostic settings.

Subject(s)

Algorithms , Data Curation/methods , Genetic Diseases, Inborn/genetics , RNA Splice Sites , RNA Splicing , Software , Base Sequence , Computational Biology/methods , Exome , Exons , Genetic Diseases, Inborn/diagnosis , Genetic Diseases, Inborn/pathology , High-Throughput Nucleotide Sequencing , Humans , Introns , Mutation , Exome Sequencing

3.

Interpretable Clinical Genomics with a Likelihood Ratio Paradigm.

Robinson, Peter N; Ravanmehr, Vida; Jacobsen, Julius O B; Danis, Daniel; Zhang, Xingmin Aaron; Carmody, Leigh C; Gargano, Michael A; Thaxton, Courtney L; Karlebach, Guy; Reese, Justin; Holtgrewe, Manuel; Köhler, Sebastian; McMurry, Julie A; Haendel, Melissa A; Smedley, Damian.

Am J Hum Genet ; 107(3): 403-417, 2020 09 03.

Article in English | MEDLINE | ID: mdl-32755546

ABSTRACT

Human Phenotype Ontology (HPO)-based analysis has become standard for genomic diagnostics of rare diseases. Current algorithms use a variety of semantic and statistical approaches to prioritize the typically long lists of genes with candidate pathogenic variants. These algorithms do not provide robust estimates of the strength of the predictions beyond the placement in a ranked list, nor do they provide measures of how much any individual phenotypic observation has contributed to the prioritization result. However, given that the overall success rate of genomic diagnostics is only around 25%-50% or less in many cohorts, a good ranking cannot be taken to imply that the gene or disease at rank one is necessarily a good candidate. Here, we present an approach to genomic diagnostics that exploits the likelihood ratio (LR) framework to provide an estimate of (1) the posttest probability of candidate diagnoses, (2) the LR for each observed HPO phenotype, and (3) the predicted pathogenicity of observed genotypes. LIkelihood Ratio Interpretation of Clinical AbnormaLities (LIRICAL) placed the correct diagnosis within the first three ranks in 92.9% of 384 case reports comprising 262 Mendelian diseases, and the correct diagnosis had a mean posttest probability of 67.3%. Simulations show that LIRICAL is robust to many typically encountered forms of genomic and phenomic noise. In summary, LIRICAL provides accurate, clinically interpretable results for phenotype-driven genomic diagnostics.

Subject(s)

Computational Biology , Databases, Genetic , Genomics , Rare Diseases/diagnosis , Algorithms , Exome/genetics , Humans , Phenotype , Rare Diseases/genetics , Software

4.

Coding long COVID: characterizing a new disease through an ICD-10 lens.

Pfaff, Emily R; Madlock-Brown, Charisse; Baratta, John M; Bhatia, Abhishek; Davis, Hannah; Girvin, Andrew; Hill, Elaine; Kelly, Elizabeth; Kostka, Kristin; Loomba, Johanna; McMurry, Julie A; Wong, Rachel; Bennett, Tellen D; Moffitt, Richard; Chute, Christopher G; Haendel, Melissa.

BMC Med ; 21(1): 58, 2023 02 16.

Article in English | MEDLINE | ID: mdl-36793086

ABSTRACT

BACKGROUND: Naming a newly discovered disease is a difficult process; in the context of the COVID-19 pandemic and the existence of post-acute sequelae of SARS-CoV-2 infection (PASC), which includes long COVID, it has proven especially challenging. Disease definitions and assignment of a diagnosis code are often asynchronous and iterative. The clinical definition and our understanding of the underlying mechanisms of long COVID are still in flux, and the deployment of an ICD-10-CM code for long COVID in the USA took nearly 2 years after patients had begun to describe their condition. Here, we leverage the largest publicly available HIPAA-limited dataset about patients with COVID-19 in the US to examine the heterogeneity of adoption and use of U09.9, the ICD-10-CM code for "Post COVID-19 condition, unspecified." METHODS: We undertook a number of analyses to characterize the N3C population with a U09.9 diagnosis code (n = 33,782), including assessing person-level demographics and a number of area-level social determinants of health; diagnoses commonly co-occurring with U09.9, clustered using the Louvain algorithm; and quantifying medications and procedures recorded within 60 days of U09.9 diagnosis. We stratified all analyses by age group in order to discern differing patterns of care across the lifespan. RESULTS: We established the diagnoses most commonly co-occurring with U09.9 and algorithmically clustered them into four major categories: cardiopulmonary, neurological, gastrointestinal, and comorbid conditions. Importantly, we discovered that the population of patients diagnosed with U09.9 is demographically skewed toward female, White, non-Hispanic individuals, as well as individuals living in areas with low poverty and low unemployment. Our results also include a characterization of common procedures and medications associated with U09.9-coded patients. CONCLUSIONS: This work offers insight into potential subtypes and current practice patterns around long COVID and speaks to the existence of disparities in the diagnosis of patients with long COVID. This latter finding in particular requires further research and urgent remediation.

Subject(s)

COVID-19 , Post-Acute COVID-19 Syndrome , Humans , Female , International Classification of Diseases , Pandemics , COVID-19/diagnosis , COVID-19/epidemiology , SARS-CoV-2

5.

The Ontology of Biological Attributes (OBA)-computational traits for the life sciences.

Stefancsik, Ray; Balhoff, James P; Balk, Meghan A; Ball, Robyn L; Bello, Susan M; Caron, Anita R; Chesler, Elissa J; de Souza, Vinicius; Gehrke, Sarah; Haendel, Melissa; Harris, Laura W; Harris, Nomi L; Ibrahim, Arwa; Koehler, Sebastian; Matentzoglu, Nicolas; McMurry, Julie A; Mungall, Christopher J; Munoz-Torres, Monica C; Putman, Tim; Robinson, Peter; Smedley, Damian; Sollis, Elliot; Thessen, Anne E; Vasilevsky, Nicole; Walton, David O; Osumi-Sutherland, David.

Mamm Genome ; 34(3): 364-378, 2023 09.

Article in English | MEDLINE | ID: mdl-37076585

ABSTRACT

Existing phenotype ontologies were originally developed to represent phenotypes that manifest as a character state in relation to a wild-type or other reference. However, these do not include the phenotypic trait or attribute categories required for the annotation of genome-wide association studies (GWAS), Quantitative Trait Loci (QTL) mappings or any population-focussed measurable trait data. The integration of trait and biological attribute information with an ever increasing body of chemical, environmental and biological data greatly facilitates computational analyses and it is also highly relevant to biomedical and clinical applications. The Ontology of Biological Attributes (OBA) is a formalised, species-independent collection of interoperable phenotypic trait categories that is intended to fulfil a data integration role. OBA is a standardised representational framework for observable attributes that are characteristics of biological entities, organisms, or parts of organisms. OBA has a modular design which provides several benefits for users and data integrators, including an automated and meaningful classification of trait terms computed on the basis of logical inferences drawn from domain-specific ontologies for cells, anatomical and other relevant entities. The logical axioms in OBA also provide a previously missing bridge that can computationally link Mendelian phenotypes with GWAS and quantitative traits. The term components in OBA provide semantic links and enable knowledge and data integration across specialised research community boundaries, thereby breaking silos.

Subject(s)

Biological Ontologies , Biological Science Disciplines , Genome-Wide Association Study , Phenotype

6.

The Human Phenotype Ontology in 2021.

Köhler, Sebastian; Gargano, Michael; Matentzoglu, Nicolas; Carmody, Leigh C; Lewis-Smith, David; Vasilevsky, Nicole A; Danis, Daniel; Balagura, Ganna; Baynam, Gareth; Brower, Amy M; Callahan, Tiffany J; Chute, Christopher G; Est, Johanna L; Galer, Peter D; Ganesan, Shiva; Griese, Matthias; Haimel, Matthias; Pazmandi, Julia; Hanauer, Marc; Harris, Nomi L; Hartnett, Michael J; Hastreiter, Maximilian; Hauck, Fabian; He, Yongqun; Jeske, Tim; Kearney, Hugh; Kindle, Gerhard; Klein, Christoph; Knoflach, Katrin; Krause, Roland; Lagorce, David; McMurry, Julie A; Miller, Jillian A; Munoz-Torres, Monica C; Peters, Rebecca L; Rapp, Christina K; Rath, Ana M; Rind, Shahmir A; Rosenberg, Avi Z; Segal, Michael M; Seidel, Markus G; Smedley, Damian; Talmy, Tomer; Thomas, Yarlalu; Wiafe, Samuel A; Xian, Julie; Yüksel, Zafer; Helbig, Ingo; Mungall, Christopher J; Haendel, Melissa A.

Nucleic Acids Res ; 49(D1): D1207-D1217, 2021 01 08.

Article in English | MEDLINE | ID: mdl-33264411

ABSTRACT

The Human Phenotype Ontology (HPO, https://hpo.jax.org) was launched in 2008 to provide a comprehensive logical standard to describe and computationally analyze phenotypic abnormalities found in human disease. The HPO is now a worldwide standard for phenotype exchange. The HPO has grown steadily since its inception due to considerable contributions from clinical experts and researchers from a diverse range of disciplines. Here, we present recent major extensions of the HPO for neurology, nephrology, immunology, pulmonology, newborn screening, and other areas. For example, the seizure subontology now reflects the International League Against Epilepsy (ILAE) guidelines and these enhancements have already shown clinical validity. We present new efforts to harmonize computational definitions of phenotypic abnormalities across the HPO and multiple phenotype ontologies used for animal models of disease. These efforts will benefit software such as Exomiser by improving the accuracy and scope of cross-species phenotype matching. The computational modeling strategy used by the HPO to define disease entities and phenotypic features and distinguish between them is explained in detail.We also report on recent efforts to translate the HPO into indigenous languages. Finally, we summarize recent advances in the use of HPO in electronic health record systems.

Subject(s)

Biological Ontologies , Computational Biology/methods , Databases, Factual , Disease/genetics , Genome , Phenotype , Software , Animals , Disease Models, Animal , Genotype , Humans , Infant, Newborn , International Cooperation , Internet , Neonatal Screening/methods , Pharmacogenetics/methods , Terminology as Topic

7.

Risk factors associated with post-acute sequelae of SARS-CoV-2: an N3C and NIH RECOVER study.

Hill, Elaine L; Mehta, Hemalkumar B; Sharma, Suchetha; Mane, Klint; Singh, Sharad Kumar; Xie, Catherine; Cathey, Emily; Loomba, Johanna; Russell, Seth; Spratt, Heidi; DeWitt, Peter E; Ammar, Nariman; Madlock-Brown, Charisse; Brown, Donald; McMurry, Julie A; Chute, Christopher G; Haendel, Melissa A; Moffitt, Richard; Pfaff, Emily R; Bennett, Tellen D.

BMC Public Health ; 23(1): 2103, 2023 10 25.

Article in English | MEDLINE | ID: mdl-37880596

ABSTRACT

BACKGROUND: More than one-third of individuals experience post-acute sequelae of SARS-CoV-2 infection (PASC, which includes long-COVID). The objective is to identify risk factors associated with PASC/long-COVID diagnosis. METHODS: This was a retrospective case-control study including 31 health systems in the United States from the National COVID Cohort Collaborative (N3C). 8,325 individuals with PASC (defined by the presence of the International Classification of Diseases, version 10 code U09.9 or a long-COVID clinic visit) matched to 41,625 controls within the same health system and COVID index date within ± 45 days of the corresponding case's earliest COVID index date. Measurements of risk factors included demographics, comorbidities, treatment and acute characteristics related to COVID-19. Multivariable logistic regression, random forest, and XGBoost were used to determine the associations between risk factors and PASC. RESULTS: Among 8,325 individuals with PASC, the majority were > 50 years of age (56.6%), female (62.8%), and non-Hispanic White (68.6%). In logistic regression, middle-age categories (40 to 69 years; OR ranging from 2.32 to 2.58), female sex (OR 1.4, 95% CI 1.33-1.48), hospitalization associated with COVID-19 (OR 3.8, 95% CI 3.05-4.73), long (8-30 days, OR 1.69, 95% CI 1.31-2.17) or extended hospital stay (30 + days, OR 3.38, 95% CI 2.45-4.67), receipt of mechanical ventilation (OR 1.44, 95% CI 1.18-1.74), and several comorbidities including depression (OR 1.50, 95% CI 1.40-1.60), chronic lung disease (OR 1.63, 95% CI 1.53-1.74), and obesity (OR 1.23, 95% CI 1.16-1.3) were associated with increased likelihood of PASC diagnosis or care at a long-COVID clinic. Characteristics associated with a lower likelihood of PASC diagnosis or care at a long-COVID clinic included younger age (18 to 29 years), male sex, non-Hispanic Black race, and comorbidities such as substance abuse, cardiomyopathy, psychosis, and dementia. More doctors per capita in the county of residence was associated with an increased likelihood of PASC diagnosis or care at a long-COVID clinic. Our findings were consistent in sensitivity analyses using a variety of analytic techniques and approaches to select controls. CONCLUSIONS: This national study identified important risk factors for PASC diagnosis such as middle age, severe COVID-19 disease, and specific comorbidities. Further clinical and epidemiological research is needed to better understand underlying mechanisms and the potential role of vaccines and therapeutics in altering PASC course.

Subject(s)

COVID-19 , SARS-CoV-2 , Middle Aged , Female , Male , Humans , Adult , Aged , Adolescent , Young Adult , COVID-19/epidemiology , Post-Acute COVID-19 Syndrome , Case-Control Studies , Retrospective Studies , Risk Factors , Disease Progression

8.

NSAID use and clinical outcomes in COVID-19 patients: a 38-center retrospective cohort study.

Reese, Justin T; Coleman, Ben; Chan, Lauren; Blau, Hannah; Callahan, Tiffany J; Cappelletti, Luca; Fontana, Tommaso; Bradwell, Katie R; Harris, Nomi L; Casiraghi, Elena; Valentini, Giorgio; Karlebach, Guy; Deer, Rachel; McMurry, Julie A; Haendel, Melissa A; Chute, Christopher G; Pfaff, Emily; Moffitt, Richard; Spratt, Heidi; Singh, Jasvinder A; Mungall, Christopher J; Williams, Andrew E; Robinson, Peter N.

Virol J ; 19(1): 84, 2022 05 15.

Article in English | MEDLINE | ID: mdl-35570298

ABSTRACT

BACKGROUND: Non-steroidal anti-inflammatory drugs (NSAIDs) are commonly used to reduce pain, fever, and inflammation but have been associated with complications in community-acquired pneumonia. Observations shortly after the start of the COVID-19 pandemic in 2020 suggested that ibuprofen was associated with an increased risk of adverse events in COVID-19 patients, but subsequent observational studies failed to demonstrate increased risk and in one case showed reduced risk associated with NSAID use. METHODS: A 38-center retrospective cohort study was performed that leveraged the harmonized, high-granularity electronic health record data of the National COVID Cohort Collaborative. A propensity-matched cohort of 19,746 COVID-19 inpatients was constructed by matching cases (treated with NSAIDs at the time of admission) and 19,746 controls (not treated) from 857,061 patients with COVID-19 available for analysis. The primary outcome of interest was COVID-19 severity in hospitalized patients, which was classified as: moderate, severe, or mortality/hospice. Secondary outcomes were acute kidney injury (AKI), extracorporeal membrane oxygenation (ECMO), invasive ventilation, and all-cause mortality at any time following COVID-19 diagnosis. RESULTS: Logistic regression showed that NSAID use was not associated with increased COVID-19 severity (OR: 0.57 95% CI: 0.53-0.61). Analysis of secondary outcomes using logistic regression showed that NSAID use was not associated with increased risk of all-cause mortality (OR 0.51 95% CI: 0.47-0.56), invasive ventilation (OR: 0.59 95% CI: 0.55-0.64), AKI (OR: 0.67 95% CI: 0.63-0.72), or ECMO (OR: 0.51 95% CI: 0.36-0.7). In contrast, the odds ratios indicate reduced risk of these outcomes, but our quantitative bias analysis showed E-values of between 1.9 and 3.3 for these associations, indicating that comparatively weak or moderate confounder associations could explain away the observed associations. CONCLUSIONS: Study interpretation is limited by the observational design. Recording of NSAID use may have been incomplete. Our study demonstrates that NSAID use is not associated with increased COVID-19 severity, all-cause mortality, invasive ventilation, AKI, or ECMO in COVID-19 inpatients. A conservative interpretation in light of the quantitative bias analysis is that there is no evidence that NSAID use is associated with risk of increased severity or the other measured outcomes. Our results confirm and extend analogous findings in previous observational studies using a large cohort of patients drawn from 38 centers in a nationally representative multicenter database.

Subject(s)

Acute Kidney Injury , COVID-19 , Anti-Inflammatory Agents, Non-Steroidal/adverse effects , COVID-19 Testing , Cohort Studies , Humans , Pandemics , Retrospective Studies

9.

Expansion of the Human Phenotype Ontology (HPO) knowledge base and resources.

Köhler, Sebastian; Carmody, Leigh; Vasilevsky, Nicole; Jacobsen, Julius O B; Danis, Daniel; Gourdine, Jean-Philippe; Gargano, Michael; Harris, Nomi L; Matentzoglu, Nicolas; McMurry, Julie A; Osumi-Sutherland, David; Cipriani, Valentina; Balhoff, James P; Conlin, Tom; Blau, Hannah; Baynam, Gareth; Palmer, Richard; Gratian, Dylan; Dawkins, Hugh; Segal, Michael; Jansen, Anna C; Muaz, Ahmed; Chang, Willie H; Bergerson, Jenna; Laulederkind, Stanley J F; Yüksel, Zafer; Beltran, Sergi; Freeman, Alexandra F; Sergouniotis, Panagiotis I; Durkin, Daniel; Storm, Andrea L; Hanauer, Marc; Brudno, Michael; Bello, Susan M; Sincan, Murat; Rageth, Kayli; Wheeler, Matthew T; Oegema, Renske; Lourghi, Halima; Della Rocca, Maria G; Thompson, Rachel; Castellanos, Francisco; Priest, James; Cunningham-Rundles, Charlotte; Hegde, Ayushi; Lovering, Ruth C; Hajek, Catherine; Olry, Annie; Notarangelo, Luigi; Similuk, Morgan.

Nucleic Acids Res ; 47(D1): D1018-D1027, 2019 01 08.

Article in English | MEDLINE | ID: mdl-30476213

ABSTRACT

The Human Phenotype Ontology (HPO)-a standardized vocabulary of phenotypic abnormalities associated with 7000+ diseases-is used by thousands of researchers, clinicians, informaticians and electronic health record systems around the world. Its detailed descriptions of clinical abnormalities and computable disease definitions have made HPO the de facto standard for deep phenotyping in the field of rare disease. The HPO's interoperability with other ontologies has enabled it to be used to improve diagnostic accuracy by incorporating model organism data. It also plays a key role in the popular Exomiser tool, which identifies potential disease-causing variants from whole-exome or whole-genome sequencing data. Since the HPO was first introduced in 2008, its users have become both more numerous and more diverse. To meet these emerging needs, the project has added new content, language translations, mappings and computational tooling, as well as integrations with external community data. The HPO continues to collaborate with clinical adopters to improve specific areas of the ontology and extend standardized disease descriptions. The newly redesigned HPO website (www.human-phenotype-ontology.org) simplifies browsing terms and exploring clinical features, diseases, and human genes.

Subject(s)

Biological Ontologies , Computational Biology/methods , Congenital Abnormalities/genetics , Genetic Predisposition to Disease/genetics , Knowledge Bases , Rare Diseases/genetics , Congenital Abnormalities/diagnosis , Databases, Genetic , Genetic Variation , Humans , Internet , Phenotype , Rare Diseases/diagnosis , Whole Genome Sequencing/methods

10.

Identifiers for the 21st century: How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data.

McMurry, Julie A; Juty, Nick; Blomberg, Niklas; Burdett, Tony; Conlin, Tom; Conte, Nathalie; Courtot, Mélanie; Deck, John; Dumontier, Michel; Fellows, Donal K; Gonzalez-Beltran, Alejandra; Gormanns, Philipp; Grethe, Jeffrey; Hastings, Janna; Hériché, Jean-Karim; Hermjakob, Henning; Ison, Jon C; Jimenez, Rafael C; Jupp, Simon; Kunze, John; Laibe, Camille; Le Novère, Nicolas; Malone, James; Martin, Maria Jesus; McEntyre, Johanna R; Morris, Chris; Muilu, Juha; Müller, Wolfgang; Rocca-Serra, Philippe; Sansone, Susanna-Assunta; Sariyar, Murat; Snoep, Jacky L; Soiland-Reyes, Stian; Stanford, Natalie J; Swainston, Neil; Washington, Nicole; Williams, Alan R; Wimalaratne, Sarala M; Winfree, Lilly M; Wolstencroft, Katherine; Goble, Carole; Mungall, Christopher J; Haendel, Melissa A; Parkinson, Helen.

PLoS Biol ; 15(6): e2001414, 2017 Jun.

Article in English | MEDLINE | ID: mdl-28662064

ABSTRACT

In many disciplines, data are highly decentralized across thousands of online databases (repositories, registries, and knowledgebases). Wringing value from such databases depends on the discipline of data science and on the humble bricks and mortar that make integration possible; identifiers are a core component of this integration infrastructure. Drawing on our experience and on work by other groups, we outline 10 lessons we have learned about the identifier qualities and best practices that facilitate large-scale data integration. Specifically, we propose actions that identifier practitioners (database providers) should take in the design, provision and reuse of identifiers. We also outline the important considerations for those referencing identifiers in various circumstances, including by authors and data generators. While the importance and relevance of each lesson will vary by context, there is a need for increased awareness about how to avoid and manage common identifier problems, especially those related to persistence and web-accessibility/resolvability. We focus strongly on web-based identifiers in the life sciences; however, the principles are broadly relevant to other disciplines.

Subject(s)

Biological Science Disciplines/methods , Computational Biology/methods , Data Mining/methods , Software Design , Software , Biological Science Disciplines/statistics & numerical data , Biological Science Disciplines/trends , Computational Biology/trends , Data Mining/statistics & numerical data , Data Mining/trends , Databases, Factual/statistics & numerical data , Databases, Factual/trends , Forecasting , Humans , Internet

11.

Interpretable prioritization of splice variants in diagnostic next-generation sequencing.

Danis, Daniel; Jacobsen, Julius O B; Carmody, Leigh C; Gargano, Michael A; McMurry, Julie A; Hegde, Ayushi; Haendel, Melissa A; Valentini, Giorgio; Smedley, Damian; Robinson, Peter N.

Am J Hum Genet ; 108(11): 2205, 2021 Nov 04.

Article in English | MEDLINE | ID: mdl-34739835

12.

A Whole-Genome Analysis Framework for Effective Identification of Pathogenic Regulatory Variants in Mendelian Disease.

Smedley, Damian; Schubach, Max; Jacobsen, Julius O B; Köhler, Sebastian; Zemojtel, Tomasz; Spielmann, Malte; Jäger, Marten; Hochheiser, Harry; Washington, Nicole L; McMurry, Julie A; Haendel, Melissa A; Mungall, Christopher J; Lewis, Suzanna E; Groza, Tudor; Valentini, Giorgio; Robinson, Peter N.

Am J Hum Genet ; 99(3): 595-606, 2016 09 01.

Article in English | MEDLINE | ID: mdl-27569544

ABSTRACT

The interpretation of non-coding variants still constitutes a major challenge in the application of whole-genome sequencing in Mendelian disease, especially for single-nucleotide and other small non-coding variants. Here we present Genomiser, an analysis framework that is able not only to score the relevance of variation in the non-coding genome, but also to associate regulatory variants to specific Mendelian diseases. Genomiser scores variants through either existing methods such as CADD or a bespoke machine learning method and combines these with allele frequency, regulatory sequences, chromosomal topological domains, and phenotypic relevance to discover variants associated to specific Mendelian disorders. Overall, Genomiser is able to identify causal regulatory variants as the top candidate in 77% of simulated whole genomes, allowing effective detection and discovery of regulatory variants in Mendelian disease.

Subject(s)

Algorithms , Genetic Diseases, Inborn/genetics , Genome, Human/genetics , Mutation/genetics , Gene Frequency , Genome-Wide Association Study , Humans , Machine Learning , Open Reading Frames/genetics , Phenotype , Point Mutation/genetics

13.

The Monarch Initiative: an integrative data and analytic platform connecting phenotypes to genotypes across species.

Mungall, Christopher J; McMurry, Julie A; Köhler, Sebastian; Balhoff, James P; Borromeo, Charles; Brush, Matthew; Carbon, Seth; Conlin, Tom; Dunn, Nathan; Engelstad, Mark; Foster, Erin; Gourdine, J P; Jacobsen, Julius O B; Keith, Dan; Laraway, Bryan; Lewis, Suzanna E; NguyenXuan, Jeremy; Shefchek, Kent; Vasilevsky, Nicole; Yuan, Zhou; Washington, Nicole; Hochheiser, Harry; Groza, Tudor; Smedley, Damian; Robinson, Peter N; Haendel, Melissa A.

Nucleic Acids Res ; 45(D1): D712-D722, 2017 01 04.

Article in English | MEDLINE | ID: mdl-27899636

ABSTRACT

The correlation of phenotypic outcomes with genetic variation and environmental factors is a core pursuit in biology and biomedicine. Numerous challenges impede our progress: patient phenotypes may not match known diseases, candidate variants may be in genes that have not been characterized, model organisms may not recapitulate human or veterinary diseases, filling evolutionary gaps is difficult, and many resources must be queried to find potentially significant genotype-phenotype associations. Non-human organisms have proven instrumental in revealing biological mechanisms. Advanced informatics tools can identify phenotypically relevant disease models in research and diagnostic contexts. Large-scale integration of model organism and clinical research data can provide a breadth of knowledge not available from individual sources and can provide contextualization of data back to these sources. The Monarch Initiative (monarchinitiative.org) is a collaborative, open science effort that aims to semantically integrate genotype-phenotype data from many species and sources in order to support precision medicine, disease modeling, and mechanistic exploration. Our integrated knowledge graph, analytic tools, and web services enable diverse users to explore relationships between phenotypes and genotypes across species.

Subject(s)

Databases, Genetic , Genetic Association Studies/methods , Genotype , Phenotype , Animals , Biological Evolution , Computational Biology/methods , Data Curation , Humans , Search Engine , Software , Species Specificity , User-Computer Interface , Web Browser

14.

UVB Radiation Alone May Not Explain Sunlight Inactivation of SARS-CoV-2.

Luzzatto-Fegiz, Paolo; Temprano-Coleto, Fernando; Peaudecerf, François J; Landel, Julien R; Zhu, Yangying; McMurry, Julie A.

J Infect Dis ; 223(8): 1500-1502, 2021 04 23.

Article in English | MEDLINE | ID: mdl-33544845

Subject(s)

COVID-19 , Sunlight , Humans , SARS-CoV-2 , Ultraviolet Rays

15.

Sharing Clinical and Genomic Data on Cancer - The Need for Global Solutions.

Lawler, Mark; Haussler, David; Siu, Lillian L; Haendel, Melissa A; McMurry, Julie A; Knoppers, Bartha M; Chanock, Stephen J; Calvo, Fabien; The, Bin T; Walia, Guneet; Banks, Ian; Yu, Peter P; Staudt, Louis M; Sawyers, Charles L.

N Engl J Med ; 376(21): 2006-2009, 2017 05 25.

Article in English | MEDLINE | ID: mdl-28538124

Subject(s)

Genomics , Information Dissemination , International Cooperation , Neoplasms/genetics , Humans , Information Dissemination/ethics , Information Dissemination/legislation & jurisprudence

16.

Finding Long-COVID: Temporal Topic Modeling of Electronic Health Records from the N3C and RECOVER Programs.

O'Neil, Shawn T; Madlock-Brown, Charisse; Wilkins, Kenneth J; McGrath, Brenda M; Davis, Hannah E; Assaf, Gina S; Wei, Hannah; Zareie, Parya; French, Evan T; Loomba, Johanna; McMurry, Julie A; Zhou, Andrea; Chute, Christopher G; Moffitt, Richard A; Pfaff, Emily R; Yoo, Yun Jae; Leese, Peter; Chew, Robert F; Lieberman, Michael; Haendel, Melissa A.

medRxiv ; 2024 Jun 11.

Article in English | MEDLINE | ID: mdl-38947087

ABSTRACT

Post-Acute Sequelae of SARS-CoV-2 infection (PASC), also known as Long-COVID, encompasses a variety of complex and varied outcomes following COVID-19 infection that are still poorly understood. We clustered over 600 million condition diagnoses from 14 million patients available through the National COVID Cohort Collaborative (N3C), generating hundreds of highly detailed clinical phenotypes. Assessing patient clinical trajectories using these clusters allowed us to identify individual conditions and phenotypes strongly increased after acute infection. We found many conditions increased in COVID-19 patients compared to controls, and using a novel method to associate patients with clusters over time, we additionally found phenotypes specific to patient sex, age, wave of infection, and PASC diagnosis status. While many of these results reflect known PASC symptoms, the resolution provided by this unprecedented data scale suggests avenues for improved diagnostics and mechanistic understanding of this multifaceted disease.

17.

The Environmental Conditions, Treatments, and Exposures Ontology (ECTO): connecting toxicology and exposure to human health and beyond.

Chan, Lauren E; Thessen, Anne E; Duncan, William D; Matentzoglu, Nicolas; Schmitt, Charles; Grondin, Cynthia J; Vasilevsky, Nicole; McMurry, Julie A; Robinson, Peter N; Mungall, Christopher J; Haendel, Melissa A.

J Biomed Semantics ; 14(1): 3, 2023 02 24.

Article in English | MEDLINE | ID: mdl-36823605

ABSTRACT

BACKGROUND: Evaluating the impact of environmental exposures on organism health is a key goal of modern biomedicine and is critically important in an age of greater pollution and chemicals in our environment. Environmental health utilizes many different research methods and generates a variety of data types. However, to date, no comprehensive database represents the full spectrum of environmental health data. Due to a lack of interoperability between databases, tools for integrating these resources are needed. In this manuscript we present the Environmental Conditions, Treatments, and Exposures Ontology (ECTO), a species-agnostic ontology focused on exposure events that occur as a result of natural and experimental processes, such as diet, work, or research activities. ECTO is intended for use in harmonizing environmental health data resources to support cross-study integration and inference for mechanism discovery. METHODS AND FINDINGS: ECTO is an ontology designed for describing organismal exposures such as toxicological research, environmental variables, dietary features, and patient-reported data from surveys. ECTO utilizes the base model established within the Exposure Ontology (ExO). ECTO is developed using a combination of manual curation and Dead Simple OWL Design Patterns (DOSDP), and contains over 2700 environmental exposure terms, and incorporates chemical and environmental ontologies. ECTO is an Open Biological and Biomedical Ontology (OBO) Foundry ontology that is designed for interoperability, reuse, and axiomatization with other ontologies. ECTO terms have been utilized in axioms within the Mondo Disease Ontology to represent diseases caused or influenced by environmental factors, as well as for survey encoding for the Personalized Environment and Genes Study (PEGS). CONCLUSIONS: We constructed ECTO to meet Open Biological and Biomedical Ontology (OBO) Foundry principles to increase translation opportunities between environmental health and other areas of biology. ECTO has a growing community of contributors consisting of toxicologists, public health epidemiologists, and health care providers to provide the necessary expertise for areas that have been identified previously as gaps.

Subject(s)

Biological Ontologies , Humans , Databases, Factual

18.

Long COVID risk and pre-COVID vaccination in an EHR-based cohort study from the RECOVER program.

Brannock, M Daniel; Chew, Robert F; Preiss, Alexander J; Hadley, Emily C; Redfield, Signe; McMurry, Julie A; Leese, Peter J; Girvin, Andrew T; Crosskey, Miles; Zhou, Andrea G; Moffitt, Richard A; Funk, Michele Jonsson; Pfaff, Emily R; Haendel, Melissa A; Chute, Christopher G.

Nat Commun ; 14(1): 2914, 2023 05 22.

Article in English | MEDLINE | ID: mdl-37217471

ABSTRACT

Long COVID, or complications arising from COVID-19 weeks after infection, has become a central concern for public health experts. The United States National Institutes of Health founded the RECOVER initiative to better understand long COVID. We used electronic health records available through the National COVID Cohort Collaborative to characterize the association between SARS-CoV-2 vaccination and long COVID diagnosis. Among patients with a COVID-19 infection between August 1, 2021 and January 31, 2022, we defined two cohorts using distinct definitions of long COVID-a clinical diagnosis (n = 47,404) or a previously described computational phenotype (n = 198,514)-to compare unvaccinated individuals to those with a complete vaccine series prior to infection. Evidence of long COVID was monitored through June or July of 2022, depending on patients' data availability. We found that vaccination was consistently associated with lower odds and rates of long COVID clinical diagnosis and high-confidence computationally derived diagnosis after adjusting for sex, demographics, and medical history.

Subject(s)

COVID-19 , Post-Acute COVID-19 Syndrome , United States/epidemiology , Humans , COVID-19/epidemiology , COVID-19/prevention & control , COVID-19 Vaccines , Cohort Studies , SARS-CoV-2 , Vaccination

19.

The Ontology of Biological Attributes (OBA) - Computational Traits for the Life Sciences.

Stefancsik, Ray; Balhoff, James P; Balk, Meghan A; Ball, Robyn; Bello, Susan M; Caron, Anita R; Chessler, Elissa; de Souza, Vinicius; Gehrke, Sarah; Haendel, Melissa; Harris, Laura W; Harris, Nomi L; Ibrahim, Arwa; Koehler, Sebastian; Matentzoglu, Nicolas; McMurry, Julie A; Mungall, Christopher J; Munoz-Torres, Monica C; Putman, Tim; Robinson, Peter; Smedley, Damian; Sollis, Elliot; Thessen, Anne E; Vasilevsky, Nicole; Walton, David O; Osumi-Sutherland, David.

bioRxiv ; 2023 Jan 27.

Article in English | MEDLINE | ID: mdl-36747660

ABSTRACT

Existing phenotype ontologies were originally developed to represent phenotypes that manifest as a character state in relation to a wild-type or other reference. However, these do not include the phenotypic trait or attribute categories required for the annotation of genome-wide association studies (GWAS), Quantitative Trait Loci (QTL) mappings or any population-focused measurable trait data. Moreover, variations in gene expression in response to environmental disturbances even without any genetic alterations can also be associated with particular biological attributes. The integration of trait and biological attribute information with an ever increasing body of chemical, environmental and biological data greatly facilitates computational analyses and it is also highly relevant to biomedical and clinical applications. The Ontology of Biological Attributes (OBA) is a formalised, species-independent collection of interoperable phenotypic trait categories that is intended to fulfil a data integration role. OBA is a standardised representational framework for observable attributes that are characteristics of biological entities, organisms, or parts of organisms. OBA has a modular design which provides several benefits for users and data integrators, including an automated and meaningful classification of trait terms computed on the basis of logical inferences drawn from domain-specific ontologies for cells, anatomical and other relevant entities. The logical axioms in OBA also provide a previously missing bridge that can computationally link Mendelian phenotypes with GWAS and quantitative traits. The term components in OBA provide semantic links and enable knowledge and data integration across specialised research community boundaries, thereby breaking silos.

20.

Who is pregnant? Defining real-world data-based pregnancy episodes in the National COVID Cohort Collaborative (N3C).

Jones, Sara E; Bradwell, Katie R; Chan, Lauren E; McMurry, Julie A; Olson-Chen, Courtney; Tarleton, Jessica; Wilkins, Kenneth J; Ly, Victoria; Ljazouli, Saad; Qin, Qiuyuan; Faherty, Emily Groene; Lau, Yan Kwan; Xie, Catherine; Kao, Yu-Han; Liebman, Michael N; Mariona, Federico; Challa, Anup P; Li, Li; Ratcliffe, Sarah J; Haendel, Melissa A; Patel, Rena C; Hill, Elaine L.

JAMIA Open ; 6(3): ooad067, 2023 Oct.

Article in English | MEDLINE | ID: mdl-37600074

ABSTRACT

Objectives: To define pregnancy episodes and estimate gestational age within electronic health record (EHR) data from the National COVID Cohort Collaborative (N3C). Materials and Methods: We developed a comprehensive approach, named Hierarchy and rule-based pregnancy episode Inference integrated with Pregnancy Progression Signatures (HIPPS), and applied it to EHR data in the N3C (January 1, 2018-April 7, 2022). HIPPS combines: (1) an extension of a previously published pregnancy episode algorithm, (2) a novel algorithm to detect gestational age-specific signatures of a progressing pregnancy for further episode support, and (3) pregnancy start date inference. Clinicians performed validation of HIPPS on a subset of episodes. We then generated pregnancy cohorts based on gestational age precision and pregnancy outcomes for assessment of accuracy and comparison of COVID-19 and other characteristics. Results: We identified 628â165 pregnant persons with 816â471 pregnancy episodes, of which 52.3% were live births, 24.4% were other outcomes (stillbirth, ectopic pregnancy, abortions), and 23.3% had unknown outcomes. Clinician validation agreed 98.8% with HIPPS-identified episodes. We were able to estimate start dates within 1 week of precision for 475â433 (58.2%) episodes. 62â540 (7.7%) episodes had incident COVID-19 during pregnancy. Discussion: HIPPS provides measures of support for pregnancy-related variables such as gestational age and pregnancy outcomes based on N3C data. Gestational age precision allows researchers to find time to events with reasonable confidence. Conclusion: We have developed a novel and robust approach for inferring pregnancy episodes and gestational age that addresses data inconsistency and missingness in EHR data.

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

Subject(s)

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL