Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 58
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
bioRxiv ; 2024 Sep 22.
Artículo en Inglés | MEDLINE | ID: mdl-39345458

RESUMEN

Phenotypic data are critical for understanding biological mechanisms and consequences of genomic variation, and are pivotal for clinical use cases such as disease diagnostics and treatment development. For over a century, vast quantities of phenotype data have been collected in many different contexts covering a variety of organisms. The emerging field of phenomics focuses on integrating and interpreting these data to inform biological hypotheses. A major impediment in phenomics is the wide range of distinct and disconnected approaches to recording the observable characteristics of an organism. Phenotype data are collected and curated using free text, single terms or combinations of terms, using multiple vocabularies, terminologies, or ontologies. Integrating these heterogeneous and often siloed data enables the application of biological knowledge both within and across species. Existing integration efforts are typically limited to mappings between pairs of terminologies; a generic knowledge representation that captures the full range of cross-species phenomics data is much needed. We have developed the Unified Phenotype Ontology (uPheno) framework, a community effort to provide an integration layer over domain-specific phenotype ontologies, as a single, unified, logical representation. uPheno comprises (1) a system for consistent computational definition of phenotype terms using ontology design patterns, maintained as a community library; (2) a hierarchical vocabulary of species-neutral phenotype terms under which their species-specific counterparts are grouped; and (3) mapping tables between species-specific ontologies. This harmonized representation supports use cases such as cross-species integration of genotype-phenotype associations from different organisms and cross-species informed variant prioritization.

2.
medRxiv ; 2024 Jun 11.
Artículo en Inglés | MEDLINE | ID: mdl-38947087

RESUMEN

Post-Acute Sequelae of SARS-CoV-2 infection (PASC), also known as Long-COVID, encompasses a variety of complex and varied outcomes following COVID-19 infection that are still poorly understood. We clustered over 600 million condition diagnoses from 14 million patients available through the National COVID Cohort Collaborative (N3C), generating hundreds of highly detailed clinical phenotypes. Assessing patient clinical trajectories using these clusters allowed us to identify individual conditions and phenotypes strongly increased after acute infection. We found many conditions increased in COVID-19 patients compared to controls, and using a novel method to associate patients with clusters over time, we additionally found phenotypes specific to patient sex, age, wave of infection, and PASC diagnosis status. While many of these results reflect known PASC symptoms, the resolution provided by this unprecedented data scale suggests avenues for improved diagnostics and mechanistic understanding of this multifaceted disease.

3.
Nucleic Acids Res ; 52(D1): D938-D949, 2024 Jan 05.
Artículo en Inglés | MEDLINE | ID: mdl-38000386

RESUMEN

Bridging the gap between genetic variations, environmental determinants, and phenotypic outcomes is critical for supporting clinical diagnosis and understanding mechanisms of diseases. It requires integrating open data at a global scale. The Monarch Initiative advances these goals by developing open ontologies, semantic data models, and knowledge graphs for translational research. The Monarch App is an integrated platform combining data about genes, phenotypes, and diseases across species. Monarch's APIs enable access to carefully curated datasets and advanced analysis tools that support the understanding and diagnosis of disease for diverse applications such as variant prioritization, deep phenotyping, and patient profile-matching. We have migrated our system into a scalable, cloud-based infrastructure; simplified Monarch's data ingestion and knowledge graph integration systems; enhanced data mapping and integration standards; and developed a new user interface with novel search and graph navigation features. Furthermore, we advanced Monarch's analytic tools by developing a customized plugin for OpenAI's ChatGPT to increase the reliability of its responses about phenotypic data, allowing us to interrogate the knowledge in the Monarch graph using state-of-the-art Large Language Models. The resources of the Monarch Initiative can be found at monarchinitiative.org and its corresponding code repository at github.com/monarch-initiative/monarch-app.


Asunto(s)
Bases de Datos Factuales , Enfermedad , Genes , Fenotipo , Humanos , Internet , Bases de Datos Factuales/normas , Programas Informáticos , Genes/genética , Enfermedad/genética
4.
BMC Public Health ; 23(1): 2103, 2023 10 25.
Artículo en Inglés | MEDLINE | ID: mdl-37880596

RESUMEN

BACKGROUND: More than one-third of individuals experience post-acute sequelae of SARS-CoV-2 infection (PASC, which includes long-COVID). The objective is to identify risk factors associated with PASC/long-COVID diagnosis. METHODS: This was a retrospective case-control study including 31 health systems in the United States from the National COVID Cohort Collaborative (N3C). 8,325 individuals with PASC (defined by the presence of the International Classification of Diseases, version 10 code U09.9 or a long-COVID clinic visit) matched to 41,625 controls within the same health system and COVID index date within ± 45 days of the corresponding case's earliest COVID index date. Measurements of risk factors included demographics, comorbidities, treatment and acute characteristics related to COVID-19. Multivariable logistic regression, random forest, and XGBoost were used to determine the associations between risk factors and PASC. RESULTS: Among 8,325 individuals with PASC, the majority were > 50 years of age (56.6%), female (62.8%), and non-Hispanic White (68.6%). In logistic regression, middle-age categories (40 to 69 years; OR ranging from 2.32 to 2.58), female sex (OR 1.4, 95% CI 1.33-1.48), hospitalization associated with COVID-19 (OR 3.8, 95% CI 3.05-4.73), long (8-30 days, OR 1.69, 95% CI 1.31-2.17) or extended hospital stay (30 + days, OR 3.38, 95% CI 2.45-4.67), receipt of mechanical ventilation (OR 1.44, 95% CI 1.18-1.74), and several comorbidities including depression (OR 1.50, 95% CI 1.40-1.60), chronic lung disease (OR 1.63, 95% CI 1.53-1.74), and obesity (OR 1.23, 95% CI 1.16-1.3) were associated with increased likelihood of PASC diagnosis or care at a long-COVID clinic. Characteristics associated with a lower likelihood of PASC diagnosis or care at a long-COVID clinic included younger age (18 to 29 years), male sex, non-Hispanic Black race, and comorbidities such as substance abuse, cardiomyopathy, psychosis, and dementia. More doctors per capita in the county of residence was associated with an increased likelihood of PASC diagnosis or care at a long-COVID clinic. Our findings were consistent in sensitivity analyses using a variety of analytic techniques and approaches to select controls. CONCLUSIONS: This national study identified important risk factors for PASC diagnosis such as middle age, severe COVID-19 disease, and specific comorbidities. Further clinical and epidemiological research is needed to better understand underlying mechanisms and the potential role of vaccines and therapeutics in altering PASC course.


Asunto(s)
COVID-19 , SARS-CoV-2 , Persona de Mediana Edad , Femenino , Masculino , Humanos , Adulto , Anciano , Adolescente , Adulto Joven , COVID-19/epidemiología , Síndrome Post Agudo de COVID-19 , Estudios de Casos y Controles , Estudios Retrospectivos , Factores de Riesgo , Progresión de la Enfermedad
5.
JAMIA Open ; 6(3): ooad067, 2023 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-37600074

RESUMEN

Objectives: To define pregnancy episodes and estimate gestational age within electronic health record (EHR) data from the National COVID Cohort Collaborative (N3C). Materials and Methods: We developed a comprehensive approach, named Hierarchy and rule-based pregnancy episode Inference integrated with Pregnancy Progression Signatures (HIPPS), and applied it to EHR data in the N3C (January 1, 2018-April 7, 2022). HIPPS combines: (1) an extension of a previously published pregnancy episode algorithm, (2) a novel algorithm to detect gestational age-specific signatures of a progressing pregnancy for further episode support, and (3) pregnancy start date inference. Clinicians performed validation of HIPPS on a subset of episodes. We then generated pregnancy cohorts based on gestational age precision and pregnancy outcomes for assessment of accuracy and comparison of COVID-19 and other characteristics. Results: We identified 628 165 pregnant persons with 816 471 pregnancy episodes, of which 52.3% were live births, 24.4% were other outcomes (stillbirth, ectopic pregnancy, abortions), and 23.3% had unknown outcomes. Clinician validation agreed 98.8% with HIPPS-identified episodes. We were able to estimate start dates within 1 week of precision for 475 433 (58.2%) episodes. 62 540 (7.7%) episodes had incident COVID-19 during pregnancy. Discussion: HIPPS provides measures of support for pregnancy-related variables such as gestational age and pregnancy outcomes based on N3C data. Gestational age precision allows researchers to find time to events with reasonable confidence. Conclusion: We have developed a novel and robust approach for inferring pregnancy episodes and gestational age that addresses data inconsistency and missingness in EHR data.

6.
Nat Commun ; 14(1): 2914, 2023 05 22.
Artículo en Inglés | MEDLINE | ID: mdl-37217471

RESUMEN

Long COVID, or complications arising from COVID-19 weeks after infection, has become a central concern for public health experts. The United States National Institutes of Health founded the RECOVER initiative to better understand long COVID. We used electronic health records available through the National COVID Cohort Collaborative to characterize the association between SARS-CoV-2 vaccination and long COVID diagnosis. Among patients with a COVID-19 infection between August 1, 2021 and January 31, 2022, we defined two cohorts using distinct definitions of long COVID-a clinical diagnosis (n = 47,404) or a previously described computational phenotype (n = 198,514)-to compare unvaccinated individuals to those with a complete vaccine series prior to infection. Evidence of long COVID was monitored through June or July of 2022, depending on patients' data availability. We found that vaccination was consistently associated with lower odds and rates of long COVID clinical diagnosis and high-confidence computationally derived diagnosis after adjusting for sex, demographics, and medical history.


Asunto(s)
COVID-19 , Síndrome Post Agudo de COVID-19 , Estados Unidos/epidemiología , Humanos , COVID-19/epidemiología , COVID-19/prevención & control , Vacunas contra la COVID-19 , Estudios de Cohortes , SARS-CoV-2 , Vacunación
7.
Mamm Genome ; 34(3): 364-378, 2023 09.
Artículo en Inglés | MEDLINE | ID: mdl-37076585

RESUMEN

Existing phenotype ontologies were originally developed to represent phenotypes that manifest as a character state in relation to a wild-type or other reference. However, these do not include the phenotypic trait or attribute categories required for the annotation of genome-wide association studies (GWAS), Quantitative Trait Loci (QTL) mappings or any population-focussed measurable trait data. The integration of trait and biological attribute information with an ever increasing body of chemical, environmental and biological data greatly facilitates computational analyses and it is also highly relevant to biomedical and clinical applications. The Ontology of Biological Attributes (OBA) is a formalised, species-independent collection of interoperable phenotypic trait categories that is intended to fulfil a data integration role. OBA is a standardised representational framework for observable attributes that are characteristics of biological entities, organisms, or parts of organisms. OBA has a modular design which provides several benefits for users and data integrators, including an automated and meaningful classification of trait terms computed on the basis of logical inferences drawn from domain-specific ontologies for cells, anatomical and other relevant entities. The logical axioms in OBA also provide a previously missing bridge that can computationally link Mendelian phenotypes with GWAS and quantitative traits. The term components in OBA provide semantic links and enable knowledge and data integration across specialised research community boundaries, thereby breaking silos.


Asunto(s)
Ontologías Biológicas , Disciplinas de las Ciencias Biológicas , Estudio de Asociación del Genoma Completo , Fenotipo
8.
Adv Genet (Hoboken) ; 4(1): 2200016, 2023 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-36910590

RESUMEN

The Global Alliance for Genomics and Health (GA4GH) is developing a suite of coordinated standards for genomics for healthcare. The Phenopacket is a new GA4GH standard for sharing disease and phenotype information that characterizes an individual person, linking that individual to detailed phenotypic descriptions, genetic information, diagnoses, and treatments. A detailed example is presented that illustrates how to use the schema to represent the clinical course of a patient with retinoblastoma, including demographic information, the clinical diagnosis, phenotypic features and clinical measurements, an examination of the extirpated tumor, therapies, and the results of genomic analysis. The Phenopacket Schema, together with other GA4GH data and technical standards, will enable data exchange and provide a foundation for the computational analysis of disease and phenotype information to improve our ability to diagnose and conduct research on all types of disorders, including cancer and rare diseases.

9.
bioRxiv ; 2023 Jan 27.
Artículo en Inglés | MEDLINE | ID: mdl-36747660

RESUMEN

Existing phenotype ontologies were originally developed to represent phenotypes that manifest as a character state in relation to a wild-type or other reference. However, these do not include the phenotypic trait or attribute categories required for the annotation of genome-wide association studies (GWAS), Quantitative Trait Loci (QTL) mappings or any population-focused measurable trait data. Moreover, variations in gene expression in response to environmental disturbances even without any genetic alterations can also be associated with particular biological attributes. The integration of trait and biological attribute information with an ever increasing body of chemical, environmental and biological data greatly facilitates computational analyses and it is also highly relevant to biomedical and clinical applications. The Ontology of Biological Attributes (OBA) is a formalised, species-independent collection of interoperable phenotypic trait categories that is intended to fulfil a data integration role. OBA is a standardised representational framework for observable attributes that are characteristics of biological entities, organisms, or parts of organisms. OBA has a modular design which provides several benefits for users and data integrators, including an automated and meaningful classification of trait terms computed on the basis of logical inferences drawn from domain-specific ontologies for cells, anatomical and other relevant entities. The logical axioms in OBA also provide a previously missing bridge that can computationally link Mendelian phenotypes with GWAS and quantitative traits. The term components in OBA provide semantic links and enable knowledge and data integration across specialised research community boundaries, thereby breaking silos.

10.
BMC Med ; 21(1): 58, 2023 02 16.
Artículo en Inglés | MEDLINE | ID: mdl-36793086

RESUMEN

BACKGROUND: Naming a newly discovered disease is a difficult process; in the context of the COVID-19 pandemic and the existence of post-acute sequelae of SARS-CoV-2 infection (PASC), which includes long COVID, it has proven especially challenging. Disease definitions and assignment of a diagnosis code are often asynchronous and iterative. The clinical definition and our understanding of the underlying mechanisms of long COVID are still in flux, and the deployment of an ICD-10-CM code for long COVID in the USA took nearly 2 years after patients had begun to describe their condition. Here, we leverage the largest publicly available HIPAA-limited dataset about patients with COVID-19 in the US to examine the heterogeneity of adoption and use of U09.9, the ICD-10-CM code for "Post COVID-19 condition, unspecified." METHODS: We undertook a number of analyses to characterize the N3C population with a U09.9 diagnosis code (n = 33,782), including assessing person-level demographics and a number of area-level social determinants of health; diagnoses commonly co-occurring with U09.9, clustered using the Louvain algorithm; and quantifying medications and procedures recorded within 60 days of U09.9 diagnosis. We stratified all analyses by age group in order to discern differing patterns of care across the lifespan. RESULTS: We established the diagnoses most commonly co-occurring with U09.9 and algorithmically clustered them into four major categories: cardiopulmonary, neurological, gastrointestinal, and comorbid conditions. Importantly, we discovered that the population of patients diagnosed with U09.9 is demographically skewed toward female, White, non-Hispanic individuals, as well as individuals living in areas with low poverty and low unemployment. Our results also include a characterization of common procedures and medications associated with U09.9-coded patients. CONCLUSIONS: This work offers insight into potential subtypes and current practice patterns around long COVID and speaks to the existence of disparities in the diagnosis of patients with long COVID. This latter finding in particular requires further research and urgent remediation.


Asunto(s)
COVID-19 , Síndrome Post Agudo de COVID-19 , Humanos , Femenino , Clasificación Internacional de Enfermedades , Pandemias , COVID-19/diagnóstico , COVID-19/epidemiología , SARS-CoV-2
11.
J Biomed Semantics ; 14(1): 3, 2023 02 24.
Artículo en Inglés | MEDLINE | ID: mdl-36823605

RESUMEN

BACKGROUND: Evaluating the impact of environmental exposures on organism health is a key goal of modern biomedicine and is critically important in an age of greater pollution and chemicals in our environment. Environmental health utilizes many different research methods and generates a variety of data types. However, to date, no comprehensive database represents the full spectrum of environmental health data. Due to a lack of interoperability between databases, tools for integrating these resources are needed. In this manuscript we present the Environmental Conditions, Treatments, and Exposures Ontology (ECTO), a species-agnostic ontology focused on exposure events that occur as a result of natural and experimental processes, such as diet, work, or research activities. ECTO is intended for use in harmonizing environmental health data resources to support cross-study integration and inference for mechanism discovery. METHODS AND FINDINGS: ECTO is an ontology designed for describing organismal exposures such as toxicological research, environmental variables, dietary features, and patient-reported data from surveys. ECTO utilizes the base model established within the Exposure Ontology (ExO). ECTO is developed using a combination of manual curation and Dead Simple OWL Design Patterns (DOSDP), and contains over 2700 environmental exposure terms, and incorporates chemical and environmental ontologies. ECTO is an Open Biological and Biomedical Ontology (OBO) Foundry ontology that is designed for interoperability, reuse, and axiomatization with other ontologies. ECTO terms have been utilized in axioms within the Mondo Disease Ontology to represent diseases caused or influenced by environmental factors, as well as for survey encoding for the Personalized Environment and Genes Study (PEGS). CONCLUSIONS: We constructed ECTO to meet Open Biological and Biomedical Ontology (OBO) Foundry principles to increase translation opportunities between environmental health and other areas of biology. ECTO has a growing community of contributors consisting of toxicologists, public health epidemiologists, and health care providers to provide the necessary expertise for areas that have been identified previously as gaps.


Asunto(s)
Ontologías Biológicas , Humanos , Bases de Datos Factuales
12.
PLoS One ; 18(1): e0279968, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36603014

RESUMEN

BACKGROUND: While COVID-19 vaccines reduce adverse outcomes, post-vaccination SARS-CoV-2 infection remains problematic. We sought to identify community factors impacting risk for breakthrough infections (BTI) among fully vaccinated persons by rurality. METHODS: We conducted a retrospective cohort study of US adults sampled between January 1 and December 20, 2021, from the National COVID Cohort Collaborative (N3C). Using Kaplan-Meier and Cox-Proportional Hazards models adjusted for demographic differences and comorbid conditions, we assessed impact of rurality, county vaccine hesitancy, and county vaccination rates on risk of BTI over 180 days following two mRNA COVID-19 vaccinations between January 1 and September 21, 2021. Additionally, Cox Proportional Hazards models assessed the risk of infection among adults without documented vaccinations. We secondarily assessed the odds of hospitalization and adverse COVID-19 events based on vaccination status using multivariable logistic regression during the study period. RESULTS: Our study population included 566,128 vaccinated and 1,724,546 adults without documented vaccination. Among vaccinated persons, rurality was associated with an increased risk of BTI (adjusted hazard ratio [aHR] 1.53, 95% confidence interval [CI] 1.42-1.64, for urban-adjacent rural and 1.65, 1.42-1.91, for nonurban-adjacent rural) compared to urban dwellers. Compared to low vaccine-hesitant counties, higher risks of BTI were associated with medium (1.07, 1.02-1.12) and high (1.33, 1.23-1.43) vaccine-hesitant counties. Compared to counties with high vaccination rates, a higher risk of BTI was associated with dwelling in counties with low vaccination rates (1.34, 1.27-1.43) but not medium vaccination rates (1.00, 0.95-1.07). Community factors were also associated with higher odds of SARS-CoV-2 infection among persons without a documented vaccination. Vaccinated persons with SARS-CoV-2 infection during the study period had significantly lower odds of hospitalization and adverse events across all geographic areas and community exposures. CONCLUSIONS: Our findings suggest that community factors are associated with an increased risk of BTI, particularly in rural areas and counties with high vaccine hesitancy. Communities, such as those in rural and disproportionately vaccine hesitant areas, and certain groups at high risk for adverse breakthrough events, including immunosuppressed/compromised persons, should continue to receive public health focus, targeted interventions, and consistent guidance to help manage community spread as vaccination protection wanes.


Asunto(s)
COVID-19 , Humanos , Adulto , COVID-19/epidemiología , COVID-19/prevención & control , Vacunas contra la COVID-19 , Estudios Retrospectivos , SARS-CoV-2 , Infección Irruptiva , Vacunación
13.
EBioMedicine ; 87: 104413, 2023 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-36563487

RESUMEN

BACKGROUND: Stratification of patients with post-acute sequelae of SARS-CoV-2 infection (PASC, or long COVID) would allow precision clinical management strategies. However, long COVID is incompletely understood and characterised by a wide range of manifestations that are difficult to analyse computationally. Additionally, the generalisability of machine learning classification of COVID-19 clinical outcomes has rarely been tested. METHODS: We present a method for computationally modelling PASC phenotype data based on electronic healthcare records (EHRs) and for assessing pairwise phenotypic similarity between patients using semantic similarity. Our approach defines a nonlinear similarity function that maps from a feature space of phenotypic abnormalities to a matrix of pairwise patient similarity that can be clustered using unsupervised machine learning. FINDINGS: We found six clusters of PASC patients, each with distinct profiles of phenotypic abnormalities, including clusters with distinct pulmonary, neuropsychiatric, and cardiovascular abnormalities, and a cluster associated with broad, severe manifestations and increased mortality. There was significant association of cluster membership with a range of pre-existing conditions and measures of severity during acute COVID-19. We assigned new patients from other healthcare centres to clusters by maximum semantic similarity to the original patients, and showed that the clusters were generalisable across different hospital systems. The increased mortality rate originally identified in one cluster was consistently observed in patients assigned to that cluster in other hospital systems. INTERPRETATION: Semantic phenotypic clustering provides a foundation for assigning patients to stratified subgroups for natural history or therapy studies on PASC. FUNDING: NIH (TR002306/OT2HL161847-01/OD011883/HG010860), U.S.D.O.E. (DE-AC02-05CH11231), Donald A. Roux Family Fund at Jackson Laboratory, Marsico Family at CU Anschutz.


Asunto(s)
COVID-19 , Síndrome Post Agudo de COVID-19 , Humanos , Progresión de la Enfermedad , SARS-CoV-2
14.
medRxiv ; 2022 Oct 07.
Artículo en Inglés | MEDLINE | ID: mdl-36238713

RESUMEN

Importance: Characterizing the effect of vaccination on long COVID allows for better healthcare recommendations. Objective: To determine if, and to what degree, vaccination prior to COVID-19 is associated with eventual long COVID onset, among those a documented COVID-19 infection. Design Settings and Participants: Retrospective cohort study of adults with evidence of COVID-19 between August 1, 2021 and January 31, 2022 based on electronic health records from eleven healthcare institutions taking part in the NIH Researching COVID to Enhance Recovery (RECOVER) Initiative, a project of the National Covid Cohort Collaborative (N3C). Exposures: Pre-COVID-19 receipt of a complete vaccine series versus no pre-COVID-19 vaccination. Main Outcomes and Measures: Two approaches to the identification of long COVID were used. In the clinical diagnosis cohort (n=47,752), ICD-10 diagnosis codes or evidence of a healthcare encounter at a long COVID clinic were used. In the model-based cohort (n=199,498), a computable phenotype was used. The association between pre-COVID vaccination and long COVID was estimated using IPTW-adjusted logistic regression and Cox proportional hazards. Results: In both cohorts, when adjusting for demographics and medical history, pre-COVID vaccination was associated with a reduced risk of long COVID (clinic-based cohort: HR, 0.66; 95% CI, 0.55-0.80; OR, 0.69; 95% CI, 0.59-0.82; model-based cohort: HR, 0.62; 95% CI, 0.56-0.69; OR, 0.70; 95% CI, 0.65-0.75). Conclusions and Relevance: Long COVID has become a central concern for public health experts. Prior studies have considered the effect of vaccination on the prevalence of future long COVID symptoms, but ours is the first to thoroughly characterize the association between vaccination and clinically diagnosed or computationally derived long COVID. Our results bolster the growing consensus that vaccines retain protective effects against long COVID even in breakthrough infections. Key Points: Question: Does vaccination prior to COVID-19 onset change the risk of long COVID diagnosis?Findings: Four observational analyses of EHRs showed a statistically significant reduction in long COVID risk associated with pre-COVID vaccination (first cohort: HR, 0.66; 95% CI, 0.55-0.80; OR, 0.69; 95% CI, 0.59-0.82; second cohort: HR, 0.62; 95% CI, 0.56-0.69; OR, 0.70; 95% CI, 0.65-0.75).Meaning: Vaccination prior to COVID onset has a protective association with long COVID even in the case of breakthrough infections.

15.
Database (Oxford) ; 20222022 10 08.
Artículo en Inglés | MEDLINE | ID: mdl-36208225

RESUMEN

Similar to managing software packages, managing the ontology life cycle involves multiple complex workflows such as preparing releases, continuous quality control checking and dependency management. To manage these processes, a diverse set of tools is required, from command-line utilities to powerful ontology-engineering environmentsr. Particularly in the biomedical domain, which has developed a set of highly diverse yet inter-dependent ontologies, standardizing release practices and metadata and establishing shared quality standards are crucial to enable interoperability. The Ontology Development Kit (ODK) provides a set of standardized, customizable and automatically executable workflows, and packages all required tooling in a single Docker image. In this paper, we provide an overview of how the ODK works, show how it is used in practice and describe how we envision it driving standardization efforts in our community. Database URL: https://github.com/INCATools/ontology-development-kit.


Asunto(s)
Ontologías Biológicas , Bases de Datos Factuales , Metadatos , Control de Calidad , Programas Informáticos , Flujo de Trabajo
16.
medRxiv ; 2022 Sep 02.
Artículo en Inglés | MEDLINE | ID: mdl-36093345

RESUMEN

Background: Naming a newly discovered disease is a difficult process; in the context of the COVID-19 pandemic and the existence of post-acute sequelae of SARS-CoV-2 infection (PASC), which includes Long COVID, it has proven especially challenging. Disease definitions and assignment of a diagnosis code are often asynchronous and iterative. The clinical definition and our understanding of the underlying mechanisms of Long COVID are still in flux, and the deployment of an ICD-10-CM code for Long COVID in the US took nearly two years after patients had begun to describe their condition. Here we leverage the largest publicly available HIPAA-limited dataset about patients with COVID-19 in the US to examine the heterogeneity of adoption and use of U09.9, the ICD-10-CM code for "Post COVID-19 condition, unspecified." Methods: We undertook a number of analyses to characterize the N3C population with a U09.9 diagnosis code ( n = 21,072), including assessing person-level demographics and a number of area-level social determinants of health; diagnoses commonly co-occurring with U09.9, clustered using the Louvain algorithm; and quantifying medications and procedures recorded within 60 days of U09.9 diagnosis. We stratified all analyses by age group in order to discern differing patterns of care across the lifespan. Results: We established the diagnoses most commonly co-occurring with U09.9, and algorithmically clustered them into four major categories: cardiopulmonary, neurological, gastrointestinal, and comorbid conditions. Importantly, we discovered that the population of patients diagnosed with U09.9 is demographically skewed toward female, White, non-Hispanic individuals, as well as individuals living in areas with low poverty, high education, and high access to medical care. Our results also include a characterization of common procedures and medications associated with U09.9-coded patients. Conclusions: This work offers insight into potential subtypes and current practice patterns around Long COVID, and speaks to the existence of disparities in the diagnosis of patients with Long COVID. This latter finding in particular requires further research and urgent remediation.

17.
medRxiv ; 2022 Aug 17.
Artículo en Inglés | MEDLINE | ID: mdl-36032983

RESUMEN

Background: More than one-third of individuals experience post-acute sequelae of SARS-CoV-2 infection (PASC, which includes long-COVID). Objective: To identify risk factors associated with PASC/long-COVID. Design: Retrospective case-control study. Setting: 31 health systems in the United States from the National COVID Cohort Collaborative (N3C). Patients: 8,325 individuals with PASC (defined by the presence of the International Classification of Diseases, version 10 code U09.9 or a long-COVID clinic visit) matched to 41,625 controls within the same health system. Measurements: Risk factors included demographics, comorbidities, and treatment and acute characteristics related to COVID-19. Multivariable logistic regression, random forest, and XGBoost were used to determine the associations between risk factors and PASC. Results: Among 8,325 individuals with PASC, the majority were >50 years of age (56.6%), female (62.8%), and non-Hispanic White (68.6%). In logistic regression, middle-age categories (40 to 69 years; OR ranging from 2.32 to 2.58), female sex (OR 1.4, 95% CI 1.33-1.48), hospitalization associated with COVID-19 (OR 3.8, 95% CI 3.05-4.73), long (8-30 days, OR 1.69, 95% CI 1.31-2.17) or extended hospital stay (30+ days, OR 3.38, 95% CI 2.45-4.67), receipt of mechanical ventilation (OR 1.44, 95% CI 1.18-1.74), and several comorbidities including depression (OR 1.50, 95% CI 1.40-1.60), chronic lung disease (OR 1.63, 95% CI 1.53-1.74), and obesity (OR 1.23, 95% CI 1.16-1.3) were associated with increased likelihood of PASC diagnosis or care at a long-COVID clinic. Characteristics associated with a lower likelihood of PASC diagnosis or care at a long-COVID clinic included younger age (18 to 29 years), male sex, non-Hispanic Black race, and comorbidities such as substance abuse, cardiomyopathy, psychosis, and dementia. More doctors per capita in the county of residence was associated with an increased likelihood of PASC diagnosis or care at a long-COVID clinic. Our findings were consistent in sensitivity analyses using a variety of analytic techniques and approaches to select controls. Conclusions: This national study identified important risk factors for PASC such as middle age, severe COVID-19 disease, and specific comorbidities. Further clinical and epidemiological research is needed to better understand underlying mechanisms and the potential role of vaccines and therapeutics in altering PASC course.

18.
medRxiv ; 2022 Aug 06.
Artículo en Inglés | MEDLINE | ID: mdl-35982668

RESUMEN

Objective: To define pregnancy episodes and estimate gestational aging within electronic health record (EHR) data from the National COVID Cohort Collaborative (N3C). Materials and Methods: We developed a comprehensive approach, named H ierarchy and rule-based pregnancy episode I nference integrated with P regnancy P rogression S ignatures (HIPPS) and applied it to EHR data in the N3C from 1 January 2018 to 7 April 2022. HIPPS combines: 1) an extension of a previously published pregnancy episode algorithm, 2) a novel algorithm to detect gestational aging-specific signatures of a progressing pregnancy for further episode support, and 3) pregnancy start date inference. Clinicians performed validation of HIPPS on a subset of episodes. We then generated three types of pregnancy cohorts based on the level of precision for gestational aging and pregnancy outcomes for comparison of COVID-19 and other characteristics. Results: We identified 628,165 pregnant persons with 816,471 pregnancy episodes, of which 52.3% were live births, 24.4% were other outcomes (stillbirth, ectopic pregnancy, spontaneous abortions), and 23.3% had unknown outcomes. We were able to estimate start dates within one week of precision for 431,173 (52.8%) episodes. 66,019 (8.1%) episodes had incident COVID-19 during pregnancy. Across varying COVID-19 cohorts, patient characteristics were generally similar though pregnancy outcomes differed. Discussion: HIPPS provides support for pregnancy-related variables based on EHR data for researchers to define pregnancy cohorts. Our approach performed well based on clinician validation. Conclusion: We have developed a novel and robust approach for inferring pregnancy episodes and gestational aging that addresses data inconsistency and missingness in EHR data.

19.
medRxiv ; 2022 Jul 20.
Artículo en Inglés | MEDLINE | ID: mdl-35665012

RESUMEN

Accurate stratification of patients with post-acute sequelae of SARS-CoV-2 infection (PASC, or long COVID) would allow precision clinical management strategies. However, the natural history of long COVID is incompletely understood and characterized by an extremely wide range of manifestations that are difficult to analyze computationally. In addition, the generalizability of machine learning classification of COVID-19 clinical outcomes has rarely been tested. We present a method for computationally modeling PASC phenotype data based on electronic healthcare records (EHRs) and for assessing pairwise phenotypic similarity between patients using semantic similarity. Our approach defines a nonlinear similarity function that maps from a feature space of phenotypic abnormalities to a matrix of pairwise patient similarity that can be clustered using unsupervised machine learning procedures. Using k-means clustering of this similarity matrix, we found six distinct clusters of PASC patients, each with distinct profiles of phenotypic abnormalities. There was a significant association of cluster membership with a range of pre-existing conditions and with measures of severity during acute COVID-19. Two of the clusters were associated with severe manifestations and displayed increased mortality. We assigned new patients from other healthcare centers to one of the six clusters on the basis of maximum semantic similarity to the original patients. We show that the identified clusters were generalizable across different hospital systems and that the increased mortality rate was consistently observed in two of the clusters. Semantic phenotypic clustering can provide a foundation for assigning patients to stratified subgroups for natural history or therapy studies on PASC.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...