RESUMO
Genome sequencing has revolutionized the diagnosis of genetic diseases. Close collaborations between basic scientists and clinical genomicists are now needed to link genetic variants with disease causation. To facilitate such collaborations, we recommend prioritizing clinically relevant genes for functional studies, developing reference variant-phenotype databases, adopting phenotype description standards, and promoting data sharing.
Assuntos
Pesquisa Biomédica , Genômica , Animais , Análise Mutacional de DNA , Bases de Dados Genéticas , Doença/genética , Projeto Genoma Humano , Humanos , Disseminação de Informação , Modelos AnimaisRESUMO
A core task when establishing the strength of evidence for a gene's role in a monogenic disorder is determining the appropriate disease entity to curate. Establishing this concept determines which evidence can be applied and quantified toward the final gene-disease validity, variant pathogenicity, or actionability classification. Genes with implications in more than one phenotype can necessitate a process of lumping and splitting, disease reorganization, and updates to disease nomenclature. Reappraisal of the names that are used as labels for disease entities is therefore a necessary and perpetual process. The Clinical Genome Resource (ClinGen), in collaboration with representatives from Monarch Disease Ontology (Mondo) and Online Inheritance in Man (OMIM), formed the Disease Naming Advisory Committee (DNAC) to develop guidance for groups faced with the need to establish the "curated disease entity" for gene-phenotype validity and variant pathogenicity and to update disease names for clinical use when necessary. The objective of this group was to harmonize guidance for disease naming across these nosologic entities and among ClinGen curation groups in collaboration with other disease-related professional groups. Here, we present the initial guidance developed by the DNAC with representative examples provided by the ClinGen expert panels and working groups that warranted nomenclature updates. We also discuss the broader implications of these efforts and their benefits for harmonization of gene-disease validity curation. Overall, this work sheds light on current inconsistencies and/or discrepancies and is designed to engage the broader community on how ClinGen defines monogenic disorders using a consistent approach for disease naming.
Assuntos
Doenças Genéticas Inatas , Terminologia como Assunto , Humanos , Doenças Genéticas Inatas/genética , Bases de Dados Genéticas , FenótipoRESUMO
Bridging the gap between genetic variations, environmental determinants, and phenotypic outcomes is critical for supporting clinical diagnosis and understanding mechanisms of diseases. It requires integrating open data at a global scale. The Monarch Initiative advances these goals by developing open ontologies, semantic data models, and knowledge graphs for translational research. The Monarch App is an integrated platform combining data about genes, phenotypes, and diseases across species. Monarch's APIs enable access to carefully curated datasets and advanced analysis tools that support the understanding and diagnosis of disease for diverse applications such as variant prioritization, deep phenotyping, and patient profile-matching. We have migrated our system into a scalable, cloud-based infrastructure; simplified Monarch's data ingestion and knowledge graph integration systems; enhanced data mapping and integration standards; and developed a new user interface with novel search and graph navigation features. Furthermore, we advanced Monarch's analytic tools by developing a customized plugin for OpenAI's ChatGPT to increase the reliability of its responses about phenotypic data, allowing us to interrogate the knowledge in the Monarch graph using state-of-the-art Large Language Models. The resources of the Monarch Initiative can be found at monarchinitiative.org and its corresponding code repository at github.com/monarch-initiative/monarch-app.
Assuntos
Bases de Dados Factuais , Doença , Genes , Fenótipo , Humanos , Internet , Bases de Dados Factuais/normas , Software , Genes/genética , Doença/genéticaRESUMO
MOTIVATION: Creating knowledge bases and ontologies is a time consuming task that relies on manual curation. AI/NLP approaches can assist expert curators in populating these knowledge bases, but current approaches rely on extensive training data, and are not able to populate arbitrarily complex nested knowledge schemas. RESULTS: Here we present Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES), a Knowledge Extraction approach that relies on the ability of Large Language Models (LLMs) to perform zero-shot learning and general-purpose query answering from flexible prompts and return information conforming to a specified schema. Given a detailed, user-defined knowledge schema and an input text, SPIRES recursively performs prompt interrogation against an LLM to obtain a set of responses matching the provided schema. SPIRES uses existing ontologies and vocabularies to provide identifiers for matched elements. We present examples of applying SPIRES in different domains, including extraction of food recipes, multi-species cellular signaling pathways, disease treatments, multi-step drug mechanisms, and chemical to disease relationships. Current SPIRES accuracy is comparable to the mid-range of existing Relation Extraction methods, but greatly surpasses an LLM's native capability of grounding entities with unique identifiers. SPIRES has the advantage of easy customization, flexibility, and, crucially, the ability to perform new tasks in the absence of any new training data. This method supports a general strategy of leveraging the language interpreting capabilities of LLMs to assemble knowledge bases, assisting manual knowledge curation and acquisition while supporting validation with publicly-available databases and ontologies external to the LLM. AVAILABILITY AND IMPLEMENTATION: SPIRES is available as part of the open source OntoGPT package: https://github.com/monarch-initiative/ontogpt.
Assuntos
Bases de Conhecimento , Semântica , Bases de Dados FactuaisRESUMO
A critical challenge in genetic diagnostics is the computational assessment of candidate splice variants, specifically the interpretation of nucleotide changes located outside of the highly conserved dinucleotide sequences at the 5' and 3' ends of introns. To address this gap, we developed the Super Quick Information-content Random-forest Learning of Splice variants (SQUIRLS) algorithm. SQUIRLS generates a small set of interpretable features for machine learning by calculating the information-content of wild-type and variant sequences of canonical and cryptic splice sites, assessing changes in candidate splicing regulatory sequences, and incorporating characteristics of the sequence such as exon length, disruptions of the AG exclusion zone, and conservation. We curated a comprehensive collection of disease-associated splice-altering variants at positions outside of the highly conserved AG/GT dinucleotides at the termini of introns. SQUIRLS trains two random-forest classifiers for the donor and for the acceptor and combines their outputs by logistic regression to yield a final score. We show that SQUIRLS transcends previous state-of-the-art accuracy in classifying splice variants as assessed by rank analysis in simulated exomes, and is significantly faster than competing methods. SQUIRLS provides tabular output files for incorporation into diagnostic pipelines for exome and genome analysis, as well as visualizations that contextualize predicted effects of variants on splicing to make it easier to interpret splice variants in diagnostic settings.
Assuntos
Algoritmos , Curadoria de Dados/métodos , Doenças Genéticas Inatas/genética , Sítios de Splice de RNA , Splicing de RNA , Software , Sequência de Bases , Biologia Computacional/métodos , Exoma , Éxons , Doenças Genéticas Inatas/diagnóstico , Doenças Genéticas Inatas/patologia , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Íntrons , Mutação , Sequenciamento do ExomaRESUMO
MOTIVATION: Knowledge graphs (KGs) are a powerful approach for integrating heterogeneous data and making inferences in biology and many other domains, but a coherent solution for constructing, exchanging, and facilitating the downstream use of KGs is lacking. RESULTS: Here we present KG-Hub, a platform that enables standardized construction, exchange, and reuse of KGs. Features include a simple, modular extract-transform-load pattern for producing graphs compliant with Biolink Model (a high-level data model for standardizing biological data), easy integration of any OBO (Open Biological and Biomedical Ontologies) ontology, cached downloads of upstream data sources, versioned and automatically updated builds with stable URLs, web-browsable storage of KG artifacts on cloud infrastructure, and easy reuse of transformed subgraphs across projects. Current KG-Hub projects span use cases including COVID-19 research, drug repurposing, microbial-environmental interactions, and rare disease research. KG-Hub is equipped with tooling to easily analyze and manipulate KGs. KG-Hub is also tightly integrated with graph machine learning (ML) tools which allow automated graph ML, including node embeddings and training of models for link prediction and node classification. AVAILABILITY AND IMPLEMENTATION: https://kghub.org.
Assuntos
Ontologias Biológicas , COVID-19 , Humanos , Reconhecimento Automatizado de Padrão , Doenças Raras , Aprendizado de MáquinaRESUMO
Decades of reductionist approaches in biology have achieved spectacular progress, but the proliferation of subdisciplines, each with its own technical and social practices regarding data, impedes the growth of the multidisciplinary and interdisciplinary approaches now needed to address pressing societal challenges. Data integration is key to a reintegrated biology able to address global issues such as climate change, biodiversity loss, and sustainable ecosystem management. We identify major challenges to data integration and present a vision for a "Data as a Service"-oriented architecture to promote reuse of data for discovery. The proposed architecture includes standards development, new tools and services, and strategies for career-development and sustainability.
Assuntos
Gerenciamento de Dados/métodos , Disseminação de Informação/métodos , Pesquisa Interdisciplinar/tendências , Biodiversidade , Disciplinas das Ciências Biológicas , Conservação dos Recursos Naturais , Ecossistema , Comunicação Interdisciplinar , Pesquisa Interdisciplinar/métodosRESUMO
OBJECTIVE: Clinical deep phenotyping and phenotype annotation play a critical role in both the diagnosis of patients with rare disorders as well as in building computationally-tractable knowledge in the rare disorders field. These processes rely on using ontology concepts, often from the Human Phenotype Ontology, in conjunction with a phenotype concept recognition task (supported usually by machine learning methods) to curate patient profiles or existing scientific literature. With the significant shift in the use of large language models (LLMs) for most NLP tasks, we examine the performance of the latest Generative Pre-trained Transformer (GPT) models underpinning ChatGPT as a foundation for the tasks of clinical phenotyping and phenotype annotation. MATERIALS AND METHODS: The experimental setup of the study included seven prompts of various levels of specificity, two GPT models (gpt-3.5-turbo and gpt-4.0) and two established gold standard corpora for phenotype recognition, one consisting of publication abstracts and the other clinical observations. RESULTS: The best run, using in-context learning, achieved 0.58 document-level F1 score on publication abstracts and 0.75 document-level F1 score on clinical observations, as well as a mention-level F1 score of 0.7, which surpasses the current best in class tool. Without in-context learning, however, performance is significantly below the existing approaches. CONCLUSION: Our experiments show that gpt-4.0 surpasses the state of the art performance if the task is constrained to a subset of the target ontology where there is prior knowledge of the terms that are expected to be matched. While the results are promising, the non-deterministic nature of the outcomes, the high cost and the lack of concordance between different runs using the same prompt and input make the use of these LLMs challenging for this particular task.
Assuntos
Conhecimento , Idioma , Humanos , Aprendizado de Máquina , Fenótipo , Doenças RarasRESUMO
Human Phenotype Ontology (HPO)-based analysis has become standard for genomic diagnostics of rare diseases. Current algorithms use a variety of semantic and statistical approaches to prioritize the typically long lists of genes with candidate pathogenic variants. These algorithms do not provide robust estimates of the strength of the predictions beyond the placement in a ranked list, nor do they provide measures of how much any individual phenotypic observation has contributed to the prioritization result. However, given that the overall success rate of genomic diagnostics is only around 25%-50% or less in many cohorts, a good ranking cannot be taken to imply that the gene or disease at rank one is necessarily a good candidate. Here, we present an approach to genomic diagnostics that exploits the likelihood ratio (LR) framework to provide an estimate of (1) the posttest probability of candidate diagnoses, (2) the LR for each observed HPO phenotype, and (3) the predicted pathogenicity of observed genotypes. LIkelihood Ratio Interpretation of Clinical AbnormaLities (LIRICAL) placed the correct diagnosis within the first three ranks in 92.9% of 384 case reports comprising 262 Mendelian diseases, and the correct diagnosis had a mean posttest probability of 67.3%. Simulations show that LIRICAL is robust to many typically encountered forms of genomic and phenomic noise. In summary, LIRICAL provides accurate, clinically interpretable results for phenotype-driven genomic diagnostics.
Assuntos
Biologia Computacional , Bases de Dados Genéticas , Genômica , Doenças Raras/diagnóstico , Algoritmos , Exoma/genética , Humanos , Fenótipo , Doenças Raras/genética , SoftwareRESUMO
BACKGROUND: Naming a newly discovered disease is a difficult process; in the context of the COVID-19 pandemic and the existence of post-acute sequelae of SARS-CoV-2 infection (PASC), which includes long COVID, it has proven especially challenging. Disease definitions and assignment of a diagnosis code are often asynchronous and iterative. The clinical definition and our understanding of the underlying mechanisms of long COVID are still in flux, and the deployment of an ICD-10-CM code for long COVID in the USA took nearly 2 years after patients had begun to describe their condition. Here, we leverage the largest publicly available HIPAA-limited dataset about patients with COVID-19 in the US to examine the heterogeneity of adoption and use of U09.9, the ICD-10-CM code for "Post COVID-19 condition, unspecified." METHODS: We undertook a number of analyses to characterize the N3C population with a U09.9 diagnosis code (n = 33,782), including assessing person-level demographics and a number of area-level social determinants of health; diagnoses commonly co-occurring with U09.9, clustered using the Louvain algorithm; and quantifying medications and procedures recorded within 60 days of U09.9 diagnosis. We stratified all analyses by age group in order to discern differing patterns of care across the lifespan. RESULTS: We established the diagnoses most commonly co-occurring with U09.9 and algorithmically clustered them into four major categories: cardiopulmonary, neurological, gastrointestinal, and comorbid conditions. Importantly, we discovered that the population of patients diagnosed with U09.9 is demographically skewed toward female, White, non-Hispanic individuals, as well as individuals living in areas with low poverty and low unemployment. Our results also include a characterization of common procedures and medications associated with U09.9-coded patients. CONCLUSIONS: This work offers insight into potential subtypes and current practice patterns around long COVID and speaks to the existence of disparities in the diagnosis of patients with long COVID. This latter finding in particular requires further research and urgent remediation.
Assuntos
COVID-19 , Síndrome de COVID-19 Pós-Aguda , Humanos , Feminino , Classificação Internacional de Doenças , Pandemias , COVID-19/diagnóstico , COVID-19/epidemiologia , SARS-CoV-2RESUMO
Existing phenotype ontologies were originally developed to represent phenotypes that manifest as a character state in relation to a wild-type or other reference. However, these do not include the phenotypic trait or attribute categories required for the annotation of genome-wide association studies (GWAS), Quantitative Trait Loci (QTL) mappings or any population-focussed measurable trait data. The integration of trait and biological attribute information with an ever increasing body of chemical, environmental and biological data greatly facilitates computational analyses and it is also highly relevant to biomedical and clinical applications. The Ontology of Biological Attributes (OBA) is a formalised, species-independent collection of interoperable phenotypic trait categories that is intended to fulfil a data integration role. OBA is a standardised representational framework for observable attributes that are characteristics of biological entities, organisms, or parts of organisms. OBA has a modular design which provides several benefits for users and data integrators, including an automated and meaningful classification of trait terms computed on the basis of logical inferences drawn from domain-specific ontologies for cells, anatomical and other relevant entities. The logical axioms in OBA also provide a previously missing bridge that can computationally link Mendelian phenotypes with GWAS and quantitative traits. The term components in OBA provide semantic links and enable knowledge and data integration across specialised research community boundaries, thereby breaking silos.
Assuntos
Ontologias Biológicas , Disciplinas das Ciências Biológicas , Estudo de Associação Genômica Ampla , FenótipoRESUMO
The Human Phenotype Ontology (HPO, https://hpo.jax.org) was launched in 2008 to provide a comprehensive logical standard to describe and computationally analyze phenotypic abnormalities found in human disease. The HPO is now a worldwide standard for phenotype exchange. The HPO has grown steadily since its inception due to considerable contributions from clinical experts and researchers from a diverse range of disciplines. Here, we present recent major extensions of the HPO for neurology, nephrology, immunology, pulmonology, newborn screening, and other areas. For example, the seizure subontology now reflects the International League Against Epilepsy (ILAE) guidelines and these enhancements have already shown clinical validity. We present new efforts to harmonize computational definitions of phenotypic abnormalities across the HPO and multiple phenotype ontologies used for animal models of disease. These efforts will benefit software such as Exomiser by improving the accuracy and scope of cross-species phenotype matching. The computational modeling strategy used by the HPO to define disease entities and phenotypic features and distinguish between them is explained in detail.We also report on recent efforts to translate the HPO into indigenous languages. Finally, we summarize recent advances in the use of HPO in electronic health record systems.
Assuntos
Ontologias Biológicas , Biologia Computacional/métodos , Bases de Dados Factuais , Doença/genética , Genoma , Fenótipo , Software , Animais , Modelos Animais de Doenças , Genótipo , Humanos , Recém-Nascido , Cooperação Internacional , Internet , Triagem Neonatal/métodos , Farmacogenética/métodos , Terminologia como AssuntoRESUMO
BACKGROUND: More than one-third of individuals experience post-acute sequelae of SARS-CoV-2 infection (PASC, which includes long-COVID). The objective is to identify risk factors associated with PASC/long-COVID diagnosis. METHODS: This was a retrospective case-control study including 31 health systems in the United States from the National COVID Cohort Collaborative (N3C). 8,325 individuals with PASC (defined by the presence of the International Classification of Diseases, version 10 code U09.9 or a long-COVID clinic visit) matched to 41,625 controls within the same health system and COVID index date within ± 45 days of the corresponding case's earliest COVID index date. Measurements of risk factors included demographics, comorbidities, treatment and acute characteristics related to COVID-19. Multivariable logistic regression, random forest, and XGBoost were used to determine the associations between risk factors and PASC. RESULTS: Among 8,325 individuals with PASC, the majority were > 50 years of age (56.6%), female (62.8%), and non-Hispanic White (68.6%). In logistic regression, middle-age categories (40 to 69 years; OR ranging from 2.32 to 2.58), female sex (OR 1.4, 95% CI 1.33-1.48), hospitalization associated with COVID-19 (OR 3.8, 95% CI 3.05-4.73), long (8-30 days, OR 1.69, 95% CI 1.31-2.17) or extended hospital stay (30 + days, OR 3.38, 95% CI 2.45-4.67), receipt of mechanical ventilation (OR 1.44, 95% CI 1.18-1.74), and several comorbidities including depression (OR 1.50, 95% CI 1.40-1.60), chronic lung disease (OR 1.63, 95% CI 1.53-1.74), and obesity (OR 1.23, 95% CI 1.16-1.3) were associated with increased likelihood of PASC diagnosis or care at a long-COVID clinic. Characteristics associated with a lower likelihood of PASC diagnosis or care at a long-COVID clinic included younger age (18 to 29 years), male sex, non-Hispanic Black race, and comorbidities such as substance abuse, cardiomyopathy, psychosis, and dementia. More doctors per capita in the county of residence was associated with an increased likelihood of PASC diagnosis or care at a long-COVID clinic. Our findings were consistent in sensitivity analyses using a variety of analytic techniques and approaches to select controls. CONCLUSIONS: This national study identified important risk factors for PASC diagnosis such as middle age, severe COVID-19 disease, and specific comorbidities. Further clinical and epidemiological research is needed to better understand underlying mechanisms and the potential role of vaccines and therapeutics in altering PASC course.
Assuntos
COVID-19 , SARS-CoV-2 , Pessoa de Meia-Idade , Feminino , Masculino , Humanos , Adulto , Idoso , Adolescente , Adulto Jovem , COVID-19/epidemiologia , Síndrome de COVID-19 Pós-Aguda , Estudos de Casos e Controles , Estudos Retrospectivos , Fatores de Risco , Progressão da DoençaRESUMO
The clinical evaluation of a genetic syndrome relies upon recognition of a characteristic pattern of signs or symptoms to guide targeted genetic testing for confirmation of the diagnosis. However, individuals displaying a single phenotype of a complex syndrome may not meet criteria for clinical diagnosis or genetic testing. Here, we present a phenome-wide association study (PheWAS) approach to systematically explore the phenotypic expressivity of common and rare alleles in genes associated with four well-described syndromic diseases (Alagille (AS), Marfan (MS), DiGeorge (DS), and Noonan (NS) syndromes) in the general population. Using human phenotype ontology (HPO) terms, we systematically mapped 60 phenotypes related to AS, MS, DS and NS in 337,198 unrelated white British from the UK Biobank (UKBB) based on their hospital admission records, self-administrated questionnaires, and physiological measurements. We performed logistic regression adjusting for age, sex, and the first 5 genetic principal components, for each phenotype and each variant in the target genes (JAG1, NOTCH2 FBN1, PTPN1 and RAS-opathy genes, and genes in the 22q11.2 locus) and performed a gene burden test. Overall, we observed multiple phenotype-genotype correlations, such as the association between variation in JAG1, FBN1, PTPN11 and SOS2 with diastolic and systolic blood pressure; and pleiotropy among multiple variants in syndromic genes. For example, rs11066309 in PTPN11 was significantly associated with a lower body mass index, an increased risk of hypothyroidism and a smaller size for gestational age, all in concordance with NS-related phenotypes. Similarly, rs589668 in FBN1 was associated with an increase in body height and blood pressure, and a reduced body fat percentage as observed in Marfan syndrome. Our findings suggest that the spectrum of associations of common and rare variants in genes involved in syndromic diseases can be extended to individual phenotypes within the general population.
Assuntos
Variação Biológica da População/genética , Estudos de Associação Genética/métodos , Estudo de Associação Genômica Ampla/métodos , Síndrome de Alagille/genética , Alelos , Síndrome de DiGeorge/genética , Feminino , Frequência do Gene/genética , Predisposição Genética para Doença/genética , Testes Genéticos/métodos , Variação Genética/genética , Humanos , Masculino , Síndrome de Marfan/genética , Síndrome de Noonan/genética , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Reino Unido , População Branca/genéticaRESUMO
Technological advances in both genome sequencing and prenatal imaging are increasing our ability to accurately recognize and diagnose Mendelian conditions prenatally. Phenotype-driven early genetic diagnosis of fetal genetic disease can help to strategize treatment options and clinical preventive measures during the perinatal period, to plan in utero therapies, and to inform parental decision-making. Fetal phenotypes of genetic diseases are often unique and at present are not well understood; more comprehensive knowledge about prenatal phenotypes and computational resources have an enormous potential to improve diagnostics and translational research. The Human Phenotype Ontology (HPO) has been widely used to support diagnostics and translational research in human genetics. To better support prenatal usage, the HPO consortium conducted a series of workshops with a group of domain experts in a variety of medical specialties, diagnostic techniques, as well as diseases and phenotypes related to prenatal medicine, including perinatal pathology, musculoskeletal anomalies, neurology, medical genetics, hydrops fetalis, craniofacial malformations, cardiology, neonatal-perinatal medicine, fetal medicine, placental pathology, prenatal imaging, and bioinformatics. We expanded the representation of prenatal phenotypes in HPO by adding 95 new phenotype terms under the Abnormality of prenatal development or birth (HP:0001197) grouping term, and revised definitions, synonyms, and disease annotations for most of the 152 terms that existed before the beginning of this effort. The expansion of prenatal phenotypes in HPO will support phenotype-driven prenatal exome and genome sequencing for precision genetic diagnostics of rare diseases to support prenatal care.
Assuntos
Biologia Computacional , Placenta , Recém-Nascido , Humanos , Feminino , Gravidez , Biologia Computacional/métodos , Fenótipo , Doenças Raras , Sequenciamento do ExomaRESUMO
PURPOSE: Genomic test results, regardless of laboratory variant classification, require clinical practitioners to judge the applicability of a variant for medical decisions. Teaching and standardizing clinical interpretation of genomic variation calls for a methodology or tool. METHODS: To generate such a tool, we distilled the Clinical Genome Resource framework of causality and the American College of Medical Genetics/Association of Molecular Pathology and Quest Diagnostic Laboratory scoring of variant deleteriousness into the Clinical Variant Analysis Tool (CVAT). Applying this to 289 clinical exome reports, we compared the performance of junior practitioners with that of experienced medical geneticists and assessed the utility of reported variants. RESULTS: CVAT enabled performance comparable to that of experienced medical geneticists. In total, 124 of 289 (42.9%) exome reports and 146 of 382 (38.2%) reported variants supported a diagnosis. Overall, 10.5% (1 pathogenic [P] or likely pathogenic [LP] variant and 39 variants of uncertain significance [VUS]) of variants were reported in genes without established disease association; 20.2% (23 P/LP and 54 VUS) were in genes without sufficient phenotypic concordance; 7.3% (15 P/LP and 13 VUS) conflicted with the known molecular disease mechanism; and 24% (91 VUS) had insufficient evidence for deleteriousness. CONCLUSION: Implementation of CVAT standardized clinical interpretation of genomic variation and emphasized the need for collaborative and transparent reporting of genomic variation.
Assuntos
Testes Genéticos , Variação Genética , Exoma , Testes Genéticos/métodos , Variação Genética/genética , Genômica/métodos , Humanos , Sequenciamento do ExomaRESUMO
BACKGROUND: Non-steroidal anti-inflammatory drugs (NSAIDs) are commonly used to reduce pain, fever, and inflammation but have been associated with complications in community-acquired pneumonia. Observations shortly after the start of the COVID-19 pandemic in 2020 suggested that ibuprofen was associated with an increased risk of adverse events in COVID-19 patients, but subsequent observational studies failed to demonstrate increased risk and in one case showed reduced risk associated with NSAID use. METHODS: A 38-center retrospective cohort study was performed that leveraged the harmonized, high-granularity electronic health record data of the National COVID Cohort Collaborative. A propensity-matched cohort of 19,746 COVID-19 inpatients was constructed by matching cases (treated with NSAIDs at the time of admission) and 19,746 controls (not treated) from 857,061 patients with COVID-19 available for analysis. The primary outcome of interest was COVID-19 severity in hospitalized patients, which was classified as: moderate, severe, or mortality/hospice. Secondary outcomes were acute kidney injury (AKI), extracorporeal membrane oxygenation (ECMO), invasive ventilation, and all-cause mortality at any time following COVID-19 diagnosis. RESULTS: Logistic regression showed that NSAID use was not associated with increased COVID-19 severity (OR: 0.57 95% CI: 0.53-0.61). Analysis of secondary outcomes using logistic regression showed that NSAID use was not associated with increased risk of all-cause mortality (OR 0.51 95% CI: 0.47-0.56), invasive ventilation (OR: 0.59 95% CI: 0.55-0.64), AKI (OR: 0.67 95% CI: 0.63-0.72), or ECMO (OR: 0.51 95% CI: 0.36-0.7). In contrast, the odds ratios indicate reduced risk of these outcomes, but our quantitative bias analysis showed E-values of between 1.9 and 3.3 for these associations, indicating that comparatively weak or moderate confounder associations could explain away the observed associations. CONCLUSIONS: Study interpretation is limited by the observational design. Recording of NSAID use may have been incomplete. Our study demonstrates that NSAID use is not associated with increased COVID-19 severity, all-cause mortality, invasive ventilation, AKI, or ECMO in COVID-19 inpatients. A conservative interpretation in light of the quantitative bias analysis is that there is no evidence that NSAID use is associated with risk of increased severity or the other measured outcomes. Our results confirm and extend analogous findings in previous observational studies using a large cohort of patients drawn from 38 centers in a nationally representative multicenter database.
Assuntos
Injúria Renal Aguda , COVID-19 , Anti-Inflamatórios não Esteroides/efeitos adversos , Teste para COVID-19 , Estudos de Coortes , Humanos , Pandemias , Estudos RetrospectivosRESUMO
The rapidly decreasing cost of gene sequencing has resulted in a deluge of genomic data from across the tree of life; however, outside a few model organism databases, genomic data are limited in their scientific impact because they are not accompanied by computable phenomic data. The majority of phenomic data are contained in countless small, heterogeneous phenotypic data sets that are very difficult or impossible to integrate at scale because of variable formats, lack of digitization, and linguistic problems. One powerful solution is to represent phenotypic data using data models with precise, computable semantics, but adoption of semantic standards for representing phenotypic data has been slow, especially in biodiversity and ecology. Some phenotypic and trait data are available in a semantic language from knowledge bases, but these are often not interoperable. In this review, we will compare and contrast existing ontology and data models, focusing on nonhuman phenotypes and traits. We discuss barriers to integration of phenotypic data and make recommendations for developing an operationally useful, semantically interoperable phenotypic data ecosystem.
Assuntos
Bases de Dados Genéticas , Bases de Conhecimento , Fenômica , Animais , Classificação , Biologia Computacional , Ecossistema , Interação Gene-Ambiente , Humanos , Modelos Biológicos , Modelos Genéticos , Modelos Estatísticos , Fenótipo , SemânticaRESUMO
BACKGROUND: 16p13.11 microduplication syndrome has a variable presentation and is characterized primarily by neurodevelopmental and physical phenotypes resulting from copy number variation at chromosome 16p13.11. Given its variability, there may be features that have not yet been reported. The goal of this study was to use a patient "self-phenotyping" survey to collect data directly from patients to further characterize the phenotypes of 16p13.11 microduplication syndrome. OBJECTIVE: This study aimed to (1) discover self-identified phenotypes in 16p13.11 microduplication syndrome that have been underrepresented in the scientific literature and (2) demonstrate that self-phenotyping tools are valuable sources of data for the medical and scientific communities. METHODS: As part of a large study to compare and evaluate patient self-phenotyping surveys, an online survey tool, Phenotypr, was developed for patients with rare disorders to self-report phenotypes. Participants with 16p13.11 microduplication syndrome were recruited through the Boston Children's Hospital 16p13.11 Registry. Either the caregiver, parent, or legal guardian of an affected child or the affected person (if aged 18 years or above) completed the survey. Results were securely transferred to a Research Electronic Data Capture database and aggregated for analysis. RESULTS: A total of 19 participants enrolled in the study. Notably, among the 19 participants, aggression and anxiety were mentioned by 3 (16%) and 4 (21%) participants, respectively, which is an increase over the numbers in previously published literature. Additionally, among the 19 participants, 3 (16%) had asthma and 2 (11%) had other immunological disorders, both of which have not been previously described in the syndrome. CONCLUSIONS: Several phenotypes might be underrepresented in the previous 16p13.11 microduplication literature, and new possible phenotypes have been identified. Whenever possible, patients should continue to be referenced as a source of complete phenotyping data on their condition. Self-phenotyping may lead to a better understanding of the prevalence of phenotypes in genetic disorders and may identify previously unreported phenotypes.
Assuntos
Variações do Número de Cópias de DNA , Família , Variação Biológica da População , Estudos de Coortes , Humanos , FenótipoRESUMO
In many disciplines, data are highly decentralized across thousands of online databases (repositories, registries, and knowledgebases). Wringing value from such databases depends on the discipline of data science and on the humble bricks and mortar that make integration possible; identifiers are a core component of this integration infrastructure. Drawing on our experience and on work by other groups, we outline 10 lessons we have learned about the identifier qualities and best practices that facilitate large-scale data integration. Specifically, we propose actions that identifier practitioners (database providers) should take in the design, provision and reuse of identifiers. We also outline the important considerations for those referencing identifiers in various circumstances, including by authors and data generators. While the importance and relevance of each lesson will vary by context, there is a need for increased awareness about how to avoid and manage common identifier problems, especially those related to persistence and web-accessibility/resolvability. We focus strongly on web-based identifiers in the life sciences; however, the principles are broadly relevant to other disciplines.