RESUMEN
Data commons have emerged as the best current method for enabling data aggregation across multiple projects and multiple data sources. Good data harmonization techniques are critical to maintain quality of data within a data commons, as well as to allow future meta-analysis across different data commons. We present some of the current best practices for data harmonization.
Asunto(s)
Recolección de Datos , Difusión de la Información , Informática Médica , Acceso a la Información , Algoritmos , Investigación Biomédica/estadística & datos numéricos , Genómica , Humanos , Metaanálisis como Asunto , Neoplasias/genética , Neoplasias/terapia , Análisis de Secuencia de ADN , Resultado del TratamientoRESUMEN
Research on the COVID-19 pandemic revealed a disproportionate burden of COVID-19 infection and death among underserved populations and exposed low rates of SARS-CoV-2 testing in these communities. A landmark National Institutes of Health (NIH) funding initiative, the Rapid Acceleration of Diagnostics-Underserved Populations (RADx-UP) program, was developed to address the research gap in understanding the adoption of COVID-19 testing in underserved populations. This program is the single largest investment in health disparities and community-engaged research in the history of the NIH. The RADx-UP Testing Core (TC) provides community-based investigators with essential scientific expertise and guidance on COVID-19 diagnostics. This commentary describes the first 2 years of the TC's experience, highlighting the challenges faced and insights gained to safely and effectively deploy large-scale diagnostics for community-initiated research in underserved populations during a pandemic. The success of RADx-UP shows that community-based research to increase access and uptake of testing among underserved populations can be accomplished during a pandemic with tools, resources, and multidisciplinary expertise provided by a centralized testing-specific coordinating center. We developed adaptive tools to support individual testing strategies and frameworks for these diverse studies and ensured continuous monitoring of testing strategies and use of study data. In a rapidly evolving setting of tremendous uncertainty, the TC provided essential and real-time technical expertise to support safe, effective, and adaptive testing. The lessons learned go beyond this pandemic and can serve as a framework for rapid deployment of testing in response to future crises, especially when populations are affected inequitably.
Asunto(s)
COVID-19 , Humanos , COVID-19/diagnóstico , Prueba de COVID-19 , SARS-CoV-2 , Poblaciones Vulnerables , PandemiasRESUMEN
BACKGROUND: Clinical informatics tools to integrate data from multiple sources have the potential to catalyze population health management of childhood cancer survivors at high risk for late heart failure through the implementation of previously validated risk calculators. METHODS: The Oklahoma cohort (n = 365) harnessed data elements from Passport for Care (PFC), and the Duke cohort (n = 274) employed informatics methods to automatically extract chemotherapy exposures from electronic health record (EHR) data for survivors 18 years old and younger at diagnosis. The Childhood Cancer Survivor Study (CCSS) late cardiovascular risk calculator was implemented, and risk groups for heart failure were compared to the Children's Oncology Group (COG) and the International Guidelines Harmonization Group (IGHG) recommendations. Analysis within the Oklahoma cohort assessed disparities in guideline-adherent care. RESULTS: The Oklahoma and Duke cohorts both observed good overall concordance between the CCSS and COG risk groups for late heart failure, with weighted kappa statistics of .70 and .75, respectively. Low-risk groups showed excellent concordance (kappa > .9). Moderate and high-risk groups showed moderate concordance (kappa .44-.60). In the Oklahoma cohort, adolescents at diagnosis were significantly less likely to receive guideline-adherent echocardiogram surveillance compared with survivors younger than 13 years old at diagnosis (odds ratio [OD] 0.22; 95% confidence interval [CI]: 0.10-0.49). CONCLUSIONS: Clinical informatics tools represent a feasible approach to leverage discrete treatment-related data elements from PFC or the EHR to successfully implement previously validated late cardiovascular risk prediction models on a population health level. Concordance of CCSS, COG, and IGHG risk groups using real-world data informs current guidelines and identifies inequities in guideline-adherent care.
RESUMEN
BACKGROUND: Many interventions for widescale distribution of rapid antigen tests for COVID-19 have utilized online, direct-to-consumer (DTC) ordering systems; however, little is known about the sociodemographic characteristics of home-test users. We aimed to characterize the patterns of online orders for rapid antigen tests and determine geospatial and temporal associations with neighborhood characteristics and community incidence of COVID-19, respectively. METHODS: This observational study analyzed online, DTC orders for rapid antigen test kits from beneficiaries of the Say Yes! Covid Test program from March to November 2021 in five communities: Louisville, Kentucky; Indianapolis, Indiana; Fulton County, Georgia; O'ahu, Hawaii; and Ann Arbor/Ypsilanti, Michigan. Using spatial autoregressive models, we assessed the geospatial associations of test kit distribution with Census block-level education, income, age, population density, and racial distribution and Census tract-level Social Vulnerability Index. Lag association analyses were used to measure the association between online rapid antigen kit orders and community-level COVID-19 incidence. RESULTS: In total, 164,402 DTC test kits were ordered during the intervention. Distribution of tests at all sites were significantly geospatially clustered at the block-group level (Moran's I: p < 0.001); however, education, income, age, population density, race, and social vulnerability index were inconsistently associated with test orders across sites. In Michigan, Georgia, and Kentucky, there were strong associations between same-day COVID-19 incidence and test kit orders (Michigan: r = 0.89, Georgia: r = 0.85, Kentucky: r = 0.75). The incidence of COVID-19 during the current day and the previous 6-days increased current DTC orders by 9.0 (95% CI = 1.7, 16.3), 3.0 (95% CI = 1.3, 4.6), and 6.8 (95% CI = 3.4, 10.2) in Michigan, Georgia, and Kentucky, respectively. There was no same-day or 6-day lagged correlation between test kit orders and COVID-19 incidence in Indiana. CONCLUSIONS: Our findings suggest that online ordering is not associated with geospatial clustering based on sociodemographic characteristics. Observed temporal preferences for DTC ordering can guide public health messaging around DTC testing programs.
Asunto(s)
COVID-19 , Humanos , COVID-19/diagnóstico , COVID-19/epidemiología , Factores Sociodemográficos , Escolaridad , Censos , Análisis por ConglomeradosRESUMEN
BACKGROUND: The COVID-19 pandemic caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) continues to evolve as a global health crisis. Although highly effective vaccines have been developed, non-pharmaceutical interventions remain critical to controlling disease transmission. One such intervention-rapid, at-home antigen self-testing-can ease the burden associated with facility-based testing programs and improve testing access in high-risk communities. However, its impact on SARS-CoV-2 community transmission has yet to be definitively evaluated, and the socio-behavioral aspects of testing in underserved populations remain unknown. METHODS: As part of the Rapid Acceleration of Diagnostics-Underserved Populations (RADx-UP) program funded by the National Institutes of Health, we are implementing a public health intervention titled "Say Yes! COVID Test" (SYCT) involving at-home self-testing using a SARS-CoV-2 rapid antigen assay in North Carolina (Greenville, Pitt County) and Tennessee (Chattanooga City, Hamilton County). The intervention is supported by a multifaceted communication and community engagement strategy to ensure widespread awareness and uptake, particularly in marginalized communities. Participants receive test kits either through online orders or via local community distribution partners. To assess the impact of this intervention on SARS-CoV-2 transmission, we will conduct a non-randomized, ecological study using community-level outcomes. Specifically, we will evaluate trends in SARS-CoV-2 cases and hospitalizations, SARS-CoV-2 viral load in wastewater, and population mobility in each community before, during, and after the SYCT intervention. Individuals who choose to participate in SYCT will also have the option to enroll in an embedded prospective cohort substudy gathering participant-level data to evaluate behavioral determinants of at-home self-testing and socio-behavioral mechanisms of SARS-CoV-2 community transmission. DISCUSSION: This is the first large-scale, public health intervention implementing rapid, at-home SARS-CoV-2 self-testing in the United States. The program consists of a novel combination of an at-home testing program, a broad communications and community engagement strategy, an ecological study to assess impact, and a research substudy of the behavioral aspects of testing. The findings from the SYCT project will provide insights into innovative methods to mitigate viral transmission, advance the science of public health communications and community engagement, and evaluate emerging, novel assessments of community transmission of disease.
Asunto(s)
COVID-19 , SARS-CoV-2 , Estudios de Cohortes , Humanos , Pandemias , Estudios Prospectivos , Salud PúblicaRESUMEN
DNA methylation in repetitive elements (RE) suppresses their mobility and maintains genomic stability, and decreases in it are frequently observed in tumor and/or surrogate tissues. Averaging methylation across RE in genome is widely used to quantify global methylation. However, methylation may vary in specific RE and play diverse roles in disease development, thus averaging methylation across RE may lose significant biological information. The ambiguous mapping of short reads by and high cost of current bisulfite sequencing platforms make them impractical for quantifying locus-specific RE methylation. Although microarray-based approaches (particularly Illumina's Infinium methylation arrays) provide cost-effective and robust genome-wide methylation quantification, the number of interrogated CpGs in RE remains limited. We report a random forest-based algorithm (and corresponding R package, REMP) that can accurately predict genome-wide locus-specific RE methylation based on Infinium array profiling data. We validated its prediction performance using alternative sequencing and microarray data. Testing its clinical utility with The Cancer Genome Atlas data demonstrated that our algorithm offers more comprehensively extended locus-specific RE methylation information that can be readily applied to large human studies in a cost-effective manner. Our work has the potential to improve our understanding of the role of global methylation in human diseases, especially cancer.
Asunto(s)
Algoritmos , Metilación de ADN , Genoma Humano , Neoplasias/genética , Secuencias Repetitivas de Ácidos Nucleicos , Análisis de Secuencia de ADN/métodos , Elementos Alu , Islas de CpG , Femenino , Humanos , Elementos de Nucleótido Esparcido Largo , Masculino , Sensibilidad y EspecificidadRESUMEN
The Human Phenotype Ontology (HPO) is widely used in the rare disease community for differential diagnostics, phenotype-driven analysis of next-generation sequence-variation data, and translational research, but a comparable resource has not been available for common disease. Here, we have developed a concept-recognition procedure that analyzes the frequencies of HPO disease annotations as identified in over five million PubMed abstracts by employing an iterative procedure to optimize precision and recall of the identified terms. We derived disease models for 3,145 common human diseases comprising a total of 132,006 HPO annotations. The HPO now comprises over 250,000 phenotypic annotations for over 10,000 rare and common diseases and can be used for examining the phenotypic overlap among common diseases that share risk alleles, as well as between Mendelian diseases and common diseases linked by genomic location. The annotations, as well as the HPO itself, are freely available.
Asunto(s)
Ontología de Genes/tendencias , Enfermedades Genéticas Congénitas/clasificación , Enfermedades Genéticas Congénitas/genética , Fenotipo , Terminología como Asunto , Enfermedades Genéticas Congénitas/patología , Humanos , MEDLINE , Modelos BiológicosRESUMEN
The current version of the Human Disease Ontology (DO) (http://www.disease-ontology.org) database expands the utility of the ontology for the examination and comparison of genetic variation, phenotype, protein, drug and epitope data through the lens of human disease. DO is a biomedical resource of standardized common and rare disease concepts with stable identifiers organized by disease etiology. The content of DO has had 192 revisions since 2012, including the addition of 760 terms. Thirty-two percent of all terms now include definitions. DO has expanded the number and diversity of research communities and community members by 50+ during the past two years. These community members actively submit term requests, coordinate biomedical resource disease representation and provide expert curation guidance. Since the DO 2012 NAR paper, there have been hundreds of term requests and a steady increase in the number of DO listserv members, twitter followers and DO website usage. DO is moving to a multi-editor model utilizing Protégé to curate DO in web ontology language. This will enable closer collaboration with the Human Phenotype Ontology, EBI's Ontology Working Group, Mouse Genome Informatics and the Monarch Initiative among others, and enhance DO's current asserted view and multiple inferred views through reasoning.
Asunto(s)
Ontologías Biológicas , Bases de Datos Factuales , Enfermedad , Enfermedades Genéticas Congénitas , Humanos , Internet , Enfermedades Raras/genéticaRESUMEN
Inter-individual variation in cytosine modifications has been linked to complex traits in humans. Cytosine modification variation is partially controlled by single nucleotide polymorphisms (SNPs), known as modified cytosine quantitative trait loci (mQTL). However, little is known about the role of short tandem repeat polymorphisms (STRPs), a class of structural genetic variants, in regulating cytosine modifications. Utilizing the published data on the International HapMap Project lymphoblastoid cell lines (LCLs), we assessed the relationships between 721 STRPs and the modification levels of 283,540 autosomal CpG sites. Our findings suggest that, in contrast to the predominant cis-acting mode for SNP-based mQTL, STRPs are associated with cytosine modification levels in both cis-acting (local) and trans-acting (distant) modes. In local scans within the ±1 Mb windows of target CpGs, 21, 9, and 21 cis-acting STRP-based mQTL were detected in CEU (Caucasian residents from Utah, USA), YRI (Yoruba people from Ibadan, Nigeria), and the combined samples, respectively. In contrast, 139,420, 76,817, and 121,866 trans-acting STRP-based mQTL were identified in CEU, YRI, and the combined samples, respectively. A substantial proportion of CpG sites detected with local STRP-based mQTL were not associated with SNP-based mQTL, suggesting that STRPs represent an independent class of mQTL. Functionally, genetic variants neighboring CpG-associated STRPs are enriched with genome-wide association study (GWAS) loci for a variety of complex traits and diseases, including cancers, based on the National Human Genome Research Institute (NHGRI) GWAS Catalog. Therefore, elucidating these STRP-based mQTL in addition to SNP-based mQTL can provide novel insights into the genetic architectures of complex traits.
Asunto(s)
Citosina/metabolismo , Repeticiones de Microsatélite , Polimorfismo de Nucleótido Simple , Población Negra/genética , Línea Celular , Mapeo Cromosómico , Epigenómica , Regulación de la Expresión Génica , Estudios de Asociación Genética , Genoma Humano , Proyecto Mapa de Haplotipos , Humanos , Nigeria , Fenotipo , Sitios de Carácter Cuantitativo , Utah , Población Blanca/genéticaRESUMEN
dictyBase (http://dictybase.org) is the model organism database for the social amoeba Dictyostelium discoideum. This contribution provides an update on dictyBase that has been previously presented. During the past 3 years, dictyBase has taken significant strides toward becoming a genome portal for the whole Amoebozoa clade. In its latest release, dictyBase has scaled up to host multiple Dictyostelids, including Dictyostelium purpureum [Sucgang, Kuo, Tian, Salerno, Parikh, Feasley, Dalin, Tu, Huang, Barry et al.(2011) (Comparative genomics of the social amoebae Dictyostelium discoideum and Dictyostelium purpureum. Genome Biol., 12, R20)], Dictyostelium fasciculatum and Polysphondylium pallidum [Heidel, Lawal, Felder, Schilde, Helps, Tunggal, Rivero, John, Schleicher, Eichinger et al. (2011) (Phylogeny-wide analysis of social amoeba genomes highlights ancient origins for complex intercellular communication. Genome Res., 21, 1882-1891)]. The new release includes a new Genome Browser with RNAseq expression, interspecies Basic Local Alignment Search Tool alignments and a unified Basic Local Alignment Search Tool search for cross-species comparisons.
Asunto(s)
Bases de Datos Genéticas , Dictyosteliida/genética , Dictyostelium/genética , Genoma de Protozoos , Genómica , Internet , Proteínas Protozoarias/genética , ARN Protozoario/química , Alineación de Secuencia , Análisis de Secuencia de ARN , Interfaz Usuario-ComputadorRESUMEN
Disease and Gene Annotations database (DGA, http://dga.nubic.northwestern.edu) is a collaborative effort aiming to provide a comprehensive and integrative annotation of the human genes in disease network context by integrating computable controlled vocabulary of the Disease Ontology (DO version 3 revision 2510, which has 8043 inherited, developmental and acquired human diseases), NCBI Gene Reference Into Function (GeneRIF) and molecular interaction network (MIN). DGA integrates these resources together using semantic mappings to build an integrative set of disease-to-gene and gene-to-gene relationships with excellent coverage based on current knowledge. DGA is kept current by periodically reparsing DO, GeneRIF, and MINs. DGA provides a user-friendly and interactive web interface system enabling users to efficiently query, download and visualize the DO tree structure and annotations as a tree, a network graph or a tabular list. To facilitate integrative analysis, DGA provides a web service Application Programming Interface for integration with external analytic tools.
Asunto(s)
Bases de Datos Genéticas , Enfermedad/genética , Genes , Anotación de Secuencia Molecular , Humanos , Internet , Proteínas/genética , Proteínas/metabolismo , Vocabulario ControladoRESUMEN
The Disease Ontology (DO) database (http://disease-ontology.org) represents a comprehensive knowledge base of 8043 inherited, developmental and acquired human diseases (DO version 3, revision 2510). The DO web browser has been designed for speed, efficiency and robustness through the use of a graph database. Full-text contextual searching functionality using Lucene allows the querying of name, synonym, definition, DOID and cross-reference (xrefs) with complex Boolean search strings. The DO semantically integrates disease and medical vocabularies through extensive cross mapping and integration of MeSH, ICD, NCI's thesaurus, SNOMED CT and OMIM disease-specific terms and identifiers. The DO is utilized for disease annotation by major biomedical databases (e.g. Array Express, NIF, IEDB), as a standard representation of human disease in biomedical ontologies (e.g. IDO, Cell line ontology, NIFSTD ontology, Experimental Factor Ontology, Influenza Ontology), and as an ontological cross mappings resource between DO, MeSH and OMIM (e.g. GeneWiki). The DO project (http://diseaseontology.sf.net) has been incorporated into open source tools (e.g. Gene Answers, FunDO) to connect gene and disease biomedical data through the lens of human disease. The next iteration of the DO web browser will integrate DO's extended relations and logical definition representation along with these biomedical resource cross-mappings.
Asunto(s)
Bases de Datos Factuales , Enfermedad/clasificación , Gráficos por Computador , Enfermedad/etiología , Humanos , Semántica , Programas Informáticos , Terminología como Asunto , Interfaz Usuario-Computador , Vocabulario ControladoRESUMEN
Background: Although hypothesized to be the root cause of the pulse oximetry disparities, skin tone and its use for improving medical therapies have yet to be extensively studied. Studies previously used self-reported race as a proxy variable for skin tone. However, this approach cannot account for skin tone variability within race groups and also risks the potential to be confounded by other non-biological factors when modeling data. Therefore, to better evaluate health disparities associated with pulse oximetry, this study aimed to create a unique baseline dataset that included skin tone and electronic health record (EHR) data. Methods: Patients admitted to Duke University Hospital were eligible if they had at least one pulse oximetry value recorded within 5 minutes before an arterial blood gas (ABG) value. We collected skin tone data at 16 different body locations using multiple devices, including administered visual scales, colorimetric, spectrophotometric, and photography via mobile phone cameras. All patients' data were linked in Duke's Protected Analytics Computational Environment (PACE), converted into a common data model, and then de-identified before publication in PhysioNet. Results: Skin tone data were collected from 128 patients. We assessed 167 features per skin location on each patient. We also collected over 2000 images from mobile phones measured in the same controlled environment. Skin tone data are linked with patients' EHR data, such as laboratory data, vital sign recordings, and demographic information. Conclusions: Measuring different aspects of skin tone for each of the sixteen body locations and linking them with patients' EHR data could assist in the development of a more equitable AI model to combat disparities in healthcare associated with skin tone. A common data model format enables easy data federation with similar data from other sources, facilitating multicenter research on skin tone in healthcare. Description: A prospectively collected EHR-linked skin tone measurements database in a common data model with emphasis on pulse oximetry disparities.
RESUMEN
Importance: Pulse oximetry, a ubiquitous vital sign in modern medicine, has inequitable accuracy that disproportionately affects Black and Hispanic patients, with associated increases in mortality, organ dysfunction, and oxygen therapy. Although the root cause of these clinical performance discrepancies is believed to be skin tone, previous retrospective studies used self-reported race or ethnicity as a surrogate for skin tone. Objective: To determine the utility of objectively measured skin tone in explaining pulse oximetry discrepancies. Design Setting and Participants: Admitted hospital patients at Duke University Hospital were eligible for this prospective cohort study if they had pulse oximetry recorded up to 5 minutes prior to arterial blood gas (ABG) measurements. Skin tone was measured across sixteen body locations using administered visual scales (Fitzpatrick Skin Type, Monk Skin Tone, and Von Luschan), reflectance colorimetry (Delfin SkinColorCatch [L*, individual typology angle {ITA}, Melanin Index {MI}]), and reflectance spectrophotometry (Konica Minolta CM-700D [L*], Variable Spectro 1 [L*]). Main Outcomes and Measures: Mean directional bias, variability of bias, and accuracy root mean square (ARMS), comparing pulse oximetry and ABG measurements. Linear mixed-effects models were fitted to estimate mean directional bias while accounting for clinical confounders. Results: 128 patients (57 Black, 56 White) with 521 ABG-pulse oximetry pairs were recruited, none with hidden hypoxemia. Skin tone data was prospectively collected using 6 measurement methods, generating 8 measurements. The collected skin tone measurements were shown to yield differences among each other and overlap with self-reported racial groups, suggesting that skin tone could potentially provide information beyond self-reported race. Among the eight skin tone measurements in this study, and compared to self-reported race, the Monk Scale had the best relationship with differences in pulse oximetry bias (point estimate: -2.40%; 95% CI: -4.32%, -0.48%; p=0.01) when comparing patients with lighter and dark skin tones. Conclusions and relevance: We found clinical performance differences in pulse oximetry, especially in darker skin tones. Additional studies are needed to determine the relative contributions of skin tone measures and other potential factors on pulse oximetry discrepancies.
RESUMEN
OBJECTIVE: Pulse oximetry, a ubiquitous vital sign in modern medicine, has inequitable accuracy that disproportionately affects minority Black and Hispanic patients, with associated increases in mortality, organ dysfunction, and oxygen therapy. Previous retrospective studies used self-reported race or ethnicity as a surrogate for skin tone which is believed to be the root cause of the disparity. Our objective was to determine the utility of skin tone in explaining pulse oximetry discrepancies. DESIGN: Prospective cohort study. SETTING: Patients were eligible if they had pulse oximetry recorded up to 5 minutes before arterial blood gas (ABG) measurements. Skin tone was measured using administered visual scales, reflectance colorimetry, and reflectance spectrophotometry. PARTICIPANTS: Admitted hospital patients at Duke University Hospital. INTERVENTIONS: None. MEASUREMENTS AND MAIN RESULTS: Sao2-Spo2 bias, variation of bias, and accuracy root mean square, comparing pulse oximetry, and ABG measurements. Linear mixed-effects models were fitted to estimate Sao2-Spo2 bias while accounting for clinical confounders.One hundred twenty-eight patients (57 Black, 56 White) with 521 ABG-pulse oximetry pairs were recruited. Skin tone data were prospectively collected using six measurement methods, generating eight measurements. The collected skin tone measurements were shown to yield differences among each other and overlap with self-reported racial groups, suggesting that skin tone could potentially provide information beyond self-reported race. Among the eight skin tone measurements in this study, and compared with self-reported race, the Monk Scale had the best relationship with differences in pulse oximetry bias (point estimate: -2.40%; 95% CI, -4.32% to -0.48%; p = 0.01) when comparing patients with lighter and dark skin tones. CONCLUSIONS: We found clinical performance differences in pulse oximetry, especially in darker skin tones. Additional studies are needed to determine the relative contributions of skin tone measures and other potential factors on pulse oximetry discrepancies.
Asunto(s)
Enfermedad Crítica , Oximetría , Pigmentación de la Piel , Humanos , Oximetría/métodos , Estudios Prospectivos , Femenino , Masculino , Persona de Mediana Edad , Anciano , Estudios de Cohortes , Adulto , Análisis de los Gases de la Sangre/métodos , Población BlancaRESUMEN
dictyBase (http://www.dictybase.org), the model organism database for Dictyostelium, aims to provide the broad biomedical research community with well integrated, high quality data and tools for Dictyostelium discoideum and related species. dictyBase houses the complete genome sequence, ESTs, and the entire body of literature relevant to Dictyostelium. This information is curated to provide accurate gene models and functional annotations, with the goal of fully annotating the genome to provide a 'reference genome' in the Amoebozoa clade. We highlight several new features in the present update: (i) new annotations; (ii) improved interface with web 2.0 functionality; (iii) the initial steps towards a genome portal for the Amoebozoa; (iv) ortholog display; and (v) the complete integration of the Dicty Stock Center with dictyBase.