ABSTRACT
In response to the rapidly evolving coronavirus disease 2019 (COVID-19) pandemic, the All of Us Research Program longitudinal cohort study developed the COVID-19 Participant Experience (COPE) survey to better understand the pandemic experiences and health impacts of COVID-19 on diverse populations within the United States. Six survey versions were deployed between May 2020 and March 2021, covering mental health, loneliness, activity, substance use, and discrimination, as well as COVID-19 symptoms, testing, treatment, and vaccination. A total of 104,910 All of Us Research Program participants, of whom over 73% were from communities traditionally underrepresented in biomedical research, completed 275,201 surveys; 9,693 completed all 6 surveys. Response rates varied widely among demographic groups and were lower among participants from certain racial and ethnic minority populations, participants with low income or educational attainment, and participants with a Spanish language preference. Survey modifications improved participant response rates between the first and last surveys (13.9% to 16.1%, P < 0.001). This paper describes a data set with longitudinal COVID-19 survey data in a large, diverse population that will enable researchers to address important questions related to the pandemic, a data set that is of additional scientific value when combined with the program's other data sources.
Subject(s)
COVID-19 , Population Health , Humans , United States/epidemiology , COVID-19/epidemiology , Ethnicity , SARS-CoV-2 , Longitudinal Studies , Minority GroupsABSTRACT
The Electronic Medical Records and Genomics (eMERGE) network is a network of medical centers with electronic medical records linked to existing biorepository samples for genomic discovery and genomic medicine research. The network sought to unify the genetic results from 78 Illumina and Affymetrix genotype array batches from 12 contributing medical centers for joint association analysis of 83,717 human participants. In this report, we describe the imputation of eMERGE results and methods to create the unified imputed merged set of genome-wide variant genotype data. We imputed the data using the Michigan Imputation Server, which provides a missing single-nucleotide variant genotype imputation service using the minimac3 imputation algorithm with the Haplotype Reference Consortium genotype reference set. We describe the quality control and filtering steps used in the generation of this data set and suggest generalizable quality thresholds for imputation and phenotype association studies. To test the merged imputed genotype set, we replicated a previously reported chromosome 6 HLA-B herpes zoster (shingles) association and discovered a novel zoster-associated loci in an epigenetic binding site near the terminus of chromosome 3 (3p29).
Subject(s)
Electronic Health Records , Genetic Predisposition to Disease , Genome-Wide Association Study , Herpes Zoster/genetics , Algorithms , Black People/genetics , Chromosomes, Human/genetics , Female , Haplotypes/genetics , Homozygote , Humans , Male , Phenotype , Polymorphism, Single Nucleotide/genetics , Principal Component Analysis , White People/geneticsABSTRACT
IMPORTANCE: Scales often arise from multi-item questionnaires, yet commonly face item non-response. Traditional solutions use weighted mean (WMean) from available responses, but potentially overlook missing data intricacies. Advanced methods like multiple imputation (MI) address broader missing data, but demand increased computational resources. Researchers frequently use survey data in the All of Us Research Program (All of Us), and it is imperative to determine if the increased computational burden of employing MI to handle non-response is justifiable. OBJECTIVES: Using the 5-item Physical Activity Neighborhood Environment Scale (PANES) in All of Us, this study assessed the tradeoff between efficacy and computational demands of WMean, MI, and inverse probability weighting (IPW) when dealing with item non-response. MATERIALS AND METHODS: Synthetic missingness, allowing 1 or more item non-response, was introduced into PANES across 3 missing mechanisms and various missing percentages (10%-50%). Each scenario compared WMean of complete questions, MI, and IPW on bias, variability, coverage probability, and computation time. RESULTS: All methods showed minimal biases (all <5.5%) for good internal consistency, with WMean suffered most with poor consistency. IPW showed considerable variability with increasing missing percentage. MI required significantly more computational resources, taking >8000 and >100 times longer than WMean and IPW in full data analysis, respectively. DISCUSSION AND CONCLUSION: The marginal performance advantages of MI for item non-response in highly reliable scales do not warrant its escalated cloud computational burden in All of Us, particularly when coupled with computationally demanding post-imputation analyses. Researchers using survey scales with low missingness could utilize WMean to reduce computing burden.
ABSTRACT
The All of Us Research Program's Data and Research Center (DRC) was established to help acquire, curate, and provide access to one of the world's largest and most diverse datasets for precision medicine research. Already, over 500,000 participants are enrolled in All of Us, 80% of whom are underrepresented in biomedical research, and data are being analyzed by a community of over 2,300 researchers. The DRC created this thriving data ecosystem by collaborating with engaged participants, innovative program partners, and empowered researchers. In this review, we first describe how the DRC is organized to meet the needs of this broad group of stakeholders. We then outline guiding principles, common challenges, and innovative approaches used to build the All of Us data ecosystem. Finally, we share lessons learned to help others navigate important decisions and trade-offs in building a modern biomedical data platform.
Subject(s)
Biomedical Research , Population Health , Humans , Ecosystem , Precision MedicineABSTRACT
The All of Us Research Program (All of Us) is a national effort to accelerate health research by exploring the relationship between lifestyle, environment, and genetics. It is set to become one of the largest research efforts in U.S. history, aiming to build a national resource of data from at least one million participants. All of Us aims to address the need for more diversity in research and set the stage for that diversity to be leveraged in precision medicine research to come. This paper describes how the program assessed demographic characteristics of participants who have enrolled in other U.S. biomedical research cohorts to better understand which groups are traditionally represented or underrepresented in biomedical research. We 1) reviewed the enrollment characteristics of national cohort studies like All of Us, and 2) surveyed the literature, focusing on key diversity categories essential to the program's enrollment aims. Based on these efforts, All of Us emphasizes enrollment of racial and ethnic minorities, and has formally designated the following additional groups as historically underrepresented: individuals-with inadequate access to medical care; under the age of 18 or over 65; with an annual household income at or below 200% of the federal poverty level; who have a cognitive or physical disability; have less than a high school education or equivalent; are intersex; identify as a sexual or gender minority; or live in rural or non-metropolitan areas. Research accounting for wider demographic variability is critical. Only by ensuring diversity and by addressing the very barriers that limit it, can we position All of Us to better understand and tackle health disparities.
Subject(s)
Biomedical Research/methods , Cultural Diversity , Demography/methods , Biomedical Research/ethics , Cohort Studies , Ethnicity , Female , Humans , Male , Minority Groups , Population Health , Precision Medicine/methods , Racial Groups , United StatesABSTRACT
There is growing public demand that research participants receive all of their results, regardless of whether clinical action is indicated. Instead of the standard practice of returning only actionable results, we propose a reconceptualization called "return of value" to encompass the varied ways in which research participants value specific results and more general information they receive beyond actionable results. Our proposal is supported by a national survey of a diverse sample, which found that receiving research results would be valuable to most (78.5 percent) and would make them more likely to trust researchers (70.3 percent). Respondents highly valued results revealing genetic effects on medication response and predicting disease risk, as well as information about nearby clinical trials and updates on how their data were used. The information most valued varied by education, race/ethnicity, and age. Policies are needed to enable return of information in ways that recognize participants' differing informational needs and values.