RESUMO
Genomic data from millions of individuals have been generated worldwide to drive discovery and clinical impact in precision medicine. Lowering the barriers to using these data collectively is needed to equitably realize the benefits of the diversity and scale of population data. We examine the current landscape of global genomic data sharing, including the evolution of data sharing models from data aggregation through to data visiting, and for certain use cases, cross-cohort analysis using federated approaches across multiple environments. We highlight emerging examples of best practice relating to participant, patient and community engagement; evolution of technical standards, tools and infrastructure; and impact of research and health-care policy. We outline 12 actions we can all take together to scale up efforts to enable safe global data sharing and move beyond projects demonstrating feasibility to routinely cross-analysing research and clinical data sets, optimizing benefit.
RESUMO
The UK Biobank project is a prospective cohort study with deep genetic and phenotypic data collected on approximately 500,000 individuals from across the United Kingdom, aged between 40 and 69 at recruitment. The open resource is unique in its size and scope. A rich variety of phenotypic and health-related information is available on each participant, including biological measurements, lifestyle indicators, biomarkers in blood and urine, and imaging of the body and brain. Follow-up information is provided by linking health and medical records. Genome-wide genotype data have been collected on all participants, providing many opportunities for the discovery of new genetic associations and the genetic bases of complex traits. Here we describe the centralized analysis of the genetic data, including genotype quality, properties of population structure and relatedness of the genetic data, and efficient phasing and genotype imputation that increases the number of testable variants to around 96 million. Classical allelic variation at 11 human leukocyte antigen genes was imputed, resulting in the recovery of signals with known associations between human leukocyte antigen alleles and many diseases.
Assuntos
Bases de Dados Factuais , Genômica , Fenótipo , Adulto , Idoso , Alelos , Biomarcadores/sangue , Biomarcadores/urina , Estatura/genética , Encéfalo/diagnóstico por imagem , Estudos de Coortes , Bases de Dados Genéticas , Registros Eletrônicos de Saúde , Família , Feminino , Estudo de Associação Genômica Ampla , Haplótipos/genética , Humanos , Estilo de Vida , Complexo Principal de Histocompatibilidade/genética , Masculino , Pessoa de Meia-Idade , Controle de Qualidade , Grupos Raciais/genética , Reino UnidoRESUMO
UK Biobank is a large-scale prospective study with deep phenotyping and genomic data. Its open-access policy allows researchers worldwide, from academia or industry, to perform health research in the public interest. Between 2006 and 2010, the study recruited 502,000 adults aged 40-69 years from the general population of the United Kingdom. At enrolment, participants provided information on a wide range of factors, physical measurements were taken, and biological samples (blood, urine and saliva) were collected for long-term storage. Participants have now been followed up for over a decade with more than 52,000 incident cancer cases recorded. The study continues to be enhanced with repeat assessments, web-based questionnaires, multi-modal imaging, and conversion of the stored biological samples to genomic and other '-omic' data. The study has already demonstrated its value in enabling research into the determinants of cancer, and future planned enhancements will make the resource even more valuable to cancer researchers. Over 26,000 researchers worldwide are currently using the data, performing a wide range of cancer research. UK Biobank is uniquely placed to transform our understanding of the causes of cancer development and progression, and drive improvements in cancer treatment and prevention over the coming decades.
Assuntos
Bancos de Espécimes Biológicos , Neoplasias , Adulto , Humanos , Estudos Prospectivos , Inquéritos e Questionários , Reino Unido/epidemiologiaRESUMO
Population-based prospective studies, such as UK Biobank, are valuable for generating and testing hypotheses about the potential causes of human disease. We describe how UK Biobank's study design, data access policies, and approaches to statistical analysis can help to minimize error and improve the interpretability of research findings, with implications for other population-based prospective studies being established worldwide.
Assuntos
Bancos de Espécimes Biológicos , Biobanco do Reino Unido , Humanos , Estudos Prospectivos , Projetos de Pesquisa , Análise de DadosRESUMO
UK Biobank is an intensively characterised prospective cohort of 500,000 adults aged 40-69 years when recruited between 2006 and 2010. The study was established to enable researchers worldwide to undertake health-related research in the public interest. The existence of such a large, detailed prospective cohort with a high degree of participant engagement enabled its rapid repurposing for coronavirus disease-2019 (COVID-19) research. In response to the pandemic, the frequency of updates on hospitalisations and deaths among participants was immediately increased, and new data linkages were established to national severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) testing and primary care health records to facilitate research into the determinants of severe COVID-19. UK Biobank also instigated several sub-studies on COVID-19. In 2020, monthly blood samples were collected from approximately 20,000 individuals to investigate the distribution and determinants of SARS-CoV-2 infection, and to assess the persistence of antibodies following infection with another blood sample collected after 12 months. UK Biobank also performed repeat imaging of approximately 2,000 participants (half of whom had evidence of previous SARS-CoV-2 infection and half did not) to investigate the impact of the virus on changes in measures of internal organ structure and function. In addition, approximately 200,000 UK Biobank participants took part in a self-test SARS-CoV-2 antibody sub-study (between February and November 2021) to collect objective data on previous SARS-CoV-2 infection. These studies are enabling unique research into the genetic, lifestyle and environmental determinants of SARS-CoV-2 infection and severe COVID-19, as well as their long-term health effects. UK Biobank's contribution to the national and international response to the pandemic represents a case study for its broader value, now and in the future, to precision medicine research.
RESUMO
Recently, large scale genomic projects such as All of Us and the UK Biobank have introduced a new research paradigm where data are stored centrally in cloud-based Trusted Research Environments (TREs). To characterize the advantages and drawbacks of different TRE attributes in facilitating cross-cohort analysis, we conduct a Genome-Wide Association Study of standard lipid measures using two approaches: meta-analysis and pooled analysis. Comparison of full summary data from both approaches with an external study shows strong correlation of known loci with lipid levels (R2 ~ 83-97%). Importantly, 90 variants meet the significance threshold only in the meta-analysis and 64 variants are significant only in pooled analysis, with approximately 20% of variants in each of those groups being most prevalent in non-European, non-Asian ancestry individuals. These findings have important implications, as technical and policy choices lead to cross-cohort analyses generating similar, but not identical results, particularly for non-European ancestral populations.
Assuntos
Estudo de Associação Genômica Ampla , Saúde da População , Humanos , Genômica , Políticas , LipídeosRESUMO
BACKGROUND: The social determinants of ethnic disparities in risk of SARS-CoV-2 infection during the first wave of the pandemic in the UK remain unclear. METHODS: In May 2020, a total of 20 195 adults were recruited from the general population into the UK Biobank SARS-CoV-2 Serology Study. Between mid-May and mid-November 2020, participants provided monthly blood samples. At the end of the study, participants completed a questionnaire on social factors during different periods of the pandemic. Logistic regression yielded ORs for the association between ethnicity and SARS-CoV-2 immunoglobulin G antibodies (indicating prior infection) using blood samples collected in July 2020, immediately after the first wave. RESULTS: After exclusions, 14 571 participants (mean age 56; 58% women) returned a blood sample in July, of whom 997 (7%) had SARS-CoV-2 antibodies. Seropositivity was strongly related to ethnicity: compared with those of White ethnicity, ORs (adjusted for age and sex) for Black, South Asian, Chinese, Mixed and Other ethnic groups were 2.66 (95% CI 1.94-3.60), 1.66 (1.15-2.34), 0.99 (0.42-1.99), 1.42 (1.03-1.91) and 1.79 (1.27-2.47), respectively. Additional adjustment for social factors reduced the overall likelihood ratio statistics for ethnicity by two-thirds (67%; mostly from occupational factors and UK region of residence); more precise measurement of social factors may have further reduced the association. CONCLUSIONS: This study identifies social factors that are likely to account for much of the ethnic disparities in SARS-CoV-2 infection during the first wave in the UK, and highlights the particular relevance of occupation and residential region in the pathway between ethnicity and SARS-CoV-2 infection.
Assuntos
COVID-19 , Adulto , Humanos , Feminino , Pessoa de Meia-Idade , Masculino , SARS-CoV-2 , Fatores Sociais , Bancos de Espécimes Biológicos , Determinantes Sociais da Saúde , Inquéritos e QuestionáriosRESUMO
BACKGROUND: Little is known about the persistence of antibodies after the first year following SARS-CoV-2 infection. We aimed to determine the proportion of individuals that maintain detectable levels of SARS-CoV-2 antibodies over an 18-month period following infection. METHODS: Population-based prospective study of 20 000 UK Biobank participants and their adult relatives recruited in May 2020. The proportion of SARS-CoV-2 cases testing positive for immunoglobulin G (IgG) antibodies against the spike protein (IgG-S), and the nucleocapsid protein (IgG-N), was calculated at varying intervals following infection. RESULTS: Overall, 20 195 participants were recruited. Their median age was 56 years (IQR 39-68), 56% were female and 88% were of white ethnicity. The proportion of SARS-CoV-2 cases with IgG-S antibodies following infection remained high (92%, 95% CI 90%-93%) at 6 months after infection. Levels of IgG-N antibodies following infection gradually decreased from 92% (95% CI 88%-95%) at 3 months to 72% (95% CI 70%-75%) at 18 months. There was no strong evidence of heterogeneity in antibody persistence by age, sex, ethnicity or socioeconomic deprivation. CONCLUSION: This study adds to the limited evidence on the long-term persistence of antibodies following SARS-CoV-2 infection, with likely implications for waning immunity following infection and the use of IgG-N in population surveys.