RESUMO
Biobanks aim to improve our understanding of health and disease by collecting and analysing diverse biological and phenotypic information in large samples. So far, biobanks have largely pursued a population-based sampling strategy, where the individual is the unit of sampling, and familial relatedness occurs sporadically and by chance. This strategy has been remarkably efficient and successful, leading to thousands of scientific discoveries across multiple research domains, and plans for the next wave of biobanks are underway. In this Perspective, we discuss the strengths and limitations of a complementary sampling strategy for future biobanks based on oversampling of close genetic relatives. Such family-based samples facilitate research that clarifies causal relationships between putative risk factors and outcomes, particularly in estimates of genetic effects, because they enable analyses that reduce or eliminate confounding due to familial and demographic factors. Family-based biobank samples would also shed new light on fundamental questions across multiple fields that are often difficult to explore in population-based samples. Despite the potential for higher costs and greater analytical complexity, the many advantages of family-based samples should often outweigh their potential challenges.
Assuntos
Bancos de Espécimes Biológicos , Família , HumanosRESUMO
Genome-wide association studies (GWAS) are commonly used to identify genomic variants that are associated with complex traits, and estimate the magnitude of this association for each variant. However, it has been widely observed that the association estimates of variants tend to be lower in a replication study than in the study that discovered those associations. A phenomenon known as Winner's Curse is responsible for this upward bias present in association estimates of significant variants in the discovery study. We review existing Winner's Curse correction methods which require only GWAS summary statistics in order to make adjustments. In addition, we propose modifications to improve existing methods and propose a novel approach which uses the parametric bootstrap. We evaluate and compare methods, first using a wide variety of simulated data sets and then, using real data sets for three different traits. The metric, estimated mean squared error (MSE) over significant SNPs, was primarily used for method assessment. Our results indicate that widely used conditional likelihood based methods tend to perform poorly. The other considered methods behave much more similarly, with our proposed bootstrap method demonstrating very competitive performance. To complement this review, we have developed an R package, 'winnerscurse' which can be used to implement these various Winner's Curse adjustment methods to GWAS summary statistics.
Assuntos
Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Estudo de Associação Genômica Ampla/métodos , Funções Verossimilhança , Estudos de Associação Genética , Viés , Fenótipo , Polimorfismo de Nucleotídeo Único/genéticaRESUMO
Observational studies are rarely representative of their target population because there are known and unknown factors that affect an individual's choice to participate (the selection mechanism). Selection can cause bias in a given analysis if the outcome is related to selection (conditional on the other variables in the model). Detecting and adjusting for selection bias in practice typically requires access to data on nonselected individuals. Here, we propose methods to detect selection bias in genetic studies by comparing correlations among genetic variants in the selected sample to those expected under no selection. We examine the use of four hypothesis tests to identify induced associations between genetic variants in the selected sample. We evaluate these approaches in Monte Carlo simulations. Finally, we use these approaches in an applied example using data from the UK Biobank (UKBB). The proposed tests suggested an association between alcohol consumption and selection into UKBB. Hence, UKBB analyses with alcohol consumption as the exposure or outcome may be biased by this selection.
RESUMO
Participant overlap can induce overfitting bias into Mendelian randomization (MR) and polygenic risk score (PRS) studies. Here, we evaluated a block jackknife resampling framework for genome-wide association studies (GWAS) and PRS construction to mitigate overfitting bias in MR analyses and implemented this study design in a causal inference setting using data from the UK Biobank. We simulated PRS and MR under three scenarios: (1) using weighted SNP estimates from an external GWAS, (2) using weighted SNP estimates from an overlapping GWAS sample and (3) using a block jackknife resampling framework. Based on a P-value threshold to derive genetic instruments for MR studies (P < 5 × 10-8) and a 10% variance in the exposure explained by all SNPs, block-jackknifing PRS did not suffer from overfitting bias (mean R2 = 0.034) compared with the externally weighted PRS (mean R2 = 0.040). In contrast, genetic instruments derived from overlapping samples explained a higher variance (mean R2 = 0.048) compared with the externally derived score. Overfitting became considerably more severe when using a more liberal P-value threshold to construct PRS (e.g. P < 0.05, overlapping sample PRS mean R2 = 0.103, externally weighted PRS mean R2 = 0.086), whereas estimates using jackknife score remained robust to overfitting (mean R2 = 0.084). Using block jackknife resampling MR in an applied analysis, we examined the effects of body mass index on circulating biomarkers which provided comparable estimates to an externally weighted instrument, whereas the overfitted scores typically provided narrower confidence intervals. Furthermore, we extended this framework into sex-stratified, multivariate and bidirectional settings to investigate the effect of childhood body size on adult testosterone levels.
Assuntos
Estudo de Associação Genômica Ampla , Análise da Randomização Mendeliana , Adulto , Humanos , Fatores de Risco , Índice de Massa Corporal , Polimorfismo de Nucleotídeo Único/genéticaRESUMO
Estimating effects of parental and sibling genotypes (indirect genetic effects) can provide insight into how the family environment influences phenotypic variation. There is growing molecular genetic evidence for effects of parental phenotypes on their offspring (e.g. parental educational attainment), but the extent to which siblings affect each other is currently unclear. Here we used data from samples of unrelated individuals, without (singletons) and with biological full-siblings (non-singletons), to investigate and estimate sibling effects. Indirect genetic effects of siblings increase (or decrease) the covariance between genetic variation and a phenotype. It follows that differences in genetic association estimates between singletons and non-singletons could indicate indirect genetic effects of siblings if there is no heterogeneity in other sources of genetic association between singletons and non-singletons. We used UK Biobank data to estimate polygenic score (PGS) associations for height, BMI and educational attainment in self-reported singletons (N = 50,143) and non-singletons (N = 328,549). The educational attainment PGS association estimate was 12% larger (95% C.I. 3%, 21%) in the non-singleton sample than in the singleton sample, but the height and BMI PGS associations were consistent. Birth order data suggested that the difference in educational attainment PGS associations was driven by individuals with older siblings rather than firstborns. The relationship between number of siblings and educational attainment PGS associations was non-linear; PGS associations were 24% smaller in individuals with 6 or more siblings compared to the rest of the sample (95% C.I. 11%, 38%). We estimate that a 1 SD increase in sibling educational attainment PGS corresponds to a 0.025 year increase in the index individual's years in schooling (95% C.I. 0.013, 0.036). Our results suggest that older siblings may influence the educational attainment of younger siblings, adding to the growing evidence that effects of the environment on phenotypic variation partially reflect social effects of germline genetic variation in relatives.
Assuntos
Sucesso Acadêmico , Irmãos , Escolaridade , Humanos , Herança Multifatorial/genética , PaisRESUMO
BACKGROUND: Individuals who have experienced a stroke, or transient ischemic attack, face a heightened risk of future cardiovascular events. Identification of genetic and molecular risk factors for subsequent cardiovascular outcomes may identify effective therapeutic targets to improve prognosis after an incident stroke. METHODS: We performed genome-wide association studies for subsequent major adverse cardiovascular events (MACE; ncases=51 929; ncontrols=39 980) and subsequent arterial ischemic stroke (AIS; ncases=45 120; ncontrols=46 789) after the first incident stroke within the Million Veteran Program and UK Biobank. We then used genetic variants associated with proteins (protein quantitative trait loci) to determine the effect of 1463 plasma protein abundances on subsequent MACE using Mendelian randomization. RESULTS: Two variants were significantly associated with subsequent cardiovascular events: rs76472767 near gene RNF220 (odds ratio, 0.75 [95% CI, 0.64-0.85]; P=3.69×10-8) with subsequent AIS and rs13294166 near gene LINC01492 (odds ratio, 1.52 [95% CI, 1.37-1.67]; P=3.77×10-8) with subsequent MACE. Using Mendelian randomization, we identified 2 proteins with an effect on subsequent MACE after a stroke: CCL27 ([C-C motif chemokine 27], effect odds ratio, 0.77 [95% CI, 0.66-0.88]; adjusted P=0.05) and TNFRSF14 ([tumor necrosis factor receptor superfamily member 14], effect odds ratio, 1.42 [95% CI, 1.24-1.60]; adjusted P=0.006). These proteins are not associated with incident AIS and are implicated to have a role in inflammation. CONCLUSIONS: We found evidence that 2 proteins with little effect on incident stroke appear to influence subsequent MACE after incident AIS. These associations suggest that inflammation is a contributing factor to subsequent MACE outcomes after incident AIS and highlights potential novel targets.
Assuntos
Bancos de Espécimes Biológicos , Estudo de Associação Genômica Ampla , Análise da Randomização Mendeliana , Acidente Vascular Cerebral , Veteranos , Humanos , Masculino , Acidente Vascular Cerebral/genética , Acidente Vascular Cerebral/epidemiologia , Feminino , Reino Unido/epidemiologia , Pessoa de Meia-Idade , Idoso , Progressão da Doença , Polimorfismo de Nucleotídeo Único/genética , AVC Isquêmico/genética , AVC Isquêmico/epidemiologia , Fatores de Risco , Locos de Características Quantitativas , Biobanco do Reino UnidoRESUMO
Despite early interest, the evidence linking fatty acids to cardiovascular diseases (CVDs) remains controversial. We used Mendelian randomization to explore the involvement of polyunsaturated (PUFA) and monounsaturated (MUFA) fatty acids biosynthesis in the etiology of several CVD endpoints in up to 1 153 768 European (maximum 123 668 cases) and 212 453 East Asian (maximum 29 319 cases) ancestry individuals. As instruments, we selected single nucleotide polymorphisms mapping to genes with well-known roles in PUFA (i.e. FADS1/2 and ELOVL2) and MUFA (i.e. SCD) biosynthesis. Our findings suggest that higher PUFA biosynthesis rate (proxied by rs174576 near FADS1/2) is related to higher odds of multiple CVDs, particularly ischemic stroke, peripheral artery disease and venous thromboembolism, whereas higher MUFA biosynthesis rate (proxied by rs603424 near SCD) is related to lower odds of coronary artery disease among Europeans. Results were unclear for East Asians as most effect estimates were imprecise. By triangulating multiple approaches (i.e. uni-/multi-variable Mendelian randomization, a phenome-wide scan, genetic colocalization and within-sibling analyses), our results are compatible with higher low-density lipoprotein (LDL) cholesterol (and possibly glucose) being a downstream effect of higher PUFA biosynthesis rate. Our findings indicate that PUFA and MUFA biosynthesis are involved in the etiology of CVDs and suggest LDL cholesterol as a potential mediating trait between PUFA biosynthesis and CVDs risk.
Assuntos
Doenças Cardiovasculares , Humanos , Doenças Cardiovasculares/genética , Análise da Randomização Mendeliana , Ácidos Graxos/genética , Povo Asiático/genética , Polimorfismo de Nucleotídeo Único/genéticaRESUMO
BACKGROUND: Alzheimer's disease (AD)-related neuropathological changes can occur decades before clinical symptoms. We aimed to investigate whether neurodevelopment and/or neurodegeneration affects the risk of AD, through reducing structural brain reserve and/or increasing brain atrophy, respectively. METHODS: We used bidirectional two-sample Mendelian randomisation to estimate the effects between genetic liability to AD and global and regional cortical thickness, estimated total intracranial volume, volume of subcortical structures and total white matter in 37 680 participants aged 8-81 years across 5 independent cohorts (Adolescent Brain Cognitive Development, Generation R, IMAGEN, Avon Longitudinal Study of Parents and Children and UK Biobank). We also examined the effects of global and regional cortical thickness and subcortical volumes from the Enhancing NeuroImaging Genetics through Meta-Analysis (ENIGMA) Consortium on AD risk in up to 37 741 participants. RESULTS: Our findings show that AD risk alleles have an age-dependent effect on a range of cortical and subcortical brain measures that starts in mid-life, in non-clinical populations. Evidence for such effects across childhood and young adulthood is weak. Some of the identified structures are not typically implicated in AD, such as those in the striatum (eg, thalamus), with consistent effects from childhood to late adulthood. There was little evidence to suggest brain morphology alters AD risk. CONCLUSIONS: Genetic liability to AD is likely to affect risk of AD primarily through mechanisms affecting indicators of brain morphology in later life, rather than structural brain reserve. Future studies with repeated measures are required for a better understanding and certainty of the mechanisms at play.
RESUMO
Over a decade of genome-wide association studies (GWAS) have led to the finding of extreme polygenicity of complex traits. The phenomenon that "all genes affect every complex trait" complicates Mendelian Randomization (MR) studies, where natural genetic variations are used as instruments to infer the causal effect of heritable risk factors. We reexamine the assumptions of existing MR methods and show how they need to be clarified to allow for pervasive horizontal pleiotropy and heterogeneous effect sizes. We propose a comprehensive framework GRAPPLE to analyze the causal effect of target risk factors with heterogeneous genetic instruments and identify possible pleiotropic patterns from data. By using GWAS summary statistics, GRAPPLE can efficiently use both strong and weak genetic instruments, detect the existence of multiple pleiotropic pathways, determine the causal direction and perform multivariable MR to adjust for confounding risk factors. With GRAPPLE, we analyze the effect of blood lipids, body mass index, and systolic blood pressure on 25 disease outcomes, gaining new information on their causal relationships and potential pleiotropic pathways involved.
Assuntos
Causalidade , Fenótipo , Pleiotropia Genética , Estudo de Associação Genômica Ampla , Humanos , Análise da Randomização Mendeliana , Polimorfismo de Nucleotídeo Único , Fatores de RiscoRESUMO
Spousal comparisons have been proposed as a design that can both reduce confounding and estimate effects of the shared adulthood environment. However, assortative mating, the process by which individuals select phenotypically (dis)similar mates, could distort associations when comparing spouses. We evaluated the use of spousal comparisons, as in the within-spouse pair (WSP) model, for aetiological research such as genetic association studies. We demonstrated that the WSP model can reduce confounding but may be susceptible to collider bias arising from conditioning on assorted spouse pairs. Analyses using UK Biobank spouse pairs found that WSP genetic association estimates were smaller than estimates from random pairs for height, educational attainment, and BMI variants. Within-sibling pair estimates, robust to demographic and parental effects, were also smaller than random pair estimates for height and educational attainment, but not for BMI. WSP models, like other within-family models, may reduce confounding from demographic factors in genetic association estimates, and so could be useful for triangulating evidence across study designs to assess the robustness of findings. However, WSP estimates should be interpreted with caution due to potential collider bias.
Assuntos
Comportamento Sexual , Adulto , Índice de Massa Corporal , Estudos de Coortes , Feminino , Humanos , Masculino , Cônjuges , Reino UnidoRESUMO
Discovering drugs that efficiently treat brain diseases has been challenging. Genetic variants that modulate the expression of potential drug targets can be utilized to assess the efficacy of therapeutic interventions. We therefore employed Mendelian Randomization (MR) on gene expression measured in brain tissue to identify drug targets involved in neurological and psychiatric diseases. We conducted a two-sample MR using cis-acting brain-derived expression quantitative trait loci (eQTLs) from the Accelerating Medicines Partnership for Alzheimer's Disease consortium (AMP-AD) and the CommonMind Consortium (CMC) meta-analysis study (n = 1,286) as genetic instruments to predict the effects of 7,137 genes on 12 neurological and psychiatric disorders. We conducted Bayesian colocalization analysis on the top MR findings (using P<6x10-7 as evidence threshold, Bonferroni-corrected for 80,557 MR tests) to confirm sharing of the same causal variants between gene expression and trait in each genomic region. We then intersected the colocalized genes with known monogenic disease genes recorded in Online Mendelian Inheritance in Man (OMIM) and with genes annotated as drug targets in the Open Targets platform to identify promising drug targets. 80 eQTLs showed MR evidence of a causal effect, from which we prioritised 47 genes based on colocalization with the trait. We causally linked the expression of 23 genes with schizophrenia and a single gene each with anorexia, bipolar disorder and major depressive disorder within the psychiatric diseases and 9 genes with Alzheimer's disease, 6 genes with Parkinson's disease, 4 genes with multiple sclerosis and two genes with amyotrophic lateral sclerosis within the neurological diseases we tested. From these we identified five genes (ACE, GPNMB, KCNQ5, RERE and SUOX) as attractive drug targets that may warrant follow-up in functional studies and clinical trials, demonstrating the value of this study design for discovering drug targets in neuropsychiatric diseases.
Assuntos
Doença de Alzheimer/genética , Descoberta de Drogas , Predisposição Genética para Doença , Transcriptoma/genética , Doença de Alzheimer/tratamento farmacológico , Transtorno Bipolar/tratamento farmacológico , Transtorno Bipolar/genética , Transtorno Bipolar/patologia , Encéfalo/metabolismo , Encéfalo/patologia , Estudo de Associação Genômica Ampla , Humanos , Análise da Randomização Mendeliana , Terapia de Alvo Molecular , Doenças do Sistema Nervoso/tratamento farmacológico , Doenças do Sistema Nervoso/genética , Doenças do Sistema Nervoso/patologia , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas/genética , Esquizofrenia/tratamento farmacológico , Esquizofrenia/genética , Esquizofrenia/patologiaRESUMO
BACKGROUND: Although it is known that variation in the aldehyde dehydrogenase 2 (ALDH2) gene family influences the East Asian alcohol flushing response, knowledge about other genetic variants that affect flushing symptoms is limited. METHODS: We performed a genome-wide association study meta-analysis and heritability analysis of alcohol flushing in 15,105 males of East Asian ancestry (Koreans and Chinese) to identify genetic associations with alcohol flushing. We also evaluated whether self-reported flushing can be used as an instrumental variable for alcohol intake. RESULTS: We identified variants in the region of ALDH2 strongly associated with alcohol flushing, replicating previous studies conducted in East Asian populations. Additionally, we identified variants in the alcohol dehydrogenase 1B (ADH1B) gene region associated with alcohol flushing. Several novel variants were identified after adjustment for the lead variants (ALDH2-rs671 and ADH1B-rs1229984), which need to be confirmed in larger studies. The estimated SNP-heritability on the liability scale was 13% (S.E. = 4%) for flushing, but the heritability estimate decreased to 6% (S.E. = 4%) when the effects of the lead variants were controlled for. Genetic instrumentation of higher alcohol intake using these variants recapitulated known associations of alcohol intake with hypertension. Using self-reported alcohol flushing as an instrument gave a similar association pattern of higher alcohol intake and cardiovascular disease-related traits (e.g. stroke). CONCLUSION: This study confirms that ALDH2-rs671 and ADH1B-rs1229984 are associated with alcohol flushing in East Asian populations. Our findings also suggest that self-reported alcohol flushing can be used as an instrumental variable in future studies of alcohol consumption.
Assuntos
Consumo de Bebidas Alcoólicas , População do Leste Asiático , Rubor , Humanos , Masculino , Álcool Desidrogenase/genética , Consumo de Bebidas Alcoólicas/genética , Aldeído-Desidrogenase Mitocondrial/genética , População do Leste Asiático/genética , Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Rubor/induzido quimicamenteRESUMO
Rationale: Methylation integrates factors present at birth and modifiable across the lifespan that can influence pulmonary function. Studies are limited in scope and replication. Objectives: To conduct large-scale epigenome-wide meta-analyses of blood DNA methylation and pulmonary function. Methods: Twelve cohorts analyzed associations of methylation at cytosine-phosphate-guanine probes (CpGs), using Illumina 450K or EPIC/850K arrays, with FEV1, FVC, and FEV1/FVC. We performed multiancestry epigenome-wide meta-analyses (total of 17,503 individuals; 14,761 European, 2,549 African, and 193 Hispanic/Latino ancestries) and interpreted results using integrative epigenomics. Measurements and Main Results: We identified 1,267 CpGs (1,042 genes) differentially methylated (false discovery rate, <0.025) in relation to FEV1, FVC, or FEV1/FVC, including 1,240 novel and 73 also related to chronic obstructive pulmonary disease (1,787 cases). We found 294 CpGs unique to European or African ancestry and 395 CpGs unique to never or ever smokers. The majority of significant CpGs correlated with nearby gene expression in blood. Findings were enriched in key regulatory elements for gene function, including accessible chromatin elements, in both blood and lung. Sixty-nine implicated genes are targets of investigational or approved drugs. One example novel gene highlighted by integrative epigenomic and druggable target analysis is TNFRSF4. Mendelian randomization and colocalization analyses suggest that epigenome-wide association study signals capture causal regulatory genomic loci. Conclusions: We identified numerous novel loci differentially methylated in relation to pulmonary function; few were detected in large genome-wide association studies. Integrative analyses highlight functional relevance and potential therapeutic targets. This comprehensive discovery of potentially modifiable, novel lung function loci expands knowledge gained from genetic studies, providing insights into lung pathogenesis.
Assuntos
Metilação de DNA , Epigenoma , Ilhas de CpG , Metilação de DNA/genética , Epigênese Genética/genética , Epigenômica , Estudo de Associação Genômica Ampla , Humanos , Recém-Nascido , PulmãoRESUMO
BACKGROUND: Despite early interest in the health effects of polyunsaturated fatty acids (PUFA), there is still substantial controversy and uncertainty on the evidence linking PUFA to cardiovascular diseases (CVDs). We investigated the effect of plasma concentration of omega-3 PUFA (i.e. docosahexaenoic acid (DHA) and total omega-3 PUFA) and omega-6 PUFA (i.e. linoleic acid and total omega-6 PUFA) on the risk of CVDs using Mendelian randomization. METHODS: We conducted the largest genome-wide association study (GWAS) of circulating PUFA to date including a sample of 114,999 individuals and incorporated these data in a two-sample Mendelian randomization framework to investigate the involvement of circulating PUFA on a wide range of CVDs in up to 1,153,768 individuals of European ancestry (i.e. coronary artery disease, ischemic stroke, haemorrhagic stroke, heart failure, atrial fibrillation, peripheral arterial disease, aortic aneurysm, venous thromboembolism and aortic valve stenosis). RESULTS: GWAS identified between 46 and 64 SNPs for the four PUFA traits, explaining 4.8-7.9% of circulating PUFA variance and with mean F statistics >100. Higher genetically predicted DHA (and total omega-3 fatty acids) concentration was related to higher risk of some cardiovascular endpoints; however, these findings did not pass our criteria for multiple testing correction and were attenuated when accounting for LDL-cholesterol through multivariable Mendelian randomization or excluding SNPs in the vicinity of the FADS locus. Estimates for the relation between higher genetically predicted linoleic acid (and total omega-6) concentration were inconsistent across different cardiovascular endpoints and Mendelian randomization methods. There was weak evidence of higher genetically predicted linoleic acid being related to lower risk of ischemic stroke and peripheral artery disease when accounting by LDL-cholesterol. CONCLUSIONS: We have conducted the largest GWAS of circulating PUFA to date and the most comprehensive Mendelian randomization analyses. Overall, our Mendelian randomization findings do not support a protective role of circulating PUFA concentration on the risk of CVDs. However, horizontal pleiotropy via lipoprotein-related traits could be a key source of bias in our analyses.
Assuntos
Doenças Cardiovasculares , AVC Isquêmico , Bancos de Espécimes Biológicos , Doenças Cardiovasculares/epidemiologia , Doenças Cardiovasculares/genética , LDL-Colesterol , Ácidos Graxos , Ácidos Graxos Insaturados , Estudo de Associação Genômica Ampla , Humanos , Ácido Linoleico , Análise da Randomização Mendeliana , Polimorfismo de Nucleotídeo Único/genética , Fatores de Risco , Reino Unido/epidemiologiaRESUMO
Dietary factors are assumed to play an important role in cancer risk, apparent in consensus recommendations for cancer prevention that promote nutritional changes. However, the evidence in this field has been generated predominantly through observational studies, which may result in biased effect estimates because of confounding, exposure misclassification, and reverse causality. With major geographical differences and rapid changes in cancer incidence over time, it is crucial to establish which of the observational associations reflect causality and to identify novel risk factors as these may be modified to prevent the onset of cancer and reduce its progression. Mendelian randomization (MR) uses the special properties of germline genetic variation to strengthen causal inference regarding potentially modifiable exposures and disease risk. MR can be implemented through instrumental variable (IV) analysis and, when robustly performed, is generally less prone to confounding, reverse causation and measurement error than conventional observational methods and has different sources of bias (discussed in detail below). It is increasingly used to facilitate causal inference in epidemiology and provides an opportunity to explore the effects of nutritional exposures on cancer incidence and progression in a cost-effective and timely manner. Here, we introduce the concept of MR and discuss its current application in understanding the impact of nutritional factors (e.g., any measure of diet and nutritional intake, circulating biomarkers, patterns, preference or behaviour) on cancer aetiology and, thus, opportunities for MR to contribute to the development of nutritional recommendations and policies for cancer prevention. We provide applied examples of MR studies examining the role of nutritional factors in cancer to illustrate how this method can be used to help prioritise or deprioritise the evaluation of specific nutritional factors as intervention targets in randomised controlled trials. We describe possible biases when using MR, and methodological developments aimed at investigating and potentially overcoming these biases when present. Lastly, we consider the use of MR in identifying causally relevant nutritional risk factors for various cancers in different regions across the world, given notable geographical differences in some cancers. We also discuss how MR results could be translated into further research and policy. We conclude that findings from MR studies, which corroborate those from other well-conducted studies with different and orthogonal biases, are poised to substantially improve our understanding of nutritional influences on cancer. For such corroboration, there is a requirement for an interdisciplinary and collaborative approach to investigate risk factors for cancer incidence and progression.
Assuntos
Análise da Randomização Mendeliana , Neoplasias , Causalidade , Humanos , Análise da Randomização Mendeliana/métodos , Neoplasias/etiologia , Neoplasias/genética , Estado Nutricional , Fatores de RiscoRESUMO
MOTIVATION: The wealth of data resources on human phenotypes, risk factors, molecular traits and therapeutic interventions presents new opportunities for population health sciences. These opportunities are paralleled by a growing need for data integration, curation and mining to increase research efficiency, reduce mis-inference and ensure reproducible research. RESULTS: We developed EpiGraphDB (https://epigraphdb.org/), a graph database containing an array of different biomedical and epidemiological relationships and an analytical platform to support their use in human population health data science. In addition, we present three case studies that illustrate the value of this platform. The first uses EpiGraphDB to evaluate potential pleiotropic relationships, addressing mis-inference in systematic causal analysis. In the second case study, we illustrate how protein-protein interaction data offer opportunities to identify new drug targets. The final case study integrates causal inference using Mendelian randomization with relationships mined from the biomedical literature to 'triangulate' evidence from different sources. AVAILABILITY AND IMPLEMENTATION: The EpiGraphDB platform is openly available at https://epigraphdb.org. Code for replicating case study results is available at https://github.com/MRCIEU/epigraphdb as Jupyter notebooks using the API, and https://mrcieu.github.io/epigraphdb-r using the R package. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Ciência de Dados , Software , Mineração de Dados , Bases de Dados Factuais , Humanos , FenótipoRESUMO
BACKGROUND: Joint developmental trajectories of internalizing and externalizing problems show considerable heterogeneity; however, this can be parsed into a small number of meaningful subgroups. Doing so offered insights into risk factors that lead to different patterns of internalizing/externalizing trajectories. However, despite both domains of problems showing strong heritability, no study has yet considered genetic risks as predictors of joint internalizing/externalizing problem trajectories. METHODS: Using parallel process latent class growth analysis, we estimated joint developmental trajectories of internalizing and externalizing difficulties assessed across ages 4 to 16 using the Strengths and Difficulties Questionnaire. Multinomial logistic regression was used to evaluate a range of demographic, perinatal, maternal mental health, and child and maternal polygenic predictors of group membership. Participants included 11,049 children taking part in the Avon Longitudinal Study of Parents and Children. Polygenic data were available for 7,127 children and 6,836 mothers. RESULTS: A 5-class model was judged optimal: Unaffected, Moderate Externalizing Symptoms, High Externalizing Symptoms, Moderate Internalizing and Externalizing Symptoms and High Internalizing and Externalizing Symptoms. Male sex, lower maternal age, maternal mental health problems, maternal smoking during pregnancy, higher child polygenic risk scores for ADHD and lower polygenic scores for IQ distinguished affected classes from the unaffected class. CONCLUSIONS: While affected classes could be relatively well separated from the unaffected class, phenotypic and polygenic predictors were limited in their ability to distinguish between different affected classes. Results thus add to existing evidence that internalizing and externalizing problems have mostly shared risk factors.
Assuntos
Mães , Herança Multifatorial , Adolescente , Criança , Pré-Escolar , Feminino , Humanos , Estudos Longitudinais , Masculino , Gravidez , Fatores de Risco , FumarRESUMO
We integrate comeasured gene expression and DNA methylation (DNAme) in 265 human skeletal muscle biopsies from the FUSION study with >7 million genetic variants and eight physiological traits: height, waist, weight, waist-hip ratio, body mass index, fasting serum insulin, fasting plasma glucose, and type 2 diabetes. We find hundreds of genes and DNAme sites associated with fasting insulin, waist, and body mass index, as well as thousands of DNAme sites associated with gene expression (eQTM). We find that controlling for heterogeneity in tissue/muscle fiber type reduces the number of physiological trait associations, and that long-range eQTMs (>1 Mb) are reduced when controlling for tissue/muscle fiber type or latent factors. We map genetic regulators (quantitative trait loci; QTLs) of expression (eQTLs) and DNAme (mQTLs). Using Mendelian randomization (MR) and mediation techniques, we leverage these genetic maps to predict 213 causal relationships between expression and DNAme, approximately two-thirds of which predict methylation to causally influence expression. We use MR to integrate FUSION mQTLs, FUSION eQTLs, and GTEx eQTLs for 48 tissues with genetic associations for 534 diseases and quantitative traits. We identify hundreds of genes and thousands of DNAme sites that may drive the reported disease/quantitative trait genetic associations. We identify 300 gene expression MR associations that are present in both FUSION and GTEx skeletal muscle and that show stronger evidence of MR association in skeletal muscle than other tissues, which may partially reflect differences in power across tissues. As one example, we find that increased RXRA muscle expression may decrease lean tissue mass.