RESUMEN
Genome-wide association studies using large-scale genome and exome sequencing data have become increasingly valuable in identifying associations between genetic variants and disease, transforming basic research and translational medicine. However, this progress has not been equally shared across all people and conditions, in part due to limited resources. Leveraging publicly available sequencing data as external common controls, rather than sequencing new controls for every study, can better allocate resources by augmenting control sample sizes or providing controls where none existed. However, common control studies must be carefully planned and executed as even small differences in sample ascertainment and processing can result in substantial bias. Here, we discuss challenges and opportunities for the robust use of common controls in high-throughput sequencing studies, including study design, quality control and statistical approaches. Thoughtful generation and use of large and valuable genetic sequencing data sets will enable investigation of a broader and more representative set of conditions, environments and genetic ancestries than otherwise possible.
Asunto(s)
Exoma , Estudio de Asociación del Genoma Completo , Exoma/genética , Predisposición Genética a la Enfermedad , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Secuenciación del ExomaRESUMEN
We examined the associations of vegetarianism with metabolic biomarkers using traditional and genetic epidemiology. First, we addressed inconsistencies in self-reported vegetarianism among UK Biobank participants by utilizing data from two dietary surveys to find a cohort of strict European vegetarians (N = 2,312). Vegetarians were matched 1:4 with nonvegetarians for non-genetic association analyses, revealing significant effects of vegetarianism in 15 of 30 biomarkers. Cholesterol measures plus vitamin D were significantly lower in vegetarians, while triglycerides were higher. A genome-wide association study revealed no genome-wide significant (GWS; 5×10-8) associations with vegetarian behavior. We performed genome-wide gene-vegetarianism interaction analyses for the biomarkers, and detected a GWS interaction impacting calcium at rs72952628 (P = 4.47×10-8). rs72952628 is in MMAA, a B12 metabolic pathway gene; B12 has major deficiency potential in vegetarians. Gene-based interaction tests revealed two significant genes, RNF168 in testosterone (P = 1.45×10-6) and DOCK4 in estimated glomerular filtration rate (eGFR) (P = 6.76×10-7), which have previously been associated with testicular and renal traits, respectively. These nutrigenetic findings indicate genotype can modify the associations between vegetarianism and health outcomes.
Asunto(s)
Biomarcadores , Calcio , Dieta Vegetariana , Estudio de Asociación del Genoma Completo , Tasa de Filtración Glomerular , Testosterona , Humanos , Masculino , Tasa de Filtración Glomerular/genética , Testosterona/sangre , Femenino , Biomarcadores/sangre , Persona de Mediana Edad , Calcio/metabolismo , Polimorfismo de Nucleótido Simple , Vegetarianos , Anciano , Vitamina D/sangre , Adulto , Ubiquitina-Proteína Ligasas/genéticaRESUMEN
Large-scale whole-genome sequencing studies have enabled analysis of noncoding rare-variant (RV) associations with complex human diseases and traits. Variant-set analysis is a powerful approach to study RV association. However, existing methods have limited ability in analyzing the noncoding genome. We propose a computationally efficient and robust noncoding RV association detection framework, STAARpipeline, to automatically annotate a whole-genome sequencing study and perform flexible noncoding RV association analysis, including gene-centric analysis and fixed window-based and dynamic window-based non-gene-centric analysis by incorporating variant functional annotations. In gene-centric analysis, STAARpipeline uses STAAR to group noncoding variants based on functional categories of genes and incorporate multiple functional annotations. In non-gene-centric analysis, STAARpipeline uses SCANG-STAAR to incorporate dynamic window sizes and multiple functional annotations. We apply STAARpipeline to identify noncoding RV sets associated with four lipid traits in 21,015 discovery samples from the Trans-Omics for Precision Medicine (TOPMed) program and replicate several of them in an additional 9,123 TOPMed samples. We also analyze five non-lipid TOPMed traits.
Asunto(s)
Estudio de Asociación del Genoma Completo , Genoma , Humanos , Estudio de Asociación del Genoma Completo/métodos , Secuenciación Completa del Genoma/métodos , Fenotipo , Variación GenéticaRESUMEN
Protein-coding genetic variants that strongly affect disease risk can yield relevant clues to disease pathogenesis. Here we report exome-sequencing analyses of 20,791 individuals with type 2 diabetes (T2D) and 24,440 non-diabetic control participants from 5 ancestries. We identify gene-level associations of rare variants (with minor allele frequencies of less than 0.5%) in 4 genes at exome-wide significance, including a series of more than 30 SLC30A8 alleles that conveys protection against T2D, and in 12 gene sets, including those corresponding to T2D drug targets (P = 6.1 × 10-3) and candidate genes from knockout mice (P = 5.2 × 10-3). Within our study, the strongest T2D gene-level signals for rare variants explain at most 25% of the heritability of the strongest common single-variant signals, and the gene-level effect sizes of the rare variants that we observed in established T2D drug targets will require 75,000-185,000 sequenced cases to achieve exome-wide significance. We propose a method to interpret these modest rare-variant associations and to incorporate these associations into future target or gene prioritization efforts.
Asunto(s)
Diabetes Mellitus Tipo 2/genética , Secuenciación del Exoma , Exoma/genética , Animales , Estudios de Casos y Controles , Técnicas de Apoyo para la Decisión , Femenino , Frecuencia de los Genes , Estudio de Asociación del Genoma Completo , Humanos , Masculino , Ratones , Ratones NoqueadosRESUMEN
AIMS/HYPOTHESIS: Several studies have reported associations between specific proteins and type 2 diabetes risk in European populations. To better understand the role played by proteins in type 2 diabetes aetiology across diverse populations, we conducted a large proteome-wide association study using genetic instruments across four racial and ethnic groups: African; Asian; Hispanic/Latino; and European. METHODS: Genome and plasma proteome data from the Multi-Ethnic Study of Atherosclerosis (MESA) study involving 182 African, 69 Asian, 284 Hispanic/Latino and 409 European individuals residing in the USA were used to establish protein prediction models by using potentially associated cis- and trans-SNPs. The models were applied to genome-wide association study summary statistics of 250,127 type 2 diabetes cases and 1,222,941 controls from different racial and ethnic populations. RESULTS: We identified three, 44 and one protein associated with type 2 diabetes risk in Asian, European and Hispanic/Latino populations, respectively. Meta-analysis identified 40 proteins associated with type 2 diabetes risk across the populations, including well-established as well as novel proteins not yet implicated in type 2 diabetes development. CONCLUSIONS/INTERPRETATION: Our study improves our understanding of the aetiology of type 2 diabetes in diverse populations. DATA AVAILABILITY: The summary statistics of multi-ethnic type 2 diabetes GWAS of MVP, DIAMANTE, Biobank Japan and other studies are available from The database of Genotypes and Phenotypes (dbGaP) under accession number phs001672.v3.p1. MESA genetic, proteome and covariate data can be accessed through dbGaP under phs000209.v13.p3. All code is available on GitHub ( https://github.com/Arthur1021/MESA-1K-PWAS ).
RESUMEN
MOTIVATION: statistics from genome-wide association studies enable many valuable downstream analyses that are more efficient than individual-level data analysis while also reducing privacy concerns. As growing sample sizes enable better-powered analysis of gene-environment interactions, there is a need for gene-environment interaction-specific methods that manipulate and use summary statistics. RESULTS: We introduce two tools to facilitate such analysis, with a focus on statistical models containing multiple gene-exposure and/or gene-covariate interaction terms. REGEM (RE-analysis of GEM summary statistics) uses summary statistics from a single, multi-exposure genome-wide interaction study to derive analogous sets of summary statistics with arbitrary sets of exposures and interaction covariate adjustments. METAGEM (META-analysis of GEM summary statistics) extends current fixed-effects meta-analysis models to incorporate multiple exposures from multiple studies. We demonstrate the value and efficiency of these tools by exploring alternative methods of accounting for ancestry-related population stratification in genome-wide interaction study in the UK Biobank as well as by conducting a multi-exposure genome-wide interaction study meta-analysis in cohorts from the diabetes-focused ProDiGY consortium. These programs help to maximize the value of summary statistics from diverse and complex gene-environment interaction studies. AVAILABILITY AND IMPLEMENTATION: REGEM and METAGEM are open-source projects freely available at https://github.com/large-scale-gxe-methods/REGEM and https://github.com/large-scale-gxe-methods/METAGEM.
Asunto(s)
Interacción Gen-Ambiente , Estudio de Asociación del Genoma Completo , Modelos Estadísticos , Tamaño de la Muestra , Interpretación Estadística de Datos , Polimorfismo de Nucleótido Simple , FenotipoRESUMEN
AIMS/HYPOTHESIS: Type 2 diabetes is highly polygenic and influenced by multiple biological pathways. Rapid expansion in the number of type 2 diabetes loci can be leveraged to identify such pathways. METHODS: We developed a high-throughput pipeline to enable clustering of type 2 diabetes loci based on variant-trait associations. Our pipeline extracted summary statistics from genome-wide association studies (GWAS) for type 2 diabetes and related traits to generate a matrix of 323 variants × 64 trait associations and applied Bayesian non-negative matrix factorisation (bNMF) to identify genetic components of type 2 diabetes. Epigenomic enrichment analysis was performed in 28 cell types and single pancreatic cells. We generated cluster-specific polygenic scores and performed regression analysis in an independent cohort (N=25,419) to assess for clinical relevance. RESULTS: We identified ten clusters of genetic loci, recapturing the five from our prior analysis as well as novel clusters related to beta cell dysfunction, pronounced insulin secretion, and levels of alkaline phosphatase, lipoprotein A and sex hormone-binding globulin. Four clusters related to mechanisms of insulin deficiency, five to insulin resistance and one had an unclear mechanism. The clusters displayed tissue-specific epigenomic enrichment, notably with the two beta cell clusters differentially enriched in functional and stressed pancreatic beta cell states. Additionally, cluster-specific polygenic scores were differentially associated with patient clinical characteristics and outcomes. The pipeline was applied to coronary artery disease and chronic kidney disease, identifying multiple overlapping clusters with type 2 diabetes. CONCLUSIONS/INTERPRETATION: Our approach stratifies type 2 diabetes loci into physiologically interpretable genetic clusters associated with distinct tissues and clinical outcomes. The pipeline allows for efficient updating as additional GWAS become available and can be readily applied to other conditions, facilitating clinical translation of GWAS findings. Software to perform this clustering pipeline is freely available.
Asunto(s)
Diabetes Mellitus Tipo 2 , Humanos , Diabetes Mellitus Tipo 2/genética , Estudio de Asociación del Genoma Completo , Predisposición Genética a la Enfermedad/genética , Teorema de Bayes , Análisis por Conglomerados , Polimorfismo de Nucleótido SimpleRESUMEN
Diet is a significant modifiable risk factor for type 2 diabetes (T2D), and its effect on disease risk is under partial genetic control. Identification of specific gene-diet interactions (GDIs) influencing risk biomarkers such as glycated hemoglobin (HbA1c) is a critical step towards precision nutrition for T2D prevention, but progress has been slow due to limitations in sample size and accuracy of dietary exposure measurement. We leveraged the large UK Biobank (UKB) cohort and a diverse group of dietary exposures, including 30 individual dietary traits and 8 empirical dietary patterns, to conduct genome-wide interaction studies in ~340 000 European-ancestry participants to identify novel GDIs influencing HbA1c. We identified five variant-dietary trait pairs reaching genome-wide significance (P < 5 × 10-8): two involved dietary patterns (meat pattern with rs147678157 and a fruit & vegetable-based pattern with rs3010439) and three involved individual dietary traits (bread consumption with rs62218803, dried fruit consumption with rs140270534 and milk type [dairy vs. other] with 4:131148078_TAGAA_T). These were affected minimally by adjustment for geographical and lifestyle-related confounders, and four of the five variants lacked genetic main effects that would have allowed their detection in a traditional genome-wide association study for HbA1c. Notably, multiple loci near transient receptor potential subfamily M genes (TRPM2 and TRPM3) interacted with carbohydrate-containing food groups. These interactions were further characterized using non-European UKB subsets and alternative measures of glycaemia (fasting glucose and follow-up HbA1c measurements). Our results highlight GDIs influencing HbA1c for future investigation, while reinforcing known challenges in detecting and replicating GDIs.
Asunto(s)
Bancos de Muestras Biológicas , Diabetes Mellitus Tipo 2 , Dieta , Hemoglobina Glucada , Adulto , Diabetes Mellitus Tipo 2/sangre , Diabetes Mellitus Tipo 2/genética , Femenino , Estudio de Asociación del Genoma Completo , Hemoglobina Glucada/genética , Hemoglobina Glucada/metabolismo , Humanos , Masculino , Persona de Mediana Edad , Estudios Prospectivos , Reino UnidoRESUMEN
SUMMARY: We developed the variant-Set Test for Association using Annotation infoRmation (STAAR) workflow description language (WDL) workflow to facilitate the analysis of rare variants in whole genome sequencing association studies. The open-access STAAR workflow written in the WDL allows a user to perform rare variant testing for both gene-centric and genetic region approaches, enabling genome-wide, candidate and conditional analyses. It incorporates functional annotations into the workflow as introduced in the STAAR method in order to boost the rare variant analysis power. This tool was specifically developed and optimized to be implemented on cloud-based platforms such as BioData Catalyst Powered by Terra. It provides easy-to-use functionality for rare variant analysis that can be incorporated into an exhaustive whole genome sequencing analysis pipeline. AVAILABILITY AND IMPLEMENTATION: The workflow is freely available from https://dockstore.org/workflows/github.com/sheilagaynor/STAAR_workflow. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Nube Computacional , Programas Informáticos , Flujo de Trabajo , Genoma , Estudio de Asociación del Genoma CompletoRESUMEN
Hemoglobin A1c (HbA1c) is widely used to diagnose diabetes and assess glycemic control in individuals with diabetes. However, nonglycemic determinants, including genetic variation, may influence how accurately HbA1c reflects underlying glycemia. Analyzing the NHLBI Trans-Omics for Precision Medicine (TOPMed) sequence data in 10,338 individuals from five studies and four ancestries (6,158 Europeans, 3,123 African-Americans, 650 Hispanics, and 407 East Asians), we confirmed five regions associated with HbA1c (GCK in Europeans and African-Americans, HK1 in Europeans and Hispanics, FN3K and/or FN3KRP in Europeans, and G6PD in African-Americans and Hispanics) and we identified an African-ancestry-specific low-frequency variant (rs1039215 in HBG2 and HBE1, minor allele frequency (MAF) = 0.03). The most associated G6PD variant (rs1050828-T, p.Val98Met, MAF = 12% in African-Americans, MAF = 2% in Hispanics) lowered HbA1c (-0.88% in hemizygous males, -0.34% in heterozygous females) and explained 23% of HbA1c variance in African-Americans and 4% in Hispanics. Additionally, we identified a rare distinct G6PD coding variant (rs76723693, p.Leu353Pro, MAF = 0.5%; -0.98% in hemizygous males, -0.46% in heterozygous females) and detected significant association with HbA1c when aggregating rare missense variants in G6PD. We observed similar magnitude and direction of effects for rs1039215 (HBG2) and rs76723693 (G6PD) in the two largest TOPMed African American cohorts, and we replicated the rs76723693 association in the UK Biobank African-ancestry participants. These variants in G6PD and HBG2 were monomorphic in the European and Asian samples. African or Hispanic ancestry individuals carrying G6PD variants may be underdiagnosed for diabetes when screened with HbA1c. Thus, assessment of these variants should be considered for incorporation into precision medicine approaches for diabetes diagnosis.
Asunto(s)
Diabetes Mellitus/diagnóstico , Diabetes Mellitus/genética , Variación Genética , Hemoglobina Glucada/genética , Grupos de Población/genética , Medicina de Precisión , Estudios de Cohortes , Femenino , Humanos , Masculino , Polimorfismo de Nucleótido SimpleRESUMEN
PURPOSE: Polygenic risk scores (PRS) for breast cancer may help guide screening decisions. However, few studies have examined whether PRS are associated with risk of short-term or poor prognosis breast cancers. The study purpose was to evaluate the association of the 313 SNP breast cancer PRS with 2-year risk of poor prognosis breast cancer. METHODS: We evaluated the association of breast cancer PRS with breast cancer overall, ER + and ER- breast cancer, and poor prognosis breast cancer diagnosed within 2 years of a negative mammogram among a cohort of 3657 women using logistic regression adjusted for age, breast density, race/ethnicity, year of screening, and genetic ancestry principal components. Breast cancers were considered poor prognosis if they were metastatic, positive lymph nodes, ER/PR + HER2- and > 2 cm, ER/PR/HER2-, or HER2 + and > 1 cm. RESULTS: Of the 308 breast cancers, 137 (44%) were poor prognosis. The overall breast cancer PRS was significantly associated with breast cancer diagnosis within 2 years (OR 1.39, 95% CI 1.23-1.57, p < 0.001). The breast cancer PRS was also associated specifically with diagnosis of poor prognosis disease (OR 1.24, 95% CI 1.03-1.49, p = 0.018), but was more strongly associated with good prognosis cancer (OR 1.52 95% CI 1.29-1.80 p = 3.60 × 10-7) The ER + PRS was significantly associated with ER/PR + breast cancer (OR 1.41, 95% CI 1.24-1.61, p < 0.001) and the ER- PRS was significantly associated with ER- breast cancer (OR 1.48, 95% CI 1.08-2.02, p = 0.015). CONCLUSION: Breast cancer PRS was independently and significantly associated with diagnosis of both breast cancer overall and poor prognosis breast cancer within 2 years of a negative mammogram, suggesting PRS may help guide decisions about screening intervals and supplemental screening.
Asunto(s)
Neoplasias de la Mama , Femenino , Humanos , Neoplasias de la Mama/diagnóstico , Neoplasias de la Mama/epidemiología , Neoplasias de la Mama/genética , Polimorfismo de Nucleótido Simple , Densidad de la Mama , Pronóstico , Factores de Riesgo , Receptores de Progesterona/genéticaRESUMEN
MOTIVATION: Gene-environment interaction (GEI) studies are a general framework that can be used to identify genetic variants that modify the effects of environmental, physiological, lifestyle or treatment effects on complex traits. Moreover, accounting for GEIs can enhance our understanding of the genetic architecture of complex diseases and traits. However, commonly used statistical software programs for GEI studies are either not applicable to testing certain types of GEI hypotheses or have not been optimized for use in large samples. RESULTS: Here, we develop a new software program, GEM (Gene-Environment interaction analysis in Millions of samples), which supports the inclusion of multiple GEI terms, adjustment for GEI covariates and robust inference, while allowing multi-threading to reduce computation time. GEM can conduct GEI tests as well as joint tests of genetic main and interaction effects for both continuous and binary phenotypes. Through simulations, we demonstrate that GEM scales to millions of samples while addressing limitations of existing software programs. We additionally conduct a gene-sex interaction analysis on waist-hip ratio in 352 768 unrelated individuals from the UK Biobank, identifying 24 novel loci in the joint test that have not previously been reported in combined or sex-specific analyses. Our results demonstrate that GEM can facilitate the next generation of large-scale GEI studies and help advance our understanding of the genetic architecture of complex diseases and traits. AVAILABILITY AND IMPLEMENTATION: GEM is freely available as an open source project at https://github.com/large-scale-gxe-methods/GEM. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Interacción Gen-Ambiente , Estudio de Asociación del Genoma Completo , Interpretación Estadística de Datos , Femenino , Humanos , Masculino , Fenotipo , Programas InformáticosRESUMEN
Long and short sleep duration are associated with elevated blood pressure (BP), possibly through effects on molecular pathways that influence neuroendocrine and vascular systems. To gain new insights into the genetic basis of sleep-related BP variation, we performed genome-wide gene by short or long sleep duration interaction analyses on four BP traits (systolic BP, diastolic BP, mean arterial pressure, and pulse pressure) across five ancestry groups in two stages using 2 degree of freedom (df) joint test followed by 1df test of interaction effects. Primary multi-ancestry analysis in 62,969 individuals in stage 1 identified three novel gene by sleep interactions that were replicated in an additional 59,296 individuals in stage 2 (stage 1 + 2 Pjoint < 5 × 10-8), including rs7955964 (FIGNL2/ANKRD33) that increases BP among long sleepers, and rs73493041 (SNORA26/C9orf170) and rs10406644 (KCTD15/LSM14A) that increase BP among short sleepers (Pint < 5 × 10-8). Secondary ancestry-specific analysis identified another novel gene by long sleep interaction at rs111887471 (TRPC3/KIAA1109) in individuals of African ancestry (Pint = 2 × 10-6). Combined stage 1 and 2 analyses additionally identified significant gene by long sleep interactions at 10 loci including MKLN1 and RGL3/ELAVL3 previously associated with BP, and significant gene by short sleep interactions at 10 loci including C2orf43 previously associated with BP (Pint < 10-3). 2df test also identified novel loci for BP after modeling sleep that has known functions in sleep-wake regulation, nervous and cardiometabolic systems. This study indicates that sleep and primary mechanisms regulating BP may interact to elevate BP level, suggesting novel insights into sleep-related BP regulation.
Asunto(s)
Estudio de Asociación del Genoma Completo , Hipertensión , Presión Sanguínea/genética , Sitios Genéticos/genética , Humanos , Hipertensión/genética , Polimorfismo de Nucleótido Simple/genética , Sueño/genéticaRESUMEN
Educational attainment is widely used as a surrogate for socioeconomic status (SES). Low SES is a risk factor for hypertension and high blood pressure (BP). To identify novel BP loci, we performed multi-ancestry meta-analyses accounting for gene-educational attainment interactions using two variables, "Some College" (yes/no) and "Graduated College" (yes/no). Interactions were evaluated using both a 1 degree of freedom (DF) interaction term and a 2DF joint test of genetic and interaction effects. Analyses were performed for systolic BP, diastolic BP, mean arterial pressure, and pulse pressure. We pursued genome-wide interrogation in Stage 1 studies (N = 117 438) and follow-up on promising variants in Stage 2 studies (N = 293 787) in five ancestry groups. Through combined meta-analyses of Stages 1 and 2, we identified 84 known and 18 novel BP loci at genome-wide significance level (P < 5 × 10-8). Two novel loci were identified based on the 1DF test of interaction with educational attainment, while the remaining 16 loci were identified through the 2DF joint test of genetic and interaction effects. Ten novel loci were identified in individuals of African ancestry. Several novel loci show strong biological plausibility since they involve physiologic systems implicated in BP regulation. They include genes involved in the central nervous system-adrenal signaling axis (ZDHHC17, CADPS, PIK3C2G), vascular structure and function (GNB3, CDON), and renal function (HAS2 and HAS2-AS1, SLIT3). Collectively, these findings suggest a role of educational attainment or SES in further dissection of the genetic architecture of BP.
Asunto(s)
Estudio de Asociación del Genoma Completo , Hipertensión , Presión Sanguínea/genética , Epistasis Genética , Sitios Genéticos , Humanos , Hipertensión/genética , Polimorfismo de Nucleótido SimpleRESUMEN
Complex human diseases are affected by genetic and environmental risk factors and their interactions. Gene-environment interaction (GEI) tests for aggregate genetic variant sets have been developed in recent years. However, existing statistical methods become rate limiting for large biobank-scale sequencing studies with correlated samples. We propose efficient Mixed-model Association tests for GEne-Environment interactions (MAGEE), for testing GEI between an aggregate variant set and environmental exposures on quantitative and binary traits in large-scale sequencing studies with related individuals. Joint tests for the aggregate genetic main effects and GEI effects are also developed. A null generalized linear mixed model adjusting for covariates but without any genetic effects is fit only once in a whole genome GEI analysis, thereby vastly reducing the overall computational burden. Score tests for variant sets are performed as a combination of genetic burden and variance component tests by accounting for the genetic main effects using matrix projections. The computational complexity is dramatically reduced in a whole genome GEI analysis, which makes MAGEE scalable to hundreds of thousands of individuals. We applied MAGEE to the exome sequencing data of 41,144 related individuals from the UK Biobank, and the analysis of 18,970 protein coding genes finished within 10.4 CPU hours.
Asunto(s)
Bancos de Muestras Biológicas , Secuenciación del Exoma , Interacción Gen-Ambiente , Índice de Masa Corporal , Simulación por Computador , Exoma/genética , Femenino , Humanos , Modelos Lineales , Masculino , Modelos Genéticos , Obesidad/genética , Fenotipo , Carácter Cuantitativo Heredable , Factores de TiempoRESUMEN
Elevated blood pressure (BP), a leading cause of global morbidity and mortality, is influenced by both genetic and lifestyle factors. Cigarette smoking is one such lifestyle factor. Across five ancestries, we performed a genome-wide gene-smoking interaction study of mean arterial pressure (MAP) and pulse pressure (PP) in 129 913 individuals in stage 1 and follow-up analysis in 480 178 additional individuals in stage 2. We report here 136 loci significantly associated with MAP and/or PP. Of these, 61 were previously published through main-effect analysis of BP traits, 37 were recently reported by us for systolic BP and/or diastolic BP through gene-smoking interaction analysis and 38 were newly identified (P < 5 × 10-8, false discovery rate < 0.05). We also identified nine new signals near known loci. Of the 136 loci, 8 showed significant interaction with smoking status. They include CSMD1 previously reported for insulin resistance and BP in the spontaneously hypertensive rats. Many of the 38 new loci show biologic plausibility for a role in BP regulation. SLC26A7 encodes a chloride/bicarbonate exchanger expressed in the renal outer medullary collecting duct. AVPR1A is widely expressed, including in vascular smooth muscle cells, kidney, myocardium and brain. FHAD1 is a long non-coding RNA overexpressed in heart failure. TMEM51 was associated with contractile function in cardiomyocytes. CASP9 plays a central role in cardiomyocyte apoptosis. Identified only in African ancestry were 30 novel loci. Our findings highlight the value of multi-ancestry investigations, particularly in studies of interaction with lifestyle factors, where genomic and lifestyle differences may contribute to novel findings.
Asunto(s)
Presión Arterial/genética , Interacción Gen-Ambiente , Hipertensión/genética , Polimorfismo Genético , Grupos Raciales/genética , Fumar/efectos adversos , Adolescente , Adulto , Anciano , Anciano de 80 o más Años , Antiportadores/genética , Presión Sanguínea/genética , Caspasa 9/genética , Etnicidad/genética , Femenino , Estudio de Asociación del Genoma Completo , Humanos , Hipertensión/etiología , Masculino , Proteínas de la Membrana/genética , Persona de Mediana Edad , Receptores de Vasopresinas/genética , Transportadores de Sulfato/genética , Proteínas Supresoras de Tumor/genética , Adulto JovenRESUMEN
Genome-wide association analysis advanced understanding of blood pressure (BP), a major risk factor for vascular conditions such as coronary heart disease and stroke. Accounting for smoking behavior may help identify BP loci and extend our knowledge of its genetic architecture. We performed genome-wide association meta-analyses of systolic and diastolic BP incorporating gene-smoking interactions in 610,091 individuals. Stage 1 analysis examined â¼18.8 million SNPs and small insertion/deletion variants in 129,913 individuals from four ancestries (European, African, Asian, and Hispanic) with follow-up analysis of promising variants in 480,178 additional individuals from five ancestries. We identified 15 loci that were genome-wide significant (p < 5 × 10-8) in stage 1 and formally replicated in stage 2. A combined stage 1 and 2 meta-analysis identified 66 additional genome-wide significant loci (13, 35, and 18 loci in European, African, and trans-ancestry, respectively). A total of 56 known BP loci were also identified by our results (p < 5 × 10-8). Of the newly identified loci, ten showed significant interaction with smoking status, but none of them were replicated in stage 2. Several loci were identified in African ancestry, highlighting the importance of genetic studies in diverse populations. The identified loci show strong evidence for regulatory features and support shared pathophysiology with cardiometabolic and addiction traits. They also highlight a role in BP regulation for biological candidates such as modulators of vascular structure and function (CDKN1B, BCAR1-CFDP1, PXDN, EEA1), ciliopathies (SDCCAG8, RPGRIP1L), telomere maintenance (TNKS, PINX1, AKTIP), and central dopaminergic signaling (MSRA, EBF2).
Asunto(s)
Presión Sanguínea/genética , Sitios Genéticos , Estudio de Asociación del Genoma Completo , Grupos Raciales/genética , Fumar/genética , Estudios de Cohortes , Diástole/genética , Epistasis Genética , Femenino , Humanos , Masculino , Polimorfismo de Nucleótido Simple/genética , Sitios de Carácter Cuantitativo/genética , Reproducibilidad de los Resultados , Sístole/genéticaRESUMEN
BACKGROUND: Impaired fasting glucose (IFG) is a prevalent and potentially reversible intermediate stage leading to type 2 diabetes that increases risk for cardiometabolic complications. The identification of clinical and molecular factors associated with the reversal, or regression, from IFG to a normoglycemia state would enable more efficient cardiovascular risk reduction strategies. The aim of this study was to identify clinical and biological predictors of regression to normoglycemia in a non-European population characterized by high rates of type 2 diabetes. METHODS: We conducted a prospective, population-based study among 9637 Mexican individuals using clinical features and plasma metabolites. Among them, 491 subjects were classified as IFG, defined as fasting glucose between 100 and 125 mg/dL at baseline. Regression to normoglycemia was defined by fasting glucose less than 100 mg/dL in the follow-up visit. Plasma metabolites were profiled by Nuclear Magnetic Resonance. Multivariable cox regression models were used to examine the associations of clinical and metabolomic factors with regression to normoglycemia. We assessed the predictive capability of models that included clinical factors alone and models that included clinical factors and prioritized metabolites. RESULTS: During a median follow-up period of 2.5 years, 22.6% of participants (n = 111) regressed to normoglycemia, and 29.5% progressed to type 2 diabetes (n = 145). The multivariate adjusted relative risk of regression to normoglycemia was 1.10 (95% confidence interval [CI] 1.25 to 1.32) per 10 years of age increase, 0.94 (95% CI 0.91-0.98) per 1 SD increase in BMI, and 0.91 (95% CI 0.88-0.95) per 1 SD increase in fasting glucose. A model including information from age, fasting glucose, and BMI showed a good prediction of regression to normoglycemia (AUC = 0.73 (95% CI 0.66-0.78). The improvement after adding information from prioritized metabolites (TG in large HDL, albumin, and citrate) was non-significant (AUC = 0.74 (95% CI 0.68-0.80), p value = 0.485). CONCLUSION: In individuals with IFG, information from three clinical variables easily obtained in the clinical setting showed a good prediction of regression to normoglycemia beyond metabolomic features. Our findings can serve to inform and design future cardiovascular prevention strategies.
Asunto(s)
Glucemia/metabolismo , Diabetes Mellitus Tipo 2/sangre , Intolerancia a la Glucosa/sangre , Síndrome Metabólico/sangre , Adulto , Factores de Edad , Biomarcadores/sangre , Índice de Masa Corporal , Factores de Riesgo Cardiometabólico , Diabetes Mellitus Tipo 2/diagnóstico , Diabetes Mellitus Tipo 2/epidemiología , Progresión de la Enfermedad , Femenino , Intolerancia a la Glucosa/diagnóstico , Intolerancia a la Glucosa/epidemiología , Humanos , Espectroscopía de Resonancia Magnética , Masculino , Síndrome Metabólico/diagnóstico , Síndrome Metabólico/epidemiología , Metaboloma , Metabolómica , México/epidemiología , Persona de Mediana Edad , Estudios Prospectivos , Medición de Riesgo , Factores de TiempoRESUMEN
A major challenge in evaluating the contribution of rare variants to complex disease is identifying enough copies of the rare alleles to permit informative statistical analysis. To investigate the contribution of rare variants to the risk of type 2 diabetes (T2D) and related traits, we performed deep whole-genome analysis of 1,034 members of 20 large Mexican-American families with high prevalence of T2D. If rare variants of large effect accounted for much of the diabetes risk in these families, our experiment was powered to detect association. Using gene expression data on 21,677 transcripts for 643 pedigree members, we identified evidence for large-effect rare-variant cis-expression quantitative trait loci that could not be detected in population studies, validating our approach. However, we did not identify any rare variants of large effect associated with T2D, or the related traits of fasting glucose and insulin, suggesting that large-effect rare variants account for only a modest fraction of the genetic risk of these traits in this sample of families. Reliable identification of large-effect rare variants will require larger samples of extended pedigrees or different study designs that further enrich for such variants.
Asunto(s)
Diabetes Mellitus Tipo 2/genética , Predisposición Genética a la Enfermedad/genética , Variación Genética , Americanos Mexicanos/genética , Diabetes Mellitus Tipo 2/etnología , Diabetes Mellitus Tipo 2/patología , Salud de la Familia , Femenino , Frecuencia de los Genes , Predisposición Genética a la Enfermedad/etnología , Estudio de Asociación del Genoma Completo/métodos , Genotipo , Humanos , Masculino , Linaje , Fenotipo , Sitios de Carácter Cuantitativo/genética , Secuenciación Completa del Genoma/métodosRESUMEN
BACKGROUND: Models including an interaction term and performing a joint test of SNP and/or interaction effect are often used to discover Gene-Environment (GxE) interactions. When the environmental exposure is a binary variable, analyses from exposure-stratified models which consist of estimating genetic effect in unexposed and exposed individuals separately can be of interest. In large-scale consortia focusing on GxE interactions in which only the joint test has been performed, it may be challenging to get summary statistics from both exposure-stratified and marginal (i.e not accounting for interaction) models. RESULTS: In this work, we developed a simple framework to estimate summary statistics in each stratum of a binary exposure and in the marginal model using summary statistics from the "joint" model. We performed simulation studies to assess our estimators' accuracy and examined potential sources of bias, such as correlation between genotype and exposure and differing phenotypic variances within exposure strata. Results from these simulations highlight the high theoretical accuracy of our estimators and yield insights into the impact of potential sources of bias. We then applied our methods to real data and demonstrate our estimators' retained accuracy after filtering SNPs by sample size to mitigate potential bias. CONCLUSIONS: These analyses demonstrated the accuracy of our method in estimating both stratified and marginal summary statistics from a joint model of gene-environment interaction. In addition to facilitating the interpretation of GxE screenings, this work could be used to guide further functional analyses. We provide a user-friendly Python script to apply this strategy to real datasets. The Python script and documentation are available at https://gitlab.pasteur.fr/statistical-genetics/j2s.