RESUMEN
Age at menarche (AAM) and age at natural menopause (ANM) are highly heritable traits and have been linked to various health outcomes. We aimed to identify circulating proteins associated with altered ANM and AAM using an unbiased two-sample Mendelian randomization (MR) and colocalization approach. By testing causal effects of 1,271 proteins on AAM, we identified 22 proteins causally associated with AAM in MR, among which 13 proteins (GCKR, FOXO3, SEMA3G, PATE4, AZGP1, NEGR1, LHB, DLK1, ANXA2, YWHAB, DNAJB12, RMDN1 and HPGDS) colocalized. Among 1,349 proteins tested for causal association with ANM using MR, we identified 19 causal proteins among which 7 proteins (CPNE1, TYMP, DNER, ADAMTS13, LCT, ARL and PLXNA1) colocalized. Follow-up pathway and gene enrichment analyses demonstrated links between AAM-related proteins and obesity and diabetes, and between AAM and ANM-related proteins and various types of cancer. In conclusion, we identified proteomic signatures of reproductive ageing in women, highlighting biological processes at both ends of the reproductive lifespan.
Asunto(s)
Menarquia , Análisis de la Aleatorización Mendeliana , Humanos , Femenino , Menarquia/genética , Proteómica , Biomarcadores , Menopausia/genética , Proteínas del Choque Térmico HSP40RESUMEN
BACKGROUND: High-dimensional mediation analysis is an extension of unidimensional mediation analysis that includes multiple mediators, and increasingly it is being used to evaluate the indirect omics-layer effects of environmental exposures on health outcomes. Analyses involving high-dimensional mediators raise several statistical issues. Although many methods have recently been developed, no consensus has been reached about the optimal combination of approaches to high-dimensional mediation analyses. OBJECTIVES: We developed and validated a method for high-dimensional mediation analysis (HDMAX2) and applied it to evaluate the causal role of placental DNA methylation in the pathway between exposure to maternal smoking (MS) during pregnancy and gestational age (GA) and birth weight of the baby at birth. METHODS: HDMAX2 combines latent factor regression models for epigenome-wide association studies with max2 tests for mediation and considers CpGs and aggregated mediator regions (AMRs). HDMAX2 was carefully evaluated using simulated data and compared to state-of-the-art multidimensional epigenetic mediation methods. Then, HDMAX2 was applied to data from 470 women of the Etude des Déterminants pré et postnatals du développement de la santé de l'Enfant (EDEN) cohort. RESULTS: HDMAX2 demonstrated increased power in comparison with state-of-the-art multidimensional mediation methods and identified several AMRs not identified in previous mediation analyses of exposure to MS on birth weight and GA. The results provided evidence for a polygenic architecture of the mediation pathway with a posterior estimate of the overall indirect effect of CpGs and AMRs equal to 44.5g lower birth weight representing 32.1% of the total effect [standard deviation (SD)=60.7g]. HDMAX2 also identified AMRs having simultaneous effects both on GA and on birth weight. Among the top hits of both GA and birth weight analyses, regions located in COASY, BLCAP, and ESRP2 also mediated the relationship between GA and birth weight, suggesting reverse causality in the relationship between GA and the methylome. DISCUSSION: HDMAX2 outperformed existing approaches and revealed an unsuspected complexity of the potential causal relationships between exposure to MS and birth weight at the epigenome-wide level. HDMAX2 is applicable to a wide range of tissues and omic layers. https://doi.org/10.1289/EHP11559.
Asunto(s)
Metilación de ADN , Placenta , Recién Nacido , Humanos , Femenino , Embarazo , Peso al Nacer , Placenta/metabolismo , Exposición Materna , Fumar , PartoRESUMEN
Association of phenotypes or exposures with genomic and epigenomic data faces important statistical challenges. One of these challenges is to account for variation due to unobserved confounding factors, such as individual ancestry or cell-type composition in tissues. This issue can be addressed with penalized latent factor regression models, where penalties are introduced to cope with high dimension in the data. If a relatively small proportion of genomic or epigenomic markers correlate with the variable of interest, sparsity penalties may help to capture the relevant associations, but the improvement over non-sparse approaches has not been fully evaluated yet. Here, we present least-squares algorithms that jointly estimate effect sizes and confounding factors in sparse latent factor regression models. In simulated data, sparse latent factor regression models generally achieved higher statistical performance than other sparse methods, including the least absolute shrinkage and selection operator and a Bayesian sparse linear mixed model. In generative model simulations, statistical performance was slightly lower (while being comparable) to non-sparse methods, but in simulations based on empirical data, sparse latent factor regression models were more robust to departure from the model than the non-sparse approaches. We applied sparse latent factor regression models to a genome-wide association study of a flowering trait for the plant Arabidopsis thaliana and to an epigenome-wide association study of smoking status in pregnant women. For both applications, sparse latent factor regression models facilitated the estimation of non-null effect sizes while overcoming multiple testing issues. The results were not only consistent with previous discoveries, but they also pinpointed new genes with functional annotations relevant to each application.
Asunto(s)
Epigenoma , Estudio de Asociación del Genoma Completo , Algoritmos , Teorema de Bayes , Femenino , Estudio de Asociación del Genoma Completo/métodos , Humanos , Análisis de los Mínimos Cuadrados , EmbarazoRESUMEN
Gene-environment association (GEA) studies are essential to understand the past and ongoing adaptations of organisms to their environment, but those studies are complicated by confounding due to unobserved demographic factors. Although the confounding problem has recently received considerable attention, the proposed approaches do not scale with the high-dimensionality of genomic data. Here, we present a new estimation method for latent factor mixed models (LFMMs) implemented in an upgraded version of the corresponding computer program. We developed a least-squares estimation approach for confounder estimation that provides a unique framework for several categories of genomic data, not restricted to genotypes. The speed of the new algorithm is several order faster than existing GEA approaches and then our previous version of the LFMM program. In addition, the new method outperforms other fast approaches based on principal component or surrogate variable analysis. We illustrate the program use with analyses of the 1000 Genomes Project data set, leading to new findings on adaptation of humans to their environment, and with analyses of DNA methylation profiles providing insights on how tobacco consumption could affect DNA methylation in patients with rheumatoid arthritis. Software availability: Software is available in the R package lfmm at https://bcm-uga.github.io/lfmm/.