RESUMEN
Identifying the causal relationship between genotype and phenotype is essential to expanding our understanding of the gene regulatory network spanning the molecular level to perceptible traits. A pleiotropic gene can act as a central hub in the network, influencing multiple outcomes. Identifying such a gene involves testing under a composite null hypothesis where the gene is associated with, at most, one trait. Traditional methods such as meta-analyses of top-hit $P$-values and sequential testing of multiple traits have been proposed, but these methods fail to consider the background of genome-wide signals. Since Huang's composite test produces uniformly distributed $P$-values for genome-wide variants under the composite null, we propose a gene-level pleiotropy test that entails combining the aforementioned method with the aggregated Cauchy association test. A polygenic trait involves multiple genes with different functions to co-regulate mechanisms. We show that polygenicity should be considered when identifying pleiotropic genes; otherwise, the associations polygenic traits initiate will give rise to false positives. In this study, we constructed gene-trait functional modules using the results of the proposed pleiotropy tests. Our analysis suite was implemented as an R package PGCtest. We demonstrated the proposed method with an application study of the Taiwan Biobank database and identified functional modules comprising specific genes and their co-regulated traits.
Asunto(s)
Pleiotropía Genética , Estudio de Asociación del Genoma Completo , Herencia Multifactorial , Humanos , Estudio de Asociación del Genoma Completo/métodos , Redes Reguladoras de Genes , Fenotipo , Polimorfismo de Nucleótido Simple , Modelos Genéticos , Sitios de Carácter Cuantitativo , Biología Computacional/métodosRESUMEN
Mucus obstruction is a central feature in the cystic fibrosis (CF) airways. A genome-wide association study (GWAS) of lung disease by the CF Gene Modifier Consortium (CFGMC) identified a significant locus containing two mucin genes, MUC20 and MUC4. Expression quantitative trait locus (eQTL) analysis using human nasal epithelia (HNE) from 94 CF-affected Canadians in the CFGMC demonstrated MUC4 eQTLs that mirrored the lung association pattern in the region, suggesting that MUC4 expression may mediate CF lung disease. Complications arose, however, with colocalization testing using existing methods: the locus is complex and the associated SNPs span a 0.2 Mb region with high linkage disequilibrium (LD) and evidence of allelic heterogeneity. We previously developed the Simple Sum (SS), a powerful colocalization test in regions with allelic heterogeneity, but SS assumed eQTLs to be present to achieve type I error control. Here we propose a two-stage SS (SS2) colocalization test that avoids a priori eQTL assumptions, accounts for multiple hypothesis testing and the composite null hypothesis, and enables meta-analysis. We compare SS2 to published approaches through simulation and demonstrate type I error control for all settings with the greatest power in the presence of high LD and allelic heterogeneity. Applying SS2 to the MUC20/MUC4 CF lung disease locus with eQTLs from CF HNE revealed significant colocalization with MUC4 (p = 1.31 × 10-5) rather than with MUC20. The SS2 is a powerful method to inform the responsible gene(s) at a locus and guide future functional studies. SS2 has been implemented in the application LocusFocus.
Asunto(s)
Sistemas de Transporte de Aminoácidos/genética , Fibrosis Quística/genética , Modelos Estadísticos , Mucina 4/genética , Mucinas/genética , Sitios de Carácter Cuantitativo , Alelos , Sistemas de Transporte de Aminoácidos/metabolismo , Fibrosis Quística/metabolismo , Fibrosis Quística/patología , Perfilación de la Expresión Génica , Regulación de la Expresión Génica , Heterogeneidad Genética , Genoma Humano , Estudio de Asociación del Genoma Completo , Humanos , Desequilibrio de Ligamiento , Pulmón/metabolismo , Pulmón/patología , Mucina 4/metabolismo , Mucinas/metabolismo , Mucosa Nasal/metabolismo , Mucosa Nasal/patología , Polimorfismo de Nucleótido SimpleRESUMEN
Mediation hypothesis testing for a large number of mediators is challenging due to the composite structure of the null hypothesis, H 0 : α ß = 0 ${H}_{0}:\alpha \beta =0$ ( α $\alpha $ : effect of the exposure on the mediator after adjusting for confounders; ß $\beta $ : effect of the mediator on the outcome after adjusting for exposure and confounders). In this paper, we reviewed three classes of methods for large-scale one at a time mediation hypothesis testing. These methods are commonly used for continuous outcomes and continuous mediators assuming there is no exposure-mediator interaction so that the product α ß $\alpha \beta $ has a causal interpretation as the indirect effect. The first class of methods ignores the impact of different structures under the composite null hypothesis, namely, (1) α = 0 , ß ≠ 0 $\alpha =0,\beta \ne 0$ ; (2) α ≠ 0 , ß = 0 $\alpha \ne 0,\beta =0$ ; and (3) α = ß = 0 $\alpha =\beta =0$ . The second class of methods weights the reference distribution under each case of the null to form a mixture reference distribution. The third class constructs a composite test statistic using the three p values obtained under each case of the null so that the reference distribution of the composite statistic is approximately U ( 0 , 1 ) $U(0,1)$ . In addition to these existing methods, we developed the Sobel-comp method belonging to the second class, which uses a corrected mixture reference distribution for Sobel's test statistic. We performed extensive simulation studies to compare all six methods belonging to these three classes in terms of the false positive rates (FPRs) under the null hypothesis and the true positive rates under the alternative hypothesis. We found that the second class of methods which uses a mixture reference distribution could best maintain the FPRs at the nominal level under the null hypothesis and had the greatest true positive rates under the alternative hypothesis. We applied all methods to study the mediation mechanism of DNA methylation sites in the pathway from adult socioeconomic status to glycated hemoglobin level using data from the Multi-Ethnic Study of Atherosclerosis (MESA). We provide guidelines for choosing the optimal mediation hypothesis testing method in practice and develop an R package medScan available on the CRAN for implementing all the six methods.
Asunto(s)
Modelos Genéticos , Modelos Estadísticos , Adulto , Humanos , Simulación por Computador , Proyectos de InvestigaciónRESUMEN
Pleiotropy has important implication on genetic connection among complex phenotypes and facilitates our understanding of disease etiology. Genome-wide association studies provide an unprecedented opportunity to detect pleiotropic associations; however, efficient pleiotropy test methods are still lacking. We here consider pleiotropy identification from a methodological perspective of high-dimensional composite null hypothesis and propose a powerful gene-based method called MAIUP. MAIUP is constructed based on the traditional intersection-union test with two sets of independent P-values as input and follows a novel idea that was originally proposed under the high-dimensional mediation analysis framework. The key improvement of MAIUP is that it takes the composite null nature of pleiotropy test into account by fitting a three-component mixture null distribution, which can ultimately generate well-calibrated P-values for effective control of family-wise error rate and false discover rate. Another attractive advantage of MAIUP is its ability to effectively address the issue of overlapping subjects commonly encountered in association studies. Simulation studies demonstrate that compared with other methods, only MAIUP can maintain correct type I error control and has higher power across a wide range of scenarios. We apply MAIUP to detect shared associated genes among 14 psychiatric disorders with summary statistics and discover many new pleiotropic genes that are otherwise not identified if failing to account for the issue of composite null hypothesis testing. Functional and enrichment analyses offer additional evidence supporting the validity of these identified pleiotropic genes associated with psychiatric disorders. Overall, MAIUP represents an efficient method for pleiotropy identification.
Asunto(s)
Estudio de Asociación del Genoma Completo , Trastornos Mentales , Simulación por Computador , Pleiotropía Genética , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo/métodos , Humanos , Trastornos Mentales/genética , Fenotipo , Polimorfismo de Nucleótido SimpleRESUMEN
BACKGROUND: Mediation analysis is a powerful tool to identify factors mediating the causal pathway of exposure to health outcomes. Mediation analysis has been extended to study a large number of potential mediators in high-dimensional data settings. The presence of confounding in observational studies is inevitable. Hence, it's an essential part of high-dimensional mediation analysis (HDMA) to adjust for the potential confounders. Although the propensity score (PS) related method such as propensity score regression adjustment (PSR) and inverse probability weighting (IPW) has been proposed to tackle this problem, the characteristics with extreme propensity score distribution of the PS-based method would result in the biased estimation. METHODS: In this article, we integrated the overlapping weighting (OW) technique into HDMA workflow and proposed a concise and powerful high-dimensional mediation analysis procedure consisting of OW confounding adjustment, sure independence screening (SIS), de-biased Lasso penalization, and joint-significance testing underlying the mixture null distribution. We compared the proposed method with the existing method consisting of PS-based confounding adjustment, SIS, minimax concave penalty (MCP) variable selection, and classical joint-significance testing. RESULTS: Simulation studies demonstrate the proposed procedure has the best performance in mediator selection and estimation. The proposed procedure yielded the highest true positive rate, acceptable false discovery proportion level, and lower mean square error. In the empirical study based on the GSE117859 dataset in the Gene Expression Omnibus database using the proposed method, we found that smoking history may lead to the estimated natural killer (NK) cell level reduction through the mediation effect of some methylation markers, mainly including methylation sites cg13917614 in CNP gene and cg16893868 in LILRA2 gene. CONCLUSIONS: The proposed method has higher power, sufficient false discovery rate control, and precise mediation effect estimation. Meanwhile, it is feasible to be implemented with the presence of confounders. Hence, our method is worth considering in HDMA studies.
Asunto(s)
Análisis de Mediación , Puntaje de Propensión , Humanos , Estudios Observacionales como Asunto/métodos , Factores de Confusión Epidemiológicos , Epigenómica/métodos , Simulación por Computador , AlgoritmosRESUMEN
BACKGROUND: Detecting trans-ethnic common associated genetic loci can offer important insights into shared genetic components underlying complex diseases/traits across diverse continental populations. However, effective statistical methods for such a goal are currently lacking. METHODS: By leveraging summary statistics available from global-scale genome-wide association studies, we herein proposed a novel genetic overlap detection method called CONTO (COmposite Null hypothesis test for Trans-ethnic genetic Overlap) from the perspective of high-dimensional composite null hypothesis testing. Unlike previous studies which generally analyzed individual genetic variants, CONTO is a gene-centric method which focuses on a set of genetic variants located within a gene simultaneously and assesses their joint significance with the trait of interest. By borrowing the similar principle of joint significance test (JST), CONTO takes the maximum P value of multiple associations as the significance measurement. RESULTS: Compared to JST which is often overly conservative, CONTO is improved in two aspects, including the construction of three-component mixture null distribution and the adjustment of trans-ethnic genetic correlation. Consequently, CONTO corrects the conservativeness of JST with well-calibrated P values and is much more powerful validated by extensive simulation studies. We applied CONTO to discover common associated genes for 31 complex diseases/traits between the East Asian and European populations, and identified many shared trait-associated genes that had otherwise been missed by JST. We further revealed that population-common genes were generally more evolutionarily conserved than population-specific or null ones. CONCLUSION: Overall, CONTO represents a powerful method for detecting common associated genes across diverse ancestral groups; our results provide important implications on the transferability of GWAS discoveries in one population to others.
Asunto(s)
Estudio de Asociación del Genoma Completo , Herencia Multifactorial , Pueblo Asiatico/genética , Estudio de Asociación del Genoma Completo/métodos , Humanos , Herencia Multifactorial/genética , Fenotipo , Polimorfismo de Nucleótido SimpleRESUMEN
Replicability is a fundamental quality of scientific discoveries: we are interested in those signals that are detectable in different laboratories, different populations, across time etc. Unlike meta-analysis which accounts for experimental variability but does not guarantee replicability, testing a partial conjunction (PC) null aims specifically to identify the signals that are discovered in multiple studies. In many contemporary applications, for example, comparing multiple high-throughput genetic experiments, a large number M of PC nulls need to be tested simultaneously, calling for a multiple comparisons correction. However, standard multiple testing adjustments on the M PC p -values can be severely conservative, especially when M is large and the signals are sparse. We introduce AdaFilter, a new multiple testing procedure that increases power by adaptively filtering out unlikely candidates of PC nulls. We prove that AdaFilter can control FWER and FDR as long as data across studies are independent, and has much higher power than other existing methods. We illustrate the application of AdaFilter with three examples: microarray studies of Duchenne muscular dystrophy, single-cell RNA sequencing of T cells in lung cancer tumors and GWAS for metabolomics.
RESUMEN
BACKGROUND: Recent genome-wide association studies (GWASs) have revealed the polygenic nature of psychiatric disorders and discovered a few of single-nucleotide polymorphisms (SNPs) associated with multiple psychiatric disorders. However, the extent and pattern of pleiotropy among distinct psychiatric disorders remain not completely clear. METHODS: We analyzed 14 psychiatric disorders using summary statistics available from the largest GWASs by far. We first applied the cross-trait linkage disequilibrium score regression (LDSC) to estimate genetic correlation between disorders. Then, we performed a gene-based pleiotropy analysis by first aggregating a set of SNP-level associations into a single gene-level association signal using MAGMA. From a methodological perspective, we viewed the identification of pleiotropic associations across the entire genome as a high-dimensional problem of composite null hypothesis testing and utilized a novel method called PLACO for pleiotropy mapping. We ultimately implemented functional analysis for identified pleiotropic genes and used Mendelian randomization for detecting causal association between these disorders. RESULTS: We confirmed extensive genetic correlation among psychiatric disorders, based on which these disorders can be grouped into three diverse categories. We detected a large number of pleiotropic genes including 5884 associations and 2424 unique genes and found that differentially expressed pleiotropic genes were significantly enriched in pancreas, liver, heart, and brain, and that the biological process of these genes was remarkably enriched in regulating neurodevelopment, neurogenesis, and neuron differentiation, offering substantial evidence supporting the validity of identified pleiotropic loci. We further demonstrated that among all the identified pleiotropic genes there were 342 unique ones linked with 6353 drugs with drug-gene interaction which can be classified into distinct types including inhibitor, agonist, blocker, antagonist, and modulator. We also revealed causal associations among psychiatric disorders, indicating that genetic overlap and causality commonly drove the observed co-existence of these disorders. CONCLUSIONS: Our study is among the first large-scale effort to characterize gene-level pleiotropy among a greatly expanded set of psychiatric disorders and provides important insight into shared genetic etiology underlying these disorders. The findings would inform psychiatric nosology, identify potential neurobiological mechanisms predisposing to specific clinical presentations, and pave the way to effective drug targets for clinical treatment.
Asunto(s)
Estudio de Asociación del Genoma Completo , Trastornos Mentales , Pleiotropía Genética , Predisposición Genética a la Enfermedad , Humanos , Trastornos Mentales/genética , Fenotipo , Polimorfismo de Nucleótido SimpleRESUMEN
Mediation effects of multiple mediators are determined by two associations: one between an exposure and mediators ( S - M ) and the other between the mediators and an outcome conditional on the exposure ( M - Y ). The test for mediation effects is conducted under a composite null hypothesis, that is, either one of the S - M and M - Y associations is zero or both are zeros. Without accounting for the composite null, the type 1 error rate within a study containing a large number of multimediator tests may be much less than the expected. We propose a novel test to address the issue. For each mediation test j , j=1, ,J , we examine the S - M and M - Y associations using two separate variance component tests. Assuming a zero-mean working distribution with a common variance for the element-wise S - M (and M - Y ) associations, score tests for the variance components are constructed. We transform the test statistics into two normally distributed statistics under the null. Using a recently developed result, we conduct J hypothesis tests accounting for the composite null hypothesis by adjusting for the variances of the normally distributed statistics for the S - M and M - Y associations. Advantages of the proposed test over other methods are illustrated in simulation studies and a data application where we analyze lung cancer data from The Cancer Genome Atlas to investigate the smoking effect on gene expression through DNA methylation in 15 114 genes.
Asunto(s)
Interpretación Estadística de Datos , Modelos Genéticos , Distribuciones Estadísticas , Simulación por Computador , Metilación de ADN , Humanos , Neoplasias Pulmonares/metabolismo , Modelos Estadísticos , Fumar/efectos adversos , TranscriptomaRESUMEN
Mediation analysis helps researchers assess whether part or all of an exposure's effect on an outcome is due to an intermediate variable. The indirect effect can help in designing interventions on the mediator as opposed to the exposure and better understanding the outcome's mechanisms. Mediation analysis has seen increased use in genome-wide epidemiological studies to test for an exposure of interest being mediated through a genomic measure such as gene expression or DNA methylation (DNAm). Testing for the indirect effect is challenged by the fact that the null hypothesis is composite. We examined the performance of commonly used mediation testing methods for the indirect effect in genome-wide mediation studies. When there is no association between the exposure and the mediator and no association between the mediator and the outcome, we show that these common tests are overly conservative. This is a case that will arise frequently in genome-wide mediation studies. Caution is hence needed when applying the commonly used mediation tests in genome-wide mediation studies. We evaluated the performance of these methods using simulation studies, and performed an epigenome-wide mediation association study in the Normative Aging Study, analyzing DNAm as a mediator of the effect of pack-years on FEV1 .
Asunto(s)
Estudio de Asociación del Genoma Completo , Modelos Genéticos , Factores de Transcripción con Motivo Hélice-Asa-Hélice Básico/genética , Metilación de ADN , Epigenómica , Humanos , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/patología , Proteínas Represoras/genéticaRESUMEN
Background: Creatinine-cystatin C ratio (CCR) has been demonstrated as an objective marker of sarcopenia in clinical conditions but has not been evaluated as an osteoporosis marker in individuals with normal renal function. Methods: We selected 271,831 participants with normal renal function from UK Biobank cohort. Multivariable linear/logistic regression and Cox proportional hazards model were used to investigate the phenotypic relationship between CCR and osteoporosis in total subjects and gender-stratified subjects. Based on the genome-wide association study (GWAS) data, linkage disequilibrium regression (LDSC) and Mendelian randomization (MR) analysis were performed to reveal the shared genetic correlations and infer the causal effects, respectively. Results: Amongst total subjects and gender-stratified subjects, serum CCR was positively associated with eBMD after adjusting for potential risk factors (all P<0.05). The multivariable logistic regression model showed that the decrease in CCR was associated with a higher risk of osteoporosis/fracture in all models (all P<0.05). In the multivariable Cox regression analysis with adjustment for potential confounders, reduced CCR is associated with the incidence of osteoporosis and fracture in both total subjects and gender-stratified subjects (all P<0.05). A significant non-linear dose-response was observed between CCR and osteoporosis/fracture risk (P non-linearity < 0.05). LDSC found no significant shared genetic effects by them, but PLACO identified 42 pleiotropic SNPs shared by CCR and fracture (P<5×10-8). MR analyses indicated the causal effect from CCR to osteoporosis/fracture. Conclusions: Reduced CCR predicted increased risks of osteoporosis/fracture, and significant causal effects support their associations. These findings indicated that the muscle-origin serum CCR was a potential biomarker to assess the risks of osteoporosis and fracture.
Asunto(s)
Biomarcadores , Creatinina , Cistatina C , Análisis de la Aleatorización Mendeliana , Osteoporosis , Humanos , Femenino , Masculino , Osteoporosis/genética , Osteoporosis/sangre , Osteoporosis/epidemiología , Persona de Mediana Edad , Biomarcadores/sangre , Creatinina/sangre , Cistatina C/sangre , Cistatina C/genética , Anciano , Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Adulto , Densidad Ósea/genética , Factores de RiesgoRESUMEN
Background: Lung cancer and oesophageal cancer are prevalent malignancies with rising incidence and mortality worldwide. While some environmental and behavioural risk factors for these cancers are established, the contribution of genetic factors to their pathogenesis remains incompletely defined. This study aimed to interrogate the intricate genetic relationship between lung cancer and oesophageal cancer and their potential comorbidity. Methods: We utilised linkage disequilibrium score regression (LDSC) to analyse the genetic correlation between oesophageal carcinoma and lung carcinoma. We then employed several approaches, including pleiotropic analysis under the composite null hypothesis (PLACO), multi-marker analysis of genomic annotation (MAGMA), cis-expression quantitative trait loci (eQTL) analysis, and a pan-cancer assessment to identify pleiotropic loci and genes. Finally, we performed bidirectional Mendelian randomisation (MR) to evaluate the causal relationship between these malignancies. Results: LDSC revealed a significant genetic correlation between oesophageal carcinoma and lung carcinoma. Further analysis identified shared gene loci including PGBD1, ZNF323, and WNK1 using PLACO. MAGMA identified enriched pathways and 9 pleiotropic genes including HIST1H1B, HIST1H4L, and HIST1H2BL. eQTL analysis integrating oesophageal, lung, and blood tissues revealed 26 shared genes including TERT, NKAPL, RAD52, BTN3A2, GABBR1, CLPTM1L, and TRIM27. A pan-cancer exploration of the identified genes was undertaken. MR analysis showed no evidence for a bidirectional causal relationship between oesophageal carcinoma and lung carcinoma. Conclusions: This study provides salient insights into the intricate genetic links between lung carcinoma and oesophageal carcinoma. Utilising multiple approaches for genetic correlation, locus and gene analysis, and causal assessment, we identify shared genetic susceptibilities and regulatory mechanisms. These findings reveal new leads and targets to further elucidate the genetic basis of lung and oesophageal carcinoma, aiding development of preventive and therapeutic strategies.
RESUMEN
Microbiome data from sequencing experiments contain the relative abundance of a large number of microbial taxa with their evolutionary relationships represented by a phylogenetic tree. The compositional and high-dimensional nature of the microbiome mediator challenges the validity of standard mediation analyses. We propose a phylogeny-based mediation analysis method called PhyloMed to address this challenge. Unlike existing methods that directly identify individual mediating taxa, PhyloMed discovers mediation signals by analyzing subcompositions defined on the phylogenic tree. PhyloMed produces well-calibrated mediation test p-values and yields substantially higher discovery power than existing methods.
Asunto(s)
Microbiota , FilogeniaRESUMEN
The Pearson and likelihood ratio statistics are commonly used to test goodness of fit for models applied to data from a multinomial distribution. The goodness-of-fit test based on Pearson's Chi-squared statistic is sometimes considered to be a global test that gives little guidance to the source of poor fit when the null hypothesis is rejected, and it has also been recognized that the global test can often be outperformed in terms of power by focused or directional tests. For the cross-classification of a large number of manifest variables, the GFfit statistic focused on second-order marginals for variable pairs i, j has been proposed as a diagnostic to aid in finding the source of lack of fit after the model has been rejected based on a more global test. When data are from a table formed by the cross-classification of a large number of variables, the common global statistics may also have low power and inaccurate Type I error level due to sparseness in the cells of the table. The sparseness problem is rarely encountered with the GFfit statistic because it is focused on the lower-order marginals. In this paper, a new and extended version of the GFfit statistic is proposed by decomposing the Pearson statistic from the full table into orthogonal components defined on marginal distributions and then defining the new version, [Formula: see text], as a partial sum of these orthogonal components. While the emphasis is on lower-order marginals, the new version of [Formula: see text] is also extended to higher-order tables so that the [Formula: see text] statistics sum to the Pearson statistic. As orthogonal components of the Pearson [Formula: see text] statistic, [Formula: see text] statistics have advantages over other lack-of-fit diagnostics that are currently available for cross-classified tables: the [Formula: see text] generally have higher power to detect lack of fit while maintaining good Type I error control even if the joint frequencies are very sparse, as will be shown in simulation results; theoretical results will establish that [Formula: see text] statistics have known degrees of freedom and are asymptotically independent with known joint distribution, a property which facilitates less conservative control of false discovery rate (FDR) or familywise error rate (FWER) in a high-dimensional table which would produce a large number of bivariate lack-of-fit diagnostics. Computation of [Formula: see text] statistics is also computationally stable. The extended [Formula: see text] statistic can be applied to a variety of models for cross-classified tables. An application of the new GFfit statistic as a diagnostic for a latent variable model is presented.
Asunto(s)
Modelos Teóricos , Psicometría , Simulación por ComputadorRESUMEN
Background: A greatly growing body of literature has revealed the mediating role of DNA methylation in the influence path from childhood maltreatment to psychiatric disorders such as post-traumatic stress disorder (PTSD) in adult. However, the statistical method is challenging and powerful mediation analyses regarding this issue are lacking. Methods: To study how the maltreatment in childhood alters long-lasting DNA methylation changes which further affect PTSD in adult, we here carried out a gene-based mediation analysis from a perspective of composite null hypothesis in the Grady Trauma Project (352 participants and 16,565 genes) with childhood maltreatment as exposure, multiple DNA methylation sites as mediators, and PTSD or its relevant scores as outcome. We effectively addressed the challenging issue of gene-based mediation analysis by taking its composite null hypothesis testing nature into consideration and fitting a weighted test statistic. Results: We discovered that childhood maltreatment could substantially affected PTSD or PTSD-related scores, and that childhood maltreatment was associated with DNA methylation which further had significant roles in PTSD and these scores. Furthermore, using the proposed mediation method, we identified multiple genes within which DNA methylation sites exhibited mediating roles in the influence path from childhood maltreatment to PTSD-relevant scores in adult, with 13 for Beck Depression Inventory and 6 for modified PTSD Symptom Scale, respectively. Conclusion: Our results have the potential to confer meaningful insights into the biological mechanism for the impact of early adverse experience on adult diseases; and our proposed mediation methods can be applied to other similar analysis settings.
RESUMEN
Mediation analysis is of rising interest in epidemiology and clinical trials. Among existing methods, the joint significance (JS) test yields an overly conservative type I error rate and low power, particularly for high-dimensional mediation hypotheses. In this article we develop a multiple-testing procedure that accurately controls the family-wise error rate (FWER) and the false discovery rate (FDR) when testing high-dimensional mediation hypotheses. The core of our procedure is based on estimating the proportions of component null hypotheses and the underlying mixture null distribution of p-values. Theoretical developments and simulation experiments prove that the proposed procedure effectively controls FWER and FDR. Two mediation analyses on DNA methylation and cancer research are presented: assessing the mediation role of DNA methylation in genLetic regulation of gene expression in primary prostate cancer samples; exploring the possibility of DNA methylation mediating the effect of exercise on prostate cancer progression. Results of data examples include wellL-behaved quantile-quantile plots and improved power to detect novel mediation relationships. An R package HDMT implementing the proposed procedure is freely accessible in CRAN.
RESUMEN
In genome-wide epigenetic studies, it is of great scientific interest to assess whether the effect of an exposure on a clinical outcome is mediated through DNA methylations. However, statistical inference for causal mediation effects is challenged by the fact that one needs to test a large number of composite null hypotheses across the whole epigenome. Two popular tests, the Wald-type Sobel's test and the joint significant test using the traditional null distribution are underpowered and thus can miss important scientific discoveries. In this paper, we show that the null distribution of Sobel's test is not the standard normal distribution and the null distribution of the joint significant test is not uniform under the composite null of no mediation effect, especially in finite samples and under the singular point null case that the exposure has no effect on the mediator and the mediator has no effect on the outcome. Our results explain why these two tests are underpowered, and more importantly motivate us to develop a more powerful Divide-Aggregate Composite-null Test (DACT) for the composite null hypothesis of no mediation effect by leveraging epigenome-wide data. We adopted Efron's empirical null framework for assessing statistical significance of the DACT test. We showed analytically that the proposed DACT method had improved power, and could well control type I error rate. Our extensive simulation studies showed that, in finite samples, the DACT method properly controlled the type I error rate and outperformed Sobel's test and the joint significance test for detecting mediation effects. We applied the DACT method to the US Department of Veterans Affairs Normative Aging Study, an ongoing prospective cohort study which included men who were aged 21 to 80 years at entry. We identified multiple DNA methylation CpG sites that might mediate the effect of smoking on lung function with effect sizes ranging from -0.18 to -0.79 and false discovery rate controlled at level 0.05, including the CpG sites in the genes AHRR and F2RL3. Our sensitivity analysis found small residual correlations (less than 0.01) of the error terms between the outcome and mediator regressions, suggesting that our results are robust to unmeasured confounding factors.
RESUMEN
Mediation analysis investigates the intermediate mechanism through which an exposure exerts its influence on the outcome of interest. Mediation analysis is becoming increasingly popular in high-throughput genomics studies where a common goal is to identify molecular-level traits, such as gene expression or methylation, which actively mediate the genetic or environmental effects on the outcome. Mediation analysis in genomics studies is particularly challenging, however, thanks to the large number of potential mediators measured in these studies as well as the composite null nature of the mediation effect hypothesis. Indeed, while the standard univariate and multivariate mediation methods have been well-established for analyzing one or multiple mediators, they are not well-suited for genomics studies with a large number of mediators and often yield conservative p-values and limited power. Consequently, over the past few years many new high-dimensional mediation methods have been developed for analyzing the large number of potential mediators collected in high-throughput genomics studies. In this work, we present a thorough review of these important recent methodological advances in high-dimensional mediation analysis. Specifically, we describe in detail more than ten high-dimensional mediation methods, focusing on their motivations, basic modeling ideas, specific modeling assumptions, practical successes, methodological limitations, as well as future directions. We hope our review will serve as a useful guidance for statisticians and computational biologists who develop methods of high-dimensional mediation analysis as well as for analysts who apply mediation methods to high-throughput genomics studies.
RESUMEN
Causal mediation analysis aims to characterize an exposure's effect on an outcome and quantify the indirect effect that acts through a given mediator or a group of mediators of interest. With the increasing availability of measurements on a large number of potential mediators, like the epigenome or the microbiome, new statistical methods are needed to simultaneously accommodate high-dimensional mediators while directly target penalization of the natural indirect effect (NIE) for active mediator identification. Here, we develop two novel prior models for identification of active mediators in high-dimensional mediation analysis through penalizing NIEs in a Bayesian paradigm. Both methods specify a joint prior distribution on the exposure-mediator effect and mediator-outcome effect with either (a) a four-component Gaussian mixture prior or (b) a product threshold Gaussian prior. By jointly modeling the two parameters that contribute to the NIE, the proposed methods enable penalization on their product in a targeted way. Resultant inference can take into account the four-component composite structure underlying the NIE. We show through simulations that the proposed methods improve both selection and estimation accuracy compared to other competing methods. We applied our methods for an in-depth analysis of two ongoing epidemiologic studies: the Multi-Ethnic Study of Atherosclerosis (MESA) and the LIFECODES birth cohort. The identified active mediators in both studies reveal important biological pathways for understanding disease mechanisms.
RESUMEN
There is a growing interest in pursuing adaptive enrichment for drug development because of its potential to achieve the goal of personalized medicine. There are many versions of adaptive enrichment proposed across many disease indications. Some are exploratory adaptive enrichment and others aim at confirmatory adaptive enrichment. In this paper, we give a brief overview on adaptive enrichment and the methodologies that are growing in statistical literature. A case example is provided to illustrate a regulatory experience that led to drug approval. There were two design elements used for adaptation in this case example: population adaptation and statistical information adaptation. We articulate the challenges in the implementation of a confirmatory adaptive enrichment trial. The challenges include logistic aspects on the appropriate choice of study population for adaptation and the ability to follow the pre-specified rules for statistical information or sample size adaptation. We assess the consistency of treatment effect before and after adaptation using the approach laid out in Wang et al. (2013). We provide the rationales for what would be an appropriate treatment effect estimate for reporting in the drug label. We discuss and articulate design considerations for adaptive enrichment among a dual-composite null hypothesis, a flexible dual-independent null hypothesis and a rigorous dual-independent null hypothesis.