RESUMO
Variants in cis-regulatory elements link the noncoding genome to human pathology; however, detailed analytic tools for understanding the association between cell-level brain pathology and noncoding variants are lacking. CWAS-Plus, adapted from a Python package for category-wide association testing (CWAS), enhances noncoding variant analysis by integrating both whole-genome sequencing (WGS) and user-provided functional data. With simplified parameter settings and an efficient multiple testing correction method, CWAS-Plus conducts the CWAS workflow 50 times faster than CWAS, making it more accessible and user-friendly for researchers. Here, we used a single-nuclei assay for transposase-accessible chromatin with sequencing to facilitate CWAS-guided noncoding variant analysis at cell-type-specific enhancers and promoters. Examining autism spectrum disorder WGS data (n = 7280), CWAS-Plus identified noncoding de novo variant associations in transcription factor binding sites within conserved loci. Independently, in Alzheimer's disease WGS data (n = 1087), CWAS-Plus detected rare noncoding variant associations in microglia-specific regulatory elements. These findings highlight CWAS-Plus's utility in genomic disorders and scalability for processing large-scale WGS data and in multiple-testing corrections. CWAS-Plus and its user manual are available at https://github.com/joonan-lab/cwas/ and https://cwas-plus.readthedocs.io/en/latest/, respectively.
Assuntos
Sequenciamento Completo do Genoma , Humanos , Sequenciamento Completo do Genoma/métodos , Doença de Alzheimer/genética , Estudo de Associação Genômica Ampla/métodos , Transtorno do Espectro Autista/genética , Variação Genética , Software , Cromatina/genética , Cromatina/metabolismo , Genoma HumanoRESUMO
Variants in cis-regulatory elements link the noncoding genome to human brain pathology; however, detailed analytic tools for understanding the association between cell-level brain pathology and noncoding variants are lacking. CWAS-Plus, adapted from a Python package for category-wide association testing (CWAS) employs both whole-genome sequencing and user-provided functional data to enhance noncoding variant analysis, with a faster and more efficient execution of the CWAS workflow. Here, we used single-nuclei assay for transposase-accessible chromatin with sequencing to facilitate CWAS-guided noncoding variant analysis at cell-type specific enhancers and promoters. Examining autism spectrum disorder whole-genome sequencing data (n = 7,280), CWAS-Plus identified noncoding de novo variant associations in transcription factor binding sites within conserved loci. Independently, in Alzheimer's disease whole-genome sequencing data (n = 1,087), CWAS-Plus detected rare noncoding variant associations in microglia-specific regulatory elements. These findings highlight CWAS-Plus's utility in genomic disorders and scalability for processing large-scale whole-genome sequencing data and in multiple-testing corrections. CWAS-Plus and its user manual are available at https://github.com/joonan-lab/cwas/ and https://cwas-plus.readthedocs.io/en/latest/, respectively.
RESUMO
In genome-wide association studies (GWAS), it has become commonplace to test millions of single-nucleotide polymorphisms (SNPs) for phenotypic association. Gene-based testing can improve power to detect weak signal by reducing multiple testing and pooling signal strength. While such tests account for linkage disequilibrium (LD) structure of SNP alleles within each gene, current approaches do not capture LD of SNPs falling in different nearby genes, which can induce correlation of gene-based test statistics. We introduce an algorithm to account for this correlation. When a gene's test statistic is independent of others, it is assessed separately; when test statistics for nearby genes are strongly correlated, their SNPs are agglomerated and tested as a locus. To provide insight into SNPs and genes driving association within loci, we develop an interactive visualization tool to explore localized signal. We demonstrate our approach in the context of weakly powered GWAS for autism spectrum disorder, which is contrasted to more highly powered GWAS for schizophrenia and educational attainment. To increase power for these analyses, especially those for autism, we use adaptive $P$-value thresholding, guided by high-dimensional metadata modeled with gradient boosted trees, highlighting when and how it can be most useful. Notably our workflow is based on summary statistics.
Assuntos
Algoritmos , Biologia Computacional/métodos , Predisposição Genética para Doença , Testes Genéticos/normas , Estudo de Associação Genômica Ampla/métodos , Estudo de Associação Genômica Ampla/normas , Alelos , Mapeamento Cromossômico , Bases de Dados Genéticas , Testes Genéticos/métodos , Humanos , Desequilíbrio de Ligação , Fenótipo , Polimorfismo de Nucleotídeo Único , Locos de Características QuantitativasRESUMO
To correct for a large number of hypothesis tests, most researchers rely on simple multiple testing corrections. Yet, new methodologies of selective inference could potentially improve power while retaining statistical guarantees, especially those that enable exploration of test statistics using auxiliary information (covariates) to weight hypothesis tests for association. We explore one such method, adaptive P-value thresholding (AdaPT), in the framework of genome-wide association studies (GWAS) and gene expression/coexpression studies, with particular emphasis on schizophrenia (SCZ). Selected SCZ GWAS association P values play the role of the primary data for AdaPT; single-nucleotide polymorphisms (SNPs) are selected because they are gene expression quantitative trait loci (eQTLs). This natural pairing of SNPs and genes allow us to map the following covariate values to these pairs: GWAS statistics from genetically correlated bipolar disorder, the effect size of SNP genotypes on gene expression, and gene-gene coexpression, captured by subnetwork (module) membership. In all, 24 covariates per SNP/gene pair were included in the AdaPT analysis using flexible gradient boosted trees. We demonstrate a substantial increase in power to detect SCZ associations using gene expression information from the developing human prefrontal cortex. We interpret these results in light of recent theories about the polygenic nature of SCZ. Importantly, our entire process for identifying enrichment and creating features with independent complementary data sources can be implemented in many different high-throughput settings to ultimately improve power.