RESUMO
Gene-based association analysis is a powerful tool for identifying genes that explain trait variability. An essential step of this analysis is a conditional analysis. It aims to eliminate the influence of SNPs outside the gene, which are in linkage disequilibrium with intragenic SNPs. The popular conditional analysis method, GCTA-COJO, accounts for the influence of several top independently associated SNPs outside the gene, correcting the z statistics for intragenic SNPs. We suggest a new TauCOR method for conditional gene-based analysis using summary statistics. This method accounts the influence of the full regional polygenic background, correcting the genotype correlations between intragenic SNPs. As a result, the distribution of z statistics for intragenic SNPs becomes conditionally independent of distribution for extragenic SNPs. TauCOR is compatible with any gene-based association test. TauCOR was tested on summary statistics simulated under different scenarios and on real summary statistics for a 'gold standard' gene list from the Open Targets Genetics project. TauCOR proved to be effective in all modelling scenarios and on real data. The TauCOR's strategy showed comparable sensitivity and higher specificity and accuracy than GCTA-COJO on both simulated and real data. The method can be successfully used to improve the effectiveness of gene-based association analyses.
Assuntos
Desequilíbrio de Ligação , Herança Multifatorial , Polimorfismo de Nucleotídeo Único , Polimorfismo de Nucleotídeo Único/genética , Herança Multifatorial/genética , Humanos , Estudo de Associação Genômica Ampla/métodos , Modelos Genéticos , GenótipoRESUMO
Back pain (BP) is a major contributor to disability worldwide, with heritability estimated at 40-60%. However, less than half of the heritability is explained by common genetic variants identified by genome-wide association studies. More powerful methods and rare and ultra-rare variant analysis may offer additional insight. This study utilized exome sequencing data from the UK Biobank to perform a multi-trait gene-based association analysis of three BP-related phenotypes: chronic back pain, dorsalgia, and intervertebral disc disorder. We identified the SLC13A1 gene as a contributor to chronic back pain via loss-of-function (LoF) and missense variants. This gene has been previously detected in two studies. A multi-trait approach uncovered the novel FSCN3 gene and its impact on back pain through LoF variants. This gene deserves attention because it is only the second gene shown to have an effect on back pain due to LoF variants and represents a promising drug target for back pain therapy.
Assuntos
Exoma , Estudo de Associação Genômica Ampla , Humanos , Exoma/genética , Predisposição Genética para Doença , Fenótipo , Dor nas Costas/genéticaRESUMO
We propose a novel effective framework for the analysis of the shared genetic background for a set of genetically correlated traits using SNP-level GWAS summary statistics. This framework called SHAHER is based on the construction of a linear combination of traits by maximizing the proportion of its genetic variance explained by the shared genetic factors. SHAHER requires only full GWAS summary statistics and matrices of genetic and phenotypic correlations between traits as inputs. Our framework allows both shared and unshared genetic factors to be effectively analyzed. We tested our framework using simulation studies, compared it with previous developments, and assessed its performance using three real datasets: anthropometric traits, psychiatric conditions and lipid concentrations. SHAHER is versatile and applicable to summary statistics from GWASs with arbitrary sample sizes and sample overlaps, allows for the incorporation of different GWAS models (Cox, linear and logistic), and is computationally fast.
Assuntos
Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Polimorfismo de Nucleotídeo Único/genética , Fenótipo , Patrimônio Genético , LipídeosRESUMO
Gene-based association analysis is an effective gene-mapping tool. Many gene-based methods have been proposed recently. However, their power depends on the underlying genetic architecture, which is rarely known in complex traits, and so it is likely that a combination of such methods could serve as a universal approach. Several frameworks combining different gene-based methods have been developed. However, they all imply a fixed set of methods, weights and functional annotations. Moreover, most of them use individual phenotypes and genotypes as input data. Here, we introduce sumSTAAR, a framework for gene-based association analysis using summary statistics obtained from genome-wide association studies (GWAS). It is an extended and modified version of STAAR framework proposed by Li and colleagues in 2020. The sumSTAAR framework offers a wider range of gene-based methods to combine. It allows the user to arbitrarily define a set of these methods, weighting functions and probabilities of genetic variants being causal. The methods used in the framework were adapted to analyse genes with large number of SNPs to decrease the running time. The framework includes the polygene pruning procedure to guard against the influence of the strong GWAS signals outside the gene. We also present new improved matrices of correlations between the genotypes of variants within genes. These matrices estimated on a sample of 265,000 individuals are a state-of-the-art replacement of widely used matrices based on the 1000 Genomes Project data.
Assuntos
Estudo de Associação Genômica Ampla , Locos de Características Quantitativas , Estudos de Associação Genética , Estudo de Associação Genômica Ampla/métodos , Fenótipo , Polimorfismo de Nucleotídeo Único/genéticaRESUMO
Here I propose a fundamentally new flexible model to reveal the association between a trait and a set of genetic variants in a genomic region/gene. This model was developed for the situation when original individual-level phenotype and genotype data are not available, but the researcher possesses the results of statistical analyses conducted on these data (namely, SNP-level summary Z score statistics and SNP-by-SNP correlations). The new model was analytically derived from the classical multiple linear regression model applied for the region-based association analysis of individual-level phenotype and genotype data by using the linear compression of data, where the SNP-by-SNP correlations are among the explanatory variables, and the summary Z score statistics are categorized as the response variables. I analytically show that the regional association analysis methods developed within the framework of the classical multiple linear regression model with additive effects of genetic variants can be reformulated in terms of the new model without the loss of information. The results obtained from the regional association analysis utilizing the classical model and those derived using the proposed model are identical when SNP-by-SNP correlations and SNP-level statistics are estimated from the same genetic data.
Assuntos
Estudo de Associação Genômica Ampla/métodos , Polimorfismo de Nucleotídeo Único , Biologia Computacional/métodos , Interpretação Estatística de Dados , Genótipo , Humanos , Modelos Lineares , Modelos Genéticos , FenótipoRESUMO
MOTIVATION: A huge number of genome-wide association studies (GWAS) summary statistics freely available in databases provide a new material for gene-based association analysis aimed at identifying rare genetic variants. Only a few of the many popular gene-based methods developed for individual genotype and phenotype data are adapted for the practical use of the GWAS summary statistics as input. RESULTS: We analytically prove and numerically illustrate that all popular powerful methods developed for gene-based association analysis of individual phenotype and genotype data can be modified to utilize GWAS summary statistics. We have modified and implemented all of the popular methods, including burden and kernel machine-based tests, multiple and functional linear regression, principal components analysis and others, in the R package sumFREGAT. Using real summary statistics for coronary artery disease, we show that the new package is able to detect genes not found by the existing packages. AVAILABILITY AND IMPLEMENTATION: The R package sumFREGAT is freely and publicly available at: https://CRAN.R-project.org/package=sumFREGAT. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Estudo de Associação Genômica Ampla , Software , Genótipo , Modelos Lineares , FenótipoRESUMO
Functional linear regression models are effectively used in gene-based association analysis of complex traits. These models combine information about individual genetic variants, taking into account their positions and reducing the influence of noise and/or observation errors. To increase the power of methods, where several differently informative components are combined, weights are introduced to give the advantage to more informative components. Allele-specific weights have been introduced to collapsing and kernel-based approaches to gene-based association analysis. Here we have for the first time introduced weights to functional linear regression models adapted for both independent and family samples. Using data simulated on the basis of GAW17 genotypes and weights defined by allele frequencies via the beta distribution, we demonstrated that type I errors correspond to declared values and that increasing the weights of causal variants allows the power of functional linear models to be increased. We applied the new method to real data on blood pressure from the ORCADES sample. Five of the six known genes with P < 0.1 in at least one analysis had lower P values with weighted models. Moreover, we found an association between diastolic blood pressure and the VMP1 gene (P = 8.18×10-6), when we used a weighted functional model. For this gene, the unweighted functional and weighted kernel-based models had P = 0.004 and 0.006, respectively. The new method has been implemented in the program package FREGAT, which is freely available at https://cran.r-project.org/web/packages/FREGAT/index.html.
Assuntos
Estudo de Associação Genômica Ampla/métodos , Modelos Genéticos , Humanos , Análise de RegressãoRESUMO
BACKGROUND: Despite high heritability, little success was achieved in mapping genetic determinants of depression-related traits by means of genome-wide association studies. METHODS: To identify genes associated with depressive symptomology, we performed a gene-based association analysis of nonsynonymous variation captured using exome-sequencing and exome-chip genotyping in a genetically isolated population from the Netherlands (n = 1999). Finally, we reproduced our significant findings in an independent population-based cohort (n = 1604). RESULTS: We detected significant association of depressive symptoms with a gene NKPD1 (p = 3.7 × 10-08). Nonsynonymous variants in the gene explained 0.9% of sex- and age-adjusted variance of depressive symptoms in the discovery study, which is translated into 3.8% of the total estimated heritability (h2 = 0.24). Significant association of depressive symptoms with NKPD1 was also observed (n = 1604; p = 1.5 × 10-03) in the independent replication sample despite little overlap with the discovery cohort in the set of nonsynonymous genetic variants observed in the NKPD1 gene. Meta-analysis of the discovery and replication studies improved the association signal (p = 1.0 × 10-09). CONCLUSIONS: Our study suggests that nonsynonymous variation in the gene NKPD1 affects depressive symptoms in the general population. NKPD1 is predicted to be involved in the de novo synthesis of sphingolipids, which have been implicated in the pathogenesis of depression.
Assuntos
Depressão/genética , Transtorno Depressivo Maior/genética , Nucleosídeo-Trifosfatase/genética , Exoma , Feminino , Variação Genética , Estudo de Associação Genômica Ampla , Humanos , Masculino , Proteínas de Membrana/genética , Pessoa de Meia-Idade , Proteínas do Tecido Nervoso/genética , Países Baixos , Polimorfismo de Nucleotídeo Único , População Branca/genéticaRESUMO
UNLABELLED: Several approaches to the region-based association analysis of quantitative traits have recently been developed and successively applied. However, no software package has been developed that implements all of these approaches for either independent or structured samples. Here we introduce FREGAT (Family REGional Association Tests), an R package that can handle family and population samples and implements a wide range of region-based association methods including burden tests, functional linear models, and kernel machine-based regression. FREGAT can be used in genome/exome-wide region-based association studies of quantitative traits and candidate gene analysis. FREGAT offers many useful options to empower its users and increase the effectiveness and applicability of region-based association analysis. AVAILABILITY AND IMPLEMENTATION: https://cran.r-project.org/web/packages/FREGAT/index.html SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Online. CONTACT: belon@bionet.nsc.ru.
Assuntos
Exoma , Modelos Lineares , Software , HumanosRESUMO
Region-based association analysis is a more powerful tool for gene mapping than testing of individual genetic variants, particularly for rare genetic variants. The most powerful methods for regional mapping are based on the functional data analysis approach, which assumes that the regional genome of an individual may be considered as a continuous stochastic function that contains information about both linkage and linkage disequilibrium. Here, we extend this powerful approach, earlier applied only to independent samples, to the samples of related individuals. To this end, we additionally include a random polygene effects in functional linear model used for testing association between quantitative traits and multiple genetic variants in the region. We compare the statistical power of different methods using Genetic Analysis Workshop 17 mini-exome family data and a wide range of simulation scenarios. Our method increases the power of regional association analysis of quantitative traits compared with burden-based and kernel-based methods for the majority of the scenarios. In addition, we estimate the statistical power of our method using regions with small number of genetic variants, and show that our method retains its advantage over burden-based and kernel-based methods in this case as well. The new method is implemented as the R-function 'famFLM' using two types of basis functions: the B-spline and Fourier bases. We compare the properties of the new method using models that differ from each other in the type of their function basis. The models based on the Fourier basis functions have an advantage in terms of speed and power over the models that use the B-spline basis functions and those that combine B-spline and Fourier basis functions. The 'famFLM' function is distributed under GPLv3 license and is freely available at http://mga.bionet.nsc.ru/soft/famFLM/.
Assuntos
Estudos de Associação Genética/métodos , Modelos Lineares , Variação Genética , Genoma Humano , Humanos , Desequilíbrio de LigaçãoRESUMO
The kernel machine-based regression is an efficient approach to region-based association analysis aimed at identification of rare genetic variants. However, this method is computationally complex. The running time of kernel-based association analysis becomes especially long for samples with genetic (sub) structures, thus increasing the need to develop new and effective methods, algorithms, and software packages. We have developed a new R-package called fast family-based sequence kernel association test (FFBSKAT) for analysis of quantitative traits in samples of related individuals. This software implements a score-based variance component test to assess the association of a given set of single nucleotide polymorphisms with a continuous phenotype. We compared the performance of our software with that of two existing software for family-based sequence kernel association testing, namely, ASKAT and famSKAT, using the Genetic Analysis Workshop 17 family sample. Results demonstrate that FFBSKAT is several times faster than other available programs. In addition, the calculations of the three-compared software were similarly accurate. With respect to the available analysis modes, we combined the advantages of both ASKAT and famSKAT and added new options to empower FFBSKAT users. The FFBSKAT package is fast, user-friendly, and provides an easy-to-use method to perform whole-exome kernel machine-based regression association analysis of quantitative traits in samples of related individuals. The FFBSKAT package, along with its manual, is available for free download at http://mga.bionet.nsc.ru/soft/FFBSKAT/.
Assuntos
Exoma , Polimorfismo de Nucleotídeo Único , Característica Quantitativa Herdável , Análise de Sequência de DNA/métodos , SoftwareRESUMO
Regional-based association analysis instead of individual testing of each SNP was introduced in genome-wide association studies to increase the power of gene mapping, especially for rare genetic variants. For regional association tests, the kernel machine-based regression approach was recently proposed as a more powerful alternative to collapsing-based methods. However, the vast majority of existing algorithms and software for the kernel machine-based regression are applicable only to unrelated samples. In this paper, we present a new method for the kernel machine-based regression association analysis of quantitative traits in samples of related individuals. The method is based on the GRAMMAR+ transformation of phenotypes of related individuals, followed by use of existing kernel machine-based regression software for unrelated samples. We compared the performance of kernel-based association analysis on the material of the Genetic Analysis Workshop 17 family sample and real human data by using our transformation, the original untransformed trait, and environmental residuals. We demonstrated that only the GRAMMAR+ transformation produced type I errors close to the nominal value and that this method had the highest empirical power. The new method can be applied to analysis of related samples by using existing software for kernel-based association analysis developed for unrelated samples.
Assuntos
Locos de Características Quantitativas , Algoritmos , Humanos , Fenótipo , Polimorfismo de Nucleotídeo ÚnicoRESUMO
The variance component tests used in genome-wide association studies (GWAS) including large sample sizes become computationally exhaustive when the number of genetic markers is over a few hundred thousand. We present an extremely fast variance components-based two-step method, GRAMMAR-Gamma, developed as an analytical approximation within a framework of the score test approach. Using simulated and real human GWAS data sets, we show that this method provides unbiased estimates of the SNP effect and has a power close to that of the likelihood ratio test-based method. The computational complexity of our method is close to its theoretical minimum, that is, to the complexity of the analysis that ignores genetic structure. The running time of our method linearly depends on sample size, whereas this dependency is quadratic for other existing methods. Simulations suggest that GRAMMAR-Gamma may be used for association testing in whole-genome resequencing studies of large human cohorts.