Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
1.
Stat Appl Genet Mol Biol ; 23(1)2024 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-38235525

RESUMO

Population stratification (PS) is one major source of confounding in both single nucleotide polymorphism (SNP) and haplotype association studies. To address PS, principal component regression (PCR) and linear mixed model (LMM) are the current standards for SNP associations, which are also commonly borrowed for haplotype studies. However, the underfitting and overfitting problems introduced by PCR and LMM, respectively, have yet to be addressed. Furthermore, there have been only a few theoretical approaches proposed to address PS specifically for haplotypes. In this paper, we propose a new method under the Bayesian LASSO framework, QBLstrat, to account for PS in identifying rare and common haplotypes associated with a continuous trait of interest. QBLstrat utilizes a large number of principal components (PCs) with appropriate priors to sufficiently correct for PS, while shrinking the estimates of unassociated haplotypes and PCs. We compare the performance of QBLstrat with the Bayesian counterparts of PCR and LMM and a current method, haplo.stats. Extensive simulation studies and real data analyses show that QBLstrat is superior in controlling false positives while maintaining competitive power for identifying true positives under PS.


Assuntos
Modelos Genéticos , Polimorfismo de Nucleotídeo Único , Haplótipos , Teorema de Bayes , Fenótipo , Estudo de Associação Genômica Ampla
2.
Ann Hum Genet ; 87(6): 302-315, 2023 11.
Artigo em Inglês | MEDLINE | ID: mdl-37771252

RESUMO

INTRODUCTION: Population stratification (PS) is a major source of confounding in population-based genetic association studies of quantitative traits. Principal component regression (PCR) and linear mixed model (LMM) are two commonly used approaches to account for PS in association studies. Previous studies have shown that LMM can be interpreted as including all principal components (PCs) as random-effect covariates. However, including all PCs in LMM may dilute the influence of relevant PCs in some scenarios, while including only a few preselected PCs in PCR may fail to fully capture the genetic diversity. MATERIALS AND METHODS: To address these shortcomings, we introduce Bayestrat-a method to detect associated variants with PS correction under the Bayesian LASSO framework. To adjust for PS, Bayestrat accommodates a large number of PCs and utilizes appropriate shrinkage priors to shrink the effects of nonassociated PCs. RESULTS: Simulation results show that Bayestrat consistently controls type I error rates and achieves higher power compared to its non-shrinkage counterparts, especially when the number of PCs included in the model is large. As a demonstration of the utility of Bayestrat, we apply it to the Multi-Ethnic Study of Atherosclerosis (MESA). Variants and genes associated with serum triglyceride or HDL cholesterol are identified in our analyses. DISCUSSION: The automatic and self-selection features of Bayestrat make it particularly suited in situations with complex underlying PS scenarios, where it is unknown a priori which PCs are potential confounders, yet the number that needs to be considered could be large in order to fully account for PS.


Assuntos
Estudo de Associação Genômica Ampla , Modelos Genéticos , Humanos , Teorema de Bayes , Estudos de Associação Genética , Simulação por Computador , Modelos Lineares , Fenótipo
3.
Genet Epidemiol ; 45(1): 36-45, 2021 02.
Artigo em Inglês | MEDLINE | ID: mdl-32864779

RESUMO

The breakthroughs in next generation sequencing have allowed us to access data consisting of both common and rare variants, and in particular to investigate the impact of rare genetic variation on complex diseases. Although rare genetic variants are thought to be important components in explaining genetic mechanisms of many diseases, discovering these variants remains challenging, and most studies are restricted to population-based designs. Further, despite the shift in the field of genome-wide association studies (GWAS) towards studying rare variants due to the "missing heritability" phenomenon, little is known about rare X-linked variants associated with complex diseases. For instance, there is evidence that X-linked genes are highly involved in brain development and cognition when compared with autosomal genes; however, like most GWAS for other complex traits, previous GWAS for mental diseases have provided poor resources to deal with identification of rare variant associations on X-chromosome. In this paper, we address the two issues described above by proposing a method that can be used to test X-linked variants using sequencing data on families. Our method is much more general than existing methods, as it can be applied to detect both common and rare variants, and is applicable to autosomes as well. Our simulation study shows that the method is efficient, and exhibits good operational characteristics. An application to the University of Miami Study on Genetics of Autism and Related Disorders also yielded encouraging results.


Assuntos
Genes Ligados ao Cromossomo X , Estudo de Associação Genômica Ampla , Variação Genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Modelos Genéticos , Herança Multifatorial
4.
Ann Hum Genet ; 83(6): 454-464, 2019 11.
Artigo em Inglês | MEDLINE | ID: mdl-31322288

RESUMO

Unaccounted population stratification can lead to false-positive findings and can mask the true association signals in identification of disease-related genetic variants. The computational simplicity of principal component analysis (PCA) makes it a widely used method for population stratification adjustment. However, given that genotype data are generally represented by numerical values 0, 1, and 2, corresponding to the number of minor alleles, it is more reasonable to consider genotype data as categorical data. Because PCA is inherently only suitable for continuous variables, it is not appropriate to directly apply PCA on genotype data. Second, although common variants have been extensively studied, little is known about the stratification of rare variants and its impact on association tests. Over the last decade, there has been a shift in the genome-wide association studies toward studying low-frequency (minor allele frequency [MAF] between 0.01 and 0.05) and rare (MAF less than 0.01) variants, which are now widely reputed as complex trait determinants. The fact that rare variants are not stratified in the same way as common variants necessitates the development of statistical methods that can capture stratification patterns for low-frequency and rare variants. To address these limitations, we investigate performances of generalized PCA and similarity-matrix-based PCA methods to detect underlying structures for rare and common variants. We demonstrate, through simulated and real datasets, that a special case of generalized PCA (i.e., logistic PCA) is able to adjust for population stratification in rare variants much more effectively than standard PCA while their performances are comparable for common variants.


Assuntos
Genética Populacional , Estudo de Associação Genômica Ampla , Modelos Genéticos , Algoritmos , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla/métodos , Humanos , Análise de Componente Principal , Curva ROC
5.
Genet Epidemiol ; 41(4): 363-371, 2017 05.
Artigo em Inglês | MEDLINE | ID: mdl-28300291

RESUMO

Recent advances in genotyping with high-density markers allow researchers access to genomic variants including rare ones. Linkage disequilibrium (LD) is widely used to provide insight into evolutionary history. It is also the basis for association mapping in humans and other species. Better understanding of the genomic LD structure may lead to better-informed statistical tests that can improve the power of association studies. Although rare variant associations with common diseases (RVCD) have been extensively studied recently, there is very limited understanding, and even controversial view of LD structures among rare variants and between rare and common variants. In fact, many popular RVCD tests make the assumptions that rare variants are independent. In this report, we show that two commonly used LD measures are not capable of detecting LD when rare variants are involved. We present this argument from two perspectives, both the LD measures themselves and the computational issues associated with them. To address these issues, we propose an alternative LD measure, the polychoric correlation, that was originally designed for detecting associations among categorical variables. Using simulated as well as the 1000 Genomes data, we explore the performances of LD measures in detail and discuss their implications in association studies.


Assuntos
Variação Genética , Estudo de Associação Genômica Ampla , Cromossomos Humanos Par 21/genética , Simulação por Computador , Frequência do Gene/genética , Genótipo , Humanos , Desequilíbrio de Ligação/genética , Polimorfismo de Nucleotídeo Único/genética
6.
Genet Epidemiol ; 38 Suppl 1: S49-56, 2014 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-25112188

RESUMO

In the past decade, genome-wide association studies have been successful in identifying genetic loci that play a role in many complex diseases. Despite this, it has become clear that for many traits, investigation of single common variants does not give a complete picture of the genetic contribution to the phenotype. Therefore a number of new approaches are currently being investigated to further the search for susceptibility loci or regions. We summarize the contributions to Genetic Analysis Workshop 18 (GAW18) that concern this search using methods for population-based association analysis. Many of the members of our GAW18 working group made use of data types that have only recently become available through the use of next-generation sequencing technologies, with many focusing on the investigation of rare variants instead of or in combination with common variants. Some contributors used a haplotype-based approach, which to date has been used relatively infrequently but may become more important for analyzing rare variant association data. Others analyzed gene-gene or gene-environment interactions, where novel statistical approaches were needed to make the best use of the available information without requiring an excessive computational burden. GAW18 provided participants with the chance to make use of state-of-the-art data, statistical techniques, and technology. We report here some of the experiences and conclusions that were reached by workshop participants who analyzed the GAW18 data as a population-based association study.


Assuntos
Interação Gene-Ambiente , Estudo de Associação Genômica Ampla , Pressão Sanguínea/genética , Variação Genética , Genética Populacional , Haplótipos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA
7.
Ann Hum Genet ; 79(3): 199-208, 2015 May.
Artigo em Inglês | MEDLINE | ID: mdl-25875492

RESUMO

Because next generation sequencing technology that can rapidly genotype most genetic variations genome, there is considerable interest in investigating the effects of rare variants on complex diseases. In this paper, we propose four Kullback-Leibler distance-based Tests (KLTs) for detecting genotypic differences between cases and controls. There are several features that set the proposed tests apart from existing ones. First, by explicitly considering and comparing the distributions of genotypes, existence of variants with opposite directional effects does not compromise the power of KLTs. Second, it is not necessary to set a threshold for rare variants as the KL definition makes it reasonable to consider rare and common variants together without worrying about the contribution from one type overshadowing the other. Third, KLTs are robust to null variants thanks to a built-in noise fighting mechanism. Finally, correlation among variants is taken into account implicitly so the KLTs work well regardless of the underlying LD structure. Through extensive simulations, we demonstrated good performance of KLTs compared to the sum of squared score test (SSU) and optimal sequence kernel association test (SKAT-O). Moreover, application to the Dallas Heart Study data illustrates the feasibility and performance of KLTs in a realistic setting.


Assuntos
Estudo de Associação Genômica Ampla/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Simulação por Computador , Variação Genética , Genótipo , Humanos , Modelos Genéticos
8.
Hum Hered ; 74(1): 51-60, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23154579

RESUMO

BACKGROUND: Despite the thrilling advances in identifying gene variants that influence common diseases, most of the heritable risk for many common diseases still remains unidentified. One of the possible reasons for this missing heritability is that the genome-wide association study (GWAS) approaches have been focusing on common rather than rare single nucleotide variants (SNVs). Consequently, there is currently a great deal of interest in developing methods that can interrogate rare variants for association with diseases. METHODS: We propose a two-step method (termed rPLS) to reveal possible genetic effects related to rare as well as common variants. The procedure starts with removing irrelevant variants using penalized regression (regularization) which is followed by partial least squares (PLS) on the surviving SNVs to find an optimal linear combination of rare and common SNVs within a genomic region that is tested for its association with the trait of interest. RESULTS: Simulation settings based on the 1000 Genomes sequencing data and reflecting real situations demonstrated that rPLS performs well compared to existing methods especially when there are a large number of noncausal variants (both rare and common) present in the gene and when causal SNVs have different effect sizes and directions.


Assuntos
Predisposição Genética para Doença , Variação Genética , Genômica/métodos , Estudo de Associação Genômica Ampla/métodos , Humanos , Polimorfismo de Nucleotídeo Único
9.
PLoS One ; 9(1): e86126, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24465912

RESUMO

With the advent of next-generation sequencing technology, rare variant association analysis is increasingly being conducted to identify genetic variants associated with complex traits. In recent years, significant effort has been devoted to develop powerful statistical methods to test such associations for population-based designs. However, there has been relatively little development for family-based designs although family data have been shown to be more powerful to detect rare variants. This study introduces a blocking approach that extends two popular family-based common variant association tests to rare variants association studies. Several options are considered to partition a genomic region (gene) into "independent" blocks by which information from SNVs is aggregated within a block and an overall test statistic for the entire genomic region is calculated by combining information across these blocks. The proposed methodology allows different variants to have different directions (risk or protective) and specification of minor allele frequency threshold is not needed. We carried out a simulation to verify the validity of the method by showing that type I error is well under control when the underlying null hypothesis and the assumption of independence across blocks are satisfied. Further, data from the Genetic Analysis Workshop [Formula: see text] are utilized to illustrate the feasibility and performance of the proposed methodology in a realistic setting.


Assuntos
Interpretação Estatística de Dados , Estudos de Associação Genética/métodos , Algoritmos , Simulação por Computador , Frequência do Gene , Humanos , Desequilíbrio de Ligação , Modelos Genéticos , Polimorfismo de Nucleotídeo Único , Curva ROC
10.
BMC Proc ; 8(Suppl 1 Genetic Analysis Workshop 18Vanessa Olmo): S58, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25519393

RESUMO

For almost all complex traits studied in humans, the identified genetic variants discovered to date have accounted for only a small portion of the estimated trait heritability. Consequently, several methods have been developed to identify rare single-nucleotide variants associated with complex traits for population-based designs. Because rare disease variants tend to be enriched in families containing multiple affected individuals, family-based designs can play an important role in the identification of rare causal variants. In this study, we utilize Genetic Analysis Workshop 18 simulated data to examine the performance of some existing rare variant identification methods for unrelated individuals, including our recent method (rPLS). The simulated data is used to investigate whether there is an advantage to using family data compared to case-control data. The results indicate that population-based methods suffer from power loss, especially when the sample size is small. The family-based method employed in this paper results in higher power but fails to control type I error. Our study also highlights the importance of the phenotype choice, which can affect the power of detecting causal genes substantially.

11.
BMC Proc ; 5 Suppl 9: S19, 2011 Nov 29.
Artigo em Inglês | MEDLINE | ID: mdl-22373126

RESUMO

Genome-wide association studies are largely based on single-nucleotide polymorphisms and rest on the common disease/common variants (single-nucleotide polymorphisms) hypothesis. However, it has been argued in the last few years and is well accepted now that rare variants are valuable for studying common diseases. Although current genome-wide association studies have successfully discovered many genetic variants that are associated with common diseases, detecting associated rare variants remains a great challenge. Here, we propose two partial least-squares approaches to aggregate the signals of many single-nucleotide polymorphisms (SNPs) within a gene to reveal possible genetic effects related to rare variants. The availability of the 1000 Genomes Project offers us the opportunity to evaluate the effectiveness of these two gene-based approaches. Compared to results from a SNP-based analysis, the proposed methods were able to identify some (rare) SNPs that were missed by the SNP-based analysis.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA