Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 36
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Am J Hum Genet ; 111(8): 1750-1769, 2024 Aug 08.
Artigo em Inglês | MEDLINE | ID: mdl-39025064

RESUMO

Joint association analysis of multiple traits with multiple genetic variants can provide insight into genetic architecture and pleiotropy, improve trait prediction, and increase power for detecting association. Furthermore, some traits are naturally high-dimensional, e.g., images, networks, or longitudinally measured traits. Assessing significance for multitrait genetic association can be challenging, especially when the sample has population sub-structure and/or related individuals. Failure to adequately adjust for sample structure can lead to power loss and inflated type 1 error, and commonly used methods for assessing significance can work poorly with a large number of traits or be computationally slow. We developed JASPER, a fast, powerful, robust method for assessing significance of multitrait association with a set of genetic variants, in samples that have population sub-structure, admixture, and/or relatedness. In simulations, JASPER has higher power, better type 1 error control, and faster computation than existing methods, with the power and speed advantage of JASPER increasing with the number of traits. JASPER is potentially applicable to a wide range of association testing applications, including for multiple disease traits, expression traits, image-derived traits, and microbiome abundances. It allows for covariates, ascertainment, and rare variants and is robust to phenotype model misspecification. We apply JASPER to analyze gene expression in the Framingham Heart Study, where, compared to alternative approaches, JASPER finds more significant associations, including several that indicate pleiotropic effects, most of which replicate previous results, while others have not previously been reported. Our results demonstrate the promise of JASPER for powerful multitrait analysis in structured samples.


Assuntos
Pleiotropia Genética , Humanos , Estudo de Associação Genômica Ampla/métodos , Fenótipo , Expressão Gênica/genética , Simulação por Computador , Modelos Genéticos , Locos de Características Quantitativas , Polimorfismo de Nucleotídeo Único
2.
PLoS Genet ; 19(11): e1011020, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37934792

RESUMO

In genetic association analysis of complex traits, permutation testing can be a valuable tool for assessing significance when the distribution of the test statistic is unknown or not well-approximated. This commonly arises, e.g, in tests of gene-set, pathway or genome-wide significance, or when the statistic is formed by machine learning or data adaptive methods. Existing applications include eQTL mapping, association testing with rare variants, inclusion of admixed individuals in genetic association analysis, and epistasis detection among many others. For genetic association testing in samples with population structure and/or relatedness, use of naive permutation can lead to inflated type 1 error. To address this in quantitative traits, the MVNpermute method was developed. However, for association mapping of a binary trait, the relationship between the mean and variance makes both naive permutation and the MVNpermute method invalid. We propose BRASS, a permutation method for binary traits, for use in association mapping in structured samples. In addition to modeling structure in the sample, BRASS allows for covariates, ascertainment and simultaneous testing of multiple markers, and it accommodates a wide range of test statistics. In simulation studies, we compare BRASS to other permutation and resampling-based methods in a range of scenarios that include population structure, familial relatedness, ascertainment and phenotype model misspecification. In these settings, we demonstrate the superior control of type 1 error by BRASS compared to the other 6 methods considered. We apply BRASS to assess genome-wide significance for association analyses in domestic dog for elbow dysplasia (ED) and idiopathic epilepsy (IE). For both traits we detect previously identified associations, and in addition, for ED, we detect significant association with a SNP on chromosome 35 that was not detected by previous analyses, demonstrating the potential of the method.


Assuntos
Testes Genéticos , Modelos Genéticos , Animais , Cães , Fenótipo , Estudos de Associação Genética , Simulação por Computador , Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único/genética
3.
J Stat Softw ; 106(10)2023.
Artigo em Inglês | MEDLINE | ID: mdl-37205880

RESUMO

Quantile-Quantile (Q-Q) plots are often difficult to interpret because it is unclear how large the deviation from the theoretical distribution must be to indicate a lack of fit. Most Q-Q plots could benefit from the addition of meaningful global testing bands, but the use of such bands unfortunately remains rare because of the drawbacks of current approaches and packages. These drawbacks include incorrect global Type I error rate, lack of power to detect deviations in the tails of the distribution, relatively slow computation for large data sets, and limited applicability. To solve these problems, we apply the equal local levels global testing method, which we have implemented in the R Package qqconf, a versatile tool to create Q-Q plots and probability-probability (P-P) plots in a wide variety of settings, with simultaneous testing bands rapidly created using recently-developed algorithms. qqconf can easily be used to add global testing bands to Q-Q plots made by other packages. In addition to being quick to compute, these bands have a variety of desirable properties, including accurate global levels, equal sensitivity to deviations in all parts of the null distribution (including the tails), and applicability to a range of null distributions. We illustrate the use of qqconf in several applications: assessing normality of residuals from regression, assessing accuracy of p values, and use of Q-Q plots in genome-wide association studies.

4.
Am J Hum Genet ; 102(4): 574-591, 2018 04 05.
Artigo em Inglês | MEDLINE | ID: mdl-29625022

RESUMO

In complex-trait mapping, when each subject has multiple measurements of a quantitative trait over time, power for detecting genetic association can be gained by the inclusion of all measurements and not just single time points or averages in the analysis. To increase power and control type 1 error, one should account for dependence among observations for a single individual as well as dependence between observations of related individuals if they are present in the sample. We propose L-GATOR, a retrospective, mixed-effects method for association mapping of longitudinally measured traits in samples with related individuals. L-GATOR allows arbitrary time points for different individuals, incorporates both time-varying and static covariates, and properly addresses various types of dependence. In simulations, we show that L-GATOR outperforms existing prospective methods in terms of both type 1 error and power when there is phenotype model misspecification or missing data. Compared with the previously proposed longGWAS method, L-GATOR was more than ten times faster for association testing in our simulations and almost 100 times faster for parameter estimation. L-GATOR is applicable to essentially arbitrary combinations of related and unrelated individuals, including small families as well as large, complex pedigrees. We apply the method to data from the Framingham Heart Study to identify association between longitudinal systolic blood pressure measurements and genome-wide SNPs. Of the smallest p values, one-third occur in or near genes that have been previously identified as associated with pulse pressure (such as PIK3CG) and systolic and diastolic blood pressure (such as C10orf107), showing that L-GATOR is able to prioritize relevant loci in a genome screen.


Assuntos
Estudo de Associação Genômica Ampla , Locos de Características Quantitativas/genética , Software , Pressão Sanguínea/genética , Estudos de Coortes , Feminino , Humanos , Masculino , Modelos Genéticos , Fenótipo , Sístole/genética
5.
Proc Natl Acad Sci U S A ; 115(24): E5440-E5449, 2018 06 12.
Artigo em Inglês | MEDLINE | ID: mdl-29848634

RESUMO

Infectious diseases are often affected by specific pairings of hosts and pathogens and therefore by both of their genomes. The integration of a pair of genomes into genome-wide association mapping can provide an exquisitely detailed view of the genetic landscape of complex traits. We present a statistical method, ATOMM (Analysis with a Two-Organism Mixed Model), that maps a trait of interest to a pair of genomes simultaneously; this method makes use of whole-genome sequence data for both host and pathogen organisms. ATOMM uses a two-way mixed-effect model to test for genetic associations and cross-species genetic interactions while accounting for sample structure including interactions between the genetic backgrounds of the two organisms. We demonstrate the applicability of ATOMM to a joint association study of quantitative disease resistance (QDR) in the Arabidopsis thaliana-Xanthomonas arboricola pathosystem. Our method uncovers a clear host-strain specificity in QDR and provides a powerful approach to identify genetic variants on both genomes that contribute to phenotypic variation.


Assuntos
Arabidopsis/genética , Genoma/genética , Interações Hospedeiro-Patógeno/genética , Mapeamento Cromossômico/métodos , Resistência à Doença/genética , Variação Genética/genética , Estudo de Associação Genômica Ampla/métodos , Fenótipo , Locos de Características Quantitativas/genética , Xanthomonas/genética
6.
Am J Hum Genet ; 98(2): 243-55, 2016 Feb 04.
Artigo em Inglês | MEDLINE | ID: mdl-26833331

RESUMO

In genetic association testing, failure to properly control for population structure can lead to severely inflated type 1 error and power loss. Meanwhile, adjustment for relevant covariates is often desirable and sometimes necessary to protect against spurious association and to improve power. Many recent methods to account for population structure and covariates are based on linear mixed models (LMMs), which are primarily designed for quantitative traits. For binary traits, however, LMM is a misspecified model and can lead to deteriorated performance. We propose CARAT, a binary-trait association testing approach based on a mixed-effects quasi-likelihood framework, which exploits the dichotomous nature of the trait and achieves computational efficiency through estimating equations. We show in simulation studies that CARAT consistently outperforms existing methods and maintains high power in a wide range of population structure settings and trait models. Furthermore, CARAT is based on a retrospective approach, which is robust to misspecification of the phenotype model. We apply our approach to a genome-wide analysis of Crohn disease, in which we replicate association with 17 previously identified regions. Moreover, our analysis on 5p13.1, an extensively reported region of association, shows evidence for the presence of multiple independent association signals in the region. This example shows how CARAT can leverage known disease risk factors to shed light on the genetic architecture of complex traits.


Assuntos
Doença de Crohn/genética , Testes Genéticos/métodos , Adulto , Feminino , Estudos de Associação Genética , Humanos , Modelos Lineares , Masculino , Pessoa de Meia-Idade , Modelos Genéticos , Fenótipo , Polimorfismo de Nucleotídeo Único , Grupos Populacionais/genética , Característica Quantitativa Herdável , Estudos Retrospectivos , Adulto Jovem
7.
PLoS Genet ; 12(10): e1006329, 2016 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-27695091

RESUMO

We consider the problem of genetic association testing of a binary trait in a sample that contains related individuals, where we adjust for relevant covariates and allow for missing data. We propose CERAMIC, an estimating equation approach that can be viewed as a hybrid of logistic regression and linear mixed-effects model (LMM) approaches. CERAMIC extends the recently proposed CARAT method to allow samples with related individuals and to incorporate partially missing data. In simulations, we show that CERAMIC outperforms existing LMM and generalized LMM approaches, maintaining high power and correct type 1 error across a wider range of scenarios. CERAMIC results in a particularly large power increase over existing methods when the sample includes related individuals with some missing data (e.g., when some individuals with phenotype and covariate information have missing genotype), because CERAMIC is able to make use of the relationship information to incorporate partially missing data in the analysis while correcting for dependence. Because CERAMIC is based on a retrospective analysis, it is robust to misspecification of the phenotype model, resulting in better control of type 1 error and higher power than that of prospective methods, such as GMMAT, when the phenotype model is misspecified. CERAMIC is computationally efficient for genomewide analysis in samples of related individuals of almost any configuration, including small families, unrelated individuals and even large, complex pedigrees. We apply CERAMIC to data on type 2 diabetes (T2D) from the Framingham Heart Study. In a genome scan, 9 of the 10 smallest CERAMIC p-values occur in or near either known T2D susceptibility loci or plausible candidates, verifying that CERAMIC is able to home in on the important loci in a genome scan.


Assuntos
Diabetes Mellitus Tipo 2/genética , Estudos de Associação Genética , Polimorfismo de Nucleotídeo Único/genética , Locos de Características Quantitativas/genética , Simulação por Computador , Diabetes Mellitus Tipo 2/patologia , Testes Genéticos , Genótipo , Humanos , Modelos Logísticos , Modelos Genéticos , Linhagem , Fenótipo
8.
Genet Epidemiol ; 40(6): 446-60, 2016 09.
Artigo em Inglês | MEDLINE | ID: mdl-27256766

RESUMO

In a large-scale genetic association study, the number of phenotyped individuals available for sequencing may, in some cases, be greater than the study's sequencing budget will allow. In that case, it can be important to prioritize individuals for sequencing in a way that optimizes power for association with the trait. Suppose a cohort of phenotyped individuals is available, with some subset of them possibly already sequenced, and one wants to choose an additional fixed-size subset of individuals to sequence in such a way that the power to detect association is maximized. When the phenotyped sample includes related individuals, power for association can be gained by including partial information, such as phenotype data of ungenotyped relatives, in the analysis, and this should be taken into account when assessing whom to sequence. We propose G-STRATEGY, which uses simulated annealing to choose a subset of individuals for sequencing that maximizes the expected power for association. In simulations, G-STRATEGY performs extremely well for a range of complex disease models and outperforms other strategies with, in many cases, relative power increases of 20-40% over the next best strategy, while maintaining correct type 1 error. G-STRATEGY is computationally feasible even for large datasets and complex pedigrees. We apply G-STRATEGY to data on high-density lipoprotein and low-density lipoprotein from the AGES-Reykjavik and REFINE-Reykjavik studies, in which G-STRATEGY is able to closely approximate the power of sequencing the full sample by selecting for sequencing a only small subset of the individuals.


Assuntos
Estudos de Associação Genética , Software , Genótipo , Humanos , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas
9.
Am J Hum Genet ; 92(5): 652-66, 2013 May 02.
Artigo em Inglês | MEDLINE | ID: mdl-23643379

RESUMO

Genetic association studies often sample individuals with known familial relationships in addition to unrelated individuals, and it is common for some individuals to have missing data (phenotypes, genotypes, or covariates). When some individuals in a sample are related, power can be gained by incorporating all individuals in the analysis, including individuals with partially missing data, while properly accounting for the dependence among them. We propose MASTOR, a mixed-model, retrospective score test for genetic association with a quantitative trait. MASTOR achieves high power in samples that contain related individuals by making full use of the relationship information to incorporate partially missing data in the analysis while correcting for dependence. Individuals with available phenotype and covariate information who are not genotyped but have genotyped relatives in the sample can still contribute to the association analysis because of the dependence among genotypes. Similarly, individuals who are genotyped but are missing covariate or phenotype information can contribute to the analysis. MASTOR is valid even when the phenotype model is misspecified and with either random or phenotype-based ascertainment. In simulations, we demonstrate the correct type 1 error of MASTOR, the increase in power that comes from making full use of the relationship information, the robustness to misspecification of the phenotype model, and the improvement in power that comes from modeling the heritability. We show that MASTOR is computationally feasible and practical in genome-wide association studies. We apply MASTOR to data on high-density lipoprotein cholesterol from the Framingham Heart study.


Assuntos
Estudos de Associação Genética/métodos , Padrões de Herança/genética , Modelos Genéticos , Fenótipo , Característica Quantitativa Herdável , Software , Simulação por Computador , Humanos
10.
Hum Hered ; 80(4): 187-95, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-27576759

RESUMO

Case-control genetic association analysis is an extremely common tool in human complex trait mapping. From a statistical point of view, the analysis of binary traits poses somewhat different challenges from the analysis of quantitative traits. Desirable features of a binary trait mapping approach would include (1) phenotype modeled as binary, with appropriate dependence between the mean and variance; (2) appropriate correction for relevant covariates; (3) appropriate correction for sample structure of various types, including related individuals, admixture and other types of population structure; (4) both fast and accurate computations; (5) robustness to ascertainment and other types of phenotype model misspecification, and (6) ability to leverage partially missing data to increase power. We review these challenges and argue, both theoretically and in simulations, for the value of retrospective association analysis as a way to overcome some of the limitations of the phenotype model, including model misspecification due to ascertainment. We give an overview of two recent retrospective methods, CARAT and CERAMIC, that are designed to meet criteria 1-6.


Assuntos
Modelos Genéticos , Herança Multifatorial , Simulação por Computador , Humanos , Fenótipo , Característica Quantitativa Herdável , Estudos Retrospectivos
11.
Genet Epidemiol ; 38(1): 10-20, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24248908

RESUMO

The recent development of high-throughput sequencing technologies calls for powerful statistical tests to detect rare genetic variants associated with complex human traits. Sampling related individuals in sequencing studies offers advantages over sampling unrelated individuals only, including improved protection against sequencing error, the ability to use imputation to make more efficient use of sequence data, and the possibility of power boost due to more observed copies of extremely rare alleles among relatives. With related individuals, familial correlation needs to be accounted for to ensure correct control over type I error and to improve power. Recognizing the limitations of existing rare-variant association tests for family data, we propose MONSTER (Minimum P-value Optimized Nuisance parameter Score Test Extended to Relatives), a robust rare-variant association test, which generalizes the SKAT-O method for independent samples. MONSTER uses a mixed effects model that accounts for covariates and additive polygenic effects. To obtain a powerful test, MONSTER adaptively adjusts to the unknown configuration of effects of rare-variant sites. MONSTER also offers an analytical way of assessing P-values, which is desirable because permutation is not straightforward to conduct in related samples. In simulation studies, we demonstrate that MONSTER effectively accounts for family structure, is computationally efficient and compares very favorably, in terms of power, to previously proposed tests that allow related individuals. We apply MONSTER to an analysis of high-density lipoprotein cholesterol in the Framingham Heart Study, where we are able to replicate association with three genes.


Assuntos
Família , Estudos de Associação Genética/métodos , Variação Genética/genética , Locos de Características Quantitativas/genética , Adolescente , Adulto , Idoso , Alelos , Criança , HDL-Colesterol/genética , Simulação por Computador , Feminino , Inquéritos Epidemiológicos , Coração , Hereditariedade/genética , Humanos , Masculino , Pessoa de Meia-Idade , Modelos Genéticos , Herança Multifatorial/genética , Linhagem , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Projetos de Pesquisa , Adulto Jovem
12.
bioRxiv ; 2024 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-38405994

RESUMO

In genetic association analysis of complex traits, detection of interaction (either GxG or GxE) can help to elucidate the genetic architecture and biological mechanisms underlying the trait. Detection of interaction in a genome-wide association study (GWAS) can be methodologically challenging for various reasons, including a high burden of multiple comparisons when testing for epistasis between all possible pairs of a set of genomewide variants, as well as heteroscedasticity effects occurring in the presence of GxG or GxE interaction. In this paper, we address the problem of an even more striking phenomenon that we call the "feast or famine" effect that occurs when testing interaction in a genomewide context. As we verify, even in a simplified setting in which there is no interaction at all (and so no heteroscedasticity), in a GWAS to detect GxG or GxE interaction with a fixed genetic variant or environmental factor, the distribution of the genome-wide p-values under the null hypothesis is not the i.i.d. uniform one that is commonly assumed. Using standard methods, even if all SNPs are independent, some GWASs will have systematically underinflated p-values ("feast"), and others will have systematically overinflated p-values ("famine"), which can lead to false detection of interaction, reduced power, inconsistent results across studies, and failure to replicate true signal. This startling phenomenon is specific to detection of interaction in a GWAS, and it may partly explain why such detection has so far proved challenging and difficult to replicate. We show theoretically that the key cause of this phenomenon is which variables are conditioned on in the analysis, and this suggests an approach to correct the problem by changing the way the conditioning is done. Using this insight, we have developed the TINGA method to adjust the interaction test statistics to make their p-values closer to uniform under the null hypothesis. In simulations we show that TINGA both controls type 1 error and improves power. TINGA allows for covariates and population structure through use of a linear mixed model and accounts for heteroscedasticity. We apply TINGA to detection of epistasis in a study of flowering time in Arabidopsis thaliana.

13.
bioRxiv ; 2024 Jul 16.
Artigo em Inglês | MEDLINE | ID: mdl-38464248

RESUMO

Understanding the genetic regulatory mechanisms of gene expression is a challenging and ongoing problem. Genetic variants that are associated with expression levels are readily identified when they are proximal to the gene (i.e., cis-eQTLs), but SNPs distant from the gene whose expression levels they are associated with (i.e., trans-eQTLs) have been much more difficult to discover, even though they account for a majority of the heritability in gene expression levels. A major impediment to the identification of more trans-eQTLs is the lack of statistical methods that are powerful enough to overcome the obstacles of small effect sizes and large multiple testing burden of trans-eQTL mapping. Here, we propose ADELLE, a powerful statistical testing framework that requires only summary statistics and is designed to be most sensitive to SNPs that are associated with multiple gene expression levels, a characteristic of many trans-eQTLs. In simulations, we show that for detecting SNPs that are associated with 0.1%-2% of 10,000 traits, among the 7 methods we consider ADELLE is clearly the most powerful overall, with either the highest power or power not significantly different from the highest for all settings in that range. We apply ADELLE to a mouse advanced intercross line data set and show its ability to find trans-eQTLs that were not significant under a standard analysis. This demonstrates that ADELLE is a powerful tool at uncovering trans regulators of genetic expression.

14.
Genet Epidemiol ; 36(5): 438-50, 2012 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-22552845

RESUMO

Genetic variants on the X-chromosome could potentially play an important role in some complex traits. However, development of methods for detecting association with X-linked markers has lagged behind that for autosomal markers. We propose methods for case-control association testing with X-chromosome markers in samples with related individuals. Our method, XM, appropriately adjusts for both correlation among relatives and male-female allele copy number differences. Features of XM include: (1) it is applicable to and computationally feasible for completely general combinations of family and case-control designs; (2) it allows for both unaffected controls and controls of unknown phenotype to be included in the same analysis; (3) it can incorporate phenotype information on relatives with missing genotype data; and (4) it adjusts for sex-specific trait prevalence values. We propose two other tests, Xχ and XW, which can also be useful in certain contexts. We derive the best linear unbiased estimator of allele frequency, and its variance, for X-linked markers. In simulation studies with related individuals, we demonstrate the power and validity of the proposed methods. We apply the methods to X-chromosome association analysis of (1) asthma in a Hutterite sample and (2) alcohol dependence in the GAW 14 COGA data. In analysis (1), we demonstrate computational feasibility of XM and the applicability of our robust variance estimator. In analysis (2), we detect significant association, after Bonferroni correction, between alcohol dependence and single nucleotide polymorphism rs979606 in the monoamine oxidases A gene, where this gene has previously been found to be associated with substance abuse and antisocial behavior.


Assuntos
Estudos de Casos e Controles , Cromossomos Humanos X/genética , Genes Ligados ao Cromossomo X , Alelos , Mapeamento Cromossômico/métodos , Saúde da Família , Feminino , Frequência do Gene , Genótipo , Haplótipos , Humanos , Masculino , Modelos Genéticos , Modelos Estatísticos , Monoaminoxidase/genética , Fenótipo , Polimorfismo de Nucleotídeo Único
15.
Am J Hum Genet ; 86(2): 172-84, 2010 Feb 12.
Artigo em Inglês | MEDLINE | ID: mdl-20137780

RESUMO

Genome-wide association studies are routinely conducted to identify genetic variants that influence complex disorders. It is well known that failure to properly account for population or pedigree structure can lead to spurious association as well as reduced power. We propose a method, ROADTRIPS, for case-control association testing in samples with partially or completely unknown population and pedigree structure. ROADTRIPS uses a covariance matrix estimated from genome-screen data to correct for unknown population and pedigree structure while maintaining high power by taking advantage of known pedigree information when it is available. ROADTRIPS can incorporate data on arbitrary combinations of related and unrelated individuals and is computationally feasible for the analysis of genetic studies with millions of markers. In simulations with related individuals and population structure, including admixture, we demonstrate that ROADTRIPS provides a substantial improvement over existing methods in terms of power and type 1 error. The ROADTRIPS method can be used across a variety of study designs, ranging from studies that have a combination of unrelated individuals and small pedigrees to studies of isolated founder populations with partially known or completely unknown pedigrees. We apply the method to analyze two data sets: a study of rheumatoid arthritis in small UK pedigrees, from Genetic Analysis Workshop 15, and data from the Collaborative Study of the Genetics of Alcoholism on alcohol dependence in a sample of moderate-size pedigrees of European descent, from Genetic Analysis Workshop 14. We detect genome-wide significant association, after Bonferroni correction, in both studies.


Assuntos
Genética Populacional , Estudo de Associação Genômica Ampla/métodos , Linhagem , Software , Alcoolismo/genética , Artrite Reumatoide/genética , Estudos de Casos e Controles , Simulação por Computador , Feminino , Humanos , Masculino , Polimorfismo de Nucleotídeo Único/genética , Dinâmica Populacional , Fatores de Tempo
16.
bioRxiv ; 2023 Dec 19.
Artigo em Inglês | MEDLINE | ID: mdl-38187553

RESUMO

Joint association analysis of multiple traits with multiple genetic variants can provide insight into genetic architecture and pleiotropy, improve trait prediction and increase power for detecting association. Furthermore, some traits are naturally high-dimensional, e.g., images, networks or longitudinally measured traits. Assessing significance for multitrait genetic association can be challenging, especially when the sample has population sub-structure and/or related individuals. Failure to adequately adjust for sample structure can lead to power loss and inflated type 1 error, and commonly used methods for assessing significance can work poorly with a large number of traits or be computationally slow. We developed JASPER, a fast, powerful, robust method for assessing significance of multitrait association with a set of genetic variants, in samples that have population sub-structure, admixture and/or relatedness. In simulations, JASPER has higher power, better type 1 error control, and faster computation than existing methods, with the power and speed advantage of JASPER increasing with the number of traits. JASPER is potentially applicable to a wide range of association testing applications, including for multiple disease traits, expression traits, image-derived traits and microbiome abundances. It allows for covariates, ascertainment and rare variants and is robust to phenotype model misspecification. We apply JASPER to analyze gene expression in the Framingham Heart Study, where, compared to alternative approaches, JASPER finds more significant associations, including several that indicate pleiotropic effects, some of which replicate previous results, while others have not previously been reported. Our results demonstrate the promise of JASPER for powerful multitrait analysis in structured samples.

17.
Am J Hum Genet ; 85(5): 667-78, 2009 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-19913122

RESUMO

In genome-wide association studies, only a subset of all genomic variants are typed by current, high-throughput, SNP-genotyping platforms. However, many of the untyped variants can be well predicted from typed variants, with linkage disequilibrium (LD) information among typed and untyped variants available from an external reference panel such as HapMap. Incorporation of such external information can allow one to perform tests of association between untyped variants and phenotype, thereby making more efficient use of the available genotype data. When related individuals are included in case-control samples, the dependence among their genotypes must be properly addressed for valid association testing. In the context of testing untyped variants, an additional analytical challenge is that the dependence, across related individuals, of the partial information on untyped-SNP genotypes must also be assessed and incorporated into the analysis for valid inference. We address this challenge with ATRIUM, a method for case-control association testing with untyped SNPs, based on genome screen data in samples in which some individuals are related. ATRIUM uses LD information from an external reference panel to specify a one-degree-of-freedom test of association with an untyped SNP. It properly accounts for dependence in the partial information on untyped-SNP genotypes across related individuals. We demonstrate that ATRIUM is robust in that it maintains the nominal type I error rate even when the external reference panel is not well matched to the case-control sample. We apply the method to detect association between type 2 diabetes and variants on chromosome 10 in the Framingham SHARe data.


Assuntos
Genoma Humano , Estudo de Associação Genômica Ampla , Núcleo Familiar , Polimorfismo de Nucleotídeo Único/genética , Alelos , Povo Asiático/genética , População Negra/genética , Estudos de Casos e Controles , Mapeamento Cromossômico/métodos , Cromossomos Humanos Par 10 , Estudos de Coortes , Simulação por Computador , Diabetes Mellitus Tipo 2/genética , Frequência do Gene , Marcadores Genéticos , Variação Genética , Genótipo , Haplótipos , Humanos , Funções Verossimilhança , Desequilíbrio de Ligação , Modelos Logísticos , Estudos Longitudinais , Modelos Genéticos , Modelos Estatísticos , Reprodutibilidade dos Testes , Software , População Branca/genética
18.
Proc Natl Acad Sci U S A ; 105(17): 6362-7, 2008 Apr 29.
Artigo em Inglês | MEDLINE | ID: mdl-18430800

RESUMO

Regulation of gene expression is usually separated into cis and trans components. The separation may become artificial if much of the variation in expression is under multigenic and epistatic (e.g., cis-by-trans) control. There is hence a need to quantify the relative contribution of cis, trans, and cis-by-trans effects on expression divergence at different levels of evolution. To do so across the whole genome, we analyzed the full set of chromosome-substitution lines between the two behavioral races of Drosophila melanogaster. Our observations: (i) Only approximately 3% of the genes with an expression difference are purely cis regulated. In fact, relatively few genes are governed by simple genetics because nearly 80% of expression differences are controlled by at least two chromosomes. (ii) For 14% of the genes, cis regulation does play a role but usually in conjunction with trans regulation. This joint action of cis and trans effects, either additive or epistatic, is referred to as inclusive cis effect. (iii) The percentage of genes with inclusive cis effect increases to 32% among genes that are strongly differentiated between the two races. (iv) We observed a nonrandom distribution of trans-acting factors, with a substantial deficit on the second chromosome. Between Drosophila racial groups, trans regulation of expression difference is extensive, and cis regulation often evolves in conjunction with trans effects.


Assuntos
Cromossomos/genética , Drosophila melanogaster/genética , Regulação da Expressão Gênica , Animais , Genes de Insetos , Variação Genética , Modelos Genéticos
19.
Stat Appl Genet Mol Biol ; 6: Article5, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-17402920

RESUMO

Multi-color optical mapping is a new technique being developed to obtain detailed physical maps (indicating relative positions of various recognition sites) of DNA molecules. We consider a study design in which the data consist of noisy observations of multiple copies of a DNA molecule marked with colors at recognition sites. The primary goal is to estimate a physical map. A secondary goal is to estimate error rates associated with the experiment, which are potentially useful for analysis and refinement of the biochemical steps in the mapping procedure. We propose statistical models for various sources of error and use maximum likelihood estimation (MLE) to construct a physical map and estimate error rates. To overcome difficulties arising in the maximization process, a latent-variable Markov chain version of the model is proposed, and the EM algorithm is used for maximization. In addition, a simulated annealing procedure is applied to maximize the profile likelihood over the discrete space of sequences of colors. We apply the methods to simulated data on the bacteriophage lambda genome.


Assuntos
Cor , Funções Verossimilhança , Óptica e Fotônica , Bacteriófago lambda/genética , DNA Viral/química , Cadeias de Markov
20.
Genetics ; 168(4): 2349-61, 2004 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-15371359

RESUMO

When the classical chi(2) goodness-of-fit test for Hardy-Weinberg (HW) equilibrium is used on samples with related individuals, the type I error can be greatly inflated. In particular the test is inappropriate in population isolates where the individuals are related through multiple lines of descent. In this article, we propose a new test for HW (the QL-HW test) suitable for any sample with related individuals, including large inbred pedigrees, provided that their genealogy is known. Performed conditional on the pedigree structure, the QL-HW test detects departures from HW that are not due to the genealogy. Because the computation of the QL-HW test becomes intractable for very polymorphic loci in large inbred pedigrees, a simpler alternative, the GCC-HW test, is also proposed. The statistical properties of the QL-HW and GCC-HW tests are studied through simulations considering a sample of independent nuclear families, a sample of extended outbred genealogies, and samples from the Hutterite population, a North American highly inbred isolate. Finally, the method is used to test a set of 143 biallelic markers spanning 82 genes in this latter population.


Assuntos
Interpretação Estatística de Dados , Modelos Genéticos , Distribuição de Qui-Quadrado
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA