Search | VHL Search Portal

1.

JASPER: Fast, powerful, multitrait association testing in structured samples gives insight on pleiotropy in gene expression.

Mbatchou, Joelle; McPeek, Mary Sara.

Am J Hum Genet ; 111(8): 1750-1769, 2024 Aug 08.

Article in English | MEDLINE | ID: mdl-39025064

ABSTRACT

Joint association analysis of multiple traits with multiple genetic variants can provide insight into genetic architecture and pleiotropy, improve trait prediction, and increase power for detecting association. Furthermore, some traits are naturally high-dimensional, e.g., images, networks, or longitudinally measured traits. Assessing significance for multitrait genetic association can be challenging, especially when the sample has population sub-structure and/or related individuals. Failure to adequately adjust for sample structure can lead to power loss and inflated type 1 error, and commonly used methods for assessing significance can work poorly with a large number of traits or be computationally slow. We developed JASPER, a fast, powerful, robust method for assessing significance of multitrait association with a set of genetic variants, in samples that have population sub-structure, admixture, and/or relatedness. In simulations, JASPER has higher power, better type 1 error control, and faster computation than existing methods, with the power and speed advantage of JASPER increasing with the number of traits. JASPER is potentially applicable to a wide range of association testing applications, including for multiple disease traits, expression traits, image-derived traits, and microbiome abundances. It allows for covariates, ascertainment, and rare variants and is robust to phenotype model misspecification. We apply JASPER to analyze gene expression in the Framingham Heart Study, where, compared to alternative approaches, JASPER finds more significant associations, including several that indicate pleiotropic effects, most of which replicate previous results, while others have not previously been reported. Our results demonstrate the promise of JASPER for powerful multitrait analysis in structured samples.

Subject(s)

Genetic Pleiotropy , Humans , Genome-Wide Association Study/methods , Phenotype , Gene Expression/genetics , Computer Simulation , Models, Genetic , Quantitative Trait Loci , Polymorphism, Single Nucleotide

2.

BRASS: Permutation methods for binary traits in genetic association studies with structured samples.

Mbatchou, Joelle; Abney, Mark; McPeek, Mary Sara.

PLoS Genet ; 19(11): e1011020, 2023 Nov.

Article in English | MEDLINE | ID: mdl-37934792

ABSTRACT

In genetic association analysis of complex traits, permutation testing can be a valuable tool for assessing significance when the distribution of the test statistic is unknown or not well-approximated. This commonly arises, e.g, in tests of gene-set, pathway or genome-wide significance, or when the statistic is formed by machine learning or data adaptive methods. Existing applications include eQTL mapping, association testing with rare variants, inclusion of admixed individuals in genetic association analysis, and epistasis detection among many others. For genetic association testing in samples with population structure and/or relatedness, use of naive permutation can lead to inflated type 1 error. To address this in quantitative traits, the MVNpermute method was developed. However, for association mapping of a binary trait, the relationship between the mean and variance makes both naive permutation and the MVNpermute method invalid. We propose BRASS, a permutation method for binary traits, for use in association mapping in structured samples. In addition to modeling structure in the sample, BRASS allows for covariates, ascertainment and simultaneous testing of multiple markers, and it accommodates a wide range of test statistics. In simulation studies, we compare BRASS to other permutation and resampling-based methods in a range of scenarios that include population structure, familial relatedness, ascertainment and phenotype model misspecification. In these settings, we demonstrate the superior control of type 1 error by BRASS compared to the other 6 methods considered. We apply BRASS to assess genome-wide significance for association analyses in domestic dog for elbow dysplasia (ED) and idiopathic epilepsy (IE). For both traits we detect previously identified associations, and in addition, for ED, we detect significant association with a SNP on chromosome 35 that was not detected by previous analyses, demonstrating the potential of the method.

Subject(s)

Genetic Testing , Models, Genetic , Animals , Dogs , Phenotype , Genetic Association Studies , Computer Simulation , Genome-Wide Association Study , Polymorphism, Single Nucleotide/genetics

3.

Application of Equal Local Levels to Improve Q-Q Plot Testing Bands with R Package qqconf.

Weine, Eric; McPeek, Mary Sara; Abney, Mark.

J Stat Softw ; 106(10)2023.

Article in English | MEDLINE | ID: mdl-37205880

ABSTRACT

Quantile-Quantile (Q-Q) plots are often difficult to interpret because it is unclear how large the deviation from the theoretical distribution must be to indicate a lack of fit. Most Q-Q plots could benefit from the addition of meaningful global testing bands, but the use of such bands unfortunately remains rare because of the drawbacks of current approaches and packages. These drawbacks include incorrect global Type I error rate, lack of power to detect deviations in the tails of the distribution, relatively slow computation for large data sets, and limited applicability. To solve these problems, we apply the equal local levels global testing method, which we have implemented in the R Package qqconf, a versatile tool to create Q-Q plots and probability-probability (P-P) plots in a wide variety of settings, with simultaneous testing bands rapidly created using recently-developed algorithms. qqconf can easily be used to add global testing bands to Q-Q plots made by other packages. In addition to being quick to compute, these bands have a variety of desirable properties, including accurate global levels, equal sensitivity to deviations in all parts of the null distribution (including the tails), and applicability to a range of null distributions. We illustrate the use of qqconf in several applications: assessing normality of residuals from regression, assessing accuracy of p values, and use of Q-Q plots in genome-wide association studies.

4.

L-GATOR: Genetic Association Testing for a Longitudinally Measured Quantitative Trait in Samples with Related Individuals.

Wu, Xiaowei; McPeek, Mary Sara.

Am J Hum Genet ; 102(4): 574-591, 2018 04 05.

Article in English | MEDLINE | ID: mdl-29625022

ABSTRACT

In complex-trait mapping, when each subject has multiple measurements of a quantitative trait over time, power for detecting genetic association can be gained by the inclusion of all measurements and not just single time points or averages in the analysis. To increase power and control type 1 error, one should account for dependence among observations for a single individual as well as dependence between observations of related individuals if they are present in the sample. We propose L-GATOR, a retrospective, mixed-effects method for association mapping of longitudinally measured traits in samples with related individuals. L-GATOR allows arbitrary time points for different individuals, incorporates both time-varying and static covariates, and properly addresses various types of dependence. In simulations, we show that L-GATOR outperforms existing prospective methods in terms of both type 1 error and power when there is phenotype model misspecification or missing data. Compared with the previously proposed longGWAS method, L-GATOR was more than ten times faster for association testing in our simulations and almost 100 times faster for parameter estimation. L-GATOR is applicable to essentially arbitrary combinations of related and unrelated individuals, including small families as well as large, complex pedigrees. We apply the method to data from the Framingham Heart Study to identify association between longitudinal systolic blood pressure measurements and genome-wide SNPs. Of the smallest p values, one-third occur in or near genes that have been previously identified as associated with pulse pressure (such as PIK3CG) and systolic and diastolic blood pressure (such as C10orf107), showing that L-GATOR is able to prioritize relevant loci in a genome screen.

Subject(s)

Genome-Wide Association Study , Quantitative Trait Loci/genetics , Software , Blood Pressure/genetics , Cohort Studies , Female , Humans , Male , Models, Genetic , Phenotype , Systole/genetics

5.

Two-way mixed-effects methods for joint association analysis using both host and pathogen genomes.

Wang, Miaoyan; Roux, Fabrice; Bartoli, Claudia; Huard-Chauveau, Carine; Meyer, Christopher; Lee, Hana; Roby, Dominique; McPeek, Mary Sara; Bergelson, Joy.

Proc Natl Acad Sci U S A ; 115(24): E5440-E5449, 2018 06 12.

Article in English | MEDLINE | ID: mdl-29848634

ABSTRACT

Infectious diseases are often affected by specific pairings of hosts and pathogens and therefore by both of their genomes. The integration of a pair of genomes into genome-wide association mapping can provide an exquisitely detailed view of the genetic landscape of complex traits. We present a statistical method, ATOMM (Analysis with a Two-Organism Mixed Model), that maps a trait of interest to a pair of genomes simultaneously; this method makes use of whole-genome sequence data for both host and pathogen organisms. ATOMM uses a two-way mixed-effect model to test for genetic associations and cross-species genetic interactions while accounting for sample structure including interactions between the genetic backgrounds of the two organisms. We demonstrate the applicability of ATOMM to a joint association study of quantitative disease resistance (QDR) in the Arabidopsis thaliana-Xanthomonas arboricola pathosystem. Our method uncovers a clear host-strain specificity in QDR and provides a powerful approach to identify genetic variants on both genomes that contribute to phenotypic variation.

Subject(s)

Arabidopsis/genetics , Genome/genetics , Host-Pathogen Interactions/genetics , Chromosome Mapping/methods , Disease Resistance/genetics , Genetic Variation/genetics , Genome-Wide Association Study/methods , Phenotype , Quantitative Trait Loci/genetics , Xanthomonas/genetics

6.

Retrospective Binary-Trait Association Test Elucidates Genetic Architecture of Crohn Disease.

Jiang, Duo; Zhong, Sheng; McPeek, Mary Sara.

Am J Hum Genet ; 98(2): 243-55, 2016 Feb 04.

Article in English | MEDLINE | ID: mdl-26833331

ABSTRACT

In genetic association testing, failure to properly control for population structure can lead to severely inflated type 1 error and power loss. Meanwhile, adjustment for relevant covariates is often desirable and sometimes necessary to protect against spurious association and to improve power. Many recent methods to account for population structure and covariates are based on linear mixed models (LMMs), which are primarily designed for quantitative traits. For binary traits, however, LMM is a misspecified model and can lead to deteriorated performance. We propose CARAT, a binary-trait association testing approach based on a mixed-effects quasi-likelihood framework, which exploits the dichotomous nature of the trait and achieves computational efficiency through estimating equations. We show in simulation studies that CARAT consistently outperforms existing methods and maintains high power in a wide range of population structure settings and trait models. Furthermore, CARAT is based on a retrospective approach, which is robust to misspecification of the phenotype model. We apply our approach to a genome-wide analysis of Crohn disease, in which we replicate association with 17 previously identified regions. Moreover, our analysis on 5p13.1, an extensively reported region of association, shows evidence for the presence of multiple independent association signals in the region. This example shows how CARAT can leverage known disease risk factors to shed light on the genetic architecture of complex traits.

Subject(s)

Crohn Disease/genetics , Genetic Testing/methods , Adult , Female , Genetic Association Studies , Humans , Linear Models , Male , Middle Aged , Models, Genetic , Phenotype , Polymorphism, Single Nucleotide , Population Groups/genetics , Quantitative Trait, Heritable , Retrospective Studies , Young Adult

7.

CERAMIC: Case-Control Association Testing in Samples with Related Individuals, Based on Retrospective Mixed Model Analysis with Adjustment for Covariates.

Zhong, Sheng; Jiang, Duo; McPeek, Mary Sara.

PLoS Genet ; 12(10): e1006329, 2016 Oct.

Article in English | MEDLINE | ID: mdl-27695091

ABSTRACT

We consider the problem of genetic association testing of a binary trait in a sample that contains related individuals, where we adjust for relevant covariates and allow for missing data. We propose CERAMIC, an estimating equation approach that can be viewed as a hybrid of logistic regression and linear mixed-effects model (LMM) approaches. CERAMIC extends the recently proposed CARAT method to allow samples with related individuals and to incorporate partially missing data. In simulations, we show that CERAMIC outperforms existing LMM and generalized LMM approaches, maintaining high power and correct type 1 error across a wider range of scenarios. CERAMIC results in a particularly large power increase over existing methods when the sample includes related individuals with some missing data (e.g., when some individuals with phenotype and covariate information have missing genotype), because CERAMIC is able to make use of the relationship information to incorporate partially missing data in the analysis while correcting for dependence. Because CERAMIC is based on a retrospective analysis, it is robust to misspecification of the phenotype model, resulting in better control of type 1 error and higher power than that of prospective methods, such as GMMAT, when the phenotype model is misspecified. CERAMIC is computationally efficient for genomewide analysis in samples of related individuals of almost any configuration, including small families, unrelated individuals and even large, complex pedigrees. We apply CERAMIC to data on type 2 diabetes (T2D) from the Framingham Heart Study. In a genome scan, 9 of the 10 smallest CERAMIC p-values occur in or near either known T2D susceptibility loci or plausible candidates, verifying that CERAMIC is able to home in on the important loci in a genome scan.

Subject(s)

Diabetes Mellitus, Type 2/genetics , Genetic Association Studies , Polymorphism, Single Nucleotide/genetics , Quantitative Trait Loci/genetics , Computer Simulation , Diabetes Mellitus, Type 2/pathology , Genetic Testing , Genotype , Humans , Logistic Models , Models, Genetic , Pedigree , Phenotype

8.

G-STRATEGY: Optimal Selection of Individuals for Sequencing in Genetic Association Studies.

Wang, Miaoyan; Jakobsdottir, Johanna; Smith, Albert V; McPeek, Mary Sara.

Genet Epidemiol ; 40(6): 446-60, 2016 09.

Article in English | MEDLINE | ID: mdl-27256766

ABSTRACT

In a large-scale genetic association study, the number of phenotyped individuals available for sequencing may, in some cases, be greater than the study's sequencing budget will allow. In that case, it can be important to prioritize individuals for sequencing in a way that optimizes power for association with the trait. Suppose a cohort of phenotyped individuals is available, with some subset of them possibly already sequenced, and one wants to choose an additional fixed-size subset of individuals to sequence in such a way that the power to detect association is maximized. When the phenotyped sample includes related individuals, power for association can be gained by including partial information, such as phenotype data of ungenotyped relatives, in the analysis, and this should be taken into account when assessing whom to sequence. We propose G-STRATEGY, which uses simulated annealing to choose a subset of individuals for sequencing that maximizes the expected power for association. In simulations, G-STRATEGY performs extremely well for a range of complex disease models and outperforms other strategies with, in many cases, relative power increases of 20-40% over the next best strategy, while maintaining correct type 1 error. G-STRATEGY is computationally feasible even for large datasets and complex pedigrees. We apply G-STRATEGY to data on high-density lipoprotein and low-density lipoprotein from the AGES-Reykjavik and REFINE-Reykjavik studies, in which G-STRATEGY is able to closely approximate the power of sequencing the full sample by selecting for sequencing a only small subset of the individuals.

Subject(s)

Genetic Association Studies , Software , Genotype , Humans , Polymorphism, Single Nucleotide , Quantitative Trait Loci

9.

MASTOR: mixed-model association mapping of quantitative traits in samples with related individuals.

Jakobsdottir, Johanna; McPeek, Mary Sara.

Am J Hum Genet ; 92(5): 652-66, 2013 May 02.

Article in English | MEDLINE | ID: mdl-23643379

ABSTRACT

Genetic association studies often sample individuals with known familial relationships in addition to unrelated individuals, and it is common for some individuals to have missing data (phenotypes, genotypes, or covariates). When some individuals in a sample are related, power can be gained by incorporating all individuals in the analysis, including individuals with partially missing data, while properly accounting for the dependence among them. We propose MASTOR, a mixed-model, retrospective score test for genetic association with a quantitative trait. MASTOR achieves high power in samples that contain related individuals by making full use of the relationship information to incorporate partially missing data in the analysis while correcting for dependence. Individuals with available phenotype and covariate information who are not genotyped but have genotyped relatives in the sample can still contribute to the association analysis because of the dependence among genotypes. Similarly, individuals who are genotyped but are missing covariate or phenotype information can contribute to the analysis. MASTOR is valid even when the phenotype model is misspecified and with either random or phenotype-based ascertainment. In simulations, we demonstrate the correct type 1 error of MASTOR, the increase in power that comes from making full use of the relationship information, the robustness to misspecification of the phenotype model, and the improvement in power that comes from modeling the heritability. We show that MASTOR is computationally feasible and practical in genome-wide association studies. We apply MASTOR to data on high-density lipoprotein cholesterol from the Framingham Heart study.

Subject(s)

Genetic Association Studies/methods , Inheritance Patterns/genetics , Models, Genetic , Phenotype , Quantitative Trait, Heritable , Software , Computer Simulation , Humans

10.

Retrospective Association Analysis of Binary Traits: Overcoming Some Limitations of the Additive Polygenic Model.

Jiang, Duo; Mbatchou, Joelle; McPeek, Mary Sara.

Hum Hered ; 80(4): 187-95, 2015.

Article in English | MEDLINE | ID: mdl-27576759

ABSTRACT

Case-control genetic association analysis is an extremely common tool in human complex trait mapping. From a statistical point of view, the analysis of binary traits poses somewhat different challenges from the analysis of quantitative traits. Desirable features of a binary trait mapping approach would include (1) phenotype modeled as binary, with appropriate dependence between the mean and variance; (2) appropriate correction for relevant covariates; (3) appropriate correction for sample structure of various types, including related individuals, admixture and other types of population structure; (4) both fast and accurate computations; (5) robustness to ascertainment and other types of phenotype model misspecification, and (6) ability to leverage partially missing data to increase power. We review these challenges and argue, both theoretically and in simulations, for the value of retrospective association analysis as a way to overcome some of the limitations of the phenotype model, including model misspecification due to ascertainment. We give an overview of two recent retrospective methods, CARAT and CERAMIC, that are designed to meet criteria 1-6.

Subject(s)

Models, Genetic , Multifactorial Inheritance , Computer Simulation , Humans , Phenotype , Quantitative Trait, Heritable , Retrospective Studies

11.

Robust rare variant association testing for quantitative traits in samples with related individuals.

Jiang, Duo; McPeek, Mary Sara.

Genet Epidemiol ; 38(1): 10-20, 2014 Jan.

Article in English | MEDLINE | ID: mdl-24248908

ABSTRACT

The recent development of high-throughput sequencing technologies calls for powerful statistical tests to detect rare genetic variants associated with complex human traits. Sampling related individuals in sequencing studies offers advantages over sampling unrelated individuals only, including improved protection against sequencing error, the ability to use imputation to make more efficient use of sequence data, and the possibility of power boost due to more observed copies of extremely rare alleles among relatives. With related individuals, familial correlation needs to be accounted for to ensure correct control over type I error and to improve power. Recognizing the limitations of existing rare-variant association tests for family data, we propose MONSTER (Minimum P-value Optimized Nuisance parameter Score Test Extended to Relatives), a robust rare-variant association test, which generalizes the SKAT-O method for independent samples. MONSTER uses a mixed effects model that accounts for covariates and additive polygenic effects. To obtain a powerful test, MONSTER adaptively adjusts to the unknown configuration of effects of rare-variant sites. MONSTER also offers an analytical way of assessing P-values, which is desirable because permutation is not straightforward to conduct in related samples. In simulation studies, we demonstrate that MONSTER effectively accounts for family structure, is computationally efficient and compares very favorably, in terms of power, to previously proposed tests that allow related individuals. We apply MONSTER to an analysis of high-density lipoprotein cholesterol in the Framingham Heart Study, where we are able to replicate association with three genes.

Subject(s)

Family , Genetic Association Studies/methods , Genetic Variation/genetics , Quantitative Trait Loci/genetics , Adolescent , Adult , Aged , Alleles , Child , Cholesterol, HDL/genetics , Computer Simulation , Female , Health Surveys , Heart , Heredity/genetics , Humans , Male , Middle Aged , Models, Genetic , Multifactorial Inheritance/genetics , Pedigree , Phenotype , Polymorphism, Single Nucleotide/genetics , Research Design , Young Adult

12.

Overcoming the "feast or famine" effect: improved interaction testing in genome-wide association studies.

Zhou, Huanlin; McPeek, Mary Sara.

bioRxiv ; 2024 Feb 15.

Article in English | MEDLINE | ID: mdl-38405994

ABSTRACT

In genetic association analysis of complex traits, detection of interaction (either GxG or GxE) can help to elucidate the genetic architecture and biological mechanisms underlying the trait. Detection of interaction in a genome-wide association study (GWAS) can be methodologically challenging for various reasons, including a high burden of multiple comparisons when testing for epistasis between all possible pairs of a set of genomewide variants, as well as heteroscedasticity effects occurring in the presence of GxG or GxE interaction. In this paper, we address the problem of an even more striking phenomenon that we call the "feast or famine" effect that occurs when testing interaction in a genomewide context. As we verify, even in a simplified setting in which there is no interaction at all (and so no heteroscedasticity), in a GWAS to detect GxG or GxE interaction with a fixed genetic variant or environmental factor, the distribution of the genome-wide p-values under the null hypothesis is not the i.i.d. uniform one that is commonly assumed. Using standard methods, even if all SNPs are independent, some GWASs will have systematically underinflated p-values ("feast"), and others will have systematically overinflated p-values ("famine"), which can lead to false detection of interaction, reduced power, inconsistent results across studies, and failure to replicate true signal. This startling phenomenon is specific to detection of interaction in a GWAS, and it may partly explain why such detection has so far proved challenging and difficult to replicate. We show theoretically that the key cause of this phenomenon is which variables are conditioned on in the analysis, and this suggests an approach to correct the problem by changing the way the conditioning is done. Using this insight, we have developed the TINGA method to adjust the interaction test statistics to make their p-values closer to uniform under the null hypothesis. In simulations we show that TINGA both controls type 1 error and improves power. TINGA allows for covariates and population structure through use of a linear mixed model and accounts for heteroscedasticity. We apply TINGA to detection of epistasis in a study of flowering time in Arabidopsis thaliana.

13.

ADELLE: A global testing method for Trans-eQTL mapping.

Akinbiyi, Takintayo; McPeek, Mary Sara; Abney, Mark.

bioRxiv ; 2024 Jul 16.

Article in English | MEDLINE | ID: mdl-38464248

ABSTRACT

Understanding the genetic regulatory mechanisms of gene expression is a challenging and ongoing problem. Genetic variants that are associated with expression levels are readily identified when they are proximal to the gene (i.e., cis-eQTLs), but SNPs distant from the gene whose expression levels they are associated with (i.e., trans-eQTLs) have been much more difficult to discover, even though they account for a majority of the heritability in gene expression levels. A major impediment to the identification of more trans-eQTLs is the lack of statistical methods that are powerful enough to overcome the obstacles of small effect sizes and large multiple testing burden of trans-eQTL mapping. Here, we propose ADELLE, a powerful statistical testing framework that requires only summary statistics and is designed to be most sensitive to SNPs that are associated with multiple gene expression levels, a characteristic of many trans-eQTLs. In simulations, we show that for detecting SNPs that are associated with 0.1%-2% of 10,000 traits, among the 7 methods we consider ADELLE is clearly the most powerful overall, with either the highest power or power not significantly different from the highest for all settings in that range. We apply ADELLE to a mouse advanced intercross line data set and show its ability to find trans-eQTLs that were not significant under a standard analysis. This demonstrates that ADELLE is a powerful tool at uncovering trans regulators of genetic expression.

14.

XM: association testing on the X-chromosome in case-control samples with related individuals.

Thornton, Timothy; Zhang, Qian; Cai, Xiaochen; Ober, Carole; McPeek, Mary Sara.

Genet Epidemiol ; 36(5): 438-50, 2012 Jul.

Article in English | MEDLINE | ID: mdl-22552845

ABSTRACT

Genetic variants on the X-chromosome could potentially play an important role in some complex traits. However, development of methods for detecting association with X-linked markers has lagged behind that for autosomal markers. We propose methods for case-control association testing with X-chromosome markers in samples with related individuals. Our method, XM, appropriately adjusts for both correlation among relatives and male-female allele copy number differences. Features of XM include: (1) it is applicable to and computationally feasible for completely general combinations of family and case-control designs; (2) it allows for both unaffected controls and controls of unknown phenotype to be included in the same analysis; (3) it can incorporate phenotype information on relatives with missing genotype data; and (4) it adjusts for sex-specific trait prevalence values. We propose two other tests, Xχ and XW, which can also be useful in certain contexts. We derive the best linear unbiased estimator of allele frequency, and its variance, for X-linked markers. In simulation studies with related individuals, we demonstrate the power and validity of the proposed methods. We apply the methods to X-chromosome association analysis of (1) asthma in a Hutterite sample and (2) alcohol dependence in the GAW 14 COGA data. In analysis (1), we demonstrate computational feasibility of XM and the applicability of our robust variance estimator. In analysis (2), we detect significant association, after Bonferroni correction, between alcohol dependence and single nucleotide polymorphism rs979606 in the monoamine oxidases A gene, where this gene has previously been found to be associated with substance abuse and antisocial behavior.

Subject(s)

Case-Control Studies , Chromosomes, Human, X/genetics , Genes, X-Linked , Alleles , Chromosome Mapping/methods , Family Health , Female , Gene Frequency , Genotype , Haplotypes , Humans , Male , Models, Genetic , Models, Statistical , Monoamine Oxidase/genetics , Phenotype , Polymorphism, Single Nucleotide

15.

ROADTRIPS: case-control association testing with partially or completely unknown population and pedigree structure.

Thornton, Timothy; McPeek, Mary Sara.

Am J Hum Genet ; 86(2): 172-84, 2010 Feb 12.

Article in English | MEDLINE | ID: mdl-20137780

ABSTRACT

Genome-wide association studies are routinely conducted to identify genetic variants that influence complex disorders. It is well known that failure to properly account for population or pedigree structure can lead to spurious association as well as reduced power. We propose a method, ROADTRIPS, for case-control association testing in samples with partially or completely unknown population and pedigree structure. ROADTRIPS uses a covariance matrix estimated from genome-screen data to correct for unknown population and pedigree structure while maintaining high power by taking advantage of known pedigree information when it is available. ROADTRIPS can incorporate data on arbitrary combinations of related and unrelated individuals and is computationally feasible for the analysis of genetic studies with millions of markers. In simulations with related individuals and population structure, including admixture, we demonstrate that ROADTRIPS provides a substantial improvement over existing methods in terms of power and type 1 error. The ROADTRIPS method can be used across a variety of study designs, ranging from studies that have a combination of unrelated individuals and small pedigrees to studies of isolated founder populations with partially known or completely unknown pedigrees. We apply the method to analyze two data sets: a study of rheumatoid arthritis in small UK pedigrees, from Genetic Analysis Workshop 15, and data from the Collaborative Study of the Genetics of Alcoholism on alcohol dependence in a sample of moderate-size pedigrees of European descent, from Genetic Analysis Workshop 14. We detect genome-wide significant association, after Bonferroni correction, in both studies.

Subject(s)

Genetics, Population , Genome-Wide Association Study/methods , Pedigree , Software , Alcoholism/genetics , Arthritis, Rheumatoid/genetics , Case-Control Studies , Computer Simulation , Female , Humans , Male , Polymorphism, Single Nucleotide/genetics , Population Dynamics , Time Factors

16.

JASPER: fast, powerful, multitrait association testing in structured samples gives insight on pleiotropy in gene expression.

Mbatchou, Joelle; McPeek, Mary Sara.

bioRxiv ; 2023 Dec 19.

Article in English | MEDLINE | ID: mdl-38187553

ABSTRACT

Joint association analysis of multiple traits with multiple genetic variants can provide insight into genetic architecture and pleiotropy, improve trait prediction and increase power for detecting association. Furthermore, some traits are naturally high-dimensional, e.g., images, networks or longitudinally measured traits. Assessing significance for multitrait genetic association can be challenging, especially when the sample has population sub-structure and/or related individuals. Failure to adequately adjust for sample structure can lead to power loss and inflated type 1 error, and commonly used methods for assessing significance can work poorly with a large number of traits or be computationally slow. We developed JASPER, a fast, powerful, robust method for assessing significance of multitrait association with a set of genetic variants, in samples that have population sub-structure, admixture and/or relatedness. In simulations, JASPER has higher power, better type 1 error control, and faster computation than existing methods, with the power and speed advantage of JASPER increasing with the number of traits. JASPER is potentially applicable to a wide range of association testing applications, including for multiple disease traits, expression traits, image-derived traits and microbiome abundances. It allows for covariates, ascertainment and rare variants and is robust to phenotype model misspecification. We apply JASPER to analyze gene expression in the Framingham Heart Study, where, compared to alternative approaches, JASPER finds more significant associations, including several that indicate pleiotropic effects, some of which replicate previous results, while others have not previously been reported. Our results demonstrate the promise of JASPER for powerful multitrait analysis in structured samples.

17.

ATRIUM: testing untyped SNPs in case-control association studies with related individuals.

Wang, Zuoheng; McPeek, Mary Sara.

Am J Hum Genet ; 85(5): 667-78, 2009 Nov.

Article in English | MEDLINE | ID: mdl-19913122

ABSTRACT

In genome-wide association studies, only a subset of all genomic variants are typed by current, high-throughput, SNP-genotyping platforms. However, many of the untyped variants can be well predicted from typed variants, with linkage disequilibrium (LD) information among typed and untyped variants available from an external reference panel such as HapMap. Incorporation of such external information can allow one to perform tests of association between untyped variants and phenotype, thereby making more efficient use of the available genotype data. When related individuals are included in case-control samples, the dependence among their genotypes must be properly addressed for valid association testing. In the context of testing untyped variants, an additional analytical challenge is that the dependence, across related individuals, of the partial information on untyped-SNP genotypes must also be assessed and incorporated into the analysis for valid inference. We address this challenge with ATRIUM, a method for case-control association testing with untyped SNPs, based on genome screen data in samples in which some individuals are related. ATRIUM uses LD information from an external reference panel to specify a one-degree-of-freedom test of association with an untyped SNP. It properly accounts for dependence in the partial information on untyped-SNP genotypes across related individuals. We demonstrate that ATRIUM is robust in that it maintains the nominal type I error rate even when the external reference panel is not well matched to the case-control sample. We apply the method to detect association between type 2 diabetes and variants on chromosome 10 in the Framingham SHARe data.

Subject(s)

Genome, Human , Genome-Wide Association Study , Nuclear Family , Polymorphism, Single Nucleotide/genetics , Alleles , Asian People/genetics , Black People/genetics , Case-Control Studies , Chromosome Mapping/methods , Chromosomes, Human, Pair 10 , Cohort Studies , Computer Simulation , Diabetes Mellitus, Type 2/genetics , Gene Frequency , Genetic Markers , Genetic Variation , Genotype , Haplotypes , Humans , Likelihood Functions , Linkage Disequilibrium , Logistic Models , Longitudinal Studies , Models, Genetic , Models, Statistical , Reproducibility of Results , Software , White People/genetics

18.

Complex genetic interactions underlying expression differences between Drosophila races: analysis of chromosome substitutions.

Wang, Hurng-Yi; Fu, Yonggui; McPeek, Mary Sara; Lu, Xuemei; Nuzhdin, Sergey; Xu, Anlong; Lu, Jian; Wu, Mao-Lien; Wu, Chung-I.

Proc Natl Acad Sci U S A ; 105(17): 6362-7, 2008 Apr 29.

Article in English | MEDLINE | ID: mdl-18430800

ABSTRACT

Regulation of gene expression is usually separated into cis and trans components. The separation may become artificial if much of the variation in expression is under multigenic and epistatic (e.g., cis-by-trans) control. There is hence a need to quantify the relative contribution of cis, trans, and cis-by-trans effects on expression divergence at different levels of evolution. To do so across the whole genome, we analyzed the full set of chromosome-substitution lines between the two behavioral races of Drosophila melanogaster. Our observations: (i) Only approximately 3% of the genes with an expression difference are purely cis regulated. In fact, relatively few genes are governed by simple genetics because nearly 80% of expression differences are controlled by at least two chromosomes. (ii) For 14% of the genes, cis regulation does play a role but usually in conjunction with trans regulation. This joint action of cis and trans effects, either additive or epistatic, is referred to as inclusive cis effect. (iii) The percentage of genes with inclusive cis effect increases to 32% among genes that are strongly differentiated between the two races. (iv) We observed a nonrandom distribution of trans-acting factors, with a substantial deficit on the second chromosome. Between Drosophila racial groups, trans regulation of expression difference is extensive, and cis regulation often evolves in conjunction with trans effects.

Subject(s)

Chromosomes/genetics , Drosophila melanogaster/genetics , Gene Expression Regulation , Animals , Genes, Insect , Genetic Variation , Models, Genetic

19.

Likelihood-based inference for multi-color optical mapping.

Tong, Liping; Mets, Laurens; McPeek, Mary Sara.

Stat Appl Genet Mol Biol ; 6: Article5, 2007.

Article in English | MEDLINE | ID: mdl-17402920

ABSTRACT

Multi-color optical mapping is a new technique being developed to obtain detailed physical maps (indicating relative positions of various recognition sites) of DNA molecules. We consider a study design in which the data consist of noisy observations of multiple copies of a DNA molecule marked with colors at recognition sites. The primary goal is to estimate a physical map. A secondary goal is to estimate error rates associated with the experiment, which are potentially useful for analysis and refinement of the biochemical steps in the mapping procedure. We propose statistical models for various sources of error and use maximum likelihood estimation (MLE) to construct a physical map and estimate error rates. To overcome difficulties arising in the maximization process, a latent-variable Markov chain version of the model is proposed, and the EM algorithm is used for maximization. In addition, a simulated annealing procedure is applied to maximize the profile likelihood over the discrete space of sequences of colors. We apply the methods to simulated data on the bacteriophage lambda genome.

Subject(s)

Color , Likelihood Functions , Optics and Photonics , Bacteriophage lambda/genetics , DNA, Viral/chemistry , Markov Chains

20.

Testing for Hardy-Weinberg equilibrium in samples with related individuals.

Bourgain, Catherine; Abney, Mark; Schneider, Daniel; Ober, Carole; McPeek, Mary Sara.

Genetics ; 168(4): 2349-61, 2004 Dec.

Article in English | MEDLINE | ID: mdl-15371359

ABSTRACT

When the classical chi(2) goodness-of-fit test for Hardy-Weinberg (HW) equilibrium is used on samples with related individuals, the type I error can be greatly inflated. In particular the test is inappropriate in population isolates where the individuals are related through multiple lines of descent. In this article, we propose a new test for HW (the QL-HW test) suitable for any sample with related individuals, including large inbred pedigrees, provided that their genealogy is known. Performed conditional on the pedigree structure, the QL-HW test detects departures from HW that are not due to the genealogy. Because the computation of the QL-HW test becomes intractable for very polymorphic loci in large inbred pedigrees, a simpler alternative, the GCC-HW test, is also proposed. The statistical properties of the QL-HW and GCC-HW tests are studied through simulations considering a sample of independent nuclear families, a sample of extended outbred genealogies, and samples from the Hutterite population, a North American highly inbred isolate. Finally, the method is used to test a set of 143 biallelic markers spanning 82 genes in this latter population.

Subject(s)

Data Interpretation, Statistical , Models, Genetic , Chi-Square Distribution

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL