Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 80
Filter
Add more filters

Publication year range
1.
PLoS Genet ; 20(5): e1011245, 2024 May.
Article in English | MEDLINE | ID: mdl-38728360

ABSTRACT

Joint analysis of multiple correlated phenotypes for genome-wide association studies (GWAS) can identify and interpret pleiotropic loci which are essential to understand pleiotropy in diseases and complex traits. Meanwhile, constructing a network based on associations between phenotypes and genotypes provides a new insight to analyze multiple phenotypes, which can explore whether phenotypes and genotypes might be related to each other at a higher level of cellular and organismal organization. In this paper, we first develop a bipartite signed network by linking phenotypes and genotypes into a Genotype and Phenotype Network (GPN). The GPN can be constructed by a mixture of quantitative and qualitative phenotypes and is applicable to binary phenotypes with extremely unbalanced case-control ratios in large-scale biobank datasets. We then apply a powerful community detection method to partition phenotypes into disjoint network modules based on GPN. Finally, we jointly test the association between multiple phenotypes in a network module and a single nucleotide polymorphism (SNP). Simulations and analyses of 72 complex traits in the UK Biobank show that multiple phenotype association tests based on network modules detected by GPN are much more powerful than those without considering network modules. The newly proposed GPN provides a new insight to investigate the genetic architecture among different types of phenotypes. Multiple phenotypes association studies based on GPN are improved by incorporating the genetic information into the phenotype clustering. Notably, it might broaden the understanding of genetic architecture that exists between diagnoses, genes, and pleiotropy.


Subject(s)
Genome-Wide Association Study , Genotype , Phenotype , Polymorphism, Single Nucleotide , Humans , Genome-Wide Association Study/methods , Polymorphism, Single Nucleotide/genetics , Models, Genetic , Genetic Pleiotropy , Genetic Association Studies/methods , Quantitative Trait Loci/genetics
2.
Genet Epidemiol ; 47(2): 185-197, 2023 03.
Article in English | MEDLINE | ID: mdl-36691904

ABSTRACT

In genome-wide association studies (GWAS) for thousands of phenotypes in biobanks, most binary phenotypes have substantially fewer cases than controls. Many widely used approaches for joint analysis of multiple phenotypes produce inflated type I error rates for such extremely unbalanced case-control phenotypes. In this research, we develop a method to jointly analyze multiple unbalanced case-control phenotypes to circumvent this issue. We first group multiple phenotypes into different clusters based on a hierarchical clustering method, then we merge phenotypes in each cluster into a single phenotype. In each cluster, we use the saddlepoint approximation to estimate the p value of an association test between the merged phenotype and a single nucleotide polymorphism (SNP) which eliminates the issue of inflated type I error rate of the test for extremely unbalanced case-control phenotypes. Finally, we use the Cauchy combination method to obtain an integrated p value for all clusters to test the association between multiple phenotypes and a SNP. We use extensive simulation studies to evaluate the performance of the proposed approach. The results show that the proposed approach can control type I error rate very well and is more powerful than other available methods. We also apply the proposed approach to phenotypes in category IX (diseases of the circulatory system) in the UK Biobank. We find that the proposed approach can identify more significant SNPs than the other viable methods we compared with.


Subject(s)
Genome-Wide Association Study , Models, Genetic , Humans , Genome-Wide Association Study/methods , Phenotype , Case-Control Studies , Polymorphism, Single Nucleotide
3.
Bioinformatics ; 39(12)2023 12 01.
Article in English | MEDLINE | ID: mdl-37991852

ABSTRACT

MOTIVATION: Genome-wide association studies is an essential tool for analyzing associations between phenotypes and single nucleotide polymorphisms (SNPs). Most of binary phenotypes in large biobanks are extremely unbalanced, which leads to inflated type I error rates for many widely used association tests for joint analysis of multiple phenotypes. In this article, we first propose a novel method to construct a Multi-Layer Network (MLN) using individuals with at least one case status among all phenotypes. Then, we introduce a computationally efficient community detection method to group phenotypes into disjoint clusters based on the MLN. Finally, we propose a novel approach, MLN with Omnibus (MLN-O), to jointly analyse the association between phenotypes and a SNP. MLN-O uses the score test to test the association of each merged phenotype in a cluster and a SNP, then uses the Omnibus test to obtain an overall test statistic to test the association between all phenotypes and a SNP. RESULTS: We conduct extensive simulation studies to reveal that the proposed approach can control type I error rates and is more powerful than some existing methods. Meanwhile, we apply the proposed method to a real data set in the UK Biobank. Using phenotypes in Chapter XIII (Diseases of the musculoskeletal system and connective tissue) in the UK Biobank, we find that MLN-O identifies more significant SNPs than other methods we compare with. AVAILABILITY AND IMPLEMENTATION: https://github.com/Hongjing-Xie/Multi-Layer-Network-with-Omnibus-MLN-O.


Subject(s)
Genome-Wide Association Study , Polymorphism, Single Nucleotide , Humans , Genome-Wide Association Study/methods , Phenotype , Case-Control Studies , Computer Simulation
4.
Osteoporos Int ; 35(5): 785-794, 2024 May.
Article in English | MEDLINE | ID: mdl-38246971

ABSTRACT

Hip fracture risk assessment is an important but challenging task. Quantitative CT-based patient-specific finite element (FE) analysis (FEA) incorporates bone geometry and bone density in the proximal femur. We developed a global FEA-computed fracture risk index to increase the prediction accuracy of hip fracture incidence. PURPOSE: Quantitative CT-based patient-specific finite element (FE) analysis (FEA) incorporates bone geometry and bone density in the proximal femur to compute the force (fracture load) and energy necessary to break the proximal femur in a particular loading condition. The fracture loads and energies-to-failure are individually associated with incident hip fracture, and provide different structural information about the proximal femur. METHODS: We used principal component analysis (PCA) to develop a global FEA-computed fracture risk index that incorporates the FEA-computed yield and ultimate failure loads and energies-to-failure in four loading conditions of 110 hip fracture subjects and 235 age- and sex-matched control subjects from the AGES-Reykjavik study. Using a logistic regression model, we compared the prediction performance for hip fracture based on the stratified resampling. RESULTS: We referred the first principal component (PC1) of the FE parameters as the global FEA-computed fracture risk index, which was the significant predictor of hip fracture (p-value < 0.001). The area under the receiver operating characteristic curve (AUC) using PC1 (0.776) was higher than that using all FE parameters combined (0.737) in the males (p-value < 0.001). CONCLUSIONS: The global FEA-computed fracture risk index increased hip fracture risk prediction accuracy in males.


Subject(s)
Hip Fractures , Proximal Femoral Fractures , Male , Humans , Hip Fractures/epidemiology , Hip Fractures/etiology , Bone Density , Femur/diagnostic imaging , ROC Curve , Finite Element Analysis
5.
Genet Epidemiol ; 46(8): 604-614, 2022 12.
Article in English | MEDLINE | ID: mdl-35766057

ABSTRACT

Over the past years, genome-wide association studies (GWAS) have generated a wealth of new information. Summary data from many GWAS are now publicly available, promoting the development of many statistical methods for association studies based on GWAS summary statistics, which avoids the increasing challenges associated with individual-level genotype and phenotype data sharing. However, for population-based association studies such as GWAS, it has been long recognized that population stratification can seriously confound association results. For large GWAS, it is very likely that there exist population stratification and cryptic relatedness, which will result in inflated Type I error in association testing. Although many methods have been developed to control for population stratification, only two of these approaches can be used to control population stratification without individual-level data: one is based on genomic control (GC) and the other one is based on linkage disequilibrium score regression (LDSC). However, the performance of these two approaches is currently unknown. In this study, we use extensive simulation studies including populations with subpopulations, spatially structured populations, and populations with cryptic relatedness to compare the performance of these two approaches to control for population stratification using only GWAS summary statistics without individual-level data. Data sets from the genetic analysis workshop 19 and UK Biobank are also used to evaluate these two approaches. We demonstrate that the intercept of LDSC can be used as a more accurate correction factor than GC. The results from this study will provide very useful information for researchers using GWAS summary statistics while trying to control for population stratification.


Subject(s)
Genome-Wide Association Study , Polymorphism, Single Nucleotide , Humans , Genome-Wide Association Study/methods , Models, Genetic , Genetic Association Studies , Linkage Disequilibrium , Phenotype
6.
Eur J Nucl Med Mol Imaging ; 50(10): 3022-3033, 2023 08.
Article in English | MEDLINE | ID: mdl-37195444

ABSTRACT

PURPOSE: Cardiac resynchronization therapy (CRT) has been established as an important therapy for heart failure. Mechanical dyssynchrony has the potential to predict responders to CRT. The aim of this study was to report the development and the validation of machine learning models which integrate ECG, gated SPECT MPI (GMPS), and clinical variables to predict patients' response to CRT. METHODS: This analysis included 153 patients who met criteria for CRT from a prospective cohort study. The variables were used to model predictive methods for CRT. Patients were classified as "responders" for an increase of LVEF ≥ 5% at follow-up. In a second analysis, patients were classified as "super-responders" for an increase of LVEF ≥ 15%. For ML, variable selection was applied, and Prediction Analysis of Microarrays (PAM) approach was used to model response while Naïve Bayes (NB) was used to model super-response. These ML models were compared to models obtained with guideline variables. RESULTS: PAM had AUC of 0.80 against 0.72 of partial least squares-discriminant analysis with guideline variables (p = 0.52). The sensitivity (0.86) and specificity (0.75) were better than for guideline alone, sensitivity (0.75) and specificity (0.24). Neural network with guideline variables was better than NB (AUC = 0.93 vs. 0.87) however without statistical significance (p = 0.48). Its sensitivity and specificity (1.0 and 0.75, respectively) were better than guideline alone (0.78 and 0.25, respectively). CONCLUSIONS: Compared to guideline criteria, ML methods trended toward improved CRT response and super-response prediction. GMPS was central in the acquisition of most parameters. Further studies are needed to validate the models.


Subject(s)
Cardiac Resynchronization Therapy , Heart Failure , Humans , Cardiac Resynchronization Therapy/methods , Prospective Studies , Bayes Theorem , Tomography, Emission-Computed, Single-Photon/methods , Heart Failure/diagnostic imaging , Heart Failure/therapy , Electrocardiography , Machine Learning , Treatment Outcome
7.
Genet Epidemiol ; 45(1): 64-81, 2021 02.
Article in English | MEDLINE | ID: mdl-33047835

ABSTRACT

With rapid advancements of sequencing technologies and accumulations of electronic health records, a large number of genetic variants and multiple correlated human complex traits have become available in many genetic association studies. Thus, it becomes necessary and important to develop new methods that can jointly analyze the association between multiple genetic variants and multiple traits. Compared with methods that only use a single marker or trait, the joint analysis of multiple genetic variants and multiple traits is more powerful since such an analysis can fully incorporate the correlation structure of genetic variants and/or traits and their mutual dependence patterns. However, most of existing methods that simultaneously analyze multiple genetic variants and multiple traits are only applicable to unrelated samples. We develop a new method called MF-TOWmuT to detect association of multiple phenotypes and multiple genetic variants in a genomic region with family samples. MF-TOWmuT is based on an optimally weighted combination of variants. Our method can be applied to both rare and common variants and both qualitative and quantitative traits. Our simulation results show that (1) the type I error of MF-TOWmuT is preserved; (2) MF-TOWmuT outperforms two existing methods such as Multiple Family-based Quasi-Likelihood Score Test and Multivariate Family-based Rare Variant Association Test in terms of power. We also illustrate the usefulness of MF-TOWmuT by analyzing genotypic and phenotipic data from the Genetics of Kidneys in Diabetes study. R program is available at https://github.com/gaochengPRC/MF-TOWmuT.


Subject(s)
Genetic Variation , Models, Genetic , Genetic Association Studies , Genotype , Humans , Phenotype
8.
Genet Epidemiol ; 44(1): 67-78, 2020 01.
Article in English | MEDLINE | ID: mdl-31541490

ABSTRACT

Emerging evidence suggests that a genetic variant can affect multiple phenotypes, especially in complex human diseases. Therefore, joint analysis of multiple phenotypes may offer new insights into disease etiology. Recently, many statistical methods have been developed for joint analysis of multiple phenotypes, including the clustering linear combination (CLC) method. Due to the unknown number of clusters for a given data, a simulation procedure must be used to evaluate the p-value of the final test statistic of CLC. This makes the CLC method computationally demanding. In this paper, we use a stopping criterion to determine the number of clusters in the CLC method. We have named our method, hierarchical clustering CLC (HCLC). HCLC has an asymptotic distribution, which is very computationally efficient and makes it applicable for genome-wide association studies. Extensive simulations together with the COPDGene data analysis have been used to assess the type I error rates and power of our proposed method. Our simulation results demonstrate that the type I error rates of HCLC are effectively controlled in different realistic settings. HCLC either outperforms all other methods or has statistical power that is very close to the most powerful method with which it has been compared.


Subject(s)
Cluster Analysis , Genetic Variation/genetics , Models, Genetic , Genome-Wide Association Study , Humans , Phenotype
9.
Genet Epidemiol ; 43(8): 966-979, 2019 12.
Article in English | MEDLINE | ID: mdl-31498476

ABSTRACT

Both genome-wide association study and next-generation sequencing data analyses are widely employed to identify disease susceptible common and/or rare genetic variants. Rare variants generally have large effects though they are hard to detect due to their low frequencies. Currently, many existing statistical methods for rare variants association studies employ a weighted combination scheme, which usually puts subjective weights or suboptimal weights based on some adhoc assumptions (e.g., ignoring dependence between rare variants). In this study, we analytically derived optimal weights for both common and rare variants and proposed a general and novel approach to test association between an optimally weighted combination of variants (G-TOW) in a gene or pathway for a continuous or dichotomous trait while easily adjusting for covariates. Results of the simulation studies show that G-TOW has properly controlled type I error rates and it is the most powerful test among the methods we compared when testing effects of either both rare and common variants or rare variants only. We also illustrate the effectiveness of G-TOW using the Genetic Analysis Workshop 17 (GAW17) data. Additionally, we applied G-TOW and other competitive methods to test disease-associated genes in real data of schizophrenia. The G-TOW has successfully verified genes FYN and VPS39 which are associated with schizophrenia reported in existing publications. Both of these genes are missed by the weighted sum statistic and the sequence kernel association test. Simulation study and real data analysis indicate that G-TOW is a powerful test.


Subject(s)
Genetic Variation , Genome-Wide Association Study , Models, Genetic , Models, Statistical , Computer Simulation , High-Throughput Nucleotide Sequencing , Humans , Phenotype
10.
Bioinformatics ; 35(8): 1373-1379, 2019 04 15.
Article in English | MEDLINE | ID: mdl-30239574

ABSTRACT

SUMMARY: There is an increasing interest in joint analysis of multiple phenotypes for genome-wide association studies (GWASs) based on the following reasons. First, cohorts usually collect multiple phenotypes and complex diseases are usually measured by multiple correlated intermediate phenotypes. Second, jointly analyzing multiple phenotypes may increase statistical power for detecting genetic variants associated with complex diseases. Third, there is increasing evidence showing that pleiotropy is a widespread phenomenon in complex diseases. In this paper, we develop a clustering linear combination (CLC) method to jointly analyze multiple phenotypes for GWASs. In the CLC method, we first cluster individual statistics into positively correlated clusters and then, combine the individual statistics linearly within each cluster and combine the between-cluster terms in a quadratic form. CLC is not only robust to different signs of the means of individual statistics, but also reduce the degrees of freedom of the test statistic. We also theoretically prove that if we can cluster the individual statistics correctly, CLC is the most powerful test among all tests with certain quadratic forms. Our simulation results show that CLC is either the most powerful test or has similar power to the most powerful test among the tests we compared, and CLC is much more powerful than other tests when effect sizes align with inferred clusters. We also evaluate the performance of CLC through a real case study. AVAILABILITY AND IMPLEMENTATION: R code for implementing our method is available at http://www.math.mtu.edu/∼shuzhang/software.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genome-Wide Association Study , Cluster Analysis , Models, Genetic , Phenotype , Polymorphism, Single Nucleotide
11.
Hum Hered ; 84(4-5): 170-196, 2019.
Article in English | MEDLINE | ID: mdl-32417835

ABSTRACT

MOTIVATION: The risk of many complex diseases is determined by an interplay of genetic and environmental factors. The examination of gene-environment interactions (G×Es) for multiple traits can yield valuable insights about the etiology of the disease and increase power in detecting disease-associated genes. However, the methods for testing G×Es for multiple traits are very limited. METHOD: We developed novel approaches to test G×Es for multiple traits in sequencing association studies. We first perform a transformation of multiple traits by using either principal component analysis or standardization analysis. Then, we detect the effects of G×Es using novel proposed tests: testing the effect of an optimally weighted combination of G×Es (TOW-GE) and/or variable weight TOW-GE (VW-TOW-GE). Finally, we employ Fisher's combination test to combine the p values. RESULTS: Extensive simulation studies show that the type I error rates of the proposed methods are well controlled. Compared to the interaction sequence kernel association test (ISKAT), TOW-GE is more powerful when there are only rare risk and protective variants; VW-TOW-GE is more powerful when there are both rare and common variants. Both TOW-GE and VW-TOW-GE are robust to directions of effects of causal G×Es. Application to the COPDGene Study demonstrates that our proposed methods are very effective. CONCLUSIONS: Our proposed methods are useful tools in the identification of G×Es for multiple traits. The proposed methods can be used not only to identify G×Es for common variants, but also for rare variants. Therefore, they can be employed in identifying G×Es in both genome-wide association studies and next-generation sequencing data analyses.

12.
Genet Epidemiol ; 42(4): 344-353, 2018 06.
Article in English | MEDLINE | ID: mdl-29682782

ABSTRACT

Genome-wide association studies (GWAS) have become a very effective research tool to identify genetic variants of underlying various complex diseases. In spite of the success of GWAS in identifying thousands of reproducible associations between genetic variants and complex disease, in general, the association between genetic variants and a single phenotype is usually weak. It is increasingly recognized that joint analysis of multiple phenotypes can be potentially more powerful than the univariate analysis, and can shed new light on underlying biological mechanisms of complex diseases. In this paper, we develop a novel variable reduction method using hierarchical clustering method (HCM) for joint analysis of multiple phenotypes in association studies. The proposed method involves two steps. The first step applies a dimension reduction technique by using a representative phenotype for each cluster of phenotypes. Then, existing methods are used in the second step to test the association between genetic variants and the representative phenotypes rather than the individual phenotypes. We perform extensive simulation studies to compare the powers of multivariate analysis of variance (MANOVA), joint model of multiple phenotypes (MultiPhen), and trait-based association test that uses extended simes procedure (TATES) using HCM with those of without using HCM. Our simulation studies show that using HCM is more powerful than without using HCM in most scenarios. We also illustrate the usefulness of using HCM by analyzing a whole-genome genotyping data from a lung function study.


Subject(s)
Multifactor Dimensionality Reduction/methods , Cluster Analysis , Computer Simulation , Genome-Wide Association Study , Humans , Models, Genetic , Phenotype , Pulmonary Disease, Chronic Obstructive/genetics
13.
Genet Epidemiol ; 41(3): 233-243, 2017 04.
Article in English | MEDLINE | ID: mdl-28176359

ABSTRACT

Despite the extensive discovery of disease-associated common variants, much of the genetic contribution to complex traits remains unexplained. Rare variants may explain additional disease risk or trait variability. Although sequencing technology provides a supreme opportunity to investigate the roles of rare variants in complex diseases, detection of these variants in sequencing-based association studies presents substantial challenges. In this article, we propose novel statistical tests to test the association between rare and common variants in a genomic region and a complex trait of interest based on cross-validation prediction error (PE). We first propose a PE method based on Ridge regression. Based on PE, we also propose another two tests PE-WS and PE-TOW by testing a weighted combination of variants with two different weighting schemes. PE-WS is the PE version of the test based on the weighted sum statistic (WS) and PE-TOW is the PE version of the test based on the optimally weighted combination of variants (TOW). Using extensive simulation studies, we are able to show that (1) PE-TOW and PE-WS are consistently more powerful than TOW and WS, respectively, and (2) PE is the most powerful test when causal variants contain both common and rare variants.


Subject(s)
Genetic Association Studies/standards , Genetic Variation/genetics , Predictive Value of Tests , Quantitative Trait, Heritable , Algorithms , Computer Simulation , Humans , Models, Genetic , Phenotype , Reproducibility of Results
14.
Ann Hum Genet ; 82(6): 389-395, 2018 11.
Article in English | MEDLINE | ID: mdl-29932453

ABSTRACT

In the study of complex diseases, several correlated phenotypes are usually measured. There is also increasing evidence showing that testing the association between a single-nucleotide polymorphism (SNP) and multiple-dependent phenotypes jointly is often more powerful than analyzing only one phenotype at a time. Therefore, developing statistical methods to test for genetic association with multiple phenotypes has become increasingly important. In this paper, we develop an Allele-based Clustering Approach (ACA) for the joint analysis of multiple non-normal phenotypes in association studies. In ACA, we consider the alleles at a SNP of interest as a dependent variable with two classes, and the correlated phenotypes as predictors to predict the alleles at the SNP of interest. We perform extensive simulation studies to evaluate the performance of ACA and compare the power of ACA with the powers of Adaptive Fisher's Combination test, Trait-based Association Test that uses Extended Simes procedure, Fisher's Combination test, the standard MANOVA, and the joint model of Multiple Phenotypes. Our simulation studies show that the proposed method has correct type I error rates and is much more powerful than other methods for some non-normal distributions.


Subject(s)
Cluster Analysis , Genetic Association Studies/methods , Models, Genetic , Phenotype , Polymorphism, Single Nucleotide , Alleles , Computer Simulation , Humans
15.
Genet Epidemiol ; 39(4): 294-305, 2015 May.
Article in English | MEDLINE | ID: mdl-25758547

ABSTRACT

Population stratification has long been recognized as an issue in genetic association studies because unrecognized population stratification can lead to both false-positive and false-negative findings and can obscure true association signals if not appropriately corrected. This issue can be even worse in rare variant association analyses because rare variants often demonstrate stronger and potentially different patterns of stratification than common variants. To correct for population stratification in genetic association studies, we proposed a novel method to Test the effect of an Optimally Weighted combination of variants in Admixed populations (TOWA) in which the analytically derived optimal weights can be calculated from existing phenotype and genotype data. TOWA up weights rare variants and those variants that have strong associations with the phenotype. Additionally, it can adjust for the direction of the association, and allows for local ancestry difference among study subjects. Extensive simulations show that the type I error rate of TOWA is under control in the presence of population stratification and it is more powerful than existing methods. We have also applied TOWA to a real sequencing data. Our simulation studies as well as real data analysis results indicate that TOWA is a useful tool for rare variant association analyses in admixed populations.


Subject(s)
Algorithms , Genetic Association Studies/methods , Genetic Predisposition to Disease , Genetic Variation , Genetics, Population , Models, Genetic , Population Groups/genetics , Case-Control Studies , Computer Simulation , Genotype , Hematocrit , Humans , Phenotype
16.
Ann Hum Genet ; 80(3): 162-71, 2016 May.
Article in English | MEDLINE | ID: mdl-26990300

ABSTRACT

The joint analysis of multiple traits has recently become popular since it can increase statistical power to detect genetic variants and there is increasing evidence showing that pleiotropy is a widespread phenomenon in complex diseases. Currently, the majority of existing methods for the joint analysis of multiple traits test association between one common variant and multiple traits. However, the variant-by-variant methods for common variant association studies may not be optimal for rare variant association studies due to the allelic heterogeneity as well as the extreme rarity of individual variants. Current statistical methods for rare variant association studies are for one single trait only. In this paper, we propose an adaptive weighting reverse regression (AWRR) method to test association between multiple traits and rare variants in a genomic region. AWRR is robust to the directions of effects of causal variants and is also robust to the directions of association of traits. Using extensive simulation studies, we compare the performance of AWRR with canonical correlation analysis (CCA), Single-TOW, and the weighted sum reverse regression (WSRR). Our results show that, in all of the simulation scenarios, AWRR is consistently more powerful than CCA. In most scenarios, AWRR is more powerful than Single-TOW and WSRR.


Subject(s)
Genetic Association Studies , Genetic Variation , Computer Simulation , Genotype , Humans , Models, Genetic , Phenotype , Regression Analysis
17.
Hum Hered ; 80(3): 144-52, 2015.
Article in English | MEDLINE | ID: mdl-27344597

ABSTRACT

BACKGROUND/AIMS: Genome-wide association studies (GWAS) have identified many variants that each affect multiple phenotypes, which suggests that pleiotropic effects on human complex phenotypes may be widespread. Therefore, statistical methods that can jointly analyze multiple phenotypes in GWAS may have advantages over analyzing each phenotype individually. Several statistical methods have been developed to utilize such multivariate phenotypes in genetic association studies; however, the performance of these methods under different scenarios is largely unknown. Our goal was to provide researchers with useful guidelines on selecting statistical methods for the application of real data to multiple phenotypes. METHODS: In this study, we evaluated the performance of some of the existing methods for association studies using multiple phenotypes. These methods included the O'Brien method (OB), cross-validation method (CV), optimal weight method (OW), Trait-based Association Test that uses Extended Simes procedure (TATES), principal components of heritability (PCH), canonical correlation analysis (CCA), multivariate analysis of variance (MANOVA), and a joint model of multiple phenotypes (MultiPhen). We used simulation studies to compare the powers of these methods under a variety of scenarios, including different numbers of phenotypes, different values of between-phenotype correlation, different minor allele frequencies, and different mean and variance models. RESULTS AND CONCLUSION: Our simulation results show that there is no single method with consistently good performance among all the scenarios. Each method has its own advantages and disadvantages.


Subject(s)
Genetic Association Studies/methods , Models, Genetic , Phenotype , Computer Simulation , Data Interpretation, Statistical , Gene Frequency , Genetic Association Studies/standards , Humans
18.
Genet Epidemiol ; 38(2): 135-43, 2014 Feb.
Article in English | MEDLINE | ID: mdl-24382753

ABSTRACT

With the development of sequencing technologies, the direct testing of rare variant associations has become possible. Many statistical methods for detecting associations between rare variants and complex diseases have recently been developed, most of which are population-based methods for unrelated individuals. A limitation of population-based methods is that spurious associations can occur when there is a population structure. For rare variants, this problem can be more serious, because the spectrum of rare variation can be very different in diverse populations, as well as the current nonexistence of methods to control for population stratification in population-based rare variant associations. A solution to the problem of population stratification is to use family-based association tests, which use family members to control for population stratification. In this article, we propose a novel test for Testing the Optimally Weighted combination of variants based on data of Parents and Affected Children (TOW-PAC). TOW-PAC is a family-based association test that tests the combined effect of rare and common variants in a genomic region, and is robust to the directions of the effects of causal variants. Simulation studies confirm that, for rare variant associations, family-based association tests are robust to population stratification although population-based association tests can be seriously confounded by population stratification. The results of power comparisons show that the power of TOW-PAC increases with an increase of the number of affected children in each family and TOW-PAC based on multiple affected children per family is more powerful than TOW based on unrelated individuals.


Subject(s)
Genetic Predisposition to Disease , Genetic Variation , Child , Computer Simulation , Female , Genetics, Population , Genome, Human , Humans , Male , Models, Genetic , Parents , Pedigree
19.
Genet Epidemiol ; 38(6): 494-501, 2014 Sep.
Article in English | MEDLINE | ID: mdl-25065727

ABSTRACT

Next generation sequencing technologies make direct testing rare variant associations possible. However, the development of powerful statistical methods for rare variant association studies is still underway. Most of existing methods are burden and quadratic tests. Recent studies show that the performance of each of burden and quadratic tests depends strongly upon the underlying assumption and no test demonstrates consistently acceptable power. Thus, combined tests by combining information from the burden and quadratic tests have been proposed recently. However, results from recent studies (including this study) show that there exist tests that can outperform both burden and quadratic tests. In this article, we propose three classes of tests that include tests outperforming both burden and quadratic tests. Then, we propose the optimal combination of single-variant tests (OCST) by combining information from tests of the three classes. We use extensive simulation studies to compare the performance of OCST with that of burden, quadratic and optimal single-variant tests. Our results show that OCST either is the most powerful test or has similar power with the most powerful test. We also compare the performance of OCST with that of the two existing combined tests. Our results show that OCST has better power than the two combined tests.


Subject(s)
Genetic Association Studies , Genetic Variation , A Kinase Anchor Proteins/genetics , ATPases Associated with Diverse Cellular Activities , Adenosine Triphosphatases/genetics , Collagen Type VI/genetics , Computer Simulation , Extracellular Matrix Proteins/genetics , Gene Frequency , Genotype , Haplotypes , High-Throughput Nucleotide Sequencing , Humans , Membrane Proteins/genetics , Minor Histocompatibility Antigens , Neoplasm Proteins/genetics , Proto-Oncogene Proteins/genetics
20.
Genes (Basel) ; 15(1)2024 01 03.
Article in English | MEDLINE | ID: mdl-38254957

ABSTRACT

Genome-wide association studies (GWAS) have successfully revealed many disease-associated genetic variants. For a case-control study, the adequate power of an association test can be achieved with a large sample size, although genotyping large samples is expensive. A cost-effective strategy to boost power is to integrate external control samples with publicly available genotyped data. However, the naive integration of external controls may inflate the type I error rates if ignoring the systematic differences (batch effect) between studies, such as the differences in sequencing platforms, genotype-calling procedures, population stratification, and so forth. To account for the batch effect, we propose an approach by integrating External Controls into the Association Test by Regression Calibration (iECAT-RC) in case-control association studies. Extensive simulation studies show that iECAT-RC not only can control type I error rates but also can boost statistical power in all models. We also apply iECAT-RC to the UK Biobank data for M72 Fibroblastic disorders by considering genotype calling as the batch effect. Four SNPs associated with fibroblastic disorders have been detected by iECAT-RC and the other two comparison methods, iECAT-Score and Internal. However, our method has a higher probability of identifying these significant SNPs in the scenario of an unbalanced case-control association study.


Subject(s)
Genome-Wide Association Study , Calibration , Case-Control Studies , Computer Simulation , Genotype
SELECTION OF CITATIONS
SEARCH DETAIL