Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 17 de 17
Filter
1.
Int J Drug Policy ; 125: 104340, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38342052

ABSTRACT

BACKGROUND: There is substantial geographic variability in local cannabis policies within states that have legalized recreational cannabis. This study develops an interpretable machine learning model that uses county-level population demographics, sociopolitical factors, and estimates of substance use and mental illness prevalences to predict the legality of recreational cannabis sales within each U.S. county. METHODS: We merged data and selected 14 model inputs from the 2010 Census, 2012 County Presidential Data from the MIT Elections Lab, and Small Area Estimates from the National Surveys on Drug Use and Health (NSDUH) from 2010 to 2012 at the county level. County policies were labeled as having recreational cannabis legal (RCL) if the sale of recreational cannabis was allowed anywhere in the county in 2014, resulting in 92 RCL and 3002 non-RCL counties. We used synthetic data augmentation and minority oversampling techniques to build an ensemble of 1000 logistic regressions on random sub-samples of the data, withholding one state at a time and building models from all remaining states. Performance was evaluated by comparing the predicted policy conditions with the actual outcomes in 2014. RESULTS: When compared to the actual RCL policies in 2014, the ensemble estimated predictions of counties transitioning to RCL had a macro f1 average score of 0.61. The main factors associated with legalizing county-level recreational cannabis sales were the prevalences of past-month cannabis use and past-year cocaine use. CONCLUSION: By leveraging publicly available data from 2010 to 2012, our model was able to achieve appreciable discrimination in predicting counties with legal recreational cannabis sales in 2014, however, there is room for improvement. Having demonstrated model performance in the first handful of states to legalize cannabis, additional testing with more recent data using time to event models is warranted.


Subject(s)
Cannabis , Marijuana Use , Humans , United States , Legislation, Drug , Marijuana Use/epidemiology , Commerce , Public Policy
2.
Article in English | MEDLINE | ID: mdl-35085091

ABSTRACT

The genetic etiologies of common diseases are highly complex and heterogeneous. Classic methods, such as linear regression, have successfully identified numerous variants associated with complex diseases. Nonetheless, for most diseases, the identified variants only account for a small proportion of heritability. Challenges remain to discover additional variants contributing to complex diseases. Expectile regression is a generalization of linear regression and provides complete information on the conditional distribution of a phenotype of interest. While expectile regression has many nice properties, it has rarely been used in genetic research. In this paper, we develop an expectile neural network (ENN) method for genetic data analyses of complex diseases. Similar to expectile regression, ENN provides a comprehensive view of relationships between genetic variants and disease phenotypes, which can be used to discover variants predisposing to sub-populations. We further integrate the idea of neural networks into ENN, making it capable of capturing non-linear and non-additive genetic effects (e.g., gene-gene interactions). Through simulations, we showed that the proposed method outperformed an existing expectile regression when there exist complex genotype-phenotype relationships. We also applied the proposed method to the data from the Study of Addiction: Genetics and Environment (SAGE), investigating the relationships of candidate genes with smoking quantity.


Subject(s)
Genetic Variation , Neural Networks, Computer , Phenotype , Linear Models
3.
Diabetes Care ; 46(5): 929-937, 2023 05 01.
Article in English | MEDLINE | ID: mdl-36383734

ABSTRACT

OBJECTIVE: Environmental exposures may have greater predictive power for type 2 diabetes than polygenic scores (PGS). Studies examining environmental risk factors, however, have included only individuals with European ancestry, limiting the applicability of results. We conducted an exposome-wide association study in the multiancestry Personalized Environment and Genes Study to assess the effects of environmental factors on type 2 diabetes. RESEARCH DESIGN AND METHODS: Using logistic regression for single-exposure analysis, we identified exposures associated with type 2 diabetes, adjusting for age, BMI, household income, and self-reported sex and race. To compare cumulative genetic and environmental effects, we computed an overall clinical score (OCS) as a weighted sum of BMI and prediabetes, hypertension, and high cholesterol status and a polyexposure score (PXS) as a weighted sum of 13 environmental variables. Using UK Biobank data, we developed a multiancestry PGS and calculated it for participants. RESULTS: We found 76 significant associations with type 2 diabetes, including novel associations of asbestos and coal dust exposure. OCS, PXS, and PGS were significantly associated with type 2 diabetes. PXS had moderate power to determine associations, with larger effect size and greater power and reclassification improvement than PGS. For all scores, the results differed by race. CONCLUSIONS: Our findings in a multiancestry cohort elucidate how type 2 diabetes odds can be attributed to clinical, genetic, and environmental factors and emphasize the need for exposome data in disease-risk association studies. Race-based differences in predictive scores highlight the need for genetic and exposome-wide studies in diverse populations.


Subject(s)
Diabetes Mellitus, Type 2 , Hypertension , Humans , Diabetes Mellitus, Type 2/epidemiology , Diabetes Mellitus, Type 2/genetics , Hypertension/complications , Environmental Exposure , Multifactorial Inheritance/genetics , Surveys and Questionnaires , Genome-Wide Association Study , Risk Factors
4.
Brief Bioinform ; 23(3)2022 05 13.
Article in English | MEDLINE | ID: mdl-35383355

ABSTRACT

Heritability, the proportion of phenotypic variance explained by genome-wide single nucleotide polymorphisms (SNPs) in unrelated individuals, is an important measure of the genetic contribution to human diseases and plays a critical role in studying the genetic architecture of human diseases. Linear mixed model (LMM) has been widely used for SNP heritability estimation, where variance component parameters are commonly estimated by using a restricted maximum likelihood (REML) method. REML is an iterative optimization algorithm, which is computationally intensive when applied to large-scale datasets (e.g. UK Biobank). To facilitate the heritability analysis of large-scale genetic datasets, we develop a fast approach, minimum norm quadratic unbiased estimator (MINQUE) with batch training, to estimate variance components from LMM (LMM.MNQ.BCH). In LMM.MNQ.BCH, the parameters are estimated by MINQUE, which has a closed-form solution for fast computation and has no convergence issue. Batch training has also been adopted in LMM.MNQ.BCH to accelerate the computation for large-scale genetic datasets. Through simulations and real data analysis, we demonstrate that LMM.MNQ.BCH is much faster than two existing approaches, GCTA and BOLT-REML.


Subject(s)
Genome-Wide Association Study , Models, Genetic , Genome , Genome-Wide Association Study/methods , Humans , Linear Models , Polymorphism, Single Nucleotide
5.
J Am Acad Child Adolesc Psychiatry ; 61(7): 934-945, 2022 07.
Article in English | MEDLINE | ID: mdl-35378236

ABSTRACT

OBJECTIVE: To investigate the genetic architecture of internalizing symptoms in childhood and adolescence. METHOD: In 22 cohorts, multiple univariate genome-wide association studies (GWASs) were performed using repeated assessments of internalizing symptoms, in a total of 64,561 children and adolescents between 3 and 18 years of age. Results were aggregated in meta-analyses that accounted for sample overlap, first using all available data, and then using subsets of measurements grouped by rater, age, and instrument. RESULTS: The meta-analysis of overall internalizing symptoms (INToverall) detected no genome-wide significant hits and showed low single nucleotide polymorphism (SNP) heritability (1.66%, 95% CI = 0.84-2.48%, neffective = 132,260). Stratified analyses indicated rater-based heterogeneity in genetic effects, with self-reported internalizing symptoms showing the highest heritability (5.63%, 95% CI = 3.08%-8.18%). The contribution of additive genetic effects on internalizing symptoms appeared to be stable over age, with overlapping estimates of SNP heritability from early childhood to adolescence. Genetic correlations were observed with adult anxiety, depression, and the well-being spectrum (|rg| > 0.70), as well as with insomnia, loneliness, attention-deficit/hyperactivity disorder, autism, and childhood aggression (range |rg| = 0.42-0.60), whereas there were no robust associations with schizophrenia, bipolar disorder, obsessive-compulsive disorder, or anorexia nervosa. CONCLUSION: Genetic correlations indicate that childhood and adolescent internalizing symptoms share substantial genetic vulnerabilities with adult internalizing disorders and other childhood psychiatric traits, which could partially explain both the persistence of internalizing symptoms over time and the high comorbidity among childhood psychiatric traits. Reducing phenotypic heterogeneity in childhood samples will be key in paving the way to future GWAS success.


Subject(s)
Attention Deficit Disorder with Hyperactivity , Autistic Disorder , Genome-Wide Association Study , Sleep Initiation and Maintenance Disorders , Adolescent , Adult , Aggression , Anxiety/genetics , Attention Deficit Disorder with Hyperactivity/genetics , Autistic Disorder/genetics , Bipolar Disorder , Child , Child, Preschool , Depression/genetics , Humans , Loneliness , Polymorphism, Single Nucleotide , Schizophrenia , Sleep Initiation and Maintenance Disorders/genetics
6.
Cancer Commun (Lond) ; 41(12): 1387-1397, 2021 12.
Article in English | MEDLINE | ID: mdl-34520132

ABSTRACT

BACKGROUND: DNA methylation and gene expression are known to play important roles in the etiology of human diseases such as prostate cancer (PCa). However, it has not yet been possible to incorporate information of DNA methylation and gene expression into polygenic risk scores (PRSs). Here, we aimed to develop and validate an improved PRS for PCa risk by incorporating genetically predicted gene expression and DNA methylation, and other genomic information using an integrative method. METHODS: Using data from the PRACTICAL consortium, we derived multiple sets of genetic scores, including those based on available single-nucleotide polymorphisms through widely used methods of pruning and thresholding, LDpred, LDpred-funt, AnnoPred, and EBPRS, as well as PRS constructed using the genetically predicted gene expression and DNA methylation through a revised pruning and thresholding strategy. In the tuning step, using the UK Biobank data (1458 prevalent cases and 1467 controls), we selected PRSs with the best performance. Using an independent set of data from the UK Biobank, we developed an integrative PRS combining information from individual scores. Furthermore, in the testing step, we tested the performance of the integrative PRS in another independent set of UK Biobank data of incident cases and controls. RESULTS: Our constructed PRS had improved performance (C statistics: 76.1%) over PRSs constructed by individual benchmark methods (from 69.6% to 74.7%). Furthermore, our new PRS had much higher risk assessment power than family history. The overall net reclassification improvement was 69.0% by adding PRS to the baseline model compared with 12.5% by adding family history. CONCLUSIONS: We developed and validated a new PRS which may improve the utility in predicting the risk of developing PCa. Our innovative method can also be applied to other human diseases to improve risk prediction across multiple outcomes.


Subject(s)
DNA Methylation , Prostatic Neoplasms , DNA Methylation/genetics , Gene Expression , Genetic Predisposition to Disease , Genome-Wide Association Study , Humans , Male , Multifactorial Inheritance , Prostatic Neoplasms/genetics , Risk Factors
7.
Transl Psychiatry ; 11(1): 413, 2021 07 30.
Article in English | MEDLINE | ID: mdl-34330890

ABSTRACT

Childhood aggressive behavior (AGG) has a substantial heritability of around 50%. Here we present a genome-wide association meta-analysis (GWAMA) of childhood AGG, in which all phenotype measures across childhood ages from multiple assessors were included. We analyzed phenotype assessments for a total of 328 935 observations from 87 485 children aged between 1.5 and 18 years, while accounting for sample overlap. We also meta-analyzed within subsets of the data, i.e., within rater, instrument and age. SNP-heritability for the overall meta-analysis (AGGoverall) was 3.31% (SE = 0.0038). We found no genome-wide significant SNPs for AGGoverall. The gene-based analysis returned three significant genes: ST3GAL3 (P = 1.6E-06), PCDH7 (P = 2.0E-06), and IPO13 (P = 2.5E-06). All three genes have previously been associated with educational traits. Polygenic scores based on our GWAMA significantly predicted aggression in a holdout sample of children (variance explained = 0.44%) and in retrospectively assessed childhood aggression (variance explained = 0.20%). Genetic correlations (rg) among rater-specific assessment of AGG ranged from rg = 0.46 between self- and teacher-assessment to rg = 0.81 between mother- and teacher-assessment. We obtained moderate-to-strong rgs with selected phenotypes from multiple domains, but hardly with any of the classical biomarkers thought to be associated with AGG. Significant genetic correlations were observed with most psychiatric and psychological traits (range [Formula: see text]: 0.19-1.00), except for obsessive-compulsive disorder. Aggression had a negative genetic correlation (rg = ~-0.5) with cognitive traits and age at first birth. Aggression was strongly genetically correlated with smoking phenotypes (range [Formula: see text]: 0.46-0.60). The genetic correlations between aggression and psychiatric disorders were weaker for teacher-reported AGG than for mother- and self-reported AGG. The current GWAMA of childhood aggression provides a powerful tool to interrogate the rater-specific genetic etiology of AGG.


Subject(s)
Aggression , Mental Disorders , Adolescent , Child , Child, Preschool , Female , Genetic Association Studies , Genome-Wide Association Study , Humans , Infant , Retrospective Studies
8.
Front Genet ; 12: 654717, 2021.
Article in English | MEDLINE | ID: mdl-34040634

ABSTRACT

Background: Insomnia is a common mental disorder, affecting nearly one fifth of the pre-adult population in the United States. The recent, largest genome-wide association study (GWAS) conducted on the United Kingdom Biobank cohort identified hundreds of significant single-nucleotide polymorphism (SNP), allowing the epidemiologists to quantify individual genetic predisposition in the subsequent studies via the polygenic risk scoring technique. The nucleotide polymorphisms and risk scoring, while being able to generalize to other adult populations of European origin, are not yet tested on pediatric and adolescent populations of diverse racial-ethnic backgrounds, and our study intends to fill these gaps. Materials and Methods: We took the summary of the same United Kingdom Biobank study and conducted a polygenic risk score (PRS) analysis on a multi-ethnicity, pre-adult population provided by the Adolescent Brain Cognitive Development (ABCD) Study. Results: The PRSs according to the significant nucleotide polymorphisms found in white British adults is a strong predictor of insomnia in children of similar European background but lacks power in non-European groups. Conclusions: Through polygenic risk scoring, the knowledge of insomnia genetics summarized from a white adult study population is transferable to a younger age group, which aids the search of actionable targets of early insomnia prevention. Yet population stratification may prevent the easy generalization across ethnic lines; therefore, it is necessary to conduct group specific studies to aid people of non-European genetic background.

9.
Front Genet ; 10: 448, 2019.
Article in English | MEDLINE | ID: mdl-31164900

ABSTRACT

While substantial progress has been made in finding genetic variants associated with nicotine dependence (ND), a large proportion of the genetic variants remain undiscovered. The current research focuses have shifted toward uncovering rare variants, gene-gene/gene-environment interactions, and structural variations predisposing to ND, the impact of genetic heterogeneity in ND has been nevertheless paid less attention. The study of genetic heterogeneity in ND not only could enhance the power of detecting genetic variants with heterogeneous effects in the population but also improve our understanding of genetic etiology of ND. As an initial step to understand genetic heterogeneity in ND, we applied a newly developed heterogeneity weighted U (HWU) method to 26 ND-related genes, investigating heterogeneous effects of these 26 genes in ND. We found no strong evidence of genetic heterogeneity in genes such as CHRNA5. However, results from our analysis suggest heterogeneous effects of CHRNA6 and CHRNB3 on nicotine dependence in males and females. Following the gene-based analysis, we further conduct a joint association analysis of two gene clusters, CHRNA5-CHRNA3-CHRNB4 and CHRNB3-CHRNA6. While both CHRNA5-CHRNA3-CHRNB4 and CHRNB3-CHRNA6 clusters are significantly associated with ND, there is a much stronger association of CHRNB3-CHRNA6 with ND when considering heterogeneous effects in gender (p-value = 2.11E-07).

10.
BMC Genet ; 20(1): 40, 2019 04 10.
Article in English | MEDLINE | ID: mdl-30967125

ABSTRACT

BACKGROUND: The advance of high-throughput technologies has made it cost-effective to collect diverse types of omic data in large-scale clinical and biological studies. While the collection of the vast amounts of multi-level omic data from these studies provides a great opportunity for genetic research, the high dimensionality of omic data and complex relationships among multi-level omic data bring tremendous analytic challenges. RESULTS: To address these challenges, we develop an integrative U (IU) method for the design and analysis of multi-level omic data. While non-parametric methods make less model assumptions and are flexible for analyzing different types of phenotypes and omic data, they have been less developed for association analysis of omic data. The IU method is a nonparametric method that can accommodate various types of omic and phenotype data, and consider interactive relationship among different levels of omic data. Through simulations and a real data application, we compare the IU test with commonly used variance component tests. CONCLUSIONS: Results show that the proposed test attains more robust type I error performance and higher empirical power than variance component tests under various types of phenotypes and different underlying interaction effects.


Subject(s)
Computational Biology , Genome-Wide Association Study/methods , Genomics , Algorithms , Computational Biology/methods , Genomics/methods , Humans
11.
Genetics ; 210(2): 463-476, 2018 10.
Article in English | MEDLINE | ID: mdl-30104420

ABSTRACT

The genetic etiology of many complex diseases is highly heterogeneous. A complex disease can be caused by multiple mutations within the same gene or mutations in multiple genes at various genomic loci. Although these disease-susceptibility mutations can be collectively common in the population, they are often individually rare or even private to certain families. Family-based studies are powerful for detecting rare variants enriched in families, which is an important feature for sequencing studies due to the heterogeneous nature of rare variants. In addition, family designs can provide robust protection against population stratification. Nevertheless, statistical methods for analyzing family-based sequencing data are underdeveloped, especially those accounting for heterogeneous etiology of complex diseases. In this article, we introduce a random field framework for detecting gene-phenotype associations in family-based sequencing studies, referred to as family-based genetic random field (FGRF). Similar to existing family-based association tests, FGRF could utilize within-family and between-family information separately or jointly to test an association. We demonstrate that FGRF has comparable statistical power with existing methods when there is no genetic heterogeneity, but can improve statistical power when there is genetic heterogeneity across families. The proposed method also shares the same advantages with the conventional family-based association tests (e.g., being robust to population stratification). Finally, we applied the proposed method to a sequencing data from the Minnesota Twin Family Study, and revealed several genes, including SAMD14, potentially associated with alcohol dependence.


Subject(s)
Genetic Heterogeneity , Genetic Predisposition to Disease , Models, Genetic , Mutation Rate , Pedigree , Alcoholism/genetics , Algorithms , Humans , Phenotype
12.
JAMA Psychiatry ; 74(12): 1242-1250, 2017 12 01.
Article in English | MEDLINE | ID: mdl-28979981

ABSTRACT

Importance: Antisocial behavior (ASB) places a large burden on perpetrators, survivors, and society. Twin studies indicate that half of the variation in this trait is genetic. Specific causal genetic variants have, however, not been identified. Objectives: To estimate the single-nucleotide polymorphism-based heritability of ASB; to identify novel genetic risk variants, genes, or biological pathways; to test for pleiotropic associations with other psychiatric traits; and to reevaluate the candidate gene era data through the Broad Antisocial Behavior Consortium. Design, Setting, and Participants: Genome-wide association data from 5 large population-based cohorts and 3 target samples with genome-wide genotype and ASB data were used for meta-analysis from March 1, 2014, to May 1, 2016. All data sets used quantitative phenotypes, except for the Finnish Crime Study, which applied a case-control design (370 patients and 5850 control individuals). Main Outcome and Measures: This study adopted relatively broad inclusion criteria to achieve a quantitative measure of ASB derived from multiple measures, maximizing the sample size over different age ranges. Results: The discovery samples comprised 16 400 individuals, whereas the target samples consisted of 9381 individuals (all individuals were of European descent), including child and adult samples (mean age range, 6.7-56.1 years). Three promising loci with sex-discordant associations were found (8535 female individuals, chromosome 1: rs2764450, chromosome 11: rs11215217; 7772 male individuals, chromosome X, rs41456347). Polygenic risk score analyses showed prognostication of antisocial phenotypes in an independent Finnish Crime Study (2536 male individuals and 3684 female individuals) and shared genetic origin with conduct problems in a population-based sample (394 male individuals and 431 female individuals) but not with conduct disorder in a substance-dependent sample (950 male individuals and 1386 female individuals) (R2 = 0.0017 in the most optimal model, P = 0.03). Significant inverse genetic correlation of ASB with educational attainment (r = -0.52, P = .005) was detected. Conclusions and Relevance: The Broad Antisocial Behavior Consortium entails the largest collaboration to date on the genetic architecture of ASB, and the first results suggest that ASB may be highly polygenic and has potential heterogeneous genetic effects across sex.


Subject(s)
Antisocial Personality Disorder , Conduct Disorder , Adolescent , Adult , Antisocial Personality Disorder/epidemiology , Antisocial Personality Disorder/genetics , Antisocial Personality Disorder/psychology , Child , Conduct Disorder/epidemiology , Conduct Disorder/genetics , Conduct Disorder/psychology , Environment , Female , Finland/epidemiology , Genetic Variation , Genome-Wide Association Study , Humans , Male , Middle Aged , Multifactorial Inheritance , Sex Factors
13.
Genet Epidemiol ; 41(7): 636-643, 2017 11.
Article in English | MEDLINE | ID: mdl-28850771

ABSTRACT

Although sequencing studies hold great promise for uncovering novel variants predisposing to human diseases, the high dimensionality of the sequencing data brings tremendous challenges to data analysis. Moreover, for many complex diseases (e.g., psychiatric disorders) multiple related phenotypes are collected. These phenotypes can be different measurements of an underlying disease, or measurements characterizing multiple related diseases for studying common genetic mechanism. Although jointly analyzing these phenotypes could potentially increase the power of identifying disease-associated genes, the different types of phenotypes pose challenges for association analysis. To address these challenges, we propose a nonparametric method, functional U-statistic method (FU), for multivariate analysis of sequencing data. It first constructs smooth functions from individuals' sequencing data, and then tests the association of these functions with multiple phenotypes by using a U-statistic. The method provides a general framework for analyzing various types of phenotypes (e.g., binary and continuous phenotypes) with unknown distributions. Fitting the genetic variants within a gene using a smoothing function also allows us to capture complexities of gene structure (e.g., linkage disequilibrium, LD), which could potentially increase the power of association analysis. Through simulations, we compared our method to the multivariate outcome score test (MOST), and found that our test attained better performance than MOST. In a real data application, we apply our method to the sequencing data from Minnesota Twin Study (MTS) and found potential associations of several nicotine receptor subunit (CHRN) genes, including CHRNB3, associated with nicotine dependence and/or alcohol dependence.


Subject(s)
Diseases in Twins/genetics , Genetic Variation , Linkage Disequilibrium , Models, Genetic , Sequence Analysis, DNA/methods , Substance-Related Disorders/genetics , Diseases in Twins/epidemiology , Female , Humans , Male , Minnesota/epidemiology , Multivariate Analysis , Phenotype , Statistics, Nonparametric , Substance-Related Disorders/epidemiology
14.
BMC Proc ; 10(Suppl 7): 125-129, 2016.
Article in English | MEDLINE | ID: mdl-27980623

ABSTRACT

BACKGROUND: With the advance of next-generation sequencing technologies, the study of rare variants in targeted genome regions or even the whole genome becomes feasible. Nevertheless, the massive amount of sequencing data brings great computational and statistical challenges for association analyses. Aside from sequencing variants, other high-throughput omic data (eg, gene expression data) also become available, and can be incorporated into association analysis for better modeling and power improvement. This motivates the need of developing computationally efficient and powerful approaches to model the joint associations of multilevel omic data with complex human diseases. METHODS: A similarity-based weighted U approach is used to model the joint effect of sequencing variants and gene expression. Using a Mexican American sample provided by Genetic Analysis Workshop 19 (GAW19), we performed a whole-genome joint association analysis of sequencing variants and gene expression with systolic (SBP) and diastolic blood pressure (DBP) and hypertension (HTN) phenotypes. RESULTS: The whole-genome joint association analysis was completed in 80 min on a high-performance personal computer with an i7 4700 CPU and 8 GB memory. Although no gene reached statistical significance after adjusting for multiple testing, some top-ranked genes attained a high significance level and may have biological plausibility to hypertension-related phenotypes. CONCLUSIONS: The weighted U approach is computationally efficient for high-dimensional data analysis, and is capable of integrating multiple levels of omic data into association analysis. Through a real data application, we demonstrate the potential benefit of using the new approach for joint association analysis of sequencing variants and gene expression.

15.
BMC Proc ; 10(Suppl 7): 171-174, 2016.
Article in English | MEDLINE | ID: mdl-27980631

ABSTRACT

BACKGROUND: Genome-wide association studies have made substantial progress in identifying common variants associated with human diseases. Despite such success, a large portion of heritability remains unexplained. Evolutionary theory and empirical studies suggest that rare mutations could play an important role in human diseases, which motivates comprehensive investigation of rare variants in sequencing studies. To explore the association of rare variants with human diseases, many statistical approaches have been developed with different ways of modeling genetic structure (ie, linkage disequilibrium). Nevertheless, the appropriate strategy to model genetic structure of sequencing data and its effect on association analysis have not been well studied. METHODS: We investigate 3 statistical approaches that use 3 different strategies to model the genetic structure of sequencing data. We proceed by comparing a burden test that assumes independence among sequencing variants, a burden test that considers pairwise linkage disequilibrium (LD), and a functional analysis of variance (FANOVA) test that models genetic data through fitting continuous curves on individuals' genotypes. RESULTS: Through simulations, we find that FANOVA attains better or comparable performance to the 2 burden tests. Overall, the burden test that considers pairwise LD has comparable performance to the burden test that assumes independence between sequencing variants. However, for 1 gene, where the disease-associated variant is located in an LD block, we find that considering pairwise LD could improve the test's performance. CONCLUSIONS: The structure of sequencing variants is complex in nature and its patterns vary across the whole genome. In certain cases (eg, a disease-susceptibility variant is in an LD block), ignoring the genetic structure in the association analysis could result in suboptimal performance. Through this study, we show that a functional-based method is promising for modeling the underlying genetic structure of sequencing data, which could lead to better performance.

16.
Sci Rep ; 5: 10298, 2015 Jul 30.
Article in English | MEDLINE | ID: mdl-26223539

ABSTRACT

Precise prediction for genetic architecture of complex traits is impeded by the limited understanding on genetic effects of complex traits, especially on gene-by-gene (GxG) and gene-by-environment (GxE) interaction. In the past decades, an explosion of high throughput technologies enables omics studies at multiple levels (such as genomics, transcriptomics, proteomics, and metabolomics). The analyses of large omics data, especially two-loci interaction analysis, are very time intensive. Integrating the diverse omics data and environmental effects in the analyses also remain challenges. We proposed mixed linear model approaches using GPU (Graphic Processing Unit) computation to simultaneously dissect various genetic effects. Analyses can be performed for estimating genetic main effects, GxG epistasis effects, and GxE environment interaction effects on large-scale omics data for complex traits, and for estimating heritability of specific genetic effects. Both mouse data analyses and Monte Carlo simulations demonstrated that genetic effects and environment interaction effects could be unbiasedly estimated with high statistical power by using the proposed approaches.


Subject(s)
Chromosome Mapping , Gene-Environment Interaction , Genomics , Metabolomics , Models, Genetic , Quantitative Trait, Heritable , Animals , Epistasis, Genetic/physiology , Mice
17.
PLoS One ; 8(4): e61943, 2013.
Article in English | MEDLINE | ID: mdl-23626757

ABSTRACT

Although genome-wide association studies (GWAS) have identified a significant number of single-nucleotide polymorphisms (SNPs) associated with many complex human traits, the susceptibility loci identified so far can explain only a small fraction of the genetic risk. Among other possible explanations, the lack of a comprehensive examination of gene-gene interaction (G×G) is often considered a source of the missing heritability. Previously, we reported a model-free Generalized Multifactor Dimensionality Reduction (GMDR) approach for detecting G×G in both dichotomous and quantitative phenotypes. However, the computational burden and less efficient implementation of the original programs make them impossible to use for GWAS. In this study, we developed a graphics processing unit (GPU)-based GMDR program (named GWAS-GPU), which is able not only to analyze GWAS data but also to run much faster than the earlier version of the GMDR program. As a demonstration of the program, we used the GMDR-GPU software to analyze a publicly available GWAS dataset on type 2 diabetes (T2D) from the Wellcome Trust Case Control Consortium. Through an exhaustive search of pair-wise interactions and a selected search of three- to five-way interactions conditioned on significant pair-wise results, we identified 24 core SNPs in six genes (FTO: rs9939973, rs9940128, rs9922047, rs1121980, rs9939609, rs9930506; TSPAN8: rs1495377; TCF7L2: rs4074720, rs7901695, rs4506565, rs4132670, rs10787472, rs11196205, rs10885409, rs11196208; L3MBTL3: rs10485400, rs4897366; CELF4: rs2852373, rs608489; RUNX1: rs445984, rs1040328, rs990074, rs2223046, rs2834970) that appear to be important for T2D. Of these core SNPs, 11 in FTO, TSPAN8, and TCF7L2 have been reported to be associated with T2D, obesity, or both, providing an independent replication of previously reported SNPs. Importantly, we identified three new susceptibility genes; i.e., L3MBTL3, CELF4, and RUNX1, for T2D, a finding that warrants further investigation with independent samples.


Subject(s)
Algorithms , Diabetes Mellitus, Type 2/genetics , Epistasis, Genetic , Polymorphism, Single Nucleotide , Software , CELF Proteins , Core Binding Factor Alpha 2 Subunit/genetics , Core Binding Factor Alpha 2 Subunit/metabolism , DNA-Binding Proteins/genetics , DNA-Binding Proteins/metabolism , Genetic Predisposition to Disease , Genome-Wide Association Study , Genotype , Humans , Models, Genetic , Multifactor Dimensionality Reduction , Phenotype , RNA-Binding Proteins/genetics , RNA-Binding Proteins/metabolism
SELECTION OF CITATIONS
SEARCH DETAIL
...