RESUMO
Exome-sequencing studies have generally been underpowered to identify deleterious alleles with a large effect on complex traits as such alleles are mostly rare. Because the population of northern and eastern Finland has expanded considerably and in isolation following a series of bottlenecks, individuals of these populations have numerous deleterious alleles at a relatively high frequency. Here, using exome sequencing of nearly 20,000 individuals from these regions, we investigate the role of rare coding variants in clinically relevant quantitative cardiometabolic traits. Exome-wide association studies for 64 quantitative traits identified 26 newly associated deleterious alleles. Of these 26 alleles, 19 are either unique to or more than 20 times more frequent in Finnish individuals than in other Europeans and show geographical clustering comparable to Mendelian disease mutations that are characteristic of the Finnish population. We estimate that sequencing studies of populations without this unique history would require hundreds of thousands to millions of participants to achieve comparable association power.
Assuntos
Sequenciamento do Exoma , Estudos de Associação Genética/métodos , Predisposição Genética para Doença/genética , Variação Genética/genética , Locos de Características Quantitativas/genética , Alelos , HDL-Colesterol/genética , Análise por Conglomerados , Determinação de Ponto Final , Finlândia , Mapeamento Geográfico , Humanos , Herança Multifatorial/genética , Reprodutibilidade dos TestesRESUMO
An Amendment to this paper has been published and can be accessed via a link at the top of the paper.
RESUMO
Based on epidemiologic and embryologic patterns, nonsyndromic orofacial clefts- the most common craniofacial birth defects in humans- are commonly categorized into cleft lip with or without cleft palate (CL/P) and cleft palate alone (CP), which are traditionally considered to be etiologically distinct. However, some evidence of shared genetic risk in IRF6, GRHL3 and ARHGAP29 regions exists; only FOXE1 has been recognized as significantly associated with both CL/P and CP in genome-wide association studies (GWAS). We used a new statistical approach, PLACO (pleiotropic analysis under composite null), on a combined multi-ethnic GWAS of 2,771 CL/P and 611 CP case-parent trios. At the genome-wide significance threshold of 5 × 10-8, PLACO identified 1 locus in 1q32.2 (IRF6) that appears to increase risk for one OFC subgroup but decrease risk for the other. At a suggestive significance threshold of 10-6, we found 5 more loci with compelling candidate genes having opposite effects on CL/P and CP: 1p36.13 (PAX7), 3q29 (DLG1), 4p13 (LIMCH1), 4q21.1 (SHROOM3) and 17q22 (NOG). Additionally, we replicated the recognized shared locus 9q22.33 (FOXE1), and identified 2 loci in 19p13.12 (RAB8A) and 20q12 (MAFB) that appear to influence risk of both CL/P and CP in the same direction. We found locus-specific effects may vary by racial/ethnic group at these regions of genetic overlap, and failed to find evidence of sex-specific differences. We confirmed shared etiology of the two OFC subtypes comprising CL/P, and additionally found suggestive evidence of differences in their pathogenesis at 2 loci of genetic overlap. Our novel findings include 6 new loci of genetic overlap between CL/P and CP; 3 new loci between pairwise OFC subtypes; and 4 loci not previously implicated in OFCs. Our in-silico validation showed PLACO is robust to subtype-specific effects, and can achieve massive power gains over existing approaches for identifying genetic overlap between disease subtypes. In summary, we found suggestive evidence for new genetic regions and confirmed some recognized OFC genes either exerting shared risk or with opposite effects on risk to OFC subtypes.
Assuntos
Fenda Labial/genética , Fissura Palatina/genética , Pleiotropia Genética , Biologia Computacional , Simulação por Computador , Etnicidade , Feminino , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Humanos , Masculino , Reprodutibilidade dos TestesRESUMO
Physical inactivity (PA) is an important risk factor for a wide range of diseases. Previous genome-wide association studies (GWAS), based on self-reported data or a small number of phenotypes derived from accelerometry, have identified a limited number of genetic loci associated with habitual PA and provided evidence for involvement of central nervous system in mediating genetic effects. In this study, we derived 27 PA phenotypes from wrist accelerometry data obtained from 88,411 UK Biobank study participants. Single-variant association analysis based on mixed-effects models and transcriptome-wide association studies (TWAS) together identified 5 novel loci that were not detected by previous studies of PA, sleep duration and self-reported chronotype. For both novel and previously known loci, we discovered associations with novel phenotypes including active-to-sedentary transition probability, light-intensity PA, activity during different times of the day and proxy phenotypes to sleep and circadian patterns. Follow-up studies including TWAS, colocalization, tissue-specific heritability enrichment, gene-set enrichment and genetic correlation analyses indicated the role of the blood and immune system in modulating the genetic effects and a secondary role of the digestive and endocrine systems. Our findings provided important insights into the genetic architecture of PA and its underlying mechanisms.
Assuntos
Estudo de Associação Genômica Ampla , Modelos Genéticos , Acelerometria , Exercício Físico/fisiologia , Loci Gênicos , Predisposição Genética para Doença , HumanosRESUMO
Genetic association studies of child health outcomes often employ family-based study designs. One of the most popular family-based designs is the case-parent trio design that considers the smallest possible nuclear family consisting of two parents and their affected child. This trio design is particularly advantageous for studying relatively rare disorders because it is less prone to type 1 error inflation due to population stratification compared to population-based study designs (e.g., case-control studies). However, obtaining genetic data from both parents is difficult, from a practical perspective, and many large studies predominantly measure genetic variants in mother-child dyads. While some statistical methods for analyzing parent-child dyad data (most commonly involving mother-child pairs) exist, it is not clear if they provide the same advantage as trio methods in protecting against population stratification, or if a specific dyad design (e.g., case-mother dyads vs. case-mother/control-mother dyads) is more advantageous. In this article, we review existing statistical methods for analyzing genome-wide marker data on dyads and perform extensive simulation experiments to benchmark their type I errors and statistical power under different scenarios. We extend our evaluation to existing methods for analyzing a combination of case-parent trios and dyads together. We apply these methods on genotyped and imputed data from multiethnic mother-child pairs only, case-parent trios only or combinations of both dyads and trios from the Gene, Environment Association Studies consortium (GENEVA), where each family was ascertained through a child affected by nonsyndromic cleft lip with or without cleft palate. Results from the GENEVA study corroborate the findings from our simulation experiments. Finally, we provide recommendations for using statistical genetic association methods for dyads.
Assuntos
Fenda Labial , Fissura Palatina , Benchmarking , Fenda Labial/genética , Fissura Palatina/genética , Feminino , Estudos de Associação Genética , Humanos , Modelos Genéticos , Mães , Relações Pais-Filho , Polimorfismo de Nucleotídeo ÚnicoRESUMO
There is increasing evidence that pleiotropy, the association of multiple traits with the same genetic variants/loci, is a very common phenomenon. Cross-phenotype association tests are often used to jointly analyze multiple traits from a genome-wide association study (GWAS). The underlying methods, however, are often designed to test the global null hypothesis that there is no association of a genetic variant with any of the traits, the rejection of which does not implicate pleiotropy. In this article, we propose a new statistical approach, PLACO, for specifically detecting pleiotropic loci between two traits by considering an underlying composite null hypothesis that a variant is associated with none or only one of the traits. We propose testing the null hypothesis based on the product of the Z-statistics of the genetic variants across two studies and derive a null distribution of the test statistic in the form of a mixture distribution that allows for fractions of variants to be associated with none or only one of the traits. We borrow approaches from the statistical literature on mediation analysis that allow asymptotic approximation of the null distribution avoiding estimation of nuisance parameters related to mixture proportions and variance components. Simulation studies demonstrate that the proposed method can maintain type I error and can achieve major power gain over alternative simpler methods that are typically used for testing pleiotropy. PLACO allows correlation in summary statistics between studies that may arise due to sharing of controls between disease traits. Application of PLACO to publicly available summary data from two large case-control GWAS of Type 2 Diabetes and of Prostate Cancer implicated a number of novel shared genetic regions: 3q23 (ZBTB38), 6q25.3 (RGS17), 9p22.1 (HAUS6), 9p13.3 (UBAP2), 11p11.2 (RAPSN), 14q12 (AKAP6), 15q15 (KNL1) and 18q23 (ZNF236).
Assuntos
Diabetes Mellitus Tipo 2/genética , Pleiotropia Genética , Estudo de Associação Genômica Ampla/métodos , Neoplasias da Próstata/genética , Locos de Características Quantitativas , Humanos , Masculino , Modelos GenéticosRESUMO
BACKGROUND: Cohort collaborations often require meta-analysis of exposure-outcome association estimates across cohorts as an alternative to pooling individual-level data that requires a laborious process of data harmonization on individual-level data. However, it is likely that important confounders are not all measured uniformly across the cohorts due to differences in study protocols. This imbalance in measurement of confounders leads to association estimates that are not comparable across cohorts and impedes the meta-analysis of results. METHODS: In this article, we empirically show some asymptotic relations between fully adjusted and unadjusted exposure-outcome effect estimates, and provide theoretical justification for the same. We leverage these results to obtain fully adjusted estimates for the cohorts with no information on confounders by borrowing information from cohorts with complete measurement on confounders. We implement this novel method in CIMBAL (confounder imbalance), which additionally provides a meta-analyzed estimate that appropriately accounts for the dependence between estimates arising due to borrowing of information across cohorts. We perform extensive simulation experiments to study CIMBAL's statistical properties. We illustrate CIMBAL using National Children's Study (NCS) data to estimate association of maternal education and low birth weight in infants, adjusting for maternal age at delivery, race/ethnicity, marital status, and income. RESULTS: Our simulation studies indicate that estimates of exposure-outcome association from CIMBAL are closer to the truth than those from commonly-used approaches for meta-analyzing cohorts with disparate confounder measurements. CIMBAL is not too sensitive to heterogeneity in underlying joint distributions of exposure, outcome and confounders but is very sensitive to heterogeneity of confounding bias across cohorts. Application of CIMBAL to NCS data for a proof-of-concept analysis further illustrates the utility and advantages of CIMBAL. CONCLUSIONS: CIMBAL provides a practical approach for meta-analyzing cohorts with imbalance in measurement of confounders under a weak assumption that the cohorts are independently sampled from populations with the same confounding bias.
Assuntos
Projetos de Pesquisa , Viés , Criança , Estudos de Coortes , Simulação por Computador , Humanos , LactenteRESUMO
Novel or rare damaging mutations have been implicated in the developmental pathogenesis of nonsyndromic cleft lip with or without cleft palate (nsCL ± P). Thus, we investigated the human genome for high-impact mutations that could explain the risk of nsCL ± P in our cohorts.We conducted next-generation sequencing (NGS) analysis of 130 nsCL ± P case-parent African trios to identify pathogenic variants that contribute to the risk of clefting. We replicated this analysis using whole-exome sequence data from a Brazilian nsCL ± P cohort. Computational analyses were then used to predict the mechanism by which these variants could result in increased risks for nsCL ± P.We discovered damaging mutations within the AFDN gene, a cell adhesion molecule (CAMs) that was previously shown to contribute to cleft palate in mice. These mutations include p.Met1164Ile, p.Thr453Asn, p.Pro1638Ala, p.Arg669Gln, p.Ala1717Val, and p.Arg1596His. We also discovered a novel splicing p.Leu1588Leu mutation in this protein. Computational analysis suggests that these amino acid changes affect the interactions with other cleft-associated genes including nectins (PVRL1, PVRL2, PVRL3, and PVRL4) CDH1, CTNNA1, and CTNND1.This is the first report on the contribution of AFDN to the risk for nsCL ± P in humans. AFDN encodes AFADIN, an important CAM that forms calcium-independent complexes with nectins 1 and 4 (encoded by the genes PVRL1 and PVRL4). This discovery shows the power of NGS analysis of multiethnic cleft samples in combination with a computational approach in the understanding of the pathogenesis of nsCL ± P.
RESUMO
BACKGROUND: Many popular disease transmission models have helped nations respond to the COVID-19 pandemic by informing decisions about pandemic planning, resource allocation, implementation of social distancing measures, lockdowns, and other non-pharmaceutical interventions. We study how five epidemiological models forecast and assess the course of the pandemic in India: a baseline curve-fitting model, an extended SIR (eSIR) model, two extended SEIR (SAPHIRE and SEIR-fansy) models, and a semi-mechanistic Bayesian hierarchical model (ICM). METHODS: Using COVID-19 case-recovery-death count data reported in India from March 15 to October 15 to train the models, we generate predictions from each of the five models from October 16 to December 31. To compare prediction accuracy with respect to reported cumulative and active case counts and reported cumulative death counts, we compute the symmetric mean absolute prediction error (SMAPE) for each of the five models. For reported cumulative cases and deaths, we compute Pearson's and Lin's correlation coefficients to investigate how well the projected and observed reported counts agree. We also present underreporting factors when available, and comment on uncertainty of projections from each model. RESULTS: For active case counts, SMAPE values are 35.14% (SEIR-fansy) and 37.96% (eSIR). For cumulative case counts, SMAPE values are 6.89% (baseline), 6.59% (eSIR), 2.25% (SAPHIRE) and 2.29% (SEIR-fansy). For cumulative death counts, the SMAPE values are 4.74% (SEIR-fansy), 8.94% (eSIR) and 0.77% (ICM). Three models (SAPHIRE, SEIR-fansy and ICM) return total (sum of reported and unreported) cumulative case counts as well. We compute underreporting factors as of October 31 and note that for cumulative cases, the SEIR-fansy model yields an underreporting factor of 7.25 and ICM model yields 4.54 for the same quantity. For total (sum of reported and unreported) cumulative deaths the SEIR-fansy model reports an underreporting factor of 2.97. On October 31, we observe 8.18 million cumulative reported cases, while the projections (in millions) from the baseline model are 8.71 (95% credible interval: 8.63-8.80), while eSIR yields 8.35 (7.19-9.60), SAPHIRE returns 8.17 (7.90-8.52) and SEIR-fansy projects 8.51 (8.18-8.85) million cases. Cumulative case projections from the eSIR model have the highest uncertainty in terms of width of 95% credible intervals, followed by those from SAPHIRE, the baseline model and finally SEIR-fansy. CONCLUSIONS: In this comparative paper, we describe five different models used to study the transmission dynamics of the SARS-Cov-2 virus in India. While simulation studies are the only gold standard way to compare the accuracy of the models, here we were uniquely poised to compare the projected case-counts against observed data on a test period. The largest variability across models is observed in predicting the "total" number of infections including reported and unreported cases (on which we have no validation data). The degree of under-reporting has been a major concern in India and is characterized in this report. Overall, the SEIR-fansy model appeared to be a good choice with publicly available R-package and desired flexibility plus accuracy.
Assuntos
COVID-19/epidemiologia , COVID-19/transmissão , Pandemias , Teorema de Bayes , Controle de Doenças Transmissíveis/métodos , Simulação por Computador , Previsões , Humanos , Índia/epidemiologia , Modelos EstatísticosRESUMO
Genome-wide association studies (GWAS) for complex diseases have focused primarily on single-trait analyses for disease status and disease-related quantitative traits. For example, GWAS on risk factors for coronary artery disease analyze genetic associations of plasma lipids such as total cholesterol, LDL-cholesterol, HDL-cholesterol, and triglycerides (TGs) separately. However, traits are often correlated and a joint analysis may yield increased statistical power for association over multiple univariate analyses. Recently several multivariate methods have been proposed that require individual-level data. Here, we develop metaUSAT (where USAT is unified score-based association test), a novel unified association test of a single genetic variant with multiple traits that uses only summary statistics from existing GWAS. Although the existing methods either perform well when most correlated traits are affected by the genetic variant in the same direction or are powerful when only a few of the correlated traits are associated, metaUSAT is designed to be robust to the association structure of correlated traits. metaUSAT does not require individual-level data and can test genetic associations of categorical and/or continuous traits. One can also use metaUSAT to analyze a single trait over multiple studies, appropriately accounting for overlapping samples, if any. metaUSAT provides an approximate asymptotic P-value for association and is computationally efficient for implementation at a genome-wide level. Simulation experiments show that metaUSAT maintains proper type-I error at low error levels. It has similar and sometimes greater power to detect association across a wide array of scenarios compared to existing methods, which are usually powerful for some specific association scenarios only. When applied to plasma lipids summary data from the METSIM and the T2D-GENES studies, metaUSAT detected genome-wide significant loci beyond the ones identified by univariate analyses. Evidence from larger studies suggest that the variants additionally detected by our test are, indeed, associated with lipid levels in humans. In summary, metaUSAT can provide novel insights into the genetic architecture of a common disease or traits.
Assuntos
Estudo de Associação Genômica Ampla , Metanálise como Assunto , Idoso , HDL-Colesterol/genética , LDL-Colesterol/genética , Doença da Artéria Coronariana/genética , Humanos , Masculino , Pessoa de Meia-Idade , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Triglicerídeos/genéticaRESUMO
The field of genetic epidemiology is relatively young and brings together genetics, epidemiology, and biostatistics to identify and implement the best study designs and statistical analyses for identifying genes controlling risk for complex and heterogeneous diseases (i.e., those where genes and environmental risk factors both contribute to etiology). The field has moved quickly over the past 40 years partly because the technology of genotyping and sequencing has forced it to adapt while adhering to the fundamental principles of genetics. In the last two decades, the available tools for genetic epidemiology have expanded from a genetic focus (considering 1 gene at a time) to a genomic focus (considering the entire genome), and now they must further expand to integrate information from other "-omics" (e.g., epigenomics, transcriptomics as measured by RNA expression) at both the individual and the population levels. Additionally, we can now also evaluate gene and environment interactions across populations to better understand exposure and the heterogeneity in disease risk. The future challenges facing genetic epidemiology are considerable both in scale and techniques, but the importance of the field will not diminish because by design it ties scientific goals with public health applications.
Assuntos
Epidemiologia Molecular/tendências , Genômica/tendênciasRESUMO
In the past decade, many genome-wide association studies (GWASs) have been conducted to explore association of single nucleotide polymorphisms (SNPs) with complex diseases using a case-control design. These GWASs not only collect information on the disease status (primary phenotype, D) and the SNPs (genotypes, X), but also collect extensive data on several risk factors and traits. Recent literature and grant proposals point toward a trend in reusing existing large case-control data for exploring genetic associations of some additional traits (secondary phenotypes, Y) collected during the study. These secondary phenotypes may be correlated, and a proper analysis warrants a multivariate approach. Commonly used multivariate methods are not equipped to properly account for the non-random sampling scheme. Current ad hoc practices include analyses without any adjustment, and analyses with D adjusted as a covariate. Our theoretical and empirical studies suggest that the type I error for testing genetic association of secondary traits can be substantial when X as well as Y are associated with D, even when there is no association between X and Y in the underlying (target) population. Whether using D as a covariate helps maintain type I error depends heavily on the disease mechanism and the underlying causal structure (which is often unknown). To avoid grossly incorrect inference, we have proposed proportional odds model adjusted for propensity score (POM-PS). It uses a proportional odds logistic regression of X on Y and adjusts estimated conditional probability of being diseased as a covariate. We demonstrate the validity and advantage of POM-PS, and compare to some existing methods in extensive simulation experiments mimicking plausible scenarios of dependency among Y, X, and D. Finally, we use POM-PS to jointly analyze four adiposity traits using a type 2 diabetes (T2D) case-control sample from the population-based Metabolic Syndrome in Men (METSIM) study. Only POM-PS analysis of the T2D case-control sample seems to provide valid association signals.
Assuntos
Diabetes Mellitus Tipo 2/fisiopatologia , Marcadores Genéticos/genética , Estudo de Associação Genômica Ampla/métodos , Modelos Genéticos , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Característica Quantitativa Herdável , Adiposidade/genética , Idoso , Estudos de Casos e Controles , Simulação por Computador , Genótipo , Humanos , Estudos Longitudinais , Masculino , Pessoa de Meia-IdadeRESUMO
Genome-wide association studies (GWASs) for complex diseases often collect data on multiple correlated endo-phenotypes. Multivariate analysis of these correlated phenotypes can improve the power to detect genetic variants. Multivariate analysis of variance (MANOVA) can perform such association analysis at a GWAS level, but the behavior of MANOVA under different trait models has not been carefully investigated. In this paper, we show that MANOVA is generally very powerful for detecting association but there are situations, such as when a genetic variant is associated with all the traits, where MANOVA may not have any detection power. In these situations, marginal model based methods, however, perform much better than multivariate methods. We investigate the behavior of MANOVA, both theoretically and using simulations, and derive the conditions where MANOVA loses power. Based on our findings, we propose a unified score-based test statistic USAT that can perform better than MANOVA in such situations and nearly as well as MANOVA elsewhere. Our proposed test reports an approximate asymptotic P-value for association and is computationally very efficient to implement at a GWAS level. We have studied through extensive simulations the performance of USAT, MANOVA, and other existing approaches and demonstrated the advantage of using the USAT approach to detect association between a genetic variant and multivariate phenotypes. We applied USAT to data from three correlated traits collected on 5, 816 Caucasian individuals from the Atherosclerosis Risk in Communities (ARIC, The ARIC Investigators []) Study and detected some interesting associations.
Assuntos
Aterosclerose/genética , Estudo de Associação Genômica Ampla , Análise Multivariada , Simulação por Computador , Genótipo , Humanos , Modelos Genéticos , Fenótipo , Polimorfismo de Nucleotídeo ÚnicoRESUMO
Studies of complex human diseases and traits associated with candidate genes are potentially vulnerable to bias (confounding) due to population stratification and inbreeding, especially in admixed population. In GWAS, the principal components (PCs) method provides a global ancestry value per subject, allowing corrections for population stratification. However, these coefficients are typically estimated assuming unrelated individuals, and if family structure is present and ignored, such substructures may induce artifactual PCs. Extensions of the PCs method have been proposed by Konishi and Rao [Biometrika 1992;79:631-641], taking into account only siblings' relatedness, and by Oualkacha et al. [Stat Appl Genet Mol Biol 2012, DOI: 10.2202/1544-6115.1711], taking into account large pedigrees and high-dimensional phenotype data. In this work, we extend these methods to estimate the global individual ancestry coefficients from PCs derived from different variance component matrix estimators using SNPs from two simulated data sets and two real data sets: the GENOA sibship data consisting of European and African-American subjects and the Baependi Heart Study consisting of 80 extended Brazilian families, both with genotyping data from the Affymetrix 6.0 chip. Our results show that the family structure plays an important role in the estimation of the global individual ancestry value for extended pedigrees but not for sibships.
Assuntos
Família , Predisposição Genética para Doença , Genética Médica/métodos , Modelos Genéticos , Polimorfismo de Nucleotídeo Único , Feminino , Humanos , MasculinoRESUMO
BACKGROUND: Genome-wide association studies (GWASs) have identified hundreds of genetic variants associated with complex diseases, but these variants appear to explain very little of the disease heritability. The typical single-locus association analysis in a GWAS fails to detect variants with small effect sizes and to capture higher-order interaction among these variants. Multilocus association analysis provides a powerful alternative by jointly modeling the variants within a gene or a pathway and by reducing the burden of multiple hypothesis testing in a GWAS. METHODS: Here, we propose a powerful and flexible dimension reduction approach to model multilocus association. We use a Bayesian partitioning model which clusters SNPs according to their direction of association, models higher-order interactions using a flexible scoring scheme and uses posterior marginal probabilities to detect association between the SNP set and the disease. RESULTS: We illustrate our method using extensive simulation studies and applying it to detect multilocus interaction in Atherosclerosis Risk in Communities (ARIC) GWAS with type 2 diabetes. CONCLUSION: We demonstrate that our approach has better power to detect multilocus interactions than several existing approaches. When applied to the ARIC study dataset with 9,328 individuals to study gene-based associations for type 2 diabetes, our method identified some novel variants not detected by conventional single-locus association analyses.
Assuntos
Teorema de Bayes , Estudos de Casos e Controles , Modelos Genéticos , Aterosclerose/genética , Simulação por Computador , Diabetes Mellitus Tipo 2/genética , Estudo de Associação Genômica Ampla , Humanos , Polimorfismo de Nucleotídeo ÚnicoRESUMO
OBJECTIVES: A gene-based genome-wide association study (GWAS) provides a powerful alternative to the traditional single single nucleotide polymorphism (SNP) association analysis due to its substantial reduction in the multiple testing burden and possible gain in power due to modeling multiple SNPs within a gene. A gene-based association analysis on multivariate traits is often of interest, but it imposes substantial analytical as well as computational challenges to implement it at a genome-wide level. METHODS: We propose a rapid implementation of the multivariate multiple linear regression (RMMLR) approach in unrelated individuals as well as in families. Our approach allows for covariates. Moreover, the asymptotic distribution of the test statistic is not heavily influenced by the linkage disequilibrium (LD) among the SNPs and hence can be used efficiently to perform a gene-based GWAS. We have developed a corresponding R package to implement such multivariate gene-based GWAS with this RMMLR approach. RESULTS: Through extensive simulation, we compared several approaches for both single and multivariate traits. Our RMMLR approach maintained a correct type I error level even for sets of SNPs in strong LD. It also demonstrated a substantial gain in power to detect a gene when it is associated with a subset of the traits. We also studied performances of the approaches on the Minnesota Center for Twin Family Research dataset. CONCLUSIONS: In our overall comparison, our RMMLR approach provides an efficient and powerful tool to perform a gene-based GWAS with single or multivariate traits and maintains the type I error appropriately.
Assuntos
Genes/genética , Estudo de Associação Genômica Ampla/métodos , Modelos Genéticos , Herança Multifatorial/genética , Simulação por Computador , Genótipo , Humanos , Modelos Lineares , Análise Multivariada , Polimorfismo de Nucleotídeo Único/genéticaRESUMO
Genome-wide association studies (GWAS) have found widespread evidence of pleiotropy, but characterization of global patterns of pleiotropy remain highly incomplete due to insufficient power of current approaches. We develop fastASSET, a method that allows efficient detection of variant-level pleiotropic association across many traits. We analyze GWAS summary statistics of 116 complex traits of diverse types collected from the GRASP repository and large GWAS Consortia. We identify 2293 independent loci and find that the lead variants in nearly all these loci (~99%) to be associated with ≥ 2 traits (median = 6). We observe that degree of pleiotropy estimated from our study predicts that observed in the UK Biobank for a much larger number of traits (K = 4114) (correlation = 0.43, p-value < 2.2 × 10 - 16 ). Follow-up analyzes of 21 trait-specific variants indicate their link to the expression in trait-related tissues for a small number of genes involved in relevant biological processes. Our findings provide deeper insight into the nature of pleiotropy and leads to identification of highly trait-specific susceptibility variants.
Assuntos
Pleiotropia Genética , Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , Estudo de Associação Genômica Ampla/métodos , Humanos , Fenótipo , Herança Multifatorial/genética , Variação GenéticaRESUMO
Genetic studies of nontraditional glycemic biomarkers, glycated albumin and fructosamine, can shed light on unknown aspects of type 2 diabetes genetics and biology. We performed a multiphenotype genome-wide association study of glycated albumin and fructosamine from 7,395 White and 2,016 Black participants in the Atherosclerosis Risk in Communities (ARIC) study on common variants from genotyped/imputed data. We discovered two genome-wide significant loci, one mapping to a known type 2 diabetes gene (ARAP1/STARD10) and another mapping to a novel region (UGT1A complex of genes), using multiomics gene-mapping strategies in diabetes-relevant tissues. We identified additional loci that were ancestry- and sex-specific (e.g., PRKCA in African ancestry, FCGRT in European ancestry, TEX29 in males). Further, we implemented multiphenotype gene-burden tests on whole-exome sequence data from 6,590 White and 2,309 Black ARIC participants. Ten variant sets annotated to genes across different variant aggregation strategies were exome-wide significant only in multiancestry analysis, of which CD1D, EGFL7/AGPAT2, and MIR126 had notable enrichment of rare predicted loss of function variants in African ancestry despite smaller sample sizes. Overall, 8 of 14 discovered loci and genes were implicated to influence these biomarkers via glycemic pathways, and most of them were not previously implicated in studies of type 2 diabetes. This study illustrates improved locus discovery and potential effector gene discovery by leveraging joint patterns of related biomarkers across the entire allele frequency spectrum in multiancestry analysis. Future investigation of the loci and genes potentially acting through glycemic pathways may help us better understand the risk of developing type 2 diabetes.
Assuntos
Biomarcadores , Diabetes Mellitus Tipo 2 , Estudo de Associação Genômica Ampla , Humanos , Diabetes Mellitus Tipo 2/genética , Masculino , Feminino , Biomarcadores/sangue , Frutosamina/sangue , População Branca/genética , Albumina Sérica Glicada , Polimorfismo de Nucleotídeo Único , Pessoa de Meia-Idade , Variação Genética/genética , Análise Multivariada , Albumina Sérica/genética , Albumina Sérica/metabolismoRESUMO
Family-based studies provide a unique opportunity to characterize genetic risks of diseases in the presence of population structure, assortative mating, and indirect genetic effects. We propose a novel framework, PGS-TRI, for the analysis of polygenic scores (PGS) in case-parent trio studies for estimation of the risk of an index condition associated with direct effects of inherited PGS, indirect effects of parental PGS, and gene-environment interactions. Extensive simulation studies demonstrate the robustness of PGS-TRI in the presence of complex population structure and assortative mating compared to alternative methods. We apply PGS-TRI to multi-ancestry trio studies of autism spectrum disorders (Ntrio = 1,517) and orofacial clefts (Ntrio = 1,904) to establish the first transmission-based estimates of risk associated with pre-defined PGS for these conditions and other related traits. For both conditions, we further explored offspring risk associated with polygenic gene-environment interactions, and direct and indirect effects of genetically predicted levels of gene expression and metabolite traits.
RESUMO
BACKGROUND: Thyroid differentiation score (TDS), calculated based on mRNA expression levels of 16 genes controlling thyroid metabolism and function, has been proposed as a measure to quantify differentiation in PTC. The objective of this study is to determine whether TDS is associated with survival outcomes across patient cohorts. METHODS: Two independent cohorts of PTC patients were used: 1) the Cancer Genome Atlas (TCGA) thyroid cancer study (N=372), 2) MD Anderson Cancer Center (MDACC) cohort (N=111). The primary survival outcome of interest was progression-free interval (PFI). Association with overall survival (OS) was also explored. The Kaplan-Meier method and Cox proportional hazards models were used for survival analyses. RESULTS: In both cohorts, TDS was associated with tumor and nodal stage at diagnosis as well as tumor driver mutation status. High TDS was associated with longer PFI on univariable analyses across cohorts. After adjusting for overall stage, TDS remained significantly associated with PFI in the MDACC cohort only (aHR 0.67, 95%CI 0.52-0.85). In subgroup analyses stratified by tumor driver mutation status, higher TDS was most consistently associated with longer PFI in BRAFV600E-mutated tumors across cohorts after adjusting for overall stage (TCGA: aHR 0.60, 95% CI: 0.33-1.07; MDACC: aHR 0.59, 95% CI: 0.42-0.82). For OS, increasing TDS was associated with longer OS in the overall MDACC cohort (aHR=0.78, 95% CI:0.63-0.96), where the median duration of follow-up was 12.9 years. CONCLUSION: TDS quantifies the spectrum of differentiation status in PTC and may serve as a potential prognostic biomarker in PTC, mostly promisingly in BRAFV600E-mutated tumors.