RESUMO
MOTIVATION: Rare variant-based analyses are beginning to identify risk genes for neuropsychiatric disorders and other diseases. However, the identified genes only account for a fraction of predicted causal genes. Recent studies have shown that rare damaging variants are significantly enriched in specific gene-sets. Methods which are able to jointly model rare variants and gene-sets to identify enriched gene-sets and use these enriched gene-sets to prioritize additional risk genes could improve understanding of the genetic architecture of diseases. RESULTS: We propose DECO (Integrated analysis of de novo mutations, rare case/control variants and omics information via gene-sets), an integrated method for rare-variant and gene-set analysis. The method can (i) test the enrichment of gene-sets directly within the statistical model, and (ii) use enriched gene-sets to rank existing genes and prioritize additional risk genes for tested disorders. In simulations, DECO performs better than a homologous method that uses only variant data. To demonstrate the application of the proposed protocol, we have applied this approach to rare-variant datasets of schizophrenia. Compared with a method which only uses variant information, DECO is able to prioritize additional risk genes. AVAILABILITY: DECO can be used to analyze rare-variants and biological pathways or cell types for any disease. The package is available on Github https://github.com/hoangtn/DECO.
Assuntos
Predisposição Genética para Doença/genética , Mutação , Transtornos do Neurodesenvolvimento/genética , Esquizofrenia/genética , Biologia de Sistemas/métodos , Estudos de Casos e Controles , Simulação por Computador , Análise Mutacional de DNA/métodos , Humanos , Modelos Estatísticos , Mapeamento de Interação de Proteínas/métodos , Mapas de Interação de Proteínas/genéticaRESUMO
BACKGROUND: Psychotic disorders and schizotypal traits aggregate in the relatives of probands with schizophrenia. It is currently unclear how variability in symptom dimensions in schizophrenia probands and their relatives is associated with polygenic liability to psychiatric disorders. AIMS: To investigate whether polygenic risk scores (PRSs) can predict symptom dimensions in members of multiplex families with schizophrenia. METHOD: The largest genome-wide data-sets for schizophrenia, bipolar disorder and major depressive disorder were used to construct PRSs in 861 participants from the Irish Study of High-Density Multiplex Schizophrenia Families. Symptom dimensions were derived using the Operational Criteria Checklist for Psychotic Disorders in participants with a history of a psychotic episode, and the Structured Interview for Schizotypy in participants without a history of a psychotic episode. Mixed-effects linear regression models were used to assess the relationship between PRS and symptom dimensions across the psychosis spectrum. RESULTS: Schizophrenia PRS is significantly associated with the negative/disorganised symptom dimension in participants with a history of a psychotic episode (P = 2.31 × 10-4) and negative dimension in participants without a history of a psychotic episode (P = 1.42 × 10-3). Bipolar disorder PRS is significantly associated with the manic symptom dimension in participants with a history of a psychotic episode (P = 3.70 × 10-4). No association with major depressive disorder PRS was observed. CONCLUSIONS: Polygenic liability to schizophrenia is associated with higher negative/disorganised symptoms in participants with a history of a psychotic episode and negative symptoms in participants without a history of a psychotic episode in multiplex families with schizophrenia. These results provide genetic evidence in support of the spectrum model of schizophrenia, and support the view that negative and disorganised symptoms may have greater genetic basis than positive symptoms, making them better indices of familial liability to schizophrenia.
Assuntos
Transtorno Depressivo Maior , Transtornos Psicóticos , Esquizofrenia , Transtorno da Personalidade Esquizotípica , Humanos , Esquizofrenia/diagnóstico , Esquizofrenia/genética , Transtorno da Personalidade Esquizotípica/diagnóstico , Transtorno da Personalidade Esquizotípica/genética , Transtorno da Personalidade Esquizotípica/psicologia , Transtorno Depressivo Maior/diagnóstico , Transtorno Depressivo Maior/genética , Transtornos Psicóticos/genética , Transtornos Psicóticos/psicologia , Fatores de RiscoRESUMO
Common genetic variants identified in genome-wide association studies (GWAS) show varying degrees of genetic pleiotropy across complex human disorders. Clinical studies of schizophrenia (SCZ) suggest that in addition to neuropsychiatric symptoms, patients with SCZ also show variable immune dysregulation. Epidemiological studies of multiple sclerosis (MS), an autoimmune, neurodegenerative disorder of the central nervous system, suggest that in addition to the manifestation of neuroinflammatory complications, patients with MS may also show co-occurring neuropsychiatric symptoms with disease progression. In this study, we analyzed the largest available GWAS datasets for SCZ (N = 161,405) and MS (N = 41,505) using Gaussian causal mixture modeling (MiXeR) and conditional/conjunctional false discovery rate (condFDR) frameworks to explore and quantify the shared genetic architecture of these two complex disorders at common variant level. Despite detecting only a negligible genetic correlation (rG = 0.057), we observe polygenic overlap between SCZ and MS, and a substantial genetic enrichment in SCZ conditional on associations with MS, and vice versa. By leveraging this cross-disorder enrichment, we identified 36 loci jointly associated with SCZ and MS at conjunctional FDR < 0.05 with mixed direction of effects. Follow-up functional analysis of the shared loci implicates candidate genes and biological processes involved in immune response and B-cell receptor signaling pathways. In conclusion, this study demonstrates the presence of polygenic overlap between SCZ and MS in the absence of a genetic correlation and provides new insights into the shared genetic architecture of these two disorders at the common variant level.
RESUMO
To provide insights into the biology of opioid dependence (OD) and opioid use (i.e., exposure, OE), we completed a genome-wide analysis comparing 4503 OD cases, 4173 opioid-exposed controls, and 32,500 opioid-unexposed controls, including participants of European and African descent (EUR and AFR, respectively). Among the variants identified, rs9291211 was associated with OE (exposed vs. unexposed controls; EUR z = -5.39, p = 7.2 × 10-8). This variant regulates the transcriptomic profiles of SLC30A9 and BEND4 in multiple brain tissues and was previously associated with depression, alcohol consumption, and neuroticism. A phenome-wide scan of rs9291211 in the UK Biobank (N > 360,000) found association of this variant with propensity to use dietary supplements (p = 1.68 × 10-8). With respect to the same OE phenotype in the gene-based analysis, we identified SDCCAG8 (EUR + AFR z = 4.69, p = 10-6), which was previously associated with educational attainment, risk-taking behaviors, and schizophrenia. In addition, rs201123820 showed a genome-wide significant difference between OD cases and unexposed controls (AFR z = 5.55, p = 2.9 × 10-8) and a significant association with musculoskeletal disorders in the UK Biobank (p = 4.88 × 10-7). A polygenic risk score (PRS) based on a GWAS of risk-tolerance (n = 466,571) was positively associated with OD (OD vs. unexposed controls, p = 8.1 × 10-5; OD cases vs. exposed controls, p = 0.054) and OE (exposed vs. unexposed controls, p = 3.6 × 10-5). A PRS based on a GWAS of neuroticism (n = 390,278) was positively associated with OD (OD vs. unexposed controls, p = 3.2 × 10-5; OD vs. exposed controls, p = 0.002) but not with OE (p = 0.67). Our analyses highlight the difference between dependence and exposure and the importance of considering the definition of controls in studies of addiction.
Assuntos
Analgésicos Opioides/administração & dosagem , Comportamento Aditivo/genética , Predisposição Genética para Doença/genética , Variação Genética/genética , Estudo de Associação Genômica Ampla , Genômica , Transtornos Relacionados ao Uso de Opioides/genética , Analgésicos Opioides/farmacologia , Feminino , Genoma Humano/genética , Humanos , Masculino , Herança Multifatorial/genéticaRESUMO
Genotype imputation across populations of mixed ancestry is critical for optimal discovery in large-scale genome-wide association studies (GWAS). Methods for direct imputation of GWAS summary-statistics were previously shown to be practically as accurate as summary statistics produced after raw genotype imputation, while incurring orders of magnitude lower computational burden. Given that direct imputation needs a precise estimation of linkage-disequilibrium (LD) and that most of the methods using a small reference panel for example, ~2,500-subject coming from the 1000 Genome-Project, there is a great need for much larger and more diverse reference panels. To accurately estimate the LD needed for an exhaustive analysis of any cosmopolitan cohort, we developed DISTMIX2. DISTMIX2: (a) uses a much larger and more diverse reference panel compared to traditional reference panels, and (b) can estimate weights of ethnic-mixture based solely on Z-scores, when allele frequencies are not available. We applied DISTMIX2 to GWAS summary-statistics from the psychiatric genetic consortium (PGC). DISTMIX2 uncovered signals in numerous new regions, with most of these findings coming from the rarer variants. Rarer variants provide much sharper location for the signals compared with common variants, as the LD for rare variants extends over a lower distance than for common ones. For example, while the original PGC post-traumatic stress disorder GWAS found only 3 marginal signals for common variants, we now uncover a very strong signal for a rare variant in PKN2, a gene associated with neuronal and hippocampal development. Thus, DISTMIX2 provides a robust and fast (re)imputation approach for most psychiatric GWAS-studies.
Assuntos
Estudo de Associação Genômica Ampla/normas , Transtornos Mentais/diagnóstico , Transtornos Mentais/genética , Polimorfismo de Nucleotídeo Único , Estudos de Coortes , Frequência do Gene , Humanos , Desequilíbrio de Ligação , Fenótipo , Padrões de Referência , SoftwareRESUMO
The transcription factor 4 (TCF4) locus is a robust association finding with schizophrenia (SCZ), but little is known about the genes regulated by the encoded transcription factor. Therefore, we conducted chromatin immunoprecipitation sequencing (ChIP-seq) of TCF4 in neural-derived (SH-SY5Y) cells to identify genome-wide TCF4 binding sites, followed by data integration with SCZ association findings. We identified 11 322 TCF4 binding sites overlapping in two ChIP-seq experiments. These sites are significantly enriched for the TCF4 Ebox binding motif (>85% having ≥1 Ebox) and implicate a gene set enriched for genes downregulated in TCF4 small-interfering RNA (siRNA) knockdown experiments, indicating the validity of our findings. The TCF4 gene set was also enriched among (1) gene ontology categories such as axon/neuronal development, (2) genes preferentially expressed in brain, in particular pyramidal neurons of the somatosensory cortex and (3) genes downregulated in postmortem brain tissue from SCZ patients (odds ratio, OR = 2.8, permutation P < 4x10-5). Considering genomic alignments, TCF4 binding sites significantly overlapped those for neural DNA-binding proteins such as FOXP2 and the SCZ-associated EP300. TCF4 binding sites were modestly enriched among SCZ risk loci from the Psychiatric Genomic Consortium (OR = 1.56, P = 0.03). In total, 130 TCF4 binding sites occurred in 39 of the 108 regions published in 2014. Thirteen genes within the 108 loci had both a TCF4 binding site ±10kb and were differentially expressed in siRNA knockdown experiments of TCF4, suggesting direct TCF4 regulation. These findings confirm TCF4 as an important regulator of neural genes and point toward functional interactions with potential relevance for SCZ.
Assuntos
Redes Reguladoras de Genes/genética , Genoma Humano/genética , Esquizofrenia/genética , Fator de Transcrição 4/genética , Sítios de Ligação/genética , Encéfalo/metabolismo , Encéfalo/patologia , Imunoprecipitação da Cromatina , Ontologia Genética , Predisposição Genética para Doença , Humanos , Neurogênese/genética , Mudanças Depois da Morte , Células Piramidais/metabolismo , Células Piramidais/patologia , Esquizofrenia/fisiopatologia , Córtex Somatossensorial/metabolismo , Córtex Somatossensorial/patologiaRESUMO
BACKGROUND: Long noncoding RNA (lncRNA) have been implicated in the etiology of alcohol use. Since lncRNA provide another layer of complexity to the transcriptome, assessing their expression in the brain is the first critical step toward understanding lncRNA functions in alcohol use and addiction. Thus, we sought to profile lncRNA expression in the nucleus accumbens (NAc) in a large postmortem alcohol brain sample. METHODS: LncRNA and protein-coding gene (PCG) expressions in the NAc from 41 subjects with alcohol dependence (AD) and 41 controls were assessed via a regression model. Weighted gene coexpression network analysis was used to identify lncRNA and PCG networks (i.e., modules) significantly correlated with AD. Within the significant modules, key network genes (i.e., hubs) were also identified. The lncRNA and PCG hubs were correlated via Pearson correlations to elucidate the potential biological functions of lncRNA. The lncRNA and PCG hubs were further integrated with GWAS data to identify expression quantitative trait loci (eQTL). RESULTS: At Bonferroni adj. p-value ≤ 0.05, we identified 19 lncRNA and 5 PCG significant modules, which were enriched for neuronal and immune-related processes. In these modules, we further identified 86 and 315 PCG and lncRNA hubs, respectively. At false discovery rate (FDR) of 10%, the correlation analyses between the lncRNA and PCG hubs revealed 3,125 positive and 1,860 negative correlations. Integration of hubs with genotype data identified 243 eQTLs affecting the expression of 39 and 204 PCG and lncRNA hubs, respectively. CONCLUSIONS: Our study identified lncRNA and gene networks significantly associated with AD in the NAc, coordinated lncRNA and mRNA coexpression changes, highlighting potentially regulatory functions for the lncRNA, and our genetic (cis-eQTL) analysis provides novel insights into the etiological mechanisms of AD.
Assuntos
Alcoolismo/metabolismo , Núcleo Accumbens/metabolismo , RNA Longo não Codificante/metabolismo , Alcoolismo/genética , Estudos de Casos e Controles , Estudo de Associação Genômica Ampla , Humanos , Análise de Sequência com Séries de Oligonucleotídeos , Locos de Características Quantitativas , RNA Longo não Codificante/genética , TranscriptomaRESUMO
Genetic signal detection in genome-wide association studies (GWAS) is enhanced by pooling small signals from multiple Single Nucleotide Polymorphism (SNP), for example, across genes and pathways. Because genes are believed to influence traits via gene expression, it is of interest to combine information from expression Quantitative Trait Loci (eQTLs) in a gene or genes in the same pathway. Such methods, widely referred to as transcriptomic wide association studies (TWAS), already exist for gene analysis. Due to the possibility of eliminating most of the confounding effects of linkage disequilibrium (LD) from TWAS gene statistics, pathway TWAS methods would be very useful in uncovering the true molecular basis of psychiatric disorders. However, such methods are not yet available for arbitrarily large pathways/gene sets. This is possibly due to the quadratic (as a function of the number of SNPs) computational burden for computing LD across large chromosomal regions. To overcome this obstacle, we propose JEPEGMIX2-P, a novel TWAS pathway method that (a) has a linear computational burden, (b) uses a large and diverse reference panel (33 K subjects), (c) is competitive (adjusts for background enrichment in gene TWAS statistics), and (d) is applicable as-is to ethnically mixed-cohorts. To underline its potential for increasing the power to uncover genetic signals over the commonly used nontranscriptomics methods, for example, MAGMA, we applied JEPEGMIX2-P to summary statistics of most large meta-analyses from Psychiatric Genetics Consortium (PGC). While our work is just the very first step toward clinical translation of psychiatric disorders, PGC anorexia results suggest a possible avenue for treatment.
Assuntos
Biologia Computacional/métodos , Marcadores Genéticos , Herança Multifatorial , Polimorfismo de Nucleotídeo Único , Transtornos Psicóticos/patologia , Locos de Características Quantitativas , Transcriptoma , Perfilação da Expressão Gênica , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Humanos , Fenótipo , Prognóstico , Transtornos Psicóticos/genética , Fatores de Risco , Transdução de Sinais , SoftwareRESUMO
Genome-wide association studies (GWASs) are highly effective at identifying common risk variants for schizophrenia. Rare risk variants are also important contributors to schizophrenia etiology but, with the exception of large copy number variants, are difficult to detect with GWAS. Exome and genome sequencing, which have accelerated the study of rare variants, are expensive so alternative methods are needed to aid detection of rare variants. Here we re-analyze an Irish schizophrenia GWAS dataset (n = 3,473) by performing identity-by-descent (IBD) mapping followed by exome sequencing of individuals identified as sharing risk haplotypes to search for rare risk variants in coding regions. We identified 45 rare haplotypes (>1 cM) that were significantly more common in cases than controls. By exome sequencing 105 haplotype carriers, we investigated these haplotypes for functional coding variants that could be tested for association in independent GWAS samples. We identified one rare missense variant in PCNT but did not find statistical support for an association with schizophrenia in a replication analysis. However, IBD mapping can prioritize both individual samples and genomic regions for follow-up analysis but genome rather than exome sequencing may be more effective at detecting risk variants on rare haplotypes.
Assuntos
Sequenciamento do Exoma/métodos , Esquizofrenia/genética , Análise de Sequência de DNA/métodos , Adulto , Estudos de Casos e Controles , Mapeamento Cromossômico , Variações do Número de Cópias de DNA , Bases de Dados Genéticas , Exoma/genética , Feminino , Predisposição Genética para Doença/genética , Estudo de Associação Genômica Ampla/métodos , Genótipo , Haplótipos , Humanos , Masculino , Pessoa de Meia-Idade , Fatores de Risco , Esquizofrenia/metabolismoRESUMO
BACKGROUND: Identifying genetic relationships between complex traits in emerging adulthood can provide useful etiological insights into risk for psychopathology. College-age individuals are under-represented in genomic analyses thus far, and the majority of work has focused on the clinical disorder or cognitive abilities rather than normal-range behavioral outcomes. METHODS: This study examined a sample of emerging adults 18-22 years of age (N = 5947) to construct an atlas of polygenic risk for 33 traits predicting relevant phenotypic outcomes. Twenty-eight hypotheses were tested based on the previous literature on samples of European ancestry, and the availability of rich assessment data allowed for polygenic predictions across 55 psychological and medical phenotypes. RESULTS: Polygenic risk for schizophrenia (SZ) in emerging adults predicted anxiety, depression, nicotine use, trauma, and family history of psychological disorders. Polygenic risk for neuroticism predicted anxiety, depression, phobia, panic, neuroticism, and was correlated with polygenic risk for cardiovascular disease. CONCLUSIONS: These results demonstrate the extensive impact of genetic risk for SZ, neuroticism, and major depression on a range of health outcomes in early adulthood. Minimal cross-ancestry replication of these phenomic patterns of polygenic influence underscores the need for more genome-wide association studies of non-European populations.
Assuntos
Transtorno Depressivo Maior/genética , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Transtornos Mentais/genética , Herança Multifatorial/genética , Neuroticismo , Fenótipo , Esquizofrenia/genética , Adolescente , Adulto , Feminino , Humanos , Masculino , Mid-Atlantic Region , Adulto JovemRESUMO
BACKGROUND: Despite consistent evidence of the heritability of alcohol use disorders (AUDs), few specific genes with an etiological role have been identified. It is likely that AUDs are highly polygenic; however, the etiological pathways and genetic variants involved may differ between populations. The aim of this study was thus to evaluate whether aggregate genetic risk for AUDs differed between clinically ascertained and population-based epidemiological samples. METHODS: Four independent samples were obtained: 2 from unselected birth cohorts (Avon Longitudinal Study of Parents and Children [ALSPAC], N = 4,304; FinnTwin12 [FT12], N = 1,135) and 2 from families densely affected with AUDs, identified from treatment-seeking patients (Collaborative Study on the Genetics of Alcoholism, N = 2,097; Irish Affected Sib Pair Study of Alcohol Dependence, N = 706). AUD symptoms were assessed with clinical interviews, and participants of European ancestry were genotyped. Genomewide association was conducted separately in each sample, and the resulting association weights were used to create polygenic risk scores in each of the other samples (12 total discovery-validation pairs), and from meta-analyses within sample type. We then tested how well these aggregate genetic scores predicted AUD outcomes within and across sample types. RESULTS: Polygenic scores derived from 1 population-based sample (ALSPAC) significantly predicted AUD symptoms in another population-based sample (FT12), but not in either clinically ascertained sample. Trend-level associations (uncorrected p < 0.05) were found for polygenic score predictions within sample types but no or negative predictions across sample types. Polygenic scores accounted for 0 to 1% of the variance in AUD symptoms. CONCLUSIONS: Though preliminary, these results provide suggestive evidence of differences in the genetic etiology of AUDs based on sample characteristics such as treatment-seeking status, which may index other important clinical or demographic factors that moderate genetic influences. Although the variance accounted for by genomewide polygenic scores remains low, future studies could improve gene identification efforts by amassing very large samples, or reducing genetic heterogeneity by informing analyses with other phenotypic information such as sample characteristics. Multiple complementary approaches may be needed to make progress in gene identification for this complex disorder.
Assuntos
Alcoolismo/genética , Herança Multifatorial , Adolescente , Adulto , Feminino , Humanos , Masculino , Risco , Medição de Risco , População Branca/genética , Adulto JovemRESUMO
Alcohol abuse is a widespread and serious problem. Understanding the factors that influence the likelihood of abuse is important for the development of effective therapies. There are both genetic and environmental influences on the development of abuse, but it has been difficult to identify specific liability factors, in part because of both the complex genetic architecture of liability and the influences of environmental stimuli on the expression of that genetic liability. Epigenetic modification of gene expression can underlie both genetic and environmentally sensitive variation in expression, and epigenetic regulation has been implicated in the progression to addiction. Here, we identify a role for the switching defective/sucrose nonfermenting (SWI/SNF) chromatin-remodeling complex in regulating the behavioral response to alcohol in the nematode Caenorhabditis elegans. We found that SWI/SNF components are required in adults for the normal behavioral response to ethanol and that different SWI/SNF complexes regulate different aspects of the acute response to ethanol. We showed that the SWI/SNF subunits SWSN-9 and SWSN-7 are required in neurons and muscle for the development of acute functional tolerance to ethanol. Examination of the members of the SWI/SNF complex for association with a diagnosis of alcohol dependence in a human population identified allelic variation in a member of the SWI/SNF complex, suggesting that variation in the regulation of SWI/SNF targets may influence the propensity to develop abuse disorders. Together, these data strongly implicate the chromatin remodeling associated with SWI/SNF complex members in the behavioral responses to alcohol across phyla.
Assuntos
Alcoolismo/fisiopatologia , Caenorhabditis elegans/fisiologia , Proteínas Cromossômicas não Histona/fisiologia , Fatores de Transcrição/fisiologia , Alcoolismo/diagnóstico , Animais , Proteínas Cromossômicas não Histona/genética , Etanol/toxicidade , Estudo de Associação Genômica Ampla , Humanos , Interferência de RNA , Fatores de Transcrição/genéticaRESUMO
MOTIVATION: For genetic studies, statistically significant variants explain far less trait variance than 'sub-threshold' association signals. To dimension follow-up studies, researchers need to accurately estimate 'true' effect sizes at each SNP, e.g. the true mean of odds ratios (ORs)/regression coefficients (RRs) or Z-score noncentralities. Naïve estimates of effect sizes incur winner's curse biases, which are reduced only by laborious winner's curse adjustments (WCAs). Given that Z-scores estimates can be theoretically translated on other scales, we propose a simple method to compute WCA for Z-scores, i.e. their true means/noncentralities. RESULTS: WCA of Z-scores shrinks these towards zero while, on P-value scale, multiple testing adjustment (MTA) shrinks P-values toward one, which corresponds to the zero Z-score value. Thus, WCA on Z-scores scale is a proxy for MTA on P-value scale. Therefore, to estimate Z-score noncentralities for all SNPs in genome scans, we propose F: DR I: nverse Q: uantile T: ransformation (FIQT). It (i) performs the simpler MTA of P-values using FDR and (ii) obtains noncentralities by back-transforming MTA P-values on Z-score scale. When compared to competitors, realistic simulations suggest that FIQT is more (i) accurate and (ii) computationally efficient by orders of magnitude. Practical application of FIQT to Psychiatric Genetic Consortium schizophrenia cohort predicts a non-trivial fraction of sub-threshold signals which become significant in much larger supersamples. CONCLUSIONS: FIQT is a simple, yet accurate, WCA method for Z-scores (and ORs/RRs, via simple transformations). AVAILABILITY AND IMPLEMENTATION: A 10 lines R function implementation is available at https://github.com/bacanusa/FIQT CONTACT: sabacanu@vcu.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Viés , Interpretação Estatística de Dados , Humanos , FenótipoRESUMO
MOTIVATION: To increase detection power, gene level analysis methods are used to aggregate weak signals. To greatly increase computational efficiency, most methods use as input summary statistics from genome-wide association studies (GWAS). Subsequently, gene statistics are constructed using linkage disequilibrium (LD) patterns from a relevant reference panel. However, all methods, including our own Joint Effect on Phenotype of eQTL/functional single nucleotide polymorphisms (SNPs) associated with a Gene (JEPEG), assume homogeneous panels, e.g. European. However, this renders these tools unsuitable for the analysis of large cosmopolitan cohorts. RESULTS: We propose a JEPEG extension, JEPEGMIX, which similar to one of our software tools, Direct Imputation of summary STatistics of unmeasured SNPs from MIXed ethnicity cohorts, is capable of estimating accurate LD patterns for cosmopolitan cohorts. JEPEGMIX uses this accurate LD estimates to (i) impute the summary statistics at unmeasured functional variants and (ii) test for the joint effect of all measured and imputed functional variants which are associated with a gene. We illustrate the performance of our tool by analyzing the GWAS meta-analysis summary statistics from the multi-ethnic Psychiatric Genomics Consortium Schizophrenia stage 2 cohort. This practical application supports the immune system being one of the main drivers of the process leading to schizophrenia. AVAILABILITY AND IMPLEMENTATION: Software, annotation database and examples are available at http://dleelab.github.io/jepegmix/. CONTACT: donghyung.lee@vcuhealth.org SUPPLEMENTARY INFORMATION: Supplementary material is available at Bioinformatics online.
Assuntos
Etnicidade/genética , Testes Genéticos , Genética Populacional , Polimorfismo de Nucleotídeo Único/genética , Esquizofrenia/genética , Software , Estudos de Coortes , Estudo de Associação Genômica Ampla/métodos , Genômica/métodos , Humanos , Desequilíbrio de Ligação , FenótipoRESUMO
Researchers have long observed that problem behaviors tend to cluster together, particularly among adolescents. Epidemiological studies have suggested that this covariation is due, in part, to common genetic influences, and a number of plausible candidates have emerged as targets for investigation. To date, however, genetic association studies of these behaviors have focused mostly on unidimensional models of individual phenotypes within European American samples. Herein, we compared a series of confirmatory factor models to best characterize the structure of problem behavior (alcohol and marijuana use, sexual behavior, and disruptive behavior) within a representative community-based sample of 592 low-income African American adolescents (50.3% female), ages 13 to 18. We further explored the extent to which 3 genes previously implicated for their role in similar behavioral dimensions (CHRM2, GABRA2, and OPRM1) independently accounted for variance within factors specified in the best-fitting model. Supplementary analyses were conducted to derive comparative estimates for the predictive utility of these genes in more traditional unidimensional models. Findings provide initial evidence for a bifactor structure of problem behavior among African American adolescents and highlight novel genetic correlates of specific behavioral dimensions otherwise undetected in an orthogonal syndromal factor. Implications of this approach include increased precision in the assessment of problem behavior, with corresponding increases in the reliability and validity of identified genetic associations. As a corollary, the comparison of primary and supplementary association analyses illustrates the potential for overlooking and/or overinterpreting meaningful genetic effects when failing to adequately account for phenotypic complexity.
Assuntos
Comportamento do Adolescente/psicologia , Negro ou Afro-Americano/genética , Comportamento Problema/psicologia , Adolescente , Feminino , Humanos , Masculino , Pobreza , Reprodutibilidade dos Testes , Estados UnidosRESUMO
MOTIVATION: Gene expression is influenced by variants commonly known as expression quantitative trait loci (eQTL). On the basis of this fact, researchers proposed to use eQTL/functional information univariately for prioritizing single nucleotide polymorphisms (SNPs) signals from genome-wide association studies (GWAS). However, most genes are influenced by multiple eQTLs which, thus, jointly affect any downstream phenotype. Therefore, when compared with the univariate prioritization approach, a joint modeling of eQTL action on phenotypes has the potential to substantially increase signal detection power. Nonetheless, a joint eQTL analysis is impeded by (i) not measuring all eQTLs in a gene and/or (ii) lack of access to individual genotypes. RESULTS: We propose joint effect on phenotype of eQTL/functional SNPs associated with a gene (JEPEG), a novel software tool which uses only GWAS summary statistics to (i) impute the summary statistics at unmeasured eQTLs and (ii) test for the joint effect of all measured and imputed eQTLs in a gene. We illustrate the behavior/performance of the developed tool by analysing the GWAS meta-analysis summary statistics from the Psychiatric Genomics Consortium Stage 1 and the Genetic Consortium for Anorexia Nervosa. CONCLUSIONS: Applied analyses results suggest that JEPEG complements commonly used univariate GWAS tools by: (i) increasing signal detection power via uncovering (a) novel genes or (b) known associated genes in smaller cohorts and (ii) assisting in fine-mapping of challenging regions, e.g. major histocompatibility complex for schizophrenia. AVAILABILITY AND IMPLEMENTATION: JEPEG, its associated database of eQTL SNPs and usage examples are publicly available at http://code.google.com/p/jepeg/. CONTACT: dlee4@vcu.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Anorexia Nervosa/genética , Biomarcadores/análise , Regulação da Expressão Gênica , Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único/genética , Locos de Características Quantitativas , Software , Estudos de Coortes , Perfilação da Expressão Gênica , Genômica/métodos , Genótipo , Humanos , Metanálise como Assunto , FenótipoRESUMO
MOTIVATION: To increase the signal resolution for large-scale meta-analyses of genome-wide association studies, genotypes at unmeasured single nucleotide polymorphisms (SNPs) are commonly imputed using large multi-ethnic reference panels. However, the ever increasing size and ethnic diversity of both reference panels and cohorts makes genotype imputation computationally challenging for moderately sized computer clusters. Moreover, genotype imputation requires subject-level genetic data, which unlike summary statistics provided by virtually all studies, is not publicly available. While there are much less demanding methods which avoid the genotype imputation step by directly imputing SNP statistics, e.g. Directly Imputing summary STatistics (DIST) proposed by our group, their implicit assumptions make them applicable only to ethnically homogeneous cohorts. RESULTS: To decrease computational and access requirements for the analysis of cosmopolitan cohorts, we propose DISTMIX, which extends DIST capabilities to the analysis of mixed ethnicity cohorts. The method uses a relevant reference panel to directly impute unmeasured SNP statistics based only on statistics at measured SNPs and estimated/user-specified ethnic proportions. Simulations show that the proposed method adequately controls the Type I error rates. The 1000 Genomes panel imputation of summary statistics from the ethnically diverse Psychiatric Genetic Consortium Schizophrenia Phase 2 suggests that, when compared to genotype imputation methods, DISTMIX offers comparable imputation accuracy for only a fraction of computational resources. AVAILABILITY AND IMPLEMENTATION: DISTMIX software, its reference population data, and usage examples are publicly available at http://code.google.com/p/distmix. CONTACT: dlee4@vcu.edu SUPPLEMENTARY INFORMATION: Supplementary Data are available at Bioinformatics online.
Assuntos
Biologia Computacional/métodos , Etnicidade/genética , Polimorfismo de Nucleotídeo Único/genética , Software , Estatística como Assunto , Estudos de Coortes , Simulação por Computador , Bases de Dados Genéticas , Estudo de Associação Genômica Ampla , HumanosRESUMO
MOTIVATION: Genotype imputation methods are used to enhance the resolution of genome-wide association studies, and thus increase the detection rate for genetic signals. Although most studies report all univariate summary statistics, many of them limit the access to subject-level genotypes. Because such an access is required by all genotype imputation methods, it is helpful to develop methods that impute summary statistics without going through the interim step of imputing genotypes. Even when subject-level genotypes are available, due to the substantial computational cost of the typical genotype imputation, there is a need for faster imputation methods. RESULTS: Direct Imputation of summary STatistics (DIST) imputes the summary statistics of untyped variants without first imputing their subject-level genotypes. This is achieved by (i) using the conditional expectation formula for multivariate normal variates and (ii) using the correlation structure from a relevant reference population. When compared with genotype imputation methods, DIST (i) requires only a fraction of their computational resources, (ii) has comparable imputation accuracy for independent subjects and (iii) is readily applicable to the imputation of association statistics coming from large pedigree data. Thus, the proposed application is useful for a fast imputation of summary results for (i) studies of unrelated subjects, which (a) do not provide subject-level genotypes or (b) have a large size and (ii) family association studies. AVAILABILITY AND IMPLEMENTATION: Pre-compiled executables built under commonly used operating systems are publicly available at http://code.google.com/p/dist/. CONTACT: dlee4@vcu.edu .
Assuntos
Polimorfismo de Nucleotídeo Único , Software , Interpretação Estatística de Dados , Genoma Humano , Técnicas de Genotipagem , HumanosRESUMO
Genome-wide association studies (GWAS) of psychiatric disorders (PD) yield numerous loci with significant signals, but often do not implicate specific genes. Because GWAS risk loci are enriched in expression/protein/methylation quantitative loci (e/p/mQTL, hereafter xQTL), transcriptome/proteome/methylome-wide association studies (T/P/MWAS, hereafter XWAS) that integrate xQTL and GWAS information, can link GWAS signals to effects on specific genes. To further increase detection power, gene signals are aggregated within relevant gene sets (GS) by performing gene set enrichment (GSE) analyses. Often GSE methods test for enrichment of "signal" genes in curated GS while overlooking their linkage disequilibrium (LD) structure, allowing for the possibility of increased false positive rates. Moreover, no GSE tool uses xQTL information to perform mendelian randomization (MR) analysis. To make causal inference on association between PD and GS, we develop a novel MR GSE (MR-GSE) procedure. First, we generate a "synthetic" GWAS for each MSigDB GS by aggregating summary statistics for x-level (mRNA, protein or DNA methylation (DNAm) levels) from the largest xQTL studies available) of genes in a GS. Second, we use synthetic GS GWAS as exposure in a generalized summary-data-based-MR analysis of complex trait outcomes. We applied MR-GSE to GWAS of nine important PD. When applied to the underpowered opioid use disorder GWAS, none of the four analyses yielded any signals, which suggests a good control of false positive rates. For other PD, MR-GSE greatly increased the detection of GO terms signals (2,594) when compared to the commonly used (non-MR) GSE method (286). Some of the findings might be easier to adapt for treatment, e.g., our analyses suggest modest positive effects for supplementation with certain vitamins and/or omega-3 for schizophrenia, bipolar and major depression disorder patients. Similar to other MR methods, when applying MR-GSE researchers should be mindful of the confounding effects of horizontal pleiotropy on statistical inference.
RESUMO
Background: The genome-wide association study (GWAS) is a common tool to identify genetic variants associated with complex traits, including psychiatric disorders (PDs). However, post-GWAS analyses are needed to extend the statistical inference to biologically relevant entities, e.g., genes, proteins, and pathways. To achieve this goal, researchers developed methods that incorporate biologically relevant intermediate molecular phenotypes, such as gene expression and protein abundance, which are posited to mediate the variant-trait association. Transcriptome-wide association study (TWAS) and proteome-wide association study (PWAS) are commonly used methods to test the association between these molecular mediators and the trait. Summary: In this review, we discuss the most recent developments in TWAS and PWAS. These methods integrate existing "omic" information with the GWAS summary statistics for trait(s) of interest. Specifically, they impute transcript/protein data and test the association between imputed gene expression/protein level with phenotype of interest by using (i) GWAS summary statistics and (ii) reference transcriptomic/proteomic/genomic datasets. TWAS and PWAS are suitable as analysis tools for (i) primary association scan and (ii) fine-mapping to identify potentially causal genes for PDs. Key Messages: As post-GWAS analyses, TWAS and PWAS have the potential to highlight causal genes for PDs. These prioritized genes could indicate targets for the development of novel drug therapies. For researchers attempting such analyses, we recommend Mendelian randomization tools that use GWAS statistics for both trait and reference datasets, e.g., summary Mendelian randomization (SMR). We base our recommendation on (i) being able to use the same tool for both TWAS and PWAS, (ii) not requiring the pre-computed weights (and thus easier to update for larger reference datasets), and (iii) most larger transcriptome reference datasets are publicly available and easy to transform into a compatible format for SMR analysis.