RESUMO
Genome-wide association studies (GWASs) have uncovered susceptibility loci associated with psychiatric disorders such as bipolar disorder (BP) and schizophrenia (SCZ). However, most of these loci are in non-coding regions of the genome, and the causal mechanisms of the link between genetic variation and disease risk is unknown. Expression quantitative trait locus (eQTL) analysis of bulk tissue is a common approach used for deciphering underlying mechanisms, although this can obscure cell-type-specific signals and thus mask trait-relevant mechanisms. Although single-cell sequencing can be prohibitively expensive in large cohorts, computationally inferred cell-type proportions and cell-type gene expression estimates have the potential to overcome these problems and advance mechanistic studies. Using bulk RNA-seq from 1,730 samples derived from whole blood in a cohort ascertained from individuals with BP and SCZ, this study estimated cell-type proportions and their relation with disease status and medication. For each cell type, we found between 2,875 and 4,629 eGenes (genes with an associated eQTL), including 1,211 that are not found on the basis of bulk expression alone. We performed a colocalization test between cell-type eQTLs and various traits and identified hundreds of associations that occur between cell-type eQTLs and GWASs but that are not detected in bulk eQTLs. Finally, we investigated the effects of lithium use on the regulation of cell-type expression loci and found examples of genes that are differentially regulated according to lithium use. Our study suggests that applying computational methods to large bulk RNA-seq datasets of non-brain tissue can identify disease-relevant, cell-type-specific biology of psychiatric disorders and psychiatric medication.
Assuntos
Estudo de Associação Genômica Ampla , Lítio , Humanos , Estudo de Associação Genômica Ampla/métodos , RNA-Seq , Locos de Características Quantitativas/genética , Fenótipo , Polimorfismo de Nucleotídeo Único , Predisposição Genética para DoençaRESUMO
Bipolar disorder (BD) is a heritable disorder characterized by shifts in mood that manifest in manic or depressive episodes. Clinical studies have identified abnormalities of the circadian system in BD patients as a hallmark of underlying pathophysiology. Fibroblasts are a well-established in vitro model for measuring circadian patterns. We set out to examine the underlying genetic architecture of circadian rhythm in fibroblasts, with the goal to assess its contribution to the polygenic nature of BD disease risk. We collected, from primary cell lines of 6 healthy individuals, temporal genomic features over a 48 h period from transcriptomic data (RNA-seq) and open chromatin data (ATAC-seq). The RNA-seq data showed that only a limited number of genes, primarily the known core clock genes such as ARNTL, CRY1, PER3, NR1D2 and TEF display circadian patterns of expression consistently across cell cultures. The ATAC-seq data identified that distinct transcription factor families, like those with the basic helix-loop-helix motif, were associated with regions that were increasing in accessibility over time. Whereas known glucocorticoid receptor target motifs were identified in those regions that were decreasing in accessibility. Further evaluation of these regions using stratified linkage disequilibrium score regression analysis failed to identify a significant presence of them in the known genetic architecture of BD, and other psychiatric disorders or neurobehavioral traits in which the circadian rhythm is affected. In this study, we characterize the biological pathways that are activated in this in vitro circadian model, evaluating the relevance of these processes in the context of the genetic architecture of BD and other disorders, highlighting its limitations and future applications for circadian genomic studies.
Assuntos
Transtorno Bipolar , Ritmo Circadiano , Fibroblastos , Humanos , Ritmo Circadiano/genética , Fibroblastos/metabolismo , Transtorno Bipolar/genética , Genômica/métodos , Transcriptoma , Cromatina/genéticaRESUMO
BACKGROUND: Histone deacetylases (HDACs) are the proteins responsible for removing the acetyl group from lysine residues of core histones in chromosomes, a crucial component of gene regulation. Eleven known HDACs exist in humans and most other vertebrates. While the basic function of HDACs has been well characterized and new discoveries are still being made, the transcriptional regulation of their corresponding genes is still poorly understood. RESULTS: Here, we conducted a computational analysis of the eleven HDAC promoter sequences in 25 vertebrate species to determine whether transcription factor binding sites (TFBSs) are conserved in HDAC evolution, and if so, whether they provide useful information about HDAC expression and function. Furthermore, we used tissue-specific information of transcription factors to investigate the potential expression patterns of HDACs in different human tissues based on their transcription factor binding sites. We found that the TFBS profiles of most of the HDACs were well conserved in closely related species for all HDAC promoters except HDAC7 and HDAC10. HDAC5 had particularly strong conservation across over half of the species studied, with nearly identical profiles in the primate species. Our comparisons of TFBSs with the tissue specific gene expression profiles of their corresponding TFs showed that most HDACs had the ability to be ubiquitously expressed. A few HDAC promoters exhibited the potential for preferential expression in certain tissues, most notably HDAC11 in gall bladder, while HDAC9 seemed to have less propensity for expression in the nervous system. CONCLUSIONS: In general, we found evolutionary conservation in HDAC promoters that seems to be more prominent for the ubiquitously expressed HDACs. In turn, when conservation did not follow usual phylogeny, human TFBS patterns indicated possible functional relevance. While we found that HDACs appear to uniformly expressed, we confirm that the functional differences in HDACs may be less a matter of location of activity than a question of which proteins and which acetyl groups they may be acting on.
Assuntos
Sequência Conservada , Histona Desacetilases/genética , Regiões Promotoras Genéticas , Animais , Sítios de Ligação , Humanos , Fatores de Transcrição , Vertebrados/genéticaRESUMO
Focusing on the interactomes of Homo sapiens, Saccharomyces cerevisiae, and Escherichia coli, we investigated interactions between controlling proteins. In particular, we determined critical, intermittent, and redundant proteins based on their tendency to participate in minimum dominating sets. Independently of the organisms considered, we found that interactions that involved critical nodes had the most prominent effects on the topology of their corresponding networks. Furthermore, we observed that phosphorylation and regulatory events were considerably enriched when the corresponding transcription factors and kinases were critical proteins, while such interactions were depleted when they were redundant proteins. Moreover, interactions involving critical proteins were enriched with essential genes, disease genes, and drug targets, suggesting that such characteristics may be key for the detection of novel drug targets as well as assess their efficacy.
RESUMO
Bipolar disorder (BD) is a heritable disorder characterized by shifts in mood that manifest in manic or depressive episodes. Clinical studies have identified abnormalities of the circadian system in BD patients as a hallmark of underlying pathophysiology. Fibroblasts are a well-established in vitro model for measuring circadian patterns. We set out to examine the underlying genetic architecture of circadian rhythm in fibroblasts, with the goal to assess its contribution to the polygenic nature of BD disease risk. We collected, from primary cell lines of 6 healthy individuals, temporal genomic features over a 48 hour period from transcriptomic data (RNA-seq) and open chromatin data (ATAC-seq). The RNA-seq data showed that only a limited number of genes, primarily the known core clock genes such as ARNTL, CRY1, PER3, NR1D2 and TEF display circadian patterns of expression consistently across cell cultures. The ATAC-seq data identified that distinct transcription factor families, like those with the basic helix-loop-helix motif, were associated with regions that were increasing in accessibility over time. Whereas known glucocorticoid receptor target motifs were identified in those regions that were decreasing in accessibility. Further evaluation of these regions using stratified linkage disequilibrium score regression (sLDSC) analysis failed to identify a significant presence of them in the known genetic architecture of BD, and other psychiatric disorders or neurobehavioral traits in which the circadian rhythm is affected. In this study, we characterize the biological pathways that are activated in this in vitro circadian model, evaluating the relevance of these processes in the context of the genetic architecture of BD and other disorders, highlighting its limitations and future applications for circadian genomic studies.
RESUMO
BACKGROUND: Dupuytren disease (DD) is a common complex trait, with varying severity and incompletely understood cause. Genome-wide association studies (GWAS) have identified risk loci. In this article, we examine whether genetic risk profiles of DD in patients are associated with clinical variation and disease severity and with patient genetic risk profiles of genetically correlated traits, including body mass index (BMI), triglycerides, high-density lipoproteins, type 2 diabetes mellitus, and endophenotypes fasting glucose and glycated hemoglobin. METHODS: The authors used a well-characterized cohort of 1461 DD patients with available phenotypic and genetic data. Phenotype data include age at onset, recurrence, and family history of disease. Polygenic risk scores (PRSs) of DD, BMI, triglycerides, high-density lipoprotein, type 2 diabetes, fasting glucose, and hemoglobin A1c using various significance thresholds were calculated with PRSice using the most recent GWAS summary statistics. Control data from LifeLines were used to determine P value cutoffs for PRS generation explaining most variance. RESULTS: The PRS for DD was significantly associated with a positive family history for DD, age at onset, disease onset before the age of 50, and recurrence. We also found a significant negative correlation between the PRSs for DD and BMI. CONCLUSIONS: Although GWAS studies of DD are designed to identify genetic risk factors distinguishing case/control status, we show that the genetic risk profile for DD also explains part of its clinical variation and disease severity. The PRS may therefore aid in accurate prognostication, choosing initial treatment and in personalized medicine in the future. CLINICAL QUESTION/LEVEL OF EVIDENCE: Risk, III.
Assuntos
Diabetes Mellitus Tipo 2 , Contratura de Dupuytren , Humanos , Diabetes Mellitus Tipo 2/complicações , Contratura de Dupuytren/genética , Estudo de Associação Genômica Ampla , Herança Multifatorial/genética , Fatores de Risco , Hemoglobinas Glicadas , Glucose , Triglicerídeos , Predisposição Genética para DoençaRESUMO
We deployed the Blended Genome Exome (BGE), a DNA library blending approach that generates low pass whole genome (1-4× mean depth) and deep whole exome (30-40× mean depth) data in a single sequencing run. This technology is cost-effective, empowers most genomic discoveries possible with deep whole genome sequencing, and provides an unbiased method to capture the diversity of common SNP variation across the globe. To evaluate this new technology at scale, we applied BGE to sequence >53,000 samples from the Populations Underrepresented in Mental Illness Associations Studies (PUMAS) Project, which included participants across African, African American, and Latin American populations. We evaluated the accuracy of BGE imputed genotypes against raw genotype calls from the Illumina Global Screening Array. All PUMAS cohorts had R 2 concordance ≥95% among SNPs with MAF≥1%, and never fell below ≥90% R 2 for SNPs with MAF<1%. Furthermore, concordance rates among local ancestries within two recently admixed cohorts were consistent among SNPs with MAF≥1%, with only minor deviations in SNPs with MAF<1%. We also benchmarked the discovery capacity of BGE to access protein-coding copy number variants (CNVs) against deep whole genome data, finding that deletions and duplications spanning at least 3 exons had a positive predicted value of ~90%. Our results demonstrate BGE scalability and efficacy in capturing SNPs, indels, and CNVs in the human genome at 28% of the cost of deep whole-genome sequencing. BGE is poised to enhance access to genomic testing and empower genomic discoveries, particularly in underrepresented populations.
RESUMO
Genome-wide association studies (GWAS) have uncovered susceptibility loci associated with psychiatric disorders like bipolar disorder (BP) and schizophrenia (SCZ). However, most of these loci are in non-coding regions of the genome with unknown causal mechanisms of the link between genetic variation and disease risk. Expression quantitative trait loci (eQTL) analysis of bulk tissue is a common approach to decipher underlying mechanisms, though this can obscure cell-type specific signals thus masking trait-relevant mechanisms. While single-cell sequencing can be prohibitively expensive in large cohorts, computationally inferred cell type proportions and cell type gene expression estimates have the potential to overcome these problems and advance mechanistic studies. Using bulk RNA-Seq from 1,730 samples derived from whole blood in a cohort ascertained for individuals with BP and SCZ this study estimated cell type proportions and their relation with disease status and medication. We found between 2,875 and 4,629 eGenes for each cell type, including 1,211 eGenes that are not found using bulk expression alone. We performed a colocalization test between cell type eQTLs and various traits and identified hundreds of associations between cell type eQTLs and GWAS loci that are not detected in bulk eQTLs. Finally, we investigated the effects of lithium use on cell type expression regulation and found examples of genes that are differentially regulated dependent on lithium use. Our study suggests that computational methods can be applied to large bulk RNA-Seq datasets of non-brain tissue to identify disease-relevant, cell type specific biology of psychiatric disorders and psychiatric medication.
RESUMO
Genomic studies of molecular traits have provided mechanistic insights into complex disease, though these lag behind for brain-related traits due to the inaccessibility of brain tissue. We leveraged cerebrospinal fluid (CSF) to study neurobiological mechanisms in vivo , measuring 5,543 CSF metabolites, the largest panel in CSF to date, in 977 individuals of European ancestry. Individuals originated from two separate cohorts including cognitively healthy subjects (n=490) and a well-characterized memory clinic sample, the Amsterdam Dementia Cohort (ADC, n=487). We performed metabolite quantitative trait loci (mQTL) mapping on CSF metabolomics and found 126 significant mQTLs, representing 65 unique CSF metabolites across 51 independent loci. To better understand the role of CSF mQTLs in brain-related disorders, we performed a metabolome-wide association study (MWAS), identifying 40 associations between CSF metabolites and brain traits. Similarly, over 90% of significant mQTLs demonstrated colocalized associations with brain-specific gene expression, unveiling potential neurobiological pathways.
RESUMO
Mapping genetic variants that regulate gene expression (eQTL mapping) in large-scale RNA sequencing (RNA-seq) studies is often employed to understand functional consequences of regulatory variants. However, the high cost of RNA-seq limits sample size, sequencing depth, and, therefore, discovery power in eQTL studies. In this work, we demonstrate that, given a fixed budget, eQTL discovery power can be increased by lowering the sequencing depth per sample and increasing the number of individuals sequenced in the assay. We perform RNA-seq of whole-blood tissue across 1,490 individuals at low coverage (5.9 million reads/sample) and show that the effective power is higher than that of an RNA-seq study of 570 individuals at moderate coverage (13.9 million reads/sample). Next, we leverage synthetic datasets derived from real RNA-seq data (50 million reads/sample) to explore the interplay of coverage and number individuals in eQTL studies, and show that a 10-fold reduction in coverage leads to only a 2.5-fold reduction in statistical power to identify eQTLs. Our work suggests that lowering coverage while increasing the number of individuals in RNA-seq is an effective approach to increase discovery power in eQTL studies.
RESUMO
BACKGROUND: The etiology of frontotemporal dementia (FTD) is poorly understood. To identify genes with predicted expression levels associated with FTD, we integrated summary statistics with external reference gene expression data using a transcriptome-wide association study approach. METHODS: FUSION software was used to leverage FTD summary statistics (all FTD: n = 2154 cases, n = 4308 controls; behavioral variant FTD: n = 1337 cases, n = 2754 controls; semantic dementia: n = 308 cases, n = 616 controls; progressive nonfluent aphasia: n = 269 cases, n = 538 controls; FTD with motor neuron disease: n = 200 cases, n = 400 controls) from the International FTD-Genomics Consortium with 53 expression quantitative loci tissue type panels (n = 12,205; 5 consortia). Significance was assessed using a 5% false discovery rate threshold. RESULTS: We identified 73 significant gene-tissue associations for FTD, representing 44 unique genes in 34 tissue types. Most significant findings were derived from dorsolateral prefrontal cortex splicing data (n = 19 genes, 26%). The 17q21.31 inversion locus contained 23 significant associations, representing 6 unique genes. Other top hits included SEC22B (a gene involved in vesicle trafficking), TRGV5, and ZNF302. A single gene finding (RAB38) was observed for behavioral variant FTD. For other clinical subtypes, no significant associations were observed. CONCLUSIONS: We identified novel candidate genes (e.g., SEC22B) and previously reported risk regions (e.g., 17q21.31) for FTD. Most significant associations were observed in dorsolateral prefrontal cortex splicing data despite the modest sample size of this reference panel. This suggests that our findings are specific to FTD and are likely to be biologically relevant highlights of genes at different FTD risk loci that are contributing to the disease pathology.