ABSTRACT
Midbrain dopaminergic neurons (mDANs) control voluntary movement, cognition, and reward behavior under physiological conditions and are implicated in human diseases such as Parkinson's disease (PD). Many transcription factors (TFs) controlling human mDAN differentiation during development have been described, but much of the regulatory landscape remains undefined. Using a tyrosine hydroxylase (TH) human iPSC reporter line, we here generate time series transcriptomic and epigenomic profiles of purified mDANs during differentiation. Integrative analysis predicts novel regulators of mDAN differentiation and super-enhancers are used to identify key TFs. We find LBX1, NHLH1 and NR2F1/2 to promote mDAN differentiation and show that overexpression of either LBX1 or NHLH1 can also improve mDAN specification. A more detailed investigation of TF targets reveals that NHLH1 promotes the induction of neuronal miR-124, LBX1 regulates cholesterol biosynthesis, and NR2F1/2 controls neuronal activity.
Subject(s)
Dopaminergic Neurons , Induced Pluripotent Stem Cells , Humans , Dopaminergic Neurons/metabolism , Multiomics , Mesencephalon , Transcription Factors/genetics , Transcription Factors/metabolism , Induced Pluripotent Stem Cells/metabolism , Cell Differentiation/genetics , Basic Helix-Loop-Helix Transcription Factors/geneticsABSTRACT
The unicellular ciliate Paramecium contains a large vegetative macronucleus with several unusual characteristics, including an extremely high coding density and high polyploidy. As macronculear chromatin is devoid of heterochromatin, our study characterizes the functional epigenomic organization necessary for gene regulation and proper Pol II activity. Histone marks (H3K4me3, H3K9ac, H3K27me3) reveal no narrow peaks but broad domains along gene bodies, whereas intergenic regions are devoid of nucleosomes. Our data implicate H3K4me3 levels inside ORFs to be the main factor associated with gene expression, and H3K27me3 appears in association with H3K4me3 in plastic genes. Silent and lowly expressed genes show low nucleosome occupancy, suggesting that gene inactivation does not involve increased nucleosome occupancy and chromatin condensation. Because of a high occupancy of Pol II along highly expressed ORFs, transcriptional elongation appears to be quite different from that of other species. This is supported by missing heptameric repeats in the C-terminal domain of Pol II and a divergent elongation system. Our data imply that unoccupied DNA is the default state, whereas gene activation requires nucleosome recruitment together with broad domains of H3K4me3. In summary, gene activation and silencing in Paramecium run counter to the current understanding of chromatin biology.
Subject(s)
Histones , Paramecium , Chromatin/genetics , Histone Code , Histones/genetics , Histones/metabolism , Nucleosomes/genetics , Paramecium/genetics , Paramecium/metabolism , RNA Polymerase II/genetics , RNA Polymerase II/metabolismABSTRACT
Several studies suggested that transcription factor (TF) binding to DNA may be impaired or enhanced by DNA methylation. We present MeDeMo, a toolbox for TF motif analysis that combines information about DNA methylation with models capturing intra-motif dependencies. In a large-scale study using ChIP-seq data for 335 TFs, we identify novel TFs that show a binding behaviour associated with DNA methylation. Overall, we find that the presence of CpG methylation decreases the likelihood of binding for the majority of methylation-associated TFs. For a considerable subset of TFs, we show that intra-motif dependencies are pivotal for accurately modelling the impact of DNA methylation on TF binding. We illustrate that the novel methylation-aware TF binding models allow to predict differential ChIP-seq peaks and improve the genome-wide analysis of TF binding. Our work indicates that simplistic models that neglect the effect of DNA methylation on DNA binding may lead to systematic underperformance for methylation-associated TFs.
ABSTRACT
17-ß-hydroxysteroid dehydrogenase 13 (HSD17B13), a lipid droplet-associated enzyme, is primarily expressed in the liver and plays an important role in lipid metabolism. Targeted inhibition of enzymatic function is a potential therapeutic strategy for treating steatotic liver disease (SLD). The present study is aimed at investigating the effects of the first selective HSD17B13 inhibitor, BI-3231, in a model of hepatocellular lipotoxicity using human cell lines and primary mouse hepatocytes in vitro. Lipotoxicity was induced with palmitic acid in HepG2 cells and freshly isolated mouse hepatocytes and the cells were coincubated with BI-3231 to assess the protective effects. Under lipotoxic stress, triglyceride (TG) accumulation was significantly decreased in the BI-3231-treated cells compared with that of the control untreated human and mouse hepatocytes. In addition, treatment with BI-3231 led to considerable improvement in hepatocyte proliferation, cell differentiation, and lipid homeostasis. Mechanistically, BI-3231 increased the mitochondrial respiratory function without affecting ß-oxidation. BI-3231 inhibited the lipotoxic effects of palmitic acid in hepatocytes, highlighting the potential of targeting HSD17B13 as a specific therapeutic approach in steatotic liver disease.NEW & NOTEWORTHY 17-ß-Hydroxysteroid dehydrogenase 13 (HSD17B13) is a lipid droplet protein primarily expressed in the liver hepatocytes. HSD17B13 is associated with the clinical outcome of chronic liver diseases and is therefore a target for the development of drugs. Here, we demonstrate the promising therapeutic effect of BI-3231 as a potent inhibitor of HSD17B13 based on its ability to inhibit triglyceride accumulation in lipid droplets (LDs), restore lipid metabolism and homeostasis, and increase mitochondrial activity in vitro.
Subject(s)
Fatty Liver , Palmitic Acid , Humans , Animals , Mice , Palmitic Acid/toxicity , Enzyme Inhibitors/pharmacology , Hepatocytes , TriglyceridesABSTRACT
RNA.DNA:DNA triple helix (triplex) formation is a form of RNA-DNA interaction which regulates gene expression but is difficult to study experimentally in vivo. This makes accurate computational prediction of such interactions highly important in the field of RNA research. Current predictive methods use canonical Hoogsteen base pairing rules, which whilst biophysically valid, may not reflect the plastic nature of cell biology. Here, we present the first optimization approach to learn a probabilistic model describing RNA-DNA interactions directly from motifs derived from triplex sequencing data. We find that there are several stable interaction codes, including Hoogsteen base pairing and novel RNA-DNA base pairings, which agree with in vitro measurements. We implemented these findings in TriplexAligner, a program that uses the determined interaction codes to predict triplex binding. TriplexAligner predicts RNA-DNA interactions identified in all-to-all sequencing data more accurately than all previously published tools in human and mouse and also predicts previously studied triplex interactions with known regulatory functions. We further validated a novel triplex interaction using biophysical experiments. Our work is an important step towards better understanding of triplex formation and allows genome-wide analyses of RNA-DNA interactions.
Subject(s)
Genome-Wide Association Study , RNA , Humans , Mice , Animals , RNA/genetics , DNA/genetics , DNA/metabolism , DNA Replication , Nucleic Acid ConformationABSTRACT
MOTIVATION: DNA CpG methylation (CpGm) has proven to be a crucial epigenetic factor in the mammalian gene regulatory system. Assessment of DNA CpG methylation values via whole-genome bisulfite sequencing (WGBS) is, however, computationally extremely demanding. RESULTS: We present FAst MEthylation calling (FAME), the first approach to quantify CpGm values directly from bulk or single-cell WGBS reads without intermediate output files. FAME is very fast but as accurate as standard methods, which first produce BS alignment files before computing CpGm values. We present experiments on bulk and single-cell bisulfite datasets in which we show that data analysis can be significantly sped-up and help addressing the current WGBS analysis bottleneck for large-scale datasets without compromising accuracy. AVAILABILITY AND IMPLEMENTATION: An implementation of FAME is open source and licensed under GPL-3.0 at https://github.com/FischerJo/FAME.
Subject(s)
DNA Methylation , Software , Animals , Sequence Analysis, DNA/methods , High-Throughput Nucleotide Sequencing/methods , Sulfites , DNA/genetics , Mammals/geneticsABSTRACT
MOTIVATION: Identifying regulatory regions in the genome is of great interest for understanding the epigenomic landscape in cells. One fundamental challenge in this context is to find the target genes whose expression is affected by the regulatory regions. A recent successful method is the Activity-By-Contact (ABC) model which scores enhancer-gene interactions based on enhancer activity and the contact frequency of an enhancer to its target gene. However, it describes regulatory interactions entirely from a gene's perspective, and does not account for all the candidate target genes of an enhancer. In addition, the ABC model requires two types of assays to measure enhancer activity, which limits the applicability. Moreover, there is neither implementation available that could allow for an integration with transcription factor (TF) binding information nor an efficient analysis of single-cell data. RESULTS: We demonstrate that the ABC score can yield a higher accuracy by adapting the enhancer activity according to the number of contacts the enhancer has to its candidate target genes and also by considering all annotated transcription start sites of a gene. Further, we show that the model is comparably accurate with only one assay to measure enhancer activity. We combined our generalized ABC model with TF binding information and illustrated an analysis of a single-cell ATAC-seq dataset of the human heart, where we were able to characterize cell type-specific regulatory interactions and predict gene expression based on TF affinities. All executed processing steps are incorporated into our new computational pipeline STARE. AVAILABILITY AND IMPLEMENTATION: The software is available at https://github.com/schulzlab/STARE. CONTACT: marcel.schulz@em.uni-frankfurt.de. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Subject(s)
Gene Expression Regulation , Transcription Factors , Humans , Transcription Factors/metabolism , Regulatory Sequences, Nucleic Acid , Software , Protein BindingABSTRACT
BACKGROUND: Cardiovascular diseases (CVDs) are the leading cause of death worldwide. Genome-wide association studies (GWAS) have identified many single nucleotide polymorphisms (SNPs) appearing in non-coding genomic regions in CVDs. The SNPs may alter gene expression by modifying transcription factor (TF) binding sites and lead to functional consequences in cardiovascular traits or diseases. To understand the underlying molecular mechanisms, it is crucial to identify which variations are involved and how they affect TF binding. METHODS: The SNEEP (SNP exploration and analysis using epigenomics data) pipeline was used to identify regulatory SNPs, which alter the binding behavior of TFs and link GWAS SNPs to their potential target genes for six CVDs. The human-induced pluripotent stem cells derived cardiomyocytes (hiPSC-CMs), monoculture cardiac organoids (MCOs) and self-organized cardiac organoids (SCOs) were used in the study. Gene expression, cardiomyocyte size and cardiac contractility were assessed. RESULTS: By using our integrative computational pipeline, we identified 1905 regulatory SNPs in CVD GWAS data. These were associated with hundreds of genes, half of them non-coding RNAs (ncRNAs), suggesting novel CVD genes. We experimentally tested 40 CVD-associated non-coding RNAs, among them RP11-98F14.11, RPL23AP92, IGBP1P1, and CTD-2383I20.1, which were upregulated in hiPSC-CMs, MCOs and SCOs under hypoxic conditions. Further experiments showed that IGBP1P1 depletion rescued expression of hypertrophic marker genes, reduced hypoxia-induced cardiomyocyte size and improved hypoxia-reduced cardiac contractility in hiPSC-CMs and MCOs. CONCLUSIONS: IGBP1P1 is a novel ncRNA with key regulatory functions in modulating cardiomyocyte size and cardiac function in our disease models. Our data suggest ncRNA IGBP1P1 as a potential therapeutic target to improve cardiac function in CVDs.
Subject(s)
Cardiovascular Diseases , Polymorphism, Single Nucleotide , Humans , Polymorphism, Single Nucleotide/genetics , Genome-Wide Association Study , Cardiovascular Diseases/genetics , Genomics , GenomeABSTRACT
CircRNAs are an important class of RNAs with diverse cellular functions in human physiology and disease. A thorough knowledge of circRNAs including their biogenesis and subcellular distribution is important to understand their roles in a wide variety of processes. However, the analysis of circRNAs from total RNA sequencing data remains challenging. Therefore, we developed Calcifer, a versatile workflow for circRNA annotation. Using Calcifer, we analysed APEX-Seq data to compare circRNA occurrence between whole cells, nucleus and subnuclear compartments. We generally find that circRNAs show higher abundance in whole cells compared to nuclear samples, consistent with their accumulation in the cytoplasm. The notable exception is the single-exon circRNA circCANX(9), which is unexpectedly enriched in the nucleus. In addition, we observe that circFIRRE prevails over the linear lncRNA FIRRE in both the cytoplasm and the nucleus. Zooming in on the subnuclear compartments, we show that circRNAs are strongly depleted from nuclear speckles, indicating that excess splicing factors in this compartment counteract back-splicing. Our results thereby provide valuable insights into the subnuclear distribution of circRNAs. Regarding circRNA function, we surprisingly find that the majority of all detected circRNAs possess complete open reading frames with potential for cap-independent translation. Overall, we show that Calcifer is an easy-to-use, versatile and sustainable workflow for the annotation of circRNAs which expands the repertoire of circRNA tools and allows to gain new insights into circRNA distribution and function.
Subject(s)
Cell Nucleus , RNA, Circular , RNA, Circular/genetics , RNA, Circular/metabolism , Humans , Cell Nucleus/metabolism , Cell Nucleus/genetics , Cytoplasm/metabolism , Cytoplasm/genetics , Open Reading Frames , Molecular Sequence Annotation , RNA, Long Noncoding/genetics , RNA, Long Noncoding/metabolism , RNA Splicing , Computational Biology/methods , Sequence Analysis, RNAABSTRACT
Spatial genome organization is tightly controlled by several regulatory mechanisms and is essential for gene expression control. Nuclear receptors are ligand-activated transcription factors that modulate physiological and pathophysiological processes and are primary pharmacological targets. DNA binding of the important loop-forming insulator protein CCCTC-binding factor (CTCF) was modulated by 1α,25-dihydroxyvitamin D3 (1,25(OH)2D3). We performed CTCF HiChIP assays to produce the first genome-wide dataset of CTCF long-range interactions in 1,25(OH)2D3-treated cells, and to determine whether dynamic changes of spatial chromatin interactions are essential for fine-tuning of nuclear receptor signaling. We detected changes in 3D chromatin organization upon vitamin D receptor (VDR) activation at 3.1% of all observed CTCF interactions. VDR binding was enriched at both differential loop anchors and within differential loops. Differential loops were observed in several putative functional roles including TAD border formation, promoter-enhancer looping, and establishment of VDR-responsive insulated neighborhoods. Vitamin D target genes were enriched in differential loops and at their anchors. Secondary vitamin D effects related to dynamic chromatin domain changes were linked to location of downstream transcription factors in differential loops. CRISPR interference and loop anchor deletion experiments confirmed the functional relevance of nuclear receptor ligand-induced adjustments of the chromatin 3D structure for gene expression regulation.
Subject(s)
Chromatin , Receptors, Calcitriol , Chromatin/genetics , Gene Expression , Ligands , Receptors, Calcitriol/genetics , Receptors, Calcitriol/metabolism , Receptors, Cytoplasmic and Nuclear/genetics , Transcription Factors/metabolism , Vitamin D/metabolism , Vitamin D/pharmacologyABSTRACT
Transcription factors (TFs) are essential players in orchestrating the regulatory landscape in cells. Still, their exact modes of action and dependencies on other regulatory aspects remain elusive. Since TFs act cell type-specific and each TF has its own characteristics, untangling their regulatory interactions from an experimental point of view is laborious and convoluted. Thus, there is an ongoing development of computational tools that estimate transcription factor activity (TFA) from a variety of data modalities, either based on a mapping of TFs to their putative target genes or in a genome-wide, gene-unspecific fashion. These tools can help to gain insights into TF regulation and to prioritize candidates for experimental validation. We want to give an overview of available computational tools that estimate TFA, illustrate examples of their application, debate common result validation strategies, and discuss assumptions and concomitant limitations.
Subject(s)
Gene Expression Regulation , Transcription Factors , Transcription Factors/metabolism , Genome , Computational Biology , Gene Regulatory NetworksABSTRACT
Liver cirrhosis is the end stage of all chronic liver diseases and contributes significantly to overall mortality of 2% globally. The age-standardized mortality from liver cirrhosis in Europe is between 10 and 20% and can be explained by not only the development of liver cancer but also the acute deterioration in the patient's overall condition. The development of complications including accumulation of fluid in the abdomen (ascites), bleeding in the gastrointestinal tract (variceal bleeding), bacterial infections, or a decrease in brain function (hepatic encephalopathy) define an acute decompensation that requires therapy and often leads to acute-on-chronic liver failure (ACLF) by different precipitating events. However, due to its complexity and organ-spanning nature, the pathogenesis of ACLF is poorly understood, and the common underlying mechanisms leading to the development of organ dysfunction or failure in ACLF are still elusive. Apart from general intensive care interventions, there are no specific therapy options for ACLF. Liver transplantation is often not possible in these patients due to contraindications and a lack of prioritization. In this review, we describe the framework of the ACLF-I project consortium funded by the Hessian Ministry of Higher Education, Research and the Arts (HMWK) based on existing findings and will provide answers to these open questions.
Subject(s)
Acute-On-Chronic Liver Failure , End Stage Liver Disease , Esophageal and Gastric Varices , Humans , End Stage Liver Disease/complications , Esophageal and Gastric Varices/complications , Gastrointestinal Hemorrhage/complications , Liver Cirrhosis/complications , Liver Cirrhosis/therapy , Acute-On-Chronic Liver Failure/therapy , Acute-On-Chronic Liver Failure/etiologyABSTRACT
AIM: Chemoresistance is a major cause of treatment failure in colorectal cancer (CRC) therapy. In this study, the impact of the IGF2BP family of RNA-binding proteins on CRC chemoresistance was investigated using in silico, in vitro, and in vivo approaches. METHODS: Gene expression data from a well-characterized cohort and publicly available cross-linking immunoprecipitation sequencing (CLIP-Seq) data were collected. Resistance to chemotherapeutics was assessed in patient-derived xenografts (PDXs) and patient-derived organoids (PDOs). Functional studies were performed in 2D and 3D cell culture models, including proliferation, spheroid growth, and mitochondrial respiration analyses. RESULTS: We identified IGF2BP2 as the most abundant IGF2BP in primary and metastastatic CRC, correlating with tumor stage in patient samples and tumor growth in PDXs. IGF2BP2 expression in primary tumor tissue was significantly associated with resistance to selumetinib, gefitinib, and regorafenib in PDOs and to 5-fluorouracil and oxaliplatin in PDX in vivo. IGF2BP2 knockout (KO) HCT116 cells were more susceptible to regorafenib in 2D and to oxaliplatin, selumitinib, and nintedanib in 3D cell culture. Further, a bioinformatic analysis using CLIP data suggested stabilization of target transcripts in primary and metastatic tumors. Measurement of oxygen consumption rate (OCR) and extracellular acidification rate (ECAR) revealed a decreased basal OCR and an increase in glycolytic ATP production rate in IGF2BP2 KO. In addition, real-time reverse transcriptase polymerase chain reaction (qPCR) analysis confirmed decreased expression of genes of the respiratory chain complex I, complex IV, and the outer mitochondrial membrane in IGF2BP2 KO cells. CONCLUSIONS: IGF2BP2 correlates with CRC tumor growth in vivo and promotes chemoresistance by altering mitochondrial respiratory chain metabolism. As a druggable target, IGF2BP2 could be used in future CRC therapy to overcome CRC chemoresistance.
Subject(s)
Colorectal Neoplasms , Humans , Oxaliplatin/pharmacology , Colorectal Neoplasms/drug therapy , Colorectal Neoplasms/genetics , Colorectal Neoplasms/pathology , Drug Resistance, Neoplasm/genetics , RNA-Binding Proteins/genetics , RNA-Binding Proteins/metabolism , Cell Line, Tumor , Cell Proliferation/genetics , Gene Expression Regulation, NeoplasticABSTRACT
Long non-coding RNAs (lncRNAs) can act as regulatory RNAs which, by altering the expression of target genes, impact on the cellular phenotype and cardiovascular disease development. Endothelial lncRNAs and their vascular functions are largely undefined. Deep RNA-Seq and FANTOM5 CAGE analysis revealed the lncRNA LINC00607 to be highly enriched in human endothelial cells. LINC00607 was induced in response to hypoxia, arteriosclerosis regression in non-human primates, post-atherosclerotic cultured endothelial cells from patients and also in response to propranolol used to induce regression of human arteriovenous malformations. siRNA knockdown or CRISPR/Cas9 knockout of LINC00607 attenuated VEGF-A-induced angiogenic sprouting. LINC00607 knockout in endothelial cells also integrated less into newly formed vascular networks in an in vivo assay in SCID mice. Overexpression of LINC00607 in CRISPR knockout cells restored normal endothelial function. RNA- and ATAC-Seq after LINC00607 knockout revealed changes in the transcription of endothelial gene sets linked to the endothelial phenotype and in chromatin accessibility around ERG-binding sites. Mechanistically, LINC00607 interacted with the SWI/SNF chromatin remodeling protein BRG1. CRISPR/Cas9-mediated knockout of BRG1 in HUVEC followed by CUT&RUN revealed that BRG1 is required to secure a stable chromatin state, mainly on ERG-binding sites. In conclusion, LINC00607 is an endothelial-enriched lncRNA that maintains ERG target gene transcription by interacting with the chromatin remodeler BRG1 to ultimately mediate angiogenesis.
Subject(s)
RNA, Long Noncoding , Animals , Humans , Mice , Chromatin , DNA Helicases/genetics , DNA Helicases/metabolism , Endothelial Cells/metabolism , Mice, SCID , Nuclear Proteins/metabolism , RNA, Long Noncoding/genetics , Neovascularization, PhysiologicABSTRACT
Understanding how epigenetic variation in non-coding regions is involved in distal gene-expression regulation is an important problem. Regulatory regions can be associated to genes using large-scale datasets of epigenetic and expression data. However, for regions of complex epigenomic signals and enhancers that regulate many genes, it is difficult to understand these associations. We present StitchIt, an approach to dissect epigenetic variation in a gene-specific manner for the detection of regulatory elements (REMs) without relying on peak calls in individual samples. StitchIt segments epigenetic signal tracks over many samples to generate the location and the target genes of a REM simultaneously. We show that this approach leads to a more accurate and refined REM detection compared to standard methods even on heterogeneous datasets, which are challenging to model. Also, StitchIt REMs are highly enriched in experimentally determined chromatin interactions and expression quantitative trait loci. We validated several newly predicted REMs using CRISPR-Cas9 experiments, thereby demonstrating the reliability of StitchIt. StitchIt is able to dissect regulation in superenhancers and predicts thousands of putative REMs that go unnoticed using peak-based approaches suggesting that a large part of the regulome might be uncharted water.
Subject(s)
Chromatin/metabolism , Data Analysis , Enhancer Elements, Genetic , Epigenesis, Genetic , Gene Expression Regulation , Human Umbilical Vein Endothelial Cells , HumansABSTRACT
MOTIVATION: The generation of genome-wide maps of histone modifications using chromatin immunoprecipitation sequencing is a standard approach to dissect the complexity of the epigenome. Interpretation and differential analysis of histone datasets remains challenging due to regulatory meaningful co-occurrences of histone marks and their difference in genomic spread. To ease interpretation, chromatin state segmentation maps are a commonly employed abstraction combining individual histone marks. We developed the tool SCIDDO as a fast, flexible and statistically sound method for the differential analysis of chromatin state segmentation maps. RESULTS: We demonstrate the utility of SCIDDO in a comparative analysis that identifies differential chromatin domains (DCD) in various regulatory contexts and with only moderate computational resources. We show that the identified DCDs correlate well with observed changes in gene expression and can recover a substantial number of differentially expressed genes (DEGs). We showcase SCIDDO's ability to directly interrogate chromatin dynamics, such as enhancer switches in downstream analysis, which simplifies exploring specific questions about regulatory changes in chromatin. By comparing SCIDDO to competing methods, we provide evidence that SCIDDO's performance in identifying DEGs via differential chromatin marking is more stable across a range of cell-type comparisons and parameter cut-offs. AVAILABILITY AND IMPLEMENTATION: The SCIDDO source code is openly available under github.com/ptrebert/sciddo. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Subject(s)
Chromatin , Chromosomes , Chromatin Immunoprecipitation , Genome , Histone CodeABSTRACT
The COVID-19 pandemic is shifting teaching to an online setting all over the world. The Galaxy framework facilitates the online learning process and makes it accessible by providing a library of high-quality community-curated training materials, enabling easy access to data and tools, and facilitates sharing achievements and progress between students and instructors. By combining Galaxy with robust communication channels, effective instruction can be designed inclusively, regardless of the students' environments.
Subject(s)
COVID-19/epidemiology , Computer-Assisted Instruction , Education, Distance/organization & administration , COVID-19/virology , Computational Biology , Humans , Information Dissemination , Pandemics , SARS-CoV-2/isolation & purificationABSTRACT
A current challenge in genomics is to interpret non-coding regions and their role in transcriptional regulation of possibly distant target genes. Genome-wide association studies show that a large part of genomic variants are found in those non-coding regions, but their mechanisms of gene regulation are often unknown. An additional challenge is to reliably identify the target genes of the regulatory regions, which is an essential step in understanding their impact on gene expression. Here we present the EpiRegio web server, a resource of regulatory elements (REMs). REMs are genomic regions that exhibit variations in their chromatin accessibility profile associated with changes in expression of their target genes. EpiRegio incorporates both epigenomic and gene expression data for various human primary cell types and tissues, providing an integrated view of REMs in the genome. Our web server allows the analysis of genes and their associated REMs, including the REM's activity and its estimated cell type-specific contribution to its target gene's expression. Further, it is possible to explore genomic regions for their regulatory potential, investigate overlapping REMs and by that the dissection of regions of large epigenomic complexity. EpiRegio allows programmatic access through a REST API and is freely available at https://epiregio.de/.
Subject(s)
Regulatory Elements, Transcriptional , Software , Chromatin Immunoprecipitation Sequencing , Disease/genetics , Gene Expression Regulation , Humans , Transcription Factors/metabolismABSTRACT
Using results from genome-wide association studies for understanding complex traits is a current challenge. Here we review how genotype data can be used with different machine learning (ML) methods to predict phenotype occurrence and severity from genotype data. We discuss common feature encoding schemes and how studies handle the often small number of samples compared to the huge number of variants. We compare which ML methods are being applied, including recent results using deep neural networks. Further, we review the application of methods for feature explanation and interpretation.
Subject(s)
Genome-Wide Association Study , Genotype , Humans , Machine LearningABSTRACT
Genome-wide CRISPR screens are becoming more widespread and allow the simultaneous interrogation of thousands of genomic regions. Although recent progress has been made in the analysis of CRISPR screens, it is still an open problem how to interpret CRISPR mutations in non-coding regions of the genome. Most of the tools concentrate on the interpretation of mutations introduced in gene coding regions. We introduce a computational pipeline that uses epigenomic information about regulatory elements for the interpretation of CRISPR mutations in non-coding regions. We illustrate our analysis protocol on the analysis of a genome-wide CRISPR screen in hTERT-RPE1 cells and reveal novel regulatory elements that mediate chemoresistance against doxorubicin in these cells. We infer links to established and to novel chemoresistance genes. Our analysis protocol is general and can be applied on any cell type and with different CRISPR enzymes.