RESUMO
Disentangling the relationship of enhancers and genes is an ongoing challenge in epigenomics. We present STARE, our software to quantify the strength of enhancer-gene interactions based on enhancer activity and chromatin contact data. It implements the generalized Activity-by-Contact (gABC) score, which allows predicting putative target genes of candidate enhancers over any desired genomic distance. The only requirement for its application is a measurement of enhancer activity. In addition to regulatory interactions, STARE calculates transcription factor (TF) affinities on gene level. We illustrate its usage on a public single-cell data set of the human heart by predicting regulatory interactions on cell type level, by giving examples on how to integrate them with other data modalities, and by constructing TF affinity matrices.
Assuntos
Cromatina , Elementos Facilitadores Genéticos , Epigenômica , Software , Humanos , Cromatina/genética , Cromatina/metabolismo , Epigenômica/métodos , Epigenoma , Fatores de Transcrição/metabolismo , Fatores de Transcrição/genética , Biologia Computacional/métodosRESUMO
To reveal gene regulation mechanisms, it is essential to understand the role of regulatory elements, which are possibly distant from gene promoters. Integrative analysis of epigenetic and transcriptomic data can be used to gain insights into gene-expression regulation in specific phenotypes. Here, we discuss STITCHIT, an approach to dissect epigenetic variation in a gene-specific manner across many samples for the identification of regulatory elements without relying on peak calling algorithms. The obtained genomic regions are then further refined using a regularized linear model approach, which can also be used to predict gene expression. We illustrate the use of STITCHIT using H3k27ac ChIP-seq and RNA-seq data from the International Human Epigenome Consortium (IHEC).
Assuntos
Epigênese Genética , Epigenômica , Transcriptoma , Humanos , Epigenômica/métodos , Transcriptoma/genética , Elementos Facilitadores Genéticos , Software , Biologia Computacional/métodos , Sequenciamento de Cromatina por Imunoprecipitação/métodos , Regulação da Expressão Gênica , Algoritmos , Histonas/genética , Histonas/metabolismo , Perfilação da Expressão Gênica/métodosRESUMO
CircRNAs are an important class of RNAs with diverse cellular functions in human physiology and disease. A thorough knowledge of circRNAs including their biogenesis and subcellular distribution is important to understand their roles in a wide variety of processes. However, the analysis of circRNAs from total RNA sequencing data remains challenging. Therefore, we developed Calcifer, a versatile workflow for circRNA annotation. Using Calcifer, we analysed APEX-Seq data to compare circRNA occurrence between whole cells, nucleus and subnuclear compartments. We generally find that circRNAs show higher abundance in whole cells compared to nuclear samples, consistent with their accumulation in the cytoplasm. The notable exception is the single-exon circRNA circCANX(9), which is unexpectedly enriched in the nucleus. In addition, we observe that circFIRRE prevails over the linear lncRNA FIRRE in both the cytoplasm and the nucleus. Zooming in on the subnuclear compartments, we show that circRNAs are strongly depleted from nuclear speckles, indicating that excess splicing factors in this compartment counteract back-splicing. Our results thereby provide valuable insights into the subnuclear distribution of circRNAs. Regarding circRNA function, we surprisingly find that the majority of all detected circRNAs possess complete open reading frames with potential for cap-independent translation. Overall, we show that Calcifer is an easy-to-use, versatile and sustainable workflow for the annotation of circRNAs which expands the repertoire of circRNA tools and allows to gain new insights into circRNA distribution and function.
Assuntos
Núcleo Celular , RNA Circular , RNA Circular/genética , RNA Circular/metabolismo , Humanos , Núcleo Celular/metabolismo , Núcleo Celular/genética , Citoplasma/metabolismo , Citoplasma/genética , Fases de Leitura Aberta , Anotação de Sequência Molecular , RNA Longo não Codificante/genética , RNA Longo não Codificante/metabolismo , Splicing de RNA , Biologia Computacional/métodos , Análise de Sequência de RNARESUMO
Constraint-based network modelling is a powerful tool for analysing cellular metabolism at genomic scale. Here, we conducted an integrative analysis of metabolic networks reconstructed from RNA-seq data with paired epigenomic data from the EpiATLAS resource of the International Human Epigenome Consortium (IHEC). Applying a state-of-the-art contextualisation algorithm, we reconstructed metabolic networks across 1,555 samples corresponding to 58 tissues and cell types. Analysis of these networks revealed the distribution of metabolic functionalities across human cell types and provides a compendium of human metabolic activity. This integrative approach allowed us to define, across tissues and cell types, i) reactions that fulfil the basic metabolic processes (core metabolism), and ii) cell type-specific functions (unique metabolism), that shape the metabolic identity of a cell or a tissue. Integration with EpiATLAS-derived cell type-specific gene-level chromatin states and enhancer-gene interactions identified enhancers, transcription factors, and key nodes controlling core and unique metabolism. Transport and first reactions of pathways were enriched for high expression, active chromatin state, and Polycomb-mediated repression in cell types where pathways are inactive, suggesting that key nodes are targets of repression. This integrative analysis forms the basis for identifying regulation points that control metabolic identity in human cells.
RESUMO
Non-coding variants located within regulatory elements may alter gene expression by modifying transcription factor (TF) binding sites, thereby leading to functional consequences. Different TF models are being used to assess the effect of DNA sequence variants, such as single nucleotide variants (SNVs). Often existing methods are slow and do not assess statistical significance of results. We investigated the distribution of absolute maximal differential TF binding scores for general computational models that affect TF binding. We find that a modified Laplace distribution can adequately approximate the empirical distributions. A benchmark on in vitro and in vivo datasets showed that our approach improves upon an existing method in terms of performance and speed. Applications on eQTLs and on a genome-wide association study illustrate the usefulness of our statistics by highlighting cell type-specific regulators and target genes. An implementation of our approach is freely available on GitHub and as bioconda package.
RESUMO
BACKGROUND: Cardiovascular research heavily relies on mouse (Mus musculus) models to study disease mechanisms and to test novel biomarkers and medications. Yet, applying these results to patients remains a major challenge and often results in noneffective drugs. Therefore, it is an open challenge of translational science to develop models with high similarities and predictive value. This requires a comparison of disease models in mice with diseased tissue derived from humans. RESULTS: To compare the transcriptional signatures at single-cell resolution, we implemented an integration pipeline called OrthoIntegrate, which uniquely assigns orthologs and therewith merges single-cell RNA sequencing (scRNA-seq) RNA of different species. The pipeline has been designed to be as easy to use and is fully integrable in the standard Seurat workflow.We applied OrthoIntegrate on scRNA-seq from cardiac tissue of heart failure patients with reduced ejection fraction (HFrEF) and scRNA-seq from the mice after chronic infarction, which is a commonly used mouse model to mimic HFrEF. We discovered shared and distinct regulatory pathways between human HFrEF patients and the corresponding mouse model. Overall, 54% of genes were commonly regulated, including major changes in cardiomyocyte energy metabolism. However, several regulatory pathways (e.g., angiogenesis) were specifically regulated in humans. CONCLUSIONS: The demonstration of unique pathways occurring in humans indicates limitations on the comparability between mice models and human HFrEF and shows that results from the mice model should be validated carefully. OrthoIntegrate is publicly accessible (https://github.com/MarianoRuzJurado/OrthoIntegrate) and can be used to integrate other large datasets to provide a general comparison of models with patient data.
Assuntos
Insuficiência Cardíaca , Humanos , Animais , Camundongos , Insuficiência Cardíaca/genética , Transcriptoma , Volume Sistólico , Metabolismo Energético , RNARESUMO
Midbrain dopaminergic neurons (mDANs) control voluntary movement, cognition, and reward behavior under physiological conditions and are implicated in human diseases such as Parkinson's disease (PD). Many transcription factors (TFs) controlling human mDAN differentiation during development have been described, but much of the regulatory landscape remains undefined. Using a tyrosine hydroxylase (TH) human iPSC reporter line, we here generate time series transcriptomic and epigenomic profiles of purified mDANs during differentiation. Integrative analysis predicts novel regulators of mDAN differentiation and super-enhancers are used to identify key TFs. We find LBX1, NHLH1 and NR2F1/2 to promote mDAN differentiation and show that overexpression of either LBX1 or NHLH1 can also improve mDAN specification. A more detailed investigation of TF targets reveals that NHLH1 promotes the induction of neuronal miR-124, LBX1 regulates cholesterol biosynthesis, and NR2F1/2 controls neuronal activity.
Assuntos
Neurônios Dopaminérgicos , Células-Tronco Pluripotentes Induzidas , Humanos , Neurônios Dopaminérgicos/metabolismo , Multiômica , Mesencéfalo , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Células-Tronco Pluripotentes Induzidas/metabolismo , Diferenciação Celular/genética , Fatores de Transcrição Hélice-Alça-Hélice Básicos/genéticaRESUMO
17-ß-hydroxysteroid dehydrogenase 13 (HSD17B13), a lipid droplet-associated enzyme, is primarily expressed in the liver and plays an important role in lipid metabolism. Targeted inhibition of enzymatic function is a potential therapeutic strategy for treating steatotic liver disease (SLD). The present study is aimed at investigating the effects of the first selective HSD17B13 inhibitor, BI-3231, in a model of hepatocellular lipotoxicity using human cell lines and primary mouse hepatocytes in vitro. Lipotoxicity was induced with palmitic acid in HepG2 cells and freshly isolated mouse hepatocytes and the cells were coincubated with BI-3231 to assess the protective effects. Under lipotoxic stress, triglyceride (TG) accumulation was significantly decreased in the BI-3231-treated cells compared with that of the control untreated human and mouse hepatocytes. In addition, treatment with BI-3231 led to considerable improvement in hepatocyte proliferation, cell differentiation, and lipid homeostasis. Mechanistically, BI-3231 increased the mitochondrial respiratory function without affecting ß-oxidation. BI-3231 inhibited the lipotoxic effects of palmitic acid in hepatocytes, highlighting the potential of targeting HSD17B13 as a specific therapeutic approach in steatotic liver disease.NEW & NOTEWORTHY 17-ß-Hydroxysteroid dehydrogenase 13 (HSD17B13) is a lipid droplet protein primarily expressed in the liver hepatocytes. HSD17B13 is associated with the clinical outcome of chronic liver diseases and is therefore a target for the development of drugs. Here, we demonstrate the promising therapeutic effect of BI-3231 as a potent inhibitor of HSD17B13 based on its ability to inhibit triglyceride accumulation in lipid droplets (LDs), restore lipid metabolism and homeostasis, and increase mitochondrial activity in vitro.
Assuntos
Fígado Gorduroso , Ácido Palmítico , Humanos , Animais , Camundongos , Ácido Palmítico/toxicidade , Inibidores Enzimáticos/farmacologia , Hepatócitos , TriglicerídeosRESUMO
The sharing and documentation of cardiovascular research data are essential for efficient use and reuse of data, thereby aiding scientific transparency, accelerating the progress of cardiovascular research and healthcare, and contributing to the reproducibility of research results. However, challenges remain. This position paper, written on behalf of and approved by the German Cardiac Society and German Centre for Cardiovascular Research, summarizes our current understanding of the challenges in cardiovascular research data management (RDM). These challenges include lack of time, awareness, incentives, and funding for implementing effective RDM; lack of standardization in RDM processes; a need to better identify meaningful and actionable data among the increasing volume and complexity of data being acquired; and a lack of understanding of the legal aspects of data sharing. While several tools exist to increase the degree to which data are findable, accessible, interoperable, and reusable (FAIR), more work is needed to lower the threshold for effective RDM not just in cardiovascular research but in all biomedical research, with data sharing and reuse being factored in at every stage of the scientific process. A culture of open science with FAIR research data should be fostered through education and training of early-career and established research professionals. Ultimately, FAIR RDM requires permanent, long-term effort at all levels. If outcomes can be shown to be superior and to promote better (and better value) science, modern RDM will make a positive difference to cardiovascular science and practice. The full position paper is available in the supplementary materials.
Assuntos
Pesquisa Biomédica , Sistema Cardiovascular , Humanos , Gerenciamento de Dados , Reprodutibilidade dos Testes , CoraçãoRESUMO
For medicine to fulfill its promise of personalized treatments based on a better understanding of disease biology, computational and statistical tools must exist to analyze the increasing amount of patient data that becomes available. A particular challenge is that several types of data are being measured to cope with the complexity of the underlying systems, enhance predictive modeling and enrich molecular understanding. Here we review a number of recent approaches that specialize in the analysis of multimodal data in the context of predictive biomedicine. We focus on methods that combine different OMIC measurements with image or genome variation data. Our overview shows the diversity of methods that address analysis challenges and reveals new avenues for novel developments.
RESUMO
The SARS-CoV-2 pandemic has affected nations globally leading to illness, death, and economic downturn. Why disease severity, ranging from no symptoms to the requirement for extracorporeal membrane oxygenation, varies between patients is still incompletely understood. Consequently, we aimed at understanding the impact of genetic factors on disease severity in infection with SARS-CoV-2. Here, we provide data on demographics, ABO blood group, human leukocyte antigen (HLA) type, as well as next-generation sequencing data of genes in the natural killer cell receptor family, the renin-angiotensin-aldosterone and kallikrein-kinin systems and others in 159 patients with SARS-CoV-2 infection, stratified into seven categories of disease severity. We provide single-nucleotide polymorphism (SNP) data on the patients and a protein structural analysis as a case study on a SNP in the SIGLEC7 gene, which was significantly associated with the clinical score. Our data represent a resource for correlation analyses involving genetic factors and disease severity and may help predict outcomes in infections with future SARS-CoV-2 variants and aid vaccine adaptation.
Assuntos
COVID-19 , Humanos , COVID-19/genética , SARS-CoV-2/genética , Polimorfismo de Nucleotídeo Único , AngiotensinasRESUMO
Transcription factors (TFs) are essential players in orchestrating the regulatory landscape in cells. Still, their exact modes of action and dependencies on other regulatory aspects remain elusive. Since TFs act cell type-specific and each TF has its own characteristics, untangling their regulatory interactions from an experimental point of view is laborious and convoluted. Thus, there is an ongoing development of computational tools that estimate transcription factor activity (TFA) from a variety of data modalities, either based on a mapping of TFs to their putative target genes or in a genome-wide, gene-unspecific fashion. These tools can help to gain insights into TF regulation and to prioritize candidates for experimental validation. We want to give an overview of available computational tools that estimate TFA, illustrate examples of their application, debate common result validation strategies, and discuss assumptions and concomitant limitations.
Assuntos
Regulação da Expressão Gênica , Fatores de Transcrição , Fatores de Transcrição/metabolismo , Genoma , Biologia Computacional , Redes Reguladoras de GenesRESUMO
Several studies suggested that transcription factor (TF) binding to DNA may be impaired or enhanced by DNA methylation. We present MeDeMo, a toolbox for TF motif analysis that combines information about DNA methylation with models capturing intra-motif dependencies. In a large-scale study using ChIP-seq data for 335 TFs, we identify novel TFs that show a binding behaviour associated with DNA methylation. Overall, we find that the presence of CpG methylation decreases the likelihood of binding for the majority of methylation-associated TFs. For a considerable subset of TFs, we show that intra-motif dependencies are pivotal for accurately modelling the impact of DNA methylation on TF binding. We illustrate that the novel methylation-aware TF binding models allow to predict differential ChIP-seq peaks and improve the genome-wide analysis of TF binding. Our work indicates that simplistic models that neglect the effect of DNA methylation on DNA binding may lead to systematic underperformance for methylation-associated TFs.
RESUMO
Fatty liver disease or the accumulation of fat in the liver, has been reported to affect the global population. This comes with an increased risk for the development of fibrosis, cirrhosis, and hepatocellular carcinoma. Yet, little is known about the effects of a diet containing high fat and alcohol towards epigenetic aging, with respect to changes in transcriptional and epigenomic profiles. In this study, we took up a multi-omics approach and integrated gene expression, methylation signals, and chromatin signals to study the epigenomic effects of a high-fat and alcohol-containing diet on mouse hepatocytes. We identified four relevant gene network clusters that were associated with relevant pathways that promote steatosis. Using a machine learning approach, we predict specific transcription factors that might be responsible to modulate the functionally relevant clusters. Finally, we discover four additional CpG loci and validate aging-related differential CpG methylation. Differential CpG methylation linked to aging showed minimal overlap with altered methylation in steatosis.
Assuntos
Epigenômica , Hepatócitos , Camundongos , Animais , Hepatócitos/metabolismo , Fígado/metabolismo , Etanol , Epigênese Genética , Metilação de DNARESUMO
Circular RNAs are generated by backsplicing and control cellular signaling and phenotypes. Pericytes stabilize capillary structures and play important roles in the formation and maintenance of blood vessels. Here, we characterize hypoxia-regulated circular RNAs (circRNAs) in human pericytes and show that the circular RNA of procollagen-lysine,2-oxoglutarate 5-dioxygenase-2 (circPLOD2) is induced by hypoxia and regulates pericyte functions. Silencing of circPLOD2 affects pericytes and increases proliferation, migration, and secretion of soluble angiogenic proteins, thereby enhancing endothelial migration and network capability. Transcriptional and epigenomic profiling of circPLOD2-depleted cells reveals widespread changes in gene expression and identifies the transcription factor krüppel-like factor 4 (KLF4) as a key effector of the circPLOD2-mediated changes. KLF4 depletion mimics circPLOD2 silencing, whereas KLF4 overexpression reverses the effects of circPLOD2 depletion on proliferation and endothelial-pericyte interactions. Together, these data reveal an important function of circPLOD2 in controlling pericyte proliferation and capillary formation and show that the circPLOD2-mediated regulation of KLF4 significantly contributes to the transcriptional response to hypoxia.
Assuntos
Pericitos , RNA Circular , Humanos , Hipóxia/metabolismo , Pericitos/metabolismo , RNA Circular/genética , RNA Circular/metabolismoRESUMO
BACKGROUND: Cardiovascular diseases (CVDs) are the leading cause of death worldwide. Genome-wide association studies (GWAS) have identified many single nucleotide polymorphisms (SNPs) appearing in non-coding genomic regions in CVDs. The SNPs may alter gene expression by modifying transcription factor (TF) binding sites and lead to functional consequences in cardiovascular traits or diseases. To understand the underlying molecular mechanisms, it is crucial to identify which variations are involved and how they affect TF binding. METHODS: The SNEEP (SNP exploration and analysis using epigenomics data) pipeline was used to identify regulatory SNPs, which alter the binding behavior of TFs and link GWAS SNPs to their potential target genes for six CVDs. The human-induced pluripotent stem cells derived cardiomyocytes (hiPSC-CMs), monoculture cardiac organoids (MCOs) and self-organized cardiac organoids (SCOs) were used in the study. Gene expression, cardiomyocyte size and cardiac contractility were assessed. RESULTS: By using our integrative computational pipeline, we identified 1905 regulatory SNPs in CVD GWAS data. These were associated with hundreds of genes, half of them non-coding RNAs (ncRNAs), suggesting novel CVD genes. We experimentally tested 40 CVD-associated non-coding RNAs, among them RP11-98F14.11, RPL23AP92, IGBP1P1, and CTD-2383I20.1, which were upregulated in hiPSC-CMs, MCOs and SCOs under hypoxic conditions. Further experiments showed that IGBP1P1 depletion rescued expression of hypertrophic marker genes, reduced hypoxia-induced cardiomyocyte size and improved hypoxia-reduced cardiac contractility in hiPSC-CMs and MCOs. CONCLUSIONS: IGBP1P1 is a novel ncRNA with key regulatory functions in modulating cardiomyocyte size and cardiac function in our disease models. Our data suggest ncRNA IGBP1P1 as a potential therapeutic target to improve cardiac function in CVDs.
Assuntos
Doenças Cardiovasculares , Polimorfismo de Nucleotídeo Único , Humanos , Polimorfismo de Nucleotídeo Único/genética , Estudo de Associação Genômica Ampla , Doenças Cardiovasculares/genética , Genômica , GenomaRESUMO
Liver cirrhosis is the end stage of all chronic liver diseases and contributes significantly to overall mortality of 2% globally. The age-standardized mortality from liver cirrhosis in Europe is between 10 and 20% and can be explained by not only the development of liver cancer but also the acute deterioration in the patient's overall condition. The development of complications including accumulation of fluid in the abdomen (ascites), bleeding in the gastrointestinal tract (variceal bleeding), bacterial infections, or a decrease in brain function (hepatic encephalopathy) define an acute decompensation that requires therapy and often leads to acute-on-chronic liver failure (ACLF) by different precipitating events. However, due to its complexity and organ-spanning nature, the pathogenesis of ACLF is poorly understood, and the common underlying mechanisms leading to the development of organ dysfunction or failure in ACLF are still elusive. Apart from general intensive care interventions, there are no specific therapy options for ACLF. Liver transplantation is often not possible in these patients due to contraindications and a lack of prioritization. In this review, we describe the framework of the ACLF-I project consortium funded by the Hessian Ministry of Higher Education, Research and the Arts (HMWK) based on existing findings and will provide answers to these open questions.
Assuntos
Insuficiência Hepática Crônica Agudizada , Doença Hepática Terminal , Varizes Esofágicas e Gástricas , Humanos , Doença Hepática Terminal/complicações , Varizes Esofágicas e Gástricas/complicações , Hemorragia Gastrointestinal/complicações , Cirrose Hepática/complicações , Cirrose Hepática/terapia , Insuficiência Hepática Crônica Agudizada/terapia , Insuficiência Hepática Crônica Agudizada/etiologiaRESUMO
MOTIVATION: DNA CpG methylation (CpGm) has proven to be a crucial epigenetic factor in the mammalian gene regulatory system. Assessment of DNA CpG methylation values via whole-genome bisulfite sequencing (WGBS) is, however, computationally extremely demanding. RESULTS: We present FAst MEthylation calling (FAME), the first approach to quantify CpGm values directly from bulk or single-cell WGBS reads without intermediate output files. FAME is very fast but as accurate as standard methods, which first produce BS alignment files before computing CpGm values. We present experiments on bulk and single-cell bisulfite datasets in which we show that data analysis can be significantly sped-up and help addressing the current WGBS analysis bottleneck for large-scale datasets without compromising accuracy. AVAILABILITY AND IMPLEMENTATION: An implementation of FAME is open source and licensed under GPL-3.0 at https://github.com/FischerJo/FAME.
Assuntos
Metilação de DNA , Software , Animais , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sulfitos , DNA/genética , Mamíferos/genéticaRESUMO
AIM: Chemoresistance is a major cause of treatment failure in colorectal cancer (CRC) therapy. In this study, the impact of the IGF2BP family of RNA-binding proteins on CRC chemoresistance was investigated using in silico, in vitro, and in vivo approaches. METHODS: Gene expression data from a well-characterized cohort and publicly available cross-linking immunoprecipitation sequencing (CLIP-Seq) data were collected. Resistance to chemotherapeutics was assessed in patient-derived xenografts (PDXs) and patient-derived organoids (PDOs). Functional studies were performed in 2D and 3D cell culture models, including proliferation, spheroid growth, and mitochondrial respiration analyses. RESULTS: We identified IGF2BP2 as the most abundant IGF2BP in primary and metastastatic CRC, correlating with tumor stage in patient samples and tumor growth in PDXs. IGF2BP2 expression in primary tumor tissue was significantly associated with resistance to selumetinib, gefitinib, and regorafenib in PDOs and to 5-fluorouracil and oxaliplatin in PDX in vivo. IGF2BP2 knockout (KO) HCT116 cells were more susceptible to regorafenib in 2D and to oxaliplatin, selumitinib, and nintedanib in 3D cell culture. Further, a bioinformatic analysis using CLIP data suggested stabilization of target transcripts in primary and metastatic tumors. Measurement of oxygen consumption rate (OCR) and extracellular acidification rate (ECAR) revealed a decreased basal OCR and an increase in glycolytic ATP production rate in IGF2BP2 KO. In addition, real-time reverse transcriptase polymerase chain reaction (qPCR) analysis confirmed decreased expression of genes of the respiratory chain complex I, complex IV, and the outer mitochondrial membrane in IGF2BP2 KO cells. CONCLUSIONS: IGF2BP2 correlates with CRC tumor growth in vivo and promotes chemoresistance by altering mitochondrial respiratory chain metabolism. As a druggable target, IGF2BP2 could be used in future CRC therapy to overcome CRC chemoresistance.
Assuntos
Neoplasias Colorretais , Humanos , Oxaliplatina/farmacologia , Neoplasias Colorretais/tratamento farmacológico , Neoplasias Colorretais/genética , Neoplasias Colorretais/patologia , Resistencia a Medicamentos Antineoplásicos/genética , Proteínas de Ligação a RNA/genética , Proteínas de Ligação a RNA/metabolismo , Linhagem Celular Tumoral , Proliferação de Células/genética , Regulação Neoplásica da Expressão GênicaRESUMO
MOTIVATION: Identifying regulatory regions in the genome is of great interest for understanding the epigenomic landscape in cells. One fundamental challenge in this context is to find the target genes whose expression is affected by the regulatory regions. A recent successful method is the Activity-By-Contact (ABC) model which scores enhancer-gene interactions based on enhancer activity and the contact frequency of an enhancer to its target gene. However, it describes regulatory interactions entirely from a gene's perspective, and does not account for all the candidate target genes of an enhancer. In addition, the ABC model requires two types of assays to measure enhancer activity, which limits the applicability. Moreover, there is neither implementation available that could allow for an integration with transcription factor (TF) binding information nor an efficient analysis of single-cell data. RESULTS: We demonstrate that the ABC score can yield a higher accuracy by adapting the enhancer activity according to the number of contacts the enhancer has to its candidate target genes and also by considering all annotated transcription start sites of a gene. Further, we show that the model is comparably accurate with only one assay to measure enhancer activity. We combined our generalized ABC model with TF binding information and illustrated an analysis of a single-cell ATAC-seq dataset of the human heart, where we were able to characterize cell type-specific regulatory interactions and predict gene expression based on TF affinities. All executed processing steps are incorporated into our new computational pipeline STARE. AVAILABILITY AND IMPLEMENTATION: The software is available at https://github.com/schulzlab/STARE. CONTACT: marcel.schulz@em.uni-frankfurt.de. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.