Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 148
Filtrar
1.
NAR Genom Bioinform ; 6(2): lqae050, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38711859

RESUMO

Delineating the intricate interplay between promoter-proximal and -distal regulators is crucial for understanding the function of transcriptional mediator complexes implicated in the regulation of gene expression. The present study aimed to develop a computational method for accurately modeling the spatial proximal and distal regulatory interactions. Our method combined regression-based models to identify key regulators through gene expression prediction and a graph-embedding approach to detect coregulated genes. This approach enabled a detailed investigation of the gene regulatory mechanisms for germinal center B cells, accompanied by dramatic rearrangements of the genome structure. We found that while the promoter-proximal regulatory elements were the principal regulators of gene expression, the distal regulators fine-tuned transcription. Moreover, our approach unveiled the presence of modular regulators, such as cofactors and proximal/distal transcription factors, which were co-expressed with their target genes. Some of these modules exhibited abnormal expression patterns in lymphoma. These findings suggest that the dysregulation of interactions between transcriptional and architectural factors is associated with chromatin reorganization failure, which may increase the risk of malignancy. Therefore, our computational approach helps decipher the transcriptional cis-regulatory code spatially interacting.

2.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38581422

RESUMO

Reliable cell type annotations are crucial for investigating cellular heterogeneity in single-cell omics data. Although various computational approaches have been proposed for single-cell RNA sequencing (scRNA-seq) annotation, high-quality cell labels are still lacking in single-cell sequencing assay for transposase-accessible chromatin (scATAC-seq) data, because of extreme sparsity and inconsistent chromatin accessibility between datasets. Here, we present a novel automated cell annotation method that transfers cell type information from a well-labeled scRNA-seq reference to an unlabeled scATAC-seq target, via a parallel graph neural network, in a semi-supervised manner. Unlike existing methods that utilize only gene expression or gene activity features, HyGAnno leverages genome-wide accessibility peak features to facilitate the training process. In addition, HyGAnno reconstructs a reference-target cell graph to detect cells with low prediction reliability, according to their specific graph connectivity patterns. HyGAnno was assessed across various datasets, showcasing its strengths in precise cell annotation, generating interpretable cell embeddings, robustness to noisy reference data and adaptability to tumor tissues.


Assuntos
Cromatina , Redes Neurais de Computação , Reprodutibilidade dos Testes
3.
PeerJ ; 12: e17073, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38500529

RESUMO

Background: Observational studies have demonstrated that a higher resting heart rate (RHR) is associated with an increased risk of dementia. However, it is not clear whether the association is causal. This study aimed to determine the causal effects of higher genetically predicted RHR on the risk of dementia. Methods: We performed a two-sample Mendelian randomization analysis to investigate the causal effect of higher genetically predicted RHR on Alzheimer's disease (AD) using summary statistics from genome-wide association studies. The generalized summary Mendelian randomization (GSMR) analysis was used to analyze the corresponding effects of RHR on following different outcomes: 1) diagnosis of AD (International Genomics of Alzheimer's Project), 2) family history (maternal and paternal) of AD from UK Biobank, 3) combined meta-analysis including these three GWAS results. Further analyses were conducted to determine the possibility of reverse causal association by adjusting for RHR modifying medication. Results: The results of GSMR showed no significant causal effect of higher genetically predicted RHR on the risk of AD (ßGSMR = 0.12, P = 0.30). GSMR applied to the maternal family history of AD (ßGSMR = -0.18, P = 0.13) and to the paternal family history of AD (ßGSMR = -0.14, P = 0.39) showed the same results. Furthermore, the results were robust after adjusting for RHR modifying drugs (ßGSMR = -0.03, P = 0.72). Conclusion: Our study did not find any evidence that supports a causal effect of RHR on dementia. Previous observational associations between RHR and dementia are likely attributed to the correlation between RHR and other cardiovascular diseases.


Assuntos
Doença de Alzheimer , Estudo de Associação Genômica Ampla , Humanos , Doença de Alzheimer/epidemiologia , Bancos de Espécimes Biológicos , Frequência Cardíaca/genética , Análise da Randomização Mendeliana , Biobanco do Reino Unido , Metanálise como Assunto
4.
Nat Genet ; 56(3): 473-482, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38361031

RESUMO

Chromatin accessibility is a hallmark of active regulatory regions and is functionally linked to transcriptional networks and cell identity. However, the molecular mechanisms and networks that govern chromatin accessibility have not been thoroughly studied. Here we conducted a genome-wide CRISPR screening combined with an optimized ATAC-see protocol to identify genes that modulate global chromatin accessibility. In addition to known chromatin regulators like CREBBP and EP400, we discovered a number of previously unrecognized proteins that modulate chromatin accessibility, including TFDP1, HNRNPU, EIF3D and THAP11 belonging to diverse biological pathways. ATAC-seq analysis upon their knockouts revealed their distinct and specific effects on chromatin accessibility. Remarkably, we found that TFDP1, a transcription factor, modulates global chromatin accessibility through transcriptional regulation of canonical histones. In addition, our findings highlight the manipulation of chromatin accessibility as an approach to enhance various cell engineering applications, including genome editing and induced pluripotent stem cell reprogramming.


Assuntos
Cromatina , Sequenciamento de Nucleotídeos em Larga Escala , Cromatina/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Histonas/genética , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Redes Reguladoras de Genes
5.
Nucleic Acids Res ; 52(3): 1107-1119, 2024 Feb 09.
Artigo em Inglês | MEDLINE | ID: mdl-38084904

RESUMO

In this research, we elucidate the presence of around 11,000 housekeeping cis-regulatory elements (HK-CREs) and describe their main characteristics. Besides the trivial promoters of housekeeping genes, most HK-CREs reside in promoter regions and are involved in a broader role beyond housekeeping gene regulation. HK-CREs are conserved regions rich in unmethylated CpG sites. Their distribution highly correlates with that of protein-coding genes, and they interact with many genes over long distances. We observed reduced activity of a subset of HK-CREs in diverse cancer subtypes due to aberrant methylation, particularly those located in chromosome 19 and associated with zinc finger genes. Further analysis of samples from 17 cancer subtypes showed a significantly increased survival probability of patients with higher expression of these genes, suggesting them as housekeeping tumor suppressor genes. Overall, our work unravels the presence of housekeeping CREs indispensable for the maintenance and stability of cells.


Assuntos
Neoplasias , Sequências Reguladoras de Ácido Nucleico , Humanos , Regiões Promotoras Genéticas , Regulação da Expressão Gênica , Neoplasias/genética , Epigênese Genética
6.
Brief Bioinform ; 24(6)2023 09 22.
Artigo em Inglês | MEDLINE | ID: mdl-37861173

RESUMO

NcRNA-encoded small peptides (ncPEPs) have recently emerged as promising targets and biomarkers for cancer immunotherapy. Therefore, identifying cancer-associated ncPEPs is crucial for cancer research. In this work, we propose CoraL, a novel supervised contrastive meta-learning framework for predicting cancer-associated ncPEPs. Specifically, the proposed meta-learning strategy enables our model to learn meta-knowledge from different types of peptides and train a promising predictive model even with few labeled samples. The results show that our model is capable of making high-confidence predictions on unseen cancer biomarkers with only five samples, potentially accelerating the discovery of novel cancer biomarkers for immunotherapy. Moreover, our approach remarkably outperforms existing deep learning models on 15 cancer-associated ncPEPs datasets, demonstrating its effectiveness and robustness. Interestingly, our model exhibits outstanding performance when extended for the identification of short open reading frames derived from ncPEPs, demonstrating the strong prediction ability of CoraL at the transcriptome level. Importantly, our feature interpretation analysis discovers unique sequential patterns as the fingerprint for each cancer-associated ncPEPs, revealing the relationship among certain cancer biomarkers that are validated by relevant literature and motif comparison. Overall, we expect CoraL to be a useful tool to decipher the pathogenesis of cancer and provide valuable information for cancer research. The dataset and source code of our proposed method can be found at https://github.com/Johnsunnn/CoraL.


Assuntos
Antozoários , Neoplasias , Animais , Antozoários/genética , Neoplasias/genética , Biomarcadores Tumorais/genética , Imunoterapia , Peptídeos/genética , RNA não Traduzido
8.
Nat Aging ; 3(8): 1001-1019, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37474791

RESUMO

Protein misfolding is a major factor of neurodegenerative diseases. Post-mitotic neurons are highly susceptible to protein aggregates that are not diluted by mitosis. Therefore, post-mitotic cells may have a specific protein quality control system. Here, we show that LONRF2 is a bona fide protein quality control ubiquitin ligase induced in post-mitotic senescent cells. Under unperturbed conditions, LONRF2 is predominantly expressed in neurons. LONRF2 binds and ubiquitylates abnormally structured TDP-43 and hnRNP M1 and artificially misfolded proteins. Lonrf2-/- mice exhibit age-dependent TDP-43-mediated motor neuron (MN) degeneration and cerebellar ataxia. Mouse induced pluripotent stem cell-derived MNs lacking LONRF2 showed reduced survival, shortening of neurites and accumulation of pTDP-43 and G3BP1 after long-term culture. The shortening of neurites in MNs from patients with amyotrophic lateral sclerosis is rescued by ectopic expression of LONRF2. Our findings reveal that LONRF2 is a protein quality control ligase whose loss may contribute to MN degeneration and motor deficits.


Assuntos
Neurônios Motores , Ubiquitina , Camundongos , Animais , Neurônios Motores/metabolismo , Ubiquitina/metabolismo , Ligases/metabolismo , DNA Helicases/metabolismo , Proteínas de Ligação a Poli-ADP-Ribose/metabolismo , RNA Helicases/metabolismo , Proteínas com Motivo de Reconhecimento de RNA/metabolismo , Proteínas de Ligação a DNA/genética
9.
Nucleic Acids Res ; 51(7): 3017-3029, 2023 04 24.
Artigo em Inglês | MEDLINE | ID: mdl-36796796

RESUMO

Here, we present DeepBIO, the first-of-its-kind automated and interpretable deep-learning platform for high-throughput biological sequence functional analysis. DeepBIO is a one-stop-shop web service that enables researchers to develop new deep-learning architectures to answer any biological question. Specifically, given any biological sequence data, DeepBIO supports a total of 42 state-of-the-art deep-learning algorithms for model training, comparison, optimization and evaluation in a fully automated pipeline. DeepBIO provides a comprehensive result visualization analysis for predictive models covering several aspects, such as model interpretability, feature analysis and functional sequential region discovery. Additionally, DeepBIO supports nine base-level functional annotation tasks using deep-learning architectures, with comprehensive interpretations and graphical visualizations to validate the reliability of annotated sites. Empowered by high-performance computers, DeepBIO allows ultra-fast prediction with up to million-scale sequence data in a few hours, demonstrating its usability in real application scenarios. Case study results show that DeepBIO provides an accurate, robust and interpretable prediction, demonstrating the power of deep learning in biological sequence functional analysis. Overall, we expect DeepBIO to ensure the reproducibility of deep-learning biological sequence analysis, lessen the programming and hardware burden for biologists and provide meaningful functional insights at both the sequence level and base level from biological sequences alone. DeepBIO is publicly available at https://inner.wei-group.net/DeepBIO.


The development of next-generation sequencing techniques has led to an exponential increase in the amount of biological sequence data accessible. It naturally poses a fundamental challenge­how to build the relationships from such large-scale sequences to their functions. In this work, we present DeepBIO, the first-of-its-kind automated and interpretable deep-learning platform for high-throughput biological sequence functional analysis. It enables researchers to develop new deep-learning architectures to answer any biological question in a fully automated pipeline. We expect DeepBIO to ensure the reproducibility of deep-learning-based biological sequence analysis, lessen the programming and hardware burden for biologists and provide meaningful functional insights at both the sequence level and base level from biological sequences alone.


Assuntos
Aprendizado Profundo , Reprodutibilidade dos Testes , Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala
10.
Front Immunol ; 14: 1304778, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38173717

RESUMO

Macrophages display extreme plasticity, and the mechanisms and applications of polarization and de-/repolarization of macrophages have been extensively investigated. However, the regulation of macrophage hysteresis after de-/repolarization remains unclear. In this study, by using a large-scale computational analysis of macrophage multi-omics data, we report a list of hysteresis genes that maintain their expression patterns after polarization and de-/repolarization. While the polarization in M1 macrophages leads to a higher level of hysteresis in genes associated with cell cycle progression, cell migration, and enhancement of the immune response, we found weak levels of hysteresis after M2 polarization. During the polarization process from M0 to M1 and back to M0, the factors IRFs/STAT, AP-1, and CTCF regulate hysteresis by altering their binding sites to the chromatin. Overall, our results show that a history of polarization can lead to hysteresis in gene expression and chromatin accessibility over a given period. This study contributes to the understanding of de-/repolarization memory in macrophages.


Assuntos
Cromatina , Fator de Transcrição AP-1 , Fator de Transcrição AP-1/genética , Fator de Transcrição AP-1/metabolismo , Cromatina/genética , Cromatina/metabolismo , Multiômica , Macrófagos
11.
NAR Genom Bioinform ; 4(4): lqac087, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-36458020

RESUMO

Several factors, including tissue origins and culture conditions, affect the gene expression of undifferentiated stem cells. However, understanding the basic identity across different stem cells has not been pursued well despite its importance in stem cell biology. Thus, we aimed to rank the relative importance of multiple factors to gene expression profile among undifferentiated human stem cells by analyzing publicly available RNA-seq datasets. We first conducted batch effect correction to avoid undefined variance in the dataset as possible. Then, we highlighted the relative impact of biological and technical factors among undifferentiated stem cell types: a more influence on tissue origins in induced pluripotent stem cells than in other stem cell types; a stronger impact of culture condition in embryonic stem cells and somatic stem cell types, including mesenchymal stem cells and hematopoietic stem cells. In addition, we found that a characteristic gene module, enriched in histones, exhibits higher expression across different stem cell types that were annotated by specific culture conditions. This tendency was also observed in mouse stem cell RNA-seq data. Our findings would help to obtain general insights into stem cell quality, such as the balance of differentiation potentials that undifferentiated stem cells possess.

12.
Genome Biol ; 23(1): 219, 2022 10 17.
Artigo em Inglês | MEDLINE | ID: mdl-36253864

RESUMO

In this study, we propose iDNA-ABF, a multi-scale deep biological language learning model that enables the interpretable prediction of DNA methylations based on genomic sequences only. Benchmarking comparisons show that our iDNA-ABF outperforms state-of-the-art methods for different methylation predictions. Importantly, we show the power of deep language learning in capturing both sequential and functional semantics information from background genomes. Moreover, by integrating the interpretable analysis mechanism, we well explain what the model learns, helping us build the mapping from the discovery of important sequential determinants to the in-depth analysis of their biological functions.


Assuntos
Metilação de DNA , Idioma , Genômica , Modelos Biológicos
13.
Front Bioinform ; 2: 910531, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36304291

RESUMO

Prediction of subcellular localization of proteins from their amino acid sequences has a long history in bioinformatics and is still actively developing, incorporating the latest advances in machine learning and proteomics. Notably, deep learning-based methods for natural language processing have made great contributions. Here, we review recent advances in the field as well as its related fields, such as subcellular proteomics and the prediction/recognition of subcellular localization from image data.

14.
Metabolites ; 12(7)2022 Jul 11.
Artigo em Inglês | MEDLINE | ID: mdl-35888758

RESUMO

Taurine, a sulfur-containing ß-amino acid, is present at high concentrations in mammalian tissues and plays an important role in several essential biological processes. However, the genetic mechanisms involved in these physiological processes associated with taurine remain unclear. In this study, we investigated the regulatory mechanism underlying the taurine-induced transcriptional enhancement of the thioredoxin-interacting protein (TXNIP). The results showed that taurine significantly increased the luciferase activity of the human TXNIP promoter. Further, deletion analysis of the TXNIP promoter showed that taurine induced luciferase activity only in the TXNIP promoter region (+200 to +218). Furthermore, by employing a bioinformatic analysis using the TRANSFAC database, we focused on Tst-1 and Ets-1 as candidates involved in taurine-induced transcription and found that the mutation in the Ets-1 sequence did not enhance transcriptional activity by taurine. Additionally, chromatin immunoprecipitation assays indicated that the binding of Ets-1 to the TXNIP promoter region was enhanced by taurine. Taurine also increased the levels of phosphorylated Ets-1, indicating activation of Ets-1 pathway by taurine. Moreover, an ERK cascade inhibitor significantly suppressed the taurine-induced increase in TXNIP mRNA levels and transcriptional enhancement of TXNIP. These results suggest that taurine enhances TXNIP expression by activating transcription factor Ets-1 via the ERK cascade.

15.
Nat Commun ; 13(1): 4063, 2022 07 13.
Artigo em Inglês | MEDLINE | ID: mdl-35831322

RESUMO

Point-mutations of MEK1, a central component of ERK signaling, are present in cancer and RASopathies, but their precise biological effects remain obscure. Here, we report a mutant MEK1 structure that uncovers the mechanisms underlying abnormal activities of cancer- and RASopathy-associated MEK1 mutants. These two classes of MEK1 mutations differentially impact on spatiotemporal dynamics of ERK signaling, cellular transcriptional programs, gene expression profiles, and consequent biological outcomes. By making use of such distinct characteristics of the MEK1 mutants, we identified cancer- and RASopathy-signature genes that may serve as diagnostic markers or therapeutic targets for these diseases. In particular, two AKT-inhibitor molecules, PHLDA1 and 2, are simultaneously upregulated by oncogenic ERK signaling, and mediate cancer-specific ERK-AKT crosstalk. The combined expression of PHLDA1/2 is critical to confer resistance to ERK pathway-targeted therapeutics on cancer cells. Finally, we propose a therapeutic strategy to overcome this drug resistance. Our data provide vital insights into the etiology, diagnosis, and therapeutic strategy of cancers and RASopathies.


Assuntos
Neoplasias , Proteínas Proto-Oncogênicas c-akt , Humanos , MAP Quinase Quinase 1/genética , Sistema de Sinalização das MAP Quinases/genética , Quinases de Proteína Quinase Ativadas por Mitógeno/metabolismo , Neoplasias/metabolismo , Inibidores de Proteínas Quinases/farmacologia , Inibidores de Proteínas Quinases/uso terapêutico , Proteínas Proto-Oncogênicas c-akt/genética , Proteínas Proto-Oncogênicas c-akt/metabolismo , Transdução de Sinais/genética
16.
Bioinformatics ; 38(13): 3351-3360, 2022 06 27.
Artigo em Inglês | MEDLINE | ID: mdl-35604077

RESUMO

SUMMARY: Identifying the protein-peptide binding residues is fundamentally important to understand the mechanisms of protein functions and explore drug discovery. Although several computational methods have been developed, most of them highly rely on third-party tools or complex data preprocessing for feature design, easily resulting in low computational efficacy and suffering from low predictive performance. To address the limitations, we propose PepBCL, a novel BERT (Bidirectional Encoder Representation from Transformers) -based contrastive learning framework to predict the protein-peptide binding residues based on protein sequences only. PepBCL is an end-to-end predictive model that is independent of feature engineering. Specifically, we introduce a well pre-trained protein language model that can automatically extract and learn high-latent representations of protein sequences relevant for protein structures and functions. Further, we design a novel contrastive learning module to optimize the feature representations of binding residues underlying the imbalanced dataset. We demonstrate that our proposed method significantly outperforms the state-of-the-art methods under benchmarking comparison, and achieves more robust performance. Moreover, we found that we further improve the performance via the integration of traditional features and our learnt features. Interestingly, the interpretable analysis of our model highlights the flexibility and adaptability of deep learning-based protein language model to capture both conserved and non-conserved sequential characteristics of peptide-binding residues. Finally, to facilitate the use of our method, we establish an online predictive platform as the implementation of the proposed PepBCL, which is now available at http://server.wei-group.net/PepBCL/. AVAILABILITY AND IMPLEMENTATION: https://github.com/Ruheng-W/PepBCL. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Aprendizado Profundo , Proteínas/química , Peptídeos , Ligação Proteica , Sequência de Aminoácidos
17.
Nucleic Acids Res ; 50(9): 4877-4899, 2022 05 20.
Artigo em Inglês | MEDLINE | ID: mdl-35524568

RESUMO

With the advent of single-cell RNA sequencing (scRNA-seq), one major challenging is the so-called 'dropout' events that distort gene expression and remarkably influence downstream analysis in single-cell transcriptome. To address this issue, much effort has been done and several scRNA-seq imputation methods were developed with two categories: model-based and deep learning-based. However, comprehensively and systematically comparing existing methods are still lacking. In this work, we use six simulated and two real scRNA-seq datasets to comprehensively evaluate and compare a total of 12 available imputation methods from the following four aspects: (i) gene expression recovering, (ii) cell clustering, (iii) gene differential expression, and (iv) cellular trajectory reconstruction. We demonstrate that deep learning-based approaches generally exhibit better overall performance than model-based approaches under major benchmarking comparison, indicating the power of deep learning for imputation. Importantly, we built scIMC (single-cell Imputation Methods Comparison platform), the first online platform that integrates all available state-of-the-art imputation methods for benchmarking comparison and visualization analysis, which is expected to be a convenient and useful tool for researchers of interest. It is now freely accessible via https://server.wei-group.net/scIMC/.


Assuntos
Perfilação da Expressão Gênica , Análise de Sequência de RNA , Análise de Célula Única , Benchmarking , Análise por Conglomerados , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Software
18.
Brief Bioinform ; 23(3)2022 05 13.
Artigo em Inglês | MEDLINE | ID: mdl-35348602

RESUMO

Proteins with desired functions and properties are important in fields like nanotechnology and biomedicine. De novo protein design enables the production of previously unseen proteins from the ground up and is believed as a key point for handling real social challenges. Recent introduction of deep learning into design methods exhibits a transformative influence and is expected to represent a promising and exciting future direction. In this review, we retrospect the major aspects of current advances in deep-learning-based design procedures and illustrate their novelty in comparison with conventional knowledge-based approaches through noticeable cases. We not only describe deep learning developments in structure-based protein design and direct sequence design, but also highlight recent applications of deep reinforcement learning in protein design. The future perspectives on design goals, challenges and opportunities are also comprehensively discussed.


Assuntos
Aprendizado Profundo , Bases de Conhecimento , Proteínas
19.
PLoS One ; 16(8): e0243595, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34424899

RESUMO

Currently, the population dynamics of preclonal cancer cells before clonal expansion of tumors has not been sufficiently addressed thus far. By focusing on preclonal cancer cell population as a Darwinian evolutionary system, we formulated and analyzed the observed mutation frequency among tumors (MFaT) as a proxy for the hypothesized sequence read frequency and beneficial fitness effect of a cancer driver mutation. Analogous to intestinal crypts, we assumed that sample donor patients are separate culture tanks where proliferating cells follow certain population dynamics described by extreme value theory (EVT). To validate this, we analyzed three large-scale cancer genome datasets, each harboring > 10000 tumor samples and in total involving > 177898 observed mutation sites. We clarified the necessary premises for the application of EVT in the strong selection and weak mutation (SSWM) regime in relation to cancer genome sequences at scale. We also confirmed that the stochastic distribution of MFaT is likely of the Fréchet type, which challenges the well-known Gumbel hypothesis of beneficial fitness effects. Based on statistical data analysis, we demonstrated the potential of EVT as a population genetics framework to understand and explain the stochastic behavior of driver-mutation frequency in cancer genomes as well as its applicability in real cancer genome sequence data.


Assuntos
Genoma/genética , Mutação/genética , Neoplasias/genética , Evolução Biológica , Genética Populacional/métodos , Humanos , Taxa de Mutação
20.
Front Genet ; 12: 681259, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34211503

RESUMO

Promoters and enhancers are well-known regulatory elements modulating gene expression. As confirmed by high-throughput sequencing technologies, these regulatory elements are bidirectionally transcribed. That is, promoters produce stable mRNA in the sense direction and unstable RNA in the antisense direction, while enhancers transcribe unstable RNA in both directions. Although it is thought that enhancers and promoters share a similar architecture of transcription start sites (TSSs), how the transcriptional machinery distinctly uses these genomic regions as promoters or enhancers remains unclear. To address this issue, we developed a deep learning (DL) method by utilizing a convolutional neural network (CNN) and the saliency algorithm. In comparison with other classifiers, our CNN presented higher predictive performance, suggesting the overarching importance of the high-order sequence features, captured by the CNN. Moreover, our method revealed that there are substantial sequence differences between the enhancers and promoters. Remarkably, the 20-120 bp downstream regions from the center of bidirectional TSSs seemed to contribute to the RNA stability. These regions in promoters tend to have a larger number of guanines and cytosines compared to those in enhancers, and this feature contributed to the classification of the regulatory elements. Our CNN-based method can capture the complex TSS architectures. We found that the genomic regions around TSSs for promoters and enhancers contribute to RNA stability and show GC-biased characteristics as a critical determinant for promoter TSSs.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...