Pesquisa | Biblioteca Virtual em Saúde Fiocruz

Understanding YTHDF2-mediated mRNA degradation by m6A-BERT-Deg.

Zhang, Ting-He; Jo, Sumin; Zhang, Michelle; Wang, Kai; Gao, Shou-Jiang; Huang, Yufei.

Brief Bioinform ; 25(3)2024 Mar 27.

Artigo em Inglês | MEDLINE | ID: mdl-38622358

RESUMO

N6-methyladenosine (m6A) is the most abundant mRNA modification within mammalian cells, holding pivotal significance in the regulation of mRNA stability, translation and splicing. Furthermore, it plays a critical role in the regulation of RNA degradation by primarily recruiting the YTHDF2 reader protein. However, the selective regulation of mRNA decay of the m6A-methylated mRNA through YTHDF2 binding is poorly understood. To improve our understanding, we developed m6A-BERT-Deg, a BERT model adapted for predicting YTHDF2-mediated degradation of m6A-methylated mRNAs. We meticulously assembled a high-quality training dataset by integrating multiple data sources for the HeLa cell line. To overcome the limitation of small training samples, we employed a pre-training-fine-tuning strategy by first performing a self-supervised pre-training of the model on 427 760 unlabeled m6A site sequences. The test results demonstrated the importance of this pre-training strategy in enabling m6A-BERT-Deg to outperform other benchmark models. We further conducted a comprehensive model interpretation and revealed a surprising finding that the presence of co-factors in proximity to m6A sites may disrupt YTHDF2-mediated mRNA degradation, subsequently enhancing mRNA stability. We also extended our analyses to the HEK293 cell line, shedding light on the context-dependent YTHDF2-mediated mRNA degradation.

Assuntos

Adenina , Proteínas de Ligação a RNA , Fatores de Transcrição , Animais , Humanos , Células HEK293 , Células HeLa , Estabilidade de RNA , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Proteínas de Ligação a RNA/genética , Proteínas de Ligação a RNA/metabolismo , Fatores de Transcrição/metabolismo

ES-ARCNN: Predicting enhancer strength by using data augmentation and residual convolutional neural network.

Zhang, Ting-He; Flores, Mario; Huang, Yufei.

Anal Biochem ; 618: 114120, 2021 04 01.

Artigo em Inglês | MEDLINE | ID: mdl-33535061

RESUMO

Enhancers are non-coding DNA sequences bound by proteins called transcription factors. They function as distant regulators of gene transcription and participate in the development and maintenance of cell types and tissues. Since experimental validation of enhancers is expensive and time-consuming, many computational methods have been developed to predict enhancers and their strength. However, most of these methods still lack good performance in the prediction of enhancer strength. Here, we present a method to predict Enhancers Strength (i.e., strong and weak) by using Augmented data and Residual Convolutional Neural Network (ES-ARCNN). To train ES-ARCNN, we used two data augmentation tricks (i.e., reverse complement and shift) to previously identified enhancers for enlarging a previously identified dataset of enhancers. We further employed a residual convolutional neural network and trained it using the augmented dataset. Compared with other state-of-the-art methods in the 10-fold cross-validation (CV) test, ES-ARCNN has the best performance with the accuracy of 66.17%, and the tricks of data augmentation can effectively improve the prediction performance. We further tested ES-ARCNN on an independent dataset and obtained 65.5% accuracy, which has more than 4% improvement over the other three existing methods. The results in 10CV and independent tests show that ES-ARCNN can effectively predict the enhancer strength. The transcription factor binding sites (TFBSs) enrichment analysis shows that from the mechanistic perspective, enhancer strength is associated with a higher density of important TFBSs in a tissue. A user-friendly web-application is also provided at http://compgenomics.utsa.edu/ES-ARCNN/.

Assuntos

Biologia Computacional , Bases de Dados Genéticas , Elementos Facilitadores Genéticos , Modelos Genéticos , Redes Neurais de Computação , Fatores de Transcrição/metabolismo , Humanos

MSLoc-DT: a new method for predicting the protein subcellular location of multispecies based on decision templates.

Zhang, Shao-Wu; Liu, Yan-Fang; Yu, Yong; Zhang, Ting-He; Fan, Xiao-Nan.

Anal Biochem ; 449: 164-71, 2014 Mar 15.

Artigo em Inglês | MEDLINE | ID: mdl-24361712

RESUMO

Revealing the subcellular location of newly discovered protein sequences can bring insight to their function and guide research at the cellular level. The rapidly increasing number of sequences entering the genome databanks has called for the development of automated analysis methods. Currently, most existing methods used to predict protein subcellular locations cover only one, or a very limited number of species. Therefore, it is necessary to develop reliable and effective computational approaches to further improve the performance of protein subcellular prediction and, at the same time, cover more species. The current study reports the development of a novel predictor called MSLoc-DT to predict the protein subcellular locations of human, animal, plant, bacteria, virus, fungi, and archaea by introducing a novel feature extraction approach termed Amino Acid Index Distribution (AAID) and then fusing gene ontology information, sequential evolutionary information, and sequence statistical information through four different modes of pseudo amino acid composition (PseAAC) with a decision template rule. Using the jackknife test, MSLoc-DT can achieve 86.5, 98.3, 90.3, 98.5, 95.9, 98.1, and 99.3% overall accuracy for human, animal, plant, bacteria, virus, fungi, and archaea, respectively, on seven stringent benchmark datasets. Compared with other predictors (e.g., Gpos-PLoc, Gneg-PLoc, Virus-PLoc, Plant-PLoc, Plant-mPLoc, ProLoc-Go, Hum-PLoc, GOASVM) on the gram-positive, gram-negative, virus, plant, eukaryotic, and human datasets, the new MSLoc-DT predictor is much more effective and robust. Although the MSLoc-DT predictor is designed to predict the single location of proteins, our method can be extended to multiple locations of proteins by introducing multilabel machine learning approaches, such as the support vector machine and deep learning, as substitutes for the K-nearest neighbor (KNN) method. As a user-friendly web server, MSLoc-DT is freely accessible at http://bioinfo.ibp.ac.cn/MSLOC_DT/index.html.

Assuntos

Inteligência Artificial , Biologia Computacional/métodos , Proteínas/análise , Frações Subcelulares/química , Sequência de Aminoácidos , Animais , Bases de Dados de Proteínas , Ontologia Genética , Humanos , Dados de Sequência Molecular

Prediction of protein-protein interaction with pairwise kernel support vector machine.

Zhang, Shao-Wu; Hao, Li-Yang; Zhang, Ting-He.

Int J Mol Sci ; 15(2): 3220-33, 2014 Feb 21.

Artigo em Inglês | MEDLINE | ID: mdl-24566145

RESUMO

Protein-protein interactions (PPIs) play a key role in many cellular processes. Unfortunately, the experimental methods currently used to identify PPIs are both time-consuming and expensive. These obstacles could be overcome by developing computational approaches to predict PPIs. Here, we report two methods of amino acids feature extraction: (i) distance frequency with PCA reducing the dimension (DFPCA) and (ii) amino acid index distribution (AAID) representing the protein sequences. In order to obtain the most robust and reliable results for PPI prediction, pairwise kernel function and support vector machines (SVM) were employed to avoid the concatenation order of two feature vectors generated with two proteins. The highest prediction accuracies of AAID and DFPCA were 94% and 93.96%, respectively, using the 10 CV test, and the results of pairwise radial basis kernel function are considerably improved over those based on radial basis kernel function. Overall, the PPI prediction tool, termed PPI-PKSVM, which is freely available at http://159.226.118.31/PPI/index.html, promises to become useful in such areas as bio-analysis and drug development.

Assuntos

Proteínas/metabolismo , Máquina de Vetores de Suporte , Algoritmos , Aminoácidos/química , Internet , Mapas de Interação de Proteínas , Proteínas/química , Software

Understanding YTHDF2-mediated mRNA Degradation By m⁶A-BERT-Deg.

Zhang, Ting-He; Jo, Sumin; Zhang, Michelle; Wang, Kai; Gao, Shou-Jiang; Huang, Yufei.

ArXiv ; 2024 Jan 15.

Artigo em Inglês | MEDLINE | ID: mdl-38292306

RESUMO

N6-methyladenosine (m6A) is the most abundant mRNA modification within mammalian cells, holding pivotal significance in the regulation of mRNA stability, translation, and splicing. Furthermore, it plays a critical role in the regulation of RNA degradation by primarily recruiting the YTHDF2 reader protein. However, the selective regulation of mRNA decay of the m6A-methylated mRNA through YTHDF2 binding is poorly understood. To improve our understanding, we developed m6A-BERT-Deg, a BERT model adapted for predicting YTHDF2-mediated degradation of m6A-methylated mRNAs. We meticulously assembled a high-quality training dataset by integrating multiple data sources for the HeLa cell line. To overcome the limitation of small training samples, we employed a pre-training-fine-tuning strategy by first performing a self-supervised pre-training of the model on 427,760 unlabeled m6A site sequences. The test results demonstrated the importance of this pre-training strategy in enabling m6A-BERT-Deg to outperform other benchmark models. We further conducted a comprehensive model interpretation and revealed a surprising finding that the presence of co-factors in proximity to m6A sites may disrupt YTHDF2-mediated mRNA degradation, subsequently enhancing mRNA stability. We also extended our analyses to the HEK293 cell line, shedding light on the context-dependent YTHDF2-mediated mRNA degradation.

Transformer for Gene Expression Modeling (T-GEM): An Interpretable Deep Learning Model for Gene Expression-Based Phenotype Predictions.

Zhang, Ting-He; Hasib, Md Musaddaqul; Chiu, Yu-Chiao; Han, Zhi-Feng; Jin, Yu-Fang; Flores, Mario; Chen, Yidong; Huang, Yufei.

Cancers (Basel) ; 14(19)2022 Sep 29.

Artigo em Inglês | MEDLINE | ID: mdl-36230685

RESUMO

Deep learning has been applied in precision oncology to address a variety of gene expression-based phenotype predictions. However, gene expression data's unique characteristics challenge the computer vision-inspired design of popular Deep Learning (DL) models such as Convolutional Neural Network (CNN) and ask for the need to develop interpretable DL models tailored for transcriptomics study. To address the current challenges in developing an interpretable DL model for modeling gene expression data, we propose a novel interpretable deep learning architecture called T-GEM, or Transformer for Gene Expression Modeling. We provided the detailed T-GEM model for modeling gene-gene interactions and demonstrated its utility for gene expression-based predictions of cancer-related phenotypes, including cancer type prediction and immune cell type classification. We carefully analyzed the learning mechanism of T-GEM and showed that the first layer has broader attention while higher layers focus more on phenotype-related genes. We also showed that T-GEM's self-attention could capture important biological functions associated with the predicted phenotypes. We further devised a method to extract the regulatory network that T-GEM learns by exploiting the attributions of self-attention weights for classifications and showed that the network hub genes were likely markers for the predicted phenotypes.

Prediction of Signal Peptide Cleavage Sites with Subsite-Coupled and Template Matching Fusion Algorithm.

Zhang, Shao-Wu; Zhang, Ting-He; Zhang, Jun-Nan; Huang, Yufei.

Mol Inform ; 33(3): 230-9, 2014 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-27485691

RESUMO

Fast and effective prediction of signal peptides (SP) and their cleavage sites is of great importance in computational biology. The approaches developed to predict signal peptide can be roughly divided into machine learning based, and sliding windows based. In order to further increase the prediction accuracy and coverage of organism for SP cleavage sites, we propose a novel method for predicting SP cleavage sites called Signal-CTF that utilizes machine learning and sliding windows, and is designed for N-termial secretory proteins in a large variety of organisms including human, animal, plant, virus, bacteria, fungi and archaea. Signal-CTF consists of three distinct elements: (1) a subsite-coupled and regularization function with a scaled window of fixed width that selects a set of candidates of possible secretion-cleavable segment for a query secretory protein; (2) a sum fusion system that integrates the outcomes from aligning the cleavage site template sequence with each of the aforementioned candidates in a scaled window of fixed width to determine the best candidate cleavage sites for the query secretory protein; (3) a voting system that identifies the ultimate signal peptide cleavage site among all possible results derived from using scaled windows of different width. When compared with Signal-3L and SignalP 4.0 predictors, the prediction accuracy of Signal-CTF is 4-12 %, 10-25 % higher than that of Signal-3L for human, animal and eukaryote, and SignalP 4.0 for eukaryota, Gram-positive bacteria and Gram-negative bacteria, respectively. Comparing with PRED-SIGNAL and SignalP 4.0 predictors on the 32 archaea secretory proteins of used in Bagos's paper, the prediction accuracy of Signal-CTF is 12.5 %, 25 % higher than that of PRED-SIGNAL and SignalP 4.0, respectively. The predicting results of several long signal peptides show that the Signal-CTF can better predict cleavage sites for long signal peptides than SignalP, Phobius, Philius, SPOCTOPUS, Signal-CF and Signal-3L. These results show that Signal-CTF is more accurate and flexible in predicting signal peptides of different characteristics for many organisms. Signal-CTF is freely available as a web-server at http://darwin2.cbi.utsa.edu/minniweb/index.html.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA