Pesquisa | BVS - MINISTÉRIO DA SAÚDE

1.

CircRNA identification and feature interpretability analysis.

Niu, Mengting; Wang, Chunyu; Chen, Yaojia; Zou, Quan; Qi, Ren; Xu, Lei.

BMC Biol ; 22(1): 44, 2024 Feb 27.

Artigo em Inglês | MEDLINE | ID: mdl-38408987

RESUMO

BACKGROUND: Circular RNAs (circRNAs) can regulate microRNA activity and are related to various diseases, such as cancer. Functional research on circRNAs is the focus of scientific research. Accurate identification of circRNAs is important for gaining insight into their functions. Although several circRNA prediction models have been developed, their prediction accuracy is still unsatisfactory. Therefore, providing a more accurate computational framework to predict circRNAs and analyse their looping characteristics is crucial for systematic annotation. RESULTS: We developed a novel framework, CircDC, for classifying circRNAs from other lncRNAs. CircDC uses four different feature encoding schemes and adopts a multilayer convolutional neural network and bidirectional long short-term memory network to learn high-order feature representation and make circRNA predictions. The results demonstrate that the proposed CircDC model is more accurate than existing models. In addition, an interpretable analysis of the features affecting the model is performed, and the computational framework is applied to the extended application of circRNA identification. CONCLUSIONS: CircDC is suitable for the prediction of circRNA. The identification of circRNA helps to understand and delve into the related biological processes and functions. Feature importance analysis increases model interpretability and uncovers significant biological properties. The relevant code and data in this article can be accessed for free at https://github.com/nmt315320/CircDC.git .

Assuntos

MicroRNAs , Neoplasias , Humanos , RNA Circular/genética , Redes Neurais de Computação , Neoplasias/genética , Biologia Computacional/métodos

2.

Identification, characterization and expression analysis of circRNA encoded by SARS-CoV-1 and SARS-CoV-2.

Niu, Mengting; Wang, Chunyu; Chen, Yaojia; Zou, Quan; Xu, Lei.

Brief Bioinform ; 25(2)2024 Jan 22.

Artigo em Inglês | MEDLINE | ID: mdl-38279648

RESUMO

Virus-encoded circular RNA (circRNA) participates in the immune response to viral infection, affects the human immune system, and can be used as a target for precision therapy and tumor biomarker. The coronaviruses SARS-CoV-1 and SARS-CoV-2 (SARS-CoV-1/2) that have emerged in recent years are highly contagious and have high mortality rates. In coronaviruses, little is known about the circRNA encoded by the SARS-CoV-1/2. Therefore, this study explores whether SARS-CoV-1/2 encodes circRNA and characteristics and functions of circRNA. Based on RNA-seq data of SARS-CoV-1 and SARS-CoV-2 infections, we used circRNA identification tools (circRNA_finder, find_circ and CIRI2) to identify circRNAs. The number of circRNAs encoded by SARS-CoV-1 and SARS-CoV-2 was identified as 151 and 470, respectively. It can be found that SARS-CoV-2 shows more prominent circRNA encoding ability than SARS-CoV-1. Expression analysis showed that only a few circRNAs encoded by SARS-CoV-1/2 showed high expression levels, and the positive strand produced more abundant circRNAs. Then, based on the identified SARS-CoV-1/2-encoded circRNAs, we performed circRNA identification and characterization using the previously developed CirRNAPL. Finally, target gene prediction and functional enrichment analysis were performed. It was found that viral circRNA is closely related to cancer and has a potential role in regulating host cell functions. This study studied the characteristics and functions of viral circRNA encoded by coronavirus SARS-CoV-1/2, providing a valuable resource for further research on the function and molecular mechanism of coronavirus circRNA.

Assuntos

COVID-19 , MicroRNAs , Neoplasias , Humanos , RNA Circular/genética , SARS-CoV-2/genética , COVID-19/genética , RNA Viral/genética , Neoplasias/genética , MicroRNAs/genética

3.

A computational model of circRNA-associated diseases based on a graph neural network: prediction and case studies for follow-up experimental validation.

Niu, Mengting; Wang, Chunyu; Zhang, Zhanguo; Zou, Quan.

BMC Biol ; 22(1): 24, 2024 Jan 29.

Artigo em Inglês | MEDLINE | ID: mdl-38281919

RESUMO

BACKGROUND: Circular RNAs (circRNAs) have been confirmed to play a vital role in the occurrence and development of diseases. Exploring the relationship between circRNAs and diseases is of far-reaching significance for studying etiopathogenesis and treating diseases. To this end, based on the graph Markov neural network algorithm (GMNN) constructed in our previous work GMNN2CD, we further considered the multisource biological data that affects the association between circRNA and disease and developed an updated web server CircDA and based on the human hepatocellular carcinoma (HCC) tissue data to verify the prediction results of CircDA. RESULTS: CircDA is built on a Tumarkov-based deep learning framework. The algorithm regards biomolecules as nodes and the interactions between molecules as edges, reasonably abstracts multiomics data, and models them as a heterogeneous biomolecular association network, which can reflect the complex relationship between different biomolecules. Case studies using literature data from HCC, cervical, and gastric cancers demonstrate that the CircDA predictor can identify missing associations between known circRNAs and diseases, and using the quantitative real-time PCR (RT-qPCR) experiment of HCC in human tissue samples, it was found that five circRNAs were significantly differentially expressed, which proved that CircDA can predict diseases related to new circRNAs. CONCLUSIONS: This efficient computational prediction and case analysis with sufficient feedback allows us to identify circRNA-associated diseases and disease-associated circRNAs. Our work provides a method to predict circRNA-associated diseases and can provide guidance for the association of diseases with certain circRNAs. For ease of use, an online prediction server ( http://server.malab.cn/CircDA ) is provided, and the code is open-sourced ( https://github.com/nmt315320/CircDA.git ) for the convenience of algorithm improvement.

Assuntos

Carcinoma Hepatocelular , Neoplasias Hepáticas , Humanos , RNA Circular/genética , RNA Circular/análise , Carcinoma Hepatocelular/genética , Seguimentos , Neoplasias Hepáticas/genética , Redes Neurais de Computação , Simulação por Computador , Biologia Computacional/métodos

4.

iTTCA-MVL: A multi-view learning model based on physicochemical information and sequence statistical information for tumor T cell antigens identification.

Zhao, Shulin; Huang, Shibo; Niu, Mengting; Xu, Lei; Xu, Lifeng.

Comput Biol Med ; 170: 107941, 2024 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-38217976

RESUMO

Immunotherapy is an emerging treatment method aimed at activating the human immune system and relying on its own immune function to kill cancer cells and tumor tissues. It has the advantages of wide applicability and minimal side effects. Effective identification of tumor T cell antigens (TTCAs) will help researchers understand their functions and mechanisms and carry out research on anti-tumor vaccine development. Considering that using biological experimental technology to identify TTCAs can be costly and time-consuming, it is necessary to develop a robust bioinformatics computing tool. At present, different machine learning models have been proposed for identifying TTCAs, but there is still room for further improvement in their performance. To establish a TTCA predictor with better prediction performance, we propose a prediction model called iTTCA-MVL in this paper. We extracted three sets of features from the views of physicochemical information and sequence statistics, namely the distribution descriptor of composition, transition, and distribution (CTDD), TF-IDF, and LSA topic. Then, we used least squares support vector machines (LSSVMs) as submodels and HilbertâSchmidt independence criteria (HSIC) as constraints to establish an independent and complementary multi-view learning model. The prediction accuracy of iTTCA-MVL on the independent test set is 0.873, and Matthew's correlation coefficient is 0.747, which is significantly better than those of existing methods. Therefore, iTTCA-MVL is an excellent prediction tool that researchers can use to accurately identify TTCAs.

Assuntos

Biologia Computacional , Aprendizado de Máquina , Humanos , Biologia Computacional/métodos , Linfócitos T

5.

GMNN2CD: identification of circRNA-disease associations based on variational inference and graph Markov neural networks.

Niu, Mengting; Zou, Quan; Wang, Chunyu.

Bioinformatics ; 38(8): 2246-2253, 2022 04 12.

Artigo em Inglês | MEDLINE | ID: mdl-35157027

RESUMO

MOTIVATION: With the analysis of the characteristic and function of circular RNAs (circRNAs), people have realized that they play a critical role in the diseases. Exploring the relationship between circRNAs and diseases is of far-reaching significance for searching the etiopathogenesis and treatment of diseases. Nevertheless, it is inefficient to learn new associations only through biotechnology. RESULTS: Consequently, we present a computational method, GMNN2CD, which employs a graph Markov neural network (GMNN) algorithm to predict unknown circRNA-disease associations. First, used verified associations, we calculate semantic similarity and Gaussian interactive profile kernel similarity (GIPs) of the disease and the GIPs of circRNA and then merge them to form a unified descriptor. After that, GMNN2CD uses a fusion feature variational map autoencoder to learn deep features and uses a label propagation map autoencoder to propagate tags based on known associations. Based on variational inference, GMNN alternate training enhances the ability of GMNN2CD to obtain high-efficiency high-dimensional features from low-dimensional representations. Finally, 5-fold cross-validation of five benchmark datasets shows that GMNN2CD is superior to the state-of-the-art methods. Furthermore, case studies have shown that GMNN2CD can detect potential associations. AVAILABILITY AND IMPLEMENTATION: The source code and data are available at https://github.com/nmt315320/GMNN2CD.git.

Assuntos

Redes Neurais de Computação , RNA Circular , Humanos , RNA Circular/genética , Algoritmos , Software , Biologia Computacional/métodos

6.

CRBPDL: Identification of circRNA-RBP interaction sites using an ensemble neural network approach.

Niu, Mengting; Zou, Quan; Lin, Chen.

PLoS Comput Biol ; 18(1): e1009798, 2022 01.

Artigo em Inglês | MEDLINE | ID: mdl-35051187

RESUMO

Circular RNAs (circRNAs) are non-coding RNAs with a special circular structure produced formed by the reverse splicing mechanism. Increasing evidence shows that circular RNAs can directly bind to RNA-binding proteins (RBP) and play an important role in a variety of biological activities. The interactions between circRNAs and RBPs are key to comprehending the mechanism of posttranscriptional regulation. Accurately identifying binding sites is very useful for analyzing interactions. In past research, some predictors on the basis of machine learning (ML) have been presented, but prediction accuracy still needs to be ameliorated. Therefore, we present a novel calculation model, CRBPDL, which uses an Adaboost integrated deep hierarchical network to identify the binding sites of circular RNA-RBP. CRBPDL combines five different feature encoding schemes to encode the original RNA sequence, uses deep multiscale residual networks (MSRN) and bidirectional gating recurrent units (BiGRUs) to effectively learn high-level feature representations, it is sufficient to extract local and global context information at the same time. Additionally, a self-attention mechanism is employed to train the robustness of the CRBPDL. Ultimately, the Adaboost algorithm is applied to integrate deep learning (DL) model to improve prediction performance and reliability of the model. To verify the usefulness of CRBPDL, we compared the efficiency with state-of-the-art methods on 37 circular RNA data sets and 31 linear RNA data sets. Moreover, results display that CRBPDL is capable of performing universal, reliable, and robust. The code and data sets are obtainable at https://github.com/nmt315320/CRBPDL.git.

Assuntos

Modelos Biológicos , Redes Neurais de Computação , RNA Circular , Proteínas de Ligação a RNA , Algoritmos , Animais , Sítios de Ligação/genética , Biologia Computacional , Aprendizado de Máquina , Splicing de RNA/genética , RNA Circular/química , RNA Circular/genética , RNA Circular/metabolismo , Proteínas de Ligação a RNA/química , Proteínas de Ligação a RNA/genética , Proteínas de Ligação a RNA/metabolismo

7.

SgRNA-RF: Identification of SgRNA On-Target Activity With Imbalanced Datasets.

Niu, Mengting; Zou, Quan.

IEEE/ACM Trans Comput Biol Bioinform ; 19(4): 2442-2453, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-33979289

RESUMO

Single-guide RNA is a guide RNA (gRNA), which guides the insertion or deletion of uridine residues into kinetoplastid during RNA editing. It is a small non-coding RNA that can be combined with pre -mRNA pairing. SgRNA is a critical component of the CRISPR/Cas9 gene knockout system and play an important role in gene editing and gene regulation. It is important to accurately and quickly identify highly on-target activity sgRNAs. Due to its importance, several computational predictors have been proposed to predict sgRNAs on-target activity. All these methods have clearly contributed to the development of this very important field. However, they also have certain limitations. In the paper, we developed a new classifier SgRNA-RF, which extracts the features of nucleic acid composition and structure of on-target activity sgRNA sequence and identified by random forest algorithm. In addition to solving an imbalanced dataset, this paper proposed a new method called CS-Smote. We compared sgRNA-RF with state-of-the-art predictors on the five datasets, and found SgRNA-RF significantly improved the identification accuracy, with accuracies of 0.8636,0.9161,0.894,0.938,0.965,0.77,0.979,0.973, respectively. The user-friendly web server that implements sgRNA-RF is freely available at http://server.malab.cn/sgRNA-RF/.

Assuntos

Sistemas CRISPR-Cas , RNA Guia de Cinetoplastídeos , Algoritmos , Sistemas CRISPR-Cas/genética , Edição de Genes , RNA Guia de Cinetoplastídeos/genética

8.

Characterizing viral circRNAs and their application in identifying circRNAs in viruses.

Niu, Mengting; Ju, Ying; Lin, Chen; Zou, Quan.

Brief Bioinform ; 23(1)2022 01 17.

Artigo em Inglês | MEDLINE | ID: mdl-34585234

RESUMO

Circular RNAs (circRNAs) are non-coding RNAs with a special circular structure produced formed by the reverse splicing mechanism, which play an important role in a variety of biological activities. Viruses can encode circRNA, and viral circRNAs have been found in multiple single-stranded and double-stranded viruses. However, the characteristics and functions of viral circRNAs remain unknown. Sequence alignment showed that viral circRNAs are less conserved than circRNAs in animal, indicating that the viral circRNAs may evolve rapidly. Through the analysis of the sequence characteristics of viral circRNAs and circRNAs in animal, it was found that viral circRNAs and animals circRNAs are similar in nucleic acid composition, but have obvious differences in secondary structure and autocorrelation characteristics. Based on these characteristics of viral circRNAs, machine learning algorithms were employed to construct a prediction model to identify viral circRNA. Additionally, analysis of the interaction between viral circRNA and miRNAs showed that viral circRNA is expected to interact with 518 human miRNAs, and preliminary analysis of the role of viral circRNA. And it has been also found that viral circRNAs may be involved in many KEGG pathways related to nervous system and cancer. We curated an online server, and the data and code are available: http://server.malab.cn/viral-CircRNA/.

Assuntos

MicroRNAs , Vírus , Algoritmos , Animais , Aprendizado de Máquina , MicroRNAs/genética , MicroRNAs/metabolismo , RNA Circular/genética , Vírus/genética , Vírus/metabolismo

9.

rBPDL:Predicting RNA-Binding Proteins Using Deep Learning.

Niu, Mengting; Wu, Jin; Zou, Quan; Liu, Zhendong; Xu, Lei.

IEEE J Biomed Health Inform ; 25(9): 3668-3676, 2021 09.

Artigo em Inglês | MEDLINE | ID: mdl-33780344

RESUMO

RNA-binding protein (RBP) is a powerful and wide-ranging regulator that plays an important role in cell development, differentiation, metabolism, health and disease. The prediction of RBPs provides valuable guidance for biologists. Although experimental methods have made great progress in predicting RBP, they are time-consuming and not flexible. Therefore, we developed a network model, rBPDL, by combining a convolutional neural network and long short-term memory for multilabel classification of RBPs. Moreover, to achieve better prediction results, we used a voting algorithm for ensemble learning of the model. We compared rBPDL with state-of-the-art methods and found that rBPDL significantly improved identification performance for the RBP68 dataset, with a macro-Area Under Curve (AUC), micro-AUC, and weighted AUC of 0.936, 0.962, and 0.946, respectively. Furthermore, through AUC statistical analysis of the RBP domain, we analyzed the performance of rBPDL and found that the RBP identification performance in the same domain was similar. In addition, we analyzed the performance preferences and physicochemical properties of the binding protein amino acids and explored the characteristics that affect the binding by using the RBP86 dataset.

Assuntos

Aprendizado Profundo , Sítios de Ligação , Redes Neurais de Computação , Ligação Proteica , Proteínas de Ligação a RNA/genética , Proteínas de Ligação a RNA/metabolismo

10.

sgRNACNN: identifying sgRNA on-target activity in four crops using ensembles of convolutional neural networks.

Niu, Mengting; Lin, Yuan; Zou, Quan.

Plant Mol Biol ; 105(4-5): 483-495, 2021 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-33385273

RESUMO

KEY MESSAGE: We proposed an ensemble convolutional neural network model to identify sgRNA high on-target activity in four crops and we used one-hot encoding and k-mers for sequence encoding. As an important component of the CRISPR/Cas9 system, single-guide RNA (sgRNA) plays an important role in gene redirection and editing. sgRNA has played an important role in the improvement of agronomic species, but there is a lack of effective bioinformatics tools to identify the activity of sgRNA in agronomic species. Therefore, it is necessary to develop a method based on machine learning to identify sgRNA high on-target activity. In this work, we proposed a simple convolutional neural network method to identify sgRNA high on-target activity. Our study used one-hot encoding and k-mers for sequence data conversion and a voting algorithm for constructing the convolutional neural network ensemble model sgRNACNN for the prediction of sgRNA activity. The ensemble model sgRNACNN was used for predictions in four crops: Glycine max, Zea mays, Sorghum bicolor and Triticum aestivum. The accuracy rates of the four crops in the sgRNACNN model were 82.43%, 80.33%, 78.25% and 87.49%, respectively. The experimental results showed that sgRNACNN realizes the identification of high on-target activity sgRNA of agronomic data and can meet the demands of sgRNA activity prediction in agronomy to a certain extent. These results have certain significance for guiding crop gene editing and academic research. The source code and relevant dataset can be found in the following link: https://github.com/nmt315320/sgRNACNN.git .

Assuntos

Algoritmos , Sistemas CRISPR-Cas , Biologia Computacional/métodos , Produtos Agrícolas/genética , Edição de Genes/métodos , Redes Neurais de Computação , RNA Guia de Cinetoplastídeos/genética , Produtos Agrícolas/classificação , Células HCT116 , Células HEK293 , Células HeLa , Humanos , Internet , Sorghum/genética , Glycine max/genética , Triticum/genética , Zea mays/genética

11.

CirRNAPL: A web server for the identification of circRNA based on extreme learning machine.

Niu, Mengting; Zhang, Jun; Li, Yanjuan; Wang, Cankun; Liu, Zhaoqian; Ding, Hui; Zou, Quan; Ma, Qin.

Comput Struct Biotechnol J ; 18: 834-842, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-32308930

RESUMO

Circular RNA (circRNA) plays an important role in the development of diseases, and it provides a novel idea for drug development. Accurate identification of circRNAs is important for a deeper understanding of their functions. In this study, we developed a new classifier, CirRNAPL, which extracts the features of nucleic acid composition and structure of the circRNA sequence and optimizes the extreme learning machine based on the particle swarm optimization algorithm. We compared CirRNAPL with existing methods, including blast, on three datasets and found CirRNAPL significantly improved the identification accuracy for the three datasets, with accuracies of 0.815, 0.802, and 0.782, respectively. Additionally, we performed sequence alignment on 564 sequences of the independent detection set of the third data set and analyzed the expression level of circRNAs. Results showed the expression level of the sequence is positively correlated with the abundance. A user-friendly CirRNAPL web server is freely available at http://server.malab.cn/CirRNAPL/.

12.

ELM-MHC: An Improved MHC Identification Method with Extreme Learning Machine Algorithm.

Li, Yanjuan; Niu, Mengting; Zou, Quan.

J Proteome Res ; 18(3): 1392-1401, 2019 03 01.

Artigo em Inglês | MEDLINE | ID: mdl-30698979

RESUMO

The major histocompatibility complex (MHC) is a term for all gene groups of a major histocompatibility antigen. It binds to peptide chains derived from pathogens and displays pathogens on the cell surface to facilitate T-cell recognition and perform a series of immune functions. MHC molecules are critical in transplantation, autoimmunity, infection, and tumor immunotherapy. Combining machine learning algorithms and making full use of bioinformatics analysis technology, more accurate recognition of MHC is an important task. The paper proposed a new MHC recognition method compared with traditional biological methods and used the built classifier to classify and identify MHC I and MHC II. The classifier used the SVMProt 188D, bag-of-ngrams (BonG), and information theory (IT) mixed feature representation methods and used the extreme learning machine (ELM), which selects lin-kernel as the activation function and used 10-fold cross-validation and the independent test set validation to verify the accuracy of the constructed classifier and simultaneously identify the MHC and identify the MHC I and MHC II, respectively. Through the 10-fold cross-validation, the proposed algorithm obtained 91.66% accuracy when identifying MHC and 94.442% accuracy when identifying MHC categories. Furthermore, an online identification Web site named ELM-MHC was constructed with the following URL: http://server.malab.cn/ELM-MHC/ .

Assuntos

Biologia Computacional , Antígenos de Histocompatibilidade Classe II/isolamento & purificação , Antígenos de Histocompatibilidade Classe I/isolamento & purificação , Aprendizado de Máquina , Algoritmos , Antígenos de Histocompatibilidade Classe I/classificação , Antígenos de Histocompatibilidade Classe I/genética , Antígenos de Histocompatibilidade Classe II/classificação , Antígenos de Histocompatibilidade Classe II/genética , Humanos , Internet , Software

13.

RFAmyloid: A Web Server for Predicting Amyloid Proteins.

Niu, Mengting; Li, Yanjuan; Wang, Chunyu; Han, Ke.

Int J Mol Sci ; 19(7)2018 Jul 16.

Artigo em Inglês | MEDLINE | ID: mdl-30013015

RESUMO

Amyloid is an insoluble fibrous protein and its mis-aggregation can lead to some diseases, such as Alzheimer's disease and Creutzfeldtâ»Jakob's disease. Therefore, the identification of amyloid is essential for the discovery and understanding of disease. We established a novel predictor called RFAmy based on random forest to identify amyloid, and it employed SVMProt 188-D feature extraction method based on protein composition and physicochemical properties and pse-in-one feature extraction method based on amino acid composition, autocorrelation pseudo acid composition, profile-based features and predicted structures features. In the ten-fold cross-validation test, RFAmy's overall accuracy was 89.19% and F-measure was 0.891. Results were obtained by comparison experiments with other feature, classifiers, and existing methods. This shows the effectiveness of RFAmy in predicting amyloid protein. The RFAmy proposed in this paper can be accessed through the URL http://server.malab.cn/RFAmyloid/.

Assuntos

Algoritmos , Proteínas Amiloidogênicas/análise , Biologia Computacional/métodos , Máquina de Vetores de Suporte , Bases de Dados de Proteínas , Internet , Reprodutibilidade dos Testes , Análise de Sequência de Proteína

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA