Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
1.
Brief Bioinform ; 25(2)2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38279648

RESUMEN

Virus-encoded circular RNA (circRNA) participates in the immune response to viral infection, affects the human immune system, and can be used as a target for precision therapy and tumor biomarker. The coronaviruses SARS-CoV-1 and SARS-CoV-2 (SARS-CoV-1/2) that have emerged in recent years are highly contagious and have high mortality rates. In coronaviruses, little is known about the circRNA encoded by the SARS-CoV-1/2. Therefore, this study explores whether SARS-CoV-1/2 encodes circRNA and characteristics and functions of circRNA. Based on RNA-seq data of SARS-CoV-1 and SARS-CoV-2 infections, we used circRNA identification tools (circRNA_finder, find_circ and CIRI2) to identify circRNAs. The number of circRNAs encoded by SARS-CoV-1 and SARS-CoV-2 was identified as 151 and 470, respectively. It can be found that SARS-CoV-2 shows more prominent circRNA encoding ability than SARS-CoV-1. Expression analysis showed that only a few circRNAs encoded by SARS-CoV-1/2 showed high expression levels, and the positive strand produced more abundant circRNAs. Then, based on the identified SARS-CoV-1/2-encoded circRNAs, we performed circRNA identification and characterization using the previously developed CirRNAPL. Finally, target gene prediction and functional enrichment analysis were performed. It was found that viral circRNA is closely related to cancer and has a potential role in regulating host cell functions. This study studied the characteristics and functions of viral circRNA encoded by coronavirus SARS-CoV-1/2, providing a valuable resource for further research on the function and molecular mechanism of coronavirus circRNA.


Asunto(s)
COVID-19 , MicroARNs , Neoplasias , Humanos , ARN Circular/genética , SARS-CoV-2/genética , COVID-19/genética , ARN Viral/genética , Neoplasias/genética , MicroARNs/genética
2.
BMC Biol ; 22(1): 24, 2024 Jan 29.
Artículo en Inglés | MEDLINE | ID: mdl-38281919

RESUMEN

BACKGROUND: Circular RNAs (circRNAs) have been confirmed to play a vital role in the occurrence and development of diseases. Exploring the relationship between circRNAs and diseases is of far-reaching significance for studying etiopathogenesis and treating diseases. To this end, based on the graph Markov neural network algorithm (GMNN) constructed in our previous work GMNN2CD, we further considered the multisource biological data that affects the association between circRNA and disease and developed an updated web server CircDA and based on the human hepatocellular carcinoma (HCC) tissue data to verify the prediction results of CircDA. RESULTS: CircDA is built on a Tumarkov-based deep learning framework. The algorithm regards biomolecules as nodes and the interactions between molecules as edges, reasonably abstracts multiomics data, and models them as a heterogeneous biomolecular association network, which can reflect the complex relationship between different biomolecules. Case studies using literature data from HCC, cervical, and gastric cancers demonstrate that the CircDA predictor can identify missing associations between known circRNAs and diseases, and using the quantitative real-time PCR (RT-qPCR) experiment of HCC in human tissue samples, it was found that five circRNAs were significantly differentially expressed, which proved that CircDA can predict diseases related to new circRNAs. CONCLUSIONS: This efficient computational prediction and case analysis with sufficient feedback allows us to identify circRNA-associated diseases and disease-associated circRNAs. Our work provides a method to predict circRNA-associated diseases and can provide guidance for the association of diseases with certain circRNAs. For ease of use, an online prediction server ( http://server.malab.cn/CircDA ) is provided, and the code is open-sourced ( https://github.com/nmt315320/CircDA.git ) for the convenience of algorithm improvement.


Asunto(s)
Carcinoma Hepatocelular , Neoplasias Hepáticas , Humanos , ARN Circular/genética , ARN Circular/análisis , Carcinoma Hepatocelular/genética , Estudios de Seguimiento , Neoplasias Hepáticas/genética , Redes Neurales de la Computación , Simulación por Computador , Biología Computacional/métodos
3.
BMC Biol ; 22(1): 44, 2024 Feb 27.
Artículo en Inglés | MEDLINE | ID: mdl-38408987

RESUMEN

BACKGROUND: Circular RNAs (circRNAs) can regulate microRNA activity and are related to various diseases, such as cancer. Functional research on circRNAs is the focus of scientific research. Accurate identification of circRNAs is important for gaining insight into their functions. Although several circRNA prediction models have been developed, their prediction accuracy is still unsatisfactory. Therefore, providing a more accurate computational framework to predict circRNAs and analyse their looping characteristics is crucial for systematic annotation. RESULTS: We developed a novel framework, CircDC, for classifying circRNAs from other lncRNAs. CircDC uses four different feature encoding schemes and adopts a multilayer convolutional neural network and bidirectional long short-term memory network to learn high-order feature representation and make circRNA predictions. The results demonstrate that the proposed CircDC model is more accurate than existing models. In addition, an interpretable analysis of the features affecting the model is performed, and the computational framework is applied to the extended application of circRNA identification. CONCLUSIONS: CircDC is suitable for the prediction of circRNA. The identification of circRNA helps to understand and delve into the related biological processes and functions. Feature importance analysis increases model interpretability and uncovers significant biological properties. The relevant code and data in this article can be accessed for free at https://github.com/nmt315320/CircDC.git .


Asunto(s)
MicroARNs , Neoplasias , Humanos , ARN Circular/genética , Redes Neurales de la Computación , Neoplasias/genética , Biología Computacional/métodos
4.
Brief Bioinform ; 23(1)2022 01 17.
Artículo en Inglés | MEDLINE | ID: mdl-34585234

RESUMEN

Circular RNAs (circRNAs) are non-coding RNAs with a special circular structure produced formed by the reverse splicing mechanism, which play an important role in a variety of biological activities. Viruses can encode circRNA, and viral circRNAs have been found in multiple single-stranded and double-stranded viruses. However, the characteristics and functions of viral circRNAs remain unknown. Sequence alignment showed that viral circRNAs are less conserved than circRNAs in animal, indicating that the viral circRNAs may evolve rapidly. Through the analysis of the sequence characteristics of viral circRNAs and circRNAs in animal, it was found that viral circRNAs and animals circRNAs are similar in nucleic acid composition, but have obvious differences in secondary structure and autocorrelation characteristics. Based on these characteristics of viral circRNAs, machine learning algorithms were employed to construct a prediction model to identify viral circRNA. Additionally, analysis of the interaction between viral circRNA and miRNAs showed that viral circRNA is expected to interact with 518 human miRNAs, and preliminary analysis of the role of viral circRNA. And it has been also found that viral circRNAs may be involved in many KEGG pathways related to nervous system and cancer. We curated an online server, and the data and code are available: http://server.malab.cn/viral-CircRNA/.


Asunto(s)
MicroARNs , Virus , Algoritmos , Animales , Aprendizaje Automático , MicroARNs/genética , MicroARNs/metabolismo , ARN Circular/genética , Virus/genética , Virus/metabolismo
5.
Bioinformatics ; 38(8): 2246-2253, 2022 04 12.
Artículo en Inglés | MEDLINE | ID: mdl-35157027

RESUMEN

MOTIVATION: With the analysis of the characteristic and function of circular RNAs (circRNAs), people have realized that they play a critical role in the diseases. Exploring the relationship between circRNAs and diseases is of far-reaching significance for searching the etiopathogenesis and treatment of diseases. Nevertheless, it is inefficient to learn new associations only through biotechnology. RESULTS: Consequently, we present a computational method, GMNN2CD, which employs a graph Markov neural network (GMNN) algorithm to predict unknown circRNA-disease associations. First, used verified associations, we calculate semantic similarity and Gaussian interactive profile kernel similarity (GIPs) of the disease and the GIPs of circRNA and then merge them to form a unified descriptor. After that, GMNN2CD uses a fusion feature variational map autoencoder to learn deep features and uses a label propagation map autoencoder to propagate tags based on known associations. Based on variational inference, GMNN alternate training enhances the ability of GMNN2CD to obtain high-efficiency high-dimensional features from low-dimensional representations. Finally, 5-fold cross-validation of five benchmark datasets shows that GMNN2CD is superior to the state-of-the-art methods. Furthermore, case studies have shown that GMNN2CD can detect potential associations. AVAILABILITY AND IMPLEMENTATION: The source code and data are available at https://github.com/nmt315320/GMNN2CD.git.


Asunto(s)
Redes Neurales de la Computación , ARN Circular , Humanos , ARN Circular/genética , Algoritmos , Programas Informáticos , Biología Computacional/métodos
6.
PLoS Comput Biol ; 18(1): e1009798, 2022 01.
Artículo en Inglés | MEDLINE | ID: mdl-35051187

RESUMEN

Circular RNAs (circRNAs) are non-coding RNAs with a special circular structure produced formed by the reverse splicing mechanism. Increasing evidence shows that circular RNAs can directly bind to RNA-binding proteins (RBP) and play an important role in a variety of biological activities. The interactions between circRNAs and RBPs are key to comprehending the mechanism of posttranscriptional regulation. Accurately identifying binding sites is very useful for analyzing interactions. In past research, some predictors on the basis of machine learning (ML) have been presented, but prediction accuracy still needs to be ameliorated. Therefore, we present a novel calculation model, CRBPDL, which uses an Adaboost integrated deep hierarchical network to identify the binding sites of circular RNA-RBP. CRBPDL combines five different feature encoding schemes to encode the original RNA sequence, uses deep multiscale residual networks (MSRN) and bidirectional gating recurrent units (BiGRUs) to effectively learn high-level feature representations, it is sufficient to extract local and global context information at the same time. Additionally, a self-attention mechanism is employed to train the robustness of the CRBPDL. Ultimately, the Adaboost algorithm is applied to integrate deep learning (DL) model to improve prediction performance and reliability of the model. To verify the usefulness of CRBPDL, we compared the efficiency with state-of-the-art methods on 37 circular RNA data sets and 31 linear RNA data sets. Moreover, results display that CRBPDL is capable of performing universal, reliable, and robust. The code and data sets are obtainable at https://github.com/nmt315320/CRBPDL.git.


Asunto(s)
Modelos Biológicos , Redes Neurales de la Computación , ARN Circular , Proteínas de Unión al ARN , Algoritmos , Animales , Sitios de Unión/genética , Biología Computacional , Aprendizaje Automático , Empalme del ARN/genética , ARN Circular/química , ARN Circular/genética , ARN Circular/metabolismo , Proteínas de Unión al ARN/química , Proteínas de Unión al ARN/genética , Proteínas de Unión al ARN/metabolismo
7.
Plant Mol Biol ; 105(4-5): 483-495, 2021 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-33385273

RESUMEN

KEY MESSAGE: We proposed an ensemble convolutional neural network model to identify sgRNA high on-target activity in four crops and we used one-hot encoding and k-mers for sequence encoding. As an important component of the CRISPR/Cas9 system, single-guide RNA (sgRNA) plays an important role in gene redirection and editing. sgRNA has played an important role in the improvement of agronomic species, but there is a lack of effective bioinformatics tools to identify the activity of sgRNA in agronomic species. Therefore, it is necessary to develop a method based on machine learning to identify sgRNA high on-target activity. In this work, we proposed a simple convolutional neural network method to identify sgRNA high on-target activity. Our study used one-hot encoding and k-mers for sequence data conversion and a voting algorithm for constructing the convolutional neural network ensemble model sgRNACNN for the prediction of sgRNA activity. The ensemble model sgRNACNN was used for predictions in four crops: Glycine max, Zea mays, Sorghum bicolor and Triticum aestivum. The accuracy rates of the four crops in the sgRNACNN model were 82.43%, 80.33%, 78.25% and 87.49%, respectively. The experimental results showed that sgRNACNN realizes the identification of high on-target activity sgRNA of agronomic data and can meet the demands of sgRNA activity prediction in agronomy to a certain extent. These results have certain significance for guiding crop gene editing and academic research. The source code and relevant dataset can be found in the following link: https://github.com/nmt315320/sgRNACNN.git .


Asunto(s)
Algoritmos , Sistemas CRISPR-Cas , Biología Computacional/métodos , Productos Agrícolas/genética , Edición Génica/métodos , Redes Neurales de la Computación , ARN Guía de Kinetoplastida/genética , Productos Agrícolas/clasificación , Células HCT116 , Células HEK293 , Células HeLa , Humanos , Internet , Sorghum/genética , Glycine max/genética , Triticum/genética , Zea mays/genética
8.
J Proteome Res ; 18(3): 1392-1401, 2019 03 01.
Artículo en Inglés | MEDLINE | ID: mdl-30698979

RESUMEN

The major histocompatibility complex (MHC) is a term for all gene groups of a major histocompatibility antigen. It binds to peptide chains derived from pathogens and displays pathogens on the cell surface to facilitate T-cell recognition and perform a series of immune functions. MHC molecules are critical in transplantation, autoimmunity, infection, and tumor immunotherapy. Combining machine learning algorithms and making full use of bioinformatics analysis technology, more accurate recognition of MHC is an important task. The paper proposed a new MHC recognition method compared with traditional biological methods and used the built classifier to classify and identify MHC I and MHC II. The classifier used the SVMProt 188D, bag-of-ngrams (BonG), and information theory (IT) mixed feature representation methods and used the extreme learning machine (ELM), which selects lin-kernel as the activation function and used 10-fold cross-validation and the independent test set validation to verify the accuracy of the constructed classifier and simultaneously identify the MHC and identify the MHC I and MHC II, respectively. Through the 10-fold cross-validation, the proposed algorithm obtained 91.66% accuracy when identifying MHC and 94.442% accuracy when identifying MHC categories. Furthermore, an online identification Web site named ELM-MHC was constructed with the following URL: http://server.malab.cn/ELM-MHC/ .


Asunto(s)
Biología Computacional , Antígenos de Histocompatibilidad Clase II/aislamiento & purificación , Antígenos de Histocompatibilidad Clase I/aislamiento & purificación , Aprendizaje Automático , Algoritmos , Antígenos de Histocompatibilidad Clase I/clasificación , Antígenos de Histocompatibilidad Clase I/genética , Antígenos de Histocompatibilidad Clase II/clasificación , Antígenos de Histocompatibilidad Clase II/genética , Humanos , Internet , Programas Informáticos
9.
Int J Mol Sci ; 19(7)2018 Jul 16.
Artículo en Inglés | MEDLINE | ID: mdl-30013015

RESUMEN

Amyloid is an insoluble fibrous protein and its mis-aggregation can lead to some diseases, such as Alzheimer's disease and Creutzfeldt⁻Jakob's disease. Therefore, the identification of amyloid is essential for the discovery and understanding of disease. We established a novel predictor called RFAmy based on random forest to identify amyloid, and it employed SVMProt 188-D feature extraction method based on protein composition and physicochemical properties and pse-in-one feature extraction method based on amino acid composition, autocorrelation pseudo acid composition, profile-based features and predicted structures features. In the ten-fold cross-validation test, RFAmy's overall accuracy was 89.19% and F-measure was 0.891. Results were obtained by comparison experiments with other feature, classifiers, and existing methods. This shows the effectiveness of RFAmy in predicting amyloid protein. The RFAmy proposed in this paper can be accessed through the URL http://server.malab.cn/RFAmyloid/.


Asunto(s)
Algoritmos , Proteínas Amiloidogénicas/análisis , Biología Computacional/métodos , Máquina de Vectores de Soporte , Bases de Datos de Proteínas , Internet , Reproducibilidad de los Resultados , Análisis de Secuencia de Proteína
10.
Comput Biol Med ; 170: 107941, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38217976

RESUMEN

Immunotherapy is an emerging treatment method aimed at activating the human immune system and relying on its own immune function to kill cancer cells and tumor tissues. It has the advantages of wide applicability and minimal side effects. Effective identification of tumor T cell antigens (TTCAs) will help researchers understand their functions and mechanisms and carry out research on anti-tumor vaccine development. Considering that using biological experimental technology to identify TTCAs can be costly and time-consuming, it is necessary to develop a robust bioinformatics computing tool. At present, different machine learning models have been proposed for identifying TTCAs, but there is still room for further improvement in their performance. To establish a TTCA predictor with better prediction performance, we propose a prediction model called iTTCA-MVL in this paper. We extracted three sets of features from the views of physicochemical information and sequence statistics, namely the distribution descriptor of composition, transition, and distribution (CTDD), TF-IDF, and LSA topic. Then, we used least squares support vector machines (LSSVMs) as submodels and Hilbert‒Schmidt independence criteria (HSIC) as constraints to establish an independent and complementary multi-view learning model. The prediction accuracy of iTTCA-MVL on the independent test set is 0.873, and Matthew's correlation coefficient is 0.747, which is significantly better than those of existing methods. Therefore, iTTCA-MVL is an excellent prediction tool that researchers can use to accurately identify TTCAs.


Asunto(s)
Biología Computacional , Aprendizaje Automático , Humanos , Biología Computacional/métodos , Linfocitos T
11.
IEEE/ACM Trans Comput Biol Bioinform ; 19(4): 2442-2453, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-33979289

RESUMEN

Single-guide RNA is a guide RNA (gRNA), which guides the insertion or deletion of uridine residues into kinetoplastid during RNA editing. It is a small non-coding RNA that can be combined with pre -mRNA pairing. SgRNA is a critical component of the CRISPR/Cas9 gene knockout system and play an important role in gene editing and gene regulation. It is important to accurately and quickly identify highly on-target activity sgRNAs. Due to its importance, several computational predictors have been proposed to predict sgRNAs on-target activity. All these methods have clearly contributed to the development of this very important field. However, they also have certain limitations. In the paper, we developed a new classifier SgRNA-RF, which extracts the features of nucleic acid composition and structure of on-target activity sgRNA sequence and identified by random forest algorithm. In addition to solving an imbalanced dataset, this paper proposed a new method called CS-Smote. We compared sgRNA-RF with state-of-the-art predictors on the five datasets, and found SgRNA-RF significantly improved the identification accuracy, with accuracies of 0.8636,0.9161,0.894,0.938,0.965,0.77,0.979,0.973, respectively. The user-friendly web server that implements sgRNA-RF is freely available at http://server.malab.cn/sgRNA-RF/.


Asunto(s)
Sistemas CRISPR-Cas , ARN Guía de Kinetoplastida , Algoritmos , Sistemas CRISPR-Cas/genética , Edición Génica , ARN Guía de Kinetoplastida/genética
12.
IEEE J Biomed Health Inform ; 25(9): 3668-3676, 2021 09.
Artículo en Inglés | MEDLINE | ID: mdl-33780344

RESUMEN

RNA-binding protein (RBP) is a powerful and wide-ranging regulator that plays an important role in cell development, differentiation, metabolism, health and disease. The prediction of RBPs provides valuable guidance for biologists. Although experimental methods have made great progress in predicting RBP, they are time-consuming and not flexible. Therefore, we developed a network model, rBPDL, by combining a convolutional neural network and long short-term memory for multilabel classification of RBPs. Moreover, to achieve better prediction results, we used a voting algorithm for ensemble learning of the model. We compared rBPDL with state-of-the-art methods and found that rBPDL significantly improved identification performance for the RBP68 dataset, with a macro-Area Under Curve (AUC), micro-AUC, and weighted AUC of 0.936, 0.962, and 0.946, respectively. Furthermore, through AUC statistical analysis of the RBP domain, we analyzed the performance of rBPDL and found that the RBP identification performance in the same domain was similar. In addition, we analyzed the performance preferences and physicochemical properties of the binding protein amino acids and explored the characteristics that affect the binding by using the RBP86 dataset.


Asunto(s)
Aprendizaje Profundo , Sitios de Unión , Redes Neurales de la Computación , Unión Proteica , Proteínas de Unión al ARN/genética , Proteínas de Unión al ARN/metabolismo
13.
Comput Struct Biotechnol J ; 18: 834-842, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32308930

RESUMEN

Circular RNA (circRNA) plays an important role in the development of diseases, and it provides a novel idea for drug development. Accurate identification of circRNAs is important for a deeper understanding of their functions. In this study, we developed a new classifier, CirRNAPL, which extracts the features of nucleic acid composition and structure of the circRNA sequence and optimizes the extreme learning machine based on the particle swarm optimization algorithm. We compared CirRNAPL with existing methods, including blast, on three datasets and found CirRNAPL significantly improved the identification accuracy for the three datasets, with accuracies of 0.815, 0.802, and 0.782, respectively. Additionally, we performed sequence alignment on 564 sequences of the independent detection set of the third data set and analyzed the expression level of circRNAs. Results showed the expression level of the sequence is positively correlated with the abundance. A user-friendly CirRNAPL web server is freely available at http://server.malab.cn/CirRNAPL/.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA