Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
1.
BMC Bioinformatics ; 22(Suppl 3): 415, 2021 Aug 24.
Artigo em Inglês | MEDLINE | ID: mdl-34429059

RESUMO

BACKGROUND: Plant long non-coding RNAs (lncRNAs) play vital roles in many biological processes mainly through interactions with RNA-binding protein (RBP). To understand the function of lncRNAs, a fundamental method is to identify which types of proteins interact with the lncRNAs. However, the models or rules of interactions are a major challenge when calculating and estimating the types of RBP. RESULTS: In this study, we propose an ensemble deep learning model to predict plant lncRNA-protein interactions using stacked denoising autoencoder and convolutional neural network based on sequence and structural information, named PRPI-SC. PRPI-SC predicts interactions between lncRNAs and proteins based on the k-mer features of RNAs and proteins. Experiments proved good results on Arabidopsis thaliana and Zea mays datasets (ATH948 and ZEA22133). The accuracy rates of ATH948 and ZEA22133 datasets were 88.9% and 82.6%, respectively. PRPI-SC also performed well on some public RNA protein interaction datasets. CONCLUSIONS: PRPI-SC accurately predicts the interaction between plant lncRNA and protein, which plays a guiding role in studying the function and expression of plant lncRNA. At the same time, PRPI-SC has a strong generalization ability and good prediction effect for non-plant data.


Assuntos
Aprendizado Profundo , RNA Longo não Codificante , Biologia Computacional , Redes Neurais de Computação , RNA Longo não Codificante/genética , Proteínas de Ligação a RNA
2.
Genomics ; 112(5): 2928-2936, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32437848

RESUMO

Long non-coding RNAs (lncRNAs) play key roles in regulating cellular biological processes through diverse molecular mechanisms including binding to RNA binding proteins. The majority of plant lncRNAs are functionally uncharacterized, thus, accurate prediction of plant lncRNA-protein interaction is imperative for subsequent functional studies. We present an integrative model, namely DRPLPI. Its uniqueness is that it predicts by multi-feature fusion. Structural and four groups of sequence features are used, including tri-nucleotide composition, gapped k-mer, recursive complement and binary profile. We design a multi-head self-attention long short-term memory encoder-decoder network to extract generative high-level features. To obtain robust results, DRPLPI combines categorical boosting and extra trees into a single meta-learner. Experiments on Zea mays and Arabidopsis thaliana obtained 0.9820 and 0.9652 area under precision/recall curve (AUPRC) respectively. The proposed method shows significant enhancement in the prediction performance compared with existing state-of-the-art methods.


Assuntos
Aprendizado Profundo , Proteínas de Plantas/metabolismo , RNA Longo não Codificante/metabolismo , RNA de Plantas/metabolismo , Proteínas de Ligação a RNA/metabolismo , Arabidopsis/genética , Arabidopsis/metabolismo , Zea mays/genética , Zea mays/metabolismo
3.
Mol Genet Genomics ; 295(5): 1091-1102, 2020 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-32409904

RESUMO

Long non-coding RNAs (lncRNAs) play a broad spectrum of distinctive regulatory roles through interactions with proteins. However, only a few plant lncRNAs have been experimentally characterized. We propose GPLPI, a graph representation learning method, to predict plant lncRNA-protein interaction (LPI) from sequence and structural information. GPLPI employs a generative model using long short-term memory (LSTM) with graph attention. Evolutionary features are extracted using frequency chaos game representation (FCGR). Manifold regularization and l2-norm are adopted to obtain discriminant feature representations and mitigate overfitting. The model captures locality preserving and reconstruction constraints that lead to better generalization ability. Finally, potential interactions between lncRNAs and proteins are predicted by integrating catboost and regularized Logistic regression based on L-BFGS optimization algorithm. The method is trained and tested on Arabidopsis thaliana and Zea mays datasets. GPLPI achieves accuracies of 85.76% and 91.97% respectively. The results show that our method consistently outperforms other state-of-the-art methods.


Assuntos
Biologia Computacional/métodos , Proteínas de Plantas/metabolismo , Plantas/metabolismo , RNA Longo não Codificante/metabolismo , Algoritmos , Arabidopsis/metabolismo , Aprendizado Profundo , Modelos Logísticos , Modelos Moleculares , Proteínas de Plantas/química , RNA Longo não Codificante/química , RNA de Plantas/química , RNA de Plantas/metabolismo , Zea mays/metabolismo
4.
Front Genet ; 14: 1199087, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37547471

RESUMO

Accurate diagnosis is the key to providing prompt and explicit treatment and disease management. The recognized biological method for the molecular diagnosis of infectious pathogens is polymerase chain reaction (PCR). Recently, deep learning approaches are playing a vital role in accurately identifying disease-related genes for diagnosis, prognosis, and treatment. The models reduce the time and cost used by wet-lab experimental procedures. Consequently, sophisticated computational approaches have been developed to facilitate the detection of cancer, a leading cause of death globally, and other complex diseases. In this review, we systematically evaluate the recent trends in multi-omics data analysis based on deep learning techniques and their application in disease prediction. We highlight the current challenges in the field and discuss how advances in deep learning methods and their optimization for application is vital in overcoming them. Ultimately, this review promotes the development of novel deep-learning methodologies for data integration, which is essential for disease detection and treatment.

5.
Comput Biol Med ; 157: 106773, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-36924731

RESUMO

Recently, small open reading frames (sORFs) in long noncoding RNA (lncRNA) have been demonstrated to encode small peptides that can help study the mechanisms of growth and development in organisms. Since machine learning-based computational methods are less costly compared with biological experiments, they can be used to identify sORFs and provide a basis for biological experiments. However, few computational methods and data resources have been exploited for identifying sORFs in plant lncRNA. Besides, machine learning models produce underperforming classifiers when faced with a class-imbalance problem. In this study, an alternative method called SMOTE based on weighted cosine distance (WCDSMOTE) which enables interaction with feature selection is put forward to synthesize minority class samples and weighted edited nearest neighbor (WENN) is applied to clean up majority class samples, thus, hybrid sampling WCDSMOTE-ENN is proposed to deal with imbalanced datasets with the multi-angle feature. A heterogeneous classifier ensemble is introduced to complete the classification task. Therefore, a novel computational method that is based on class-imbalance learning to identify the sORFs with coding potential in plant lncRNA (sORFplnc) is presented. Experimental results manifest that sORFplnc outperforms existing computational methods in identifying sORFs with coding potential. We anticipate that the proposed work can be a reference for relevant research and contribute to agriculture and biomedicine.


Assuntos
RNA Longo não Codificante , RNA Longo não Codificante/genética , Fases de Leitura Aberta/genética , Peptídeos , Plantas/genética , Aprendizado de Máquina
6.
J Comput Biol ; 28(1): 1-18, 2021 01.
Artigo em Inglês | MEDLINE | ID: mdl-32302512

RESUMO

Proteins are polypeptides essential in biological processes. Protein physical interactions are complemented by other types of functional relationship data including genetic interactions, knowledge about co-expression, and evolutionary pathways. Existing algorithms integrate protein interaction and gene expression data to retrieve context-specific subnetworks composed of genes/proteins with known and unknown functions. However, most protein function prediction algorithms fail to exploit diverse intrinsic information in feature and label spaces. We develop a novel integrative method based on differential Co-expression analysis and Neighbor-voting algorithm for Protein Function Prediction, namely CNPFP. The method integrates heterogeneous data and exploits intrinsic and latent linkages via global iterative approach and genomic features. CNPFP performs three tasks: clustering, differential co-expression analysis, and predicts protein functions. Our aim is to identify yeast cell cycle-specific proteins linked to differentially expressed proteins in the protein-protein interaction network. To capture intrinsic information, CNPFP selects the most relevant feature subset based on global iterative neighbor-voting algorithm. We identify eight condition-specific modules. The most relevant subnetwork has 87 genes highly enriched with cyclin-dependent kinases, a protein kinase relevant for cell cycle regulation. We present comprehensive annotations for 3538 Saccharomyces cerevisiae proteins. Our method achieves an AUROC of 0.9862, accuracy of 0.9710, and F-score of 0.9691. From the results, we can summarize that exploiting intrinsic nature of protein relationships improves the quality of function prediction. Thus, the proposed method is useful in functional genomics studies.


Assuntos
Genômica/métodos , Mapeamento de Interação de Proteínas/métodos , Análise por Conglomerados , Perfilação da Expressão Gênica/métodos , Saccharomyces cerevisiae , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo , Transcriptoma
7.
Cells ; 8(6)2019 05 30.
Artigo em Inglês | MEDLINE | ID: mdl-31151273

RESUMO

Long non-protein-coding RNAs (lncRNAs) identification and analysis are pervasive in transcriptome studies due to their roles in biological processes. In particular, lncRNA-protein interaction has plausible relevance to gene expression regulation and in cellular processes such as pathogen resistance in plants. While lncRNA-protein interaction has been studied in animals, there has yet to be extensive research in plants. In this paper, we propose a novel plant lncRNA-protein interaction prediction method, namely PLRPIM, which combines deep learning and shallow machine learning methods. The selection of an optimal feature subset and subsequent efficient compression are significant challenges for deep learning models. The proposed method adopts k-mer and extracts high-level abstraction sequence-based features using stacked sparse autoencoder. Based on the extracted features, the fusion of random forest (RF) and light gradient boosting machine (LGBM) is used to build the prediction model. The performances are evaluated on Arabidopsis thaliana and Zea mays datasets. Results from experiments demonstrate PLRPIM's superiority compared with other prediction tools on the two datasets. Based on 5-fold cross-validation, we obtain 89.98% and 93.44% accuracy, 0.954 and 0.982 AUC for Arabidopsis thaliana and Zea mays, respectively. PLRPIM predicts potential lncRNA-protein interaction pairs effectively, which can facilitate lncRNA related research including function prediction.


Assuntos
Biologia Computacional/métodos , Proteínas de Plantas/metabolismo , RNA Longo não Codificante/genética , Algoritmos , Arabidopsis/genética , Ligação Proteica , RNA Longo não Codificante/metabolismo , Curva ROC , Zea mays/genética
8.
Math Biosci ; 274: 25-32, 2016 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-26869536

RESUMO

One of the challenging tasks of bioinformatics is to predict more accurate and confident protein functions from genomics and proteomics datasets. Computational approaches use a variety of high throughput experimental data, such as protein-protein interaction (PPI), protein sequences and phylogenetic profiles, to predict protein functions. This paper presents a method that uses transductive multi-label learning algorithm by integrating multiple data sources for classification. Multiple proteomics datasets are integrated to make inferences about functions of unknown proteins and use a directed bi-relational graph to assign labels to unannotated proteins. Our method, bi-relational graph based transductive multi-label function annotation (Bi-TMF) uses functional correlation and topological PPI network properties on both the training and testing datasets to predict protein functions through data fusion of the individual kernel result. The main purpose of our proposed method is to enhance the performance of classifier integration for protein function prediction algorithms. Experimental results demonstrate the effectiveness and efficiency of Bi-TMF on multi-sources datasets in yeast, human and mouse benchmarks. Bi-TMF outperforms other recently proposed methods.


Assuntos
Proteínas/química , Proteínas/metabolismo , Algoritmos , Animais , Biologia Computacional , Bases de Dados de Proteínas/estatística & dados numéricos , Ensaios de Triagem em Larga Escala/estatística & dados numéricos , Humanos , Conceitos Matemáticos , Camundongos , Mapeamento de Interação de Proteínas/estatística & dados numéricos , Mapas de Interação de Proteínas , Proteínas de Saccharomyces cerevisiae/química , Proteínas de Saccharomyces cerevisiae/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA