Pesquisa | Biblioteca Virtual em Saúde

MM-StackEns: A new deep multimodal stacked generalization approach for protein-protein interaction prediction.

Albu, Alexandra-Ioana; Bocicor, Maria-Iuliana; Czibula, Gabriela.

Comput Biol Med ; 153: 106526, 2023 02.

Artigo em Inglês | MEDLINE | ID: mdl-36623437

RESUMO

Accurate in-silico identification of protein-protein interactions (PPIs) is a long-standing problem in biology, with important implications in protein function prediction and drug design. Current computational approaches predominantly use a single data modality for describing protein pairs, which may not fully capture the characteristics relevant for identifying PPIs. Another limitation of existing methods is their poor generalization to proteins outside the training graph. In this paper, we aim to address these shortcomings by proposing a new ensemble approach for PPI prediction, which learns information from two modalities, corresponding to pairs of sequences and to the graph formed by the training proteins and their interactions. Our approach uses a siamese neural network to process sequence information, while graph attention networks are employed for the network view. For capturing the relationships between the proteins in a pair, we design a new feature fusion module, based on computing the distance between the distributions corresponding to the two proteins. The prediction is made using a stacked generalization procedure, in which the final classifier is represented by a Logistic Regression model trained on the scores predicted by the sequence and graph models. Additionally, we show that protein sequence embeddings obtained using pretrained language models can significantly improve the generalization of PPI methods. The experimental results demonstrate the good performance of our approach, which surpasses all the related work on two Yeast data sets, while outperforming the majority of literature approaches on two Human data sets and on independent multi-species data sets.

Assuntos

Redes Neurais de Computação , Proteínas , Humanos , Proteínas/metabolismo , Sequência de Aminoácidos , Saccharomyces cerevisiae/metabolismo , Aprendizagem

AutoPPI: An Ensemble of Deep Autoencoders for Protein-Protein Interaction Prediction.

Czibula, Gabriela; Albu, Alexandra-Ioana; Bocicor, Maria Iuliana; Chira, Camelia.

Entropy (Basel) ; 23(6)2021 May 21.

Artigo em Inglês | MEDLINE | ID: mdl-34064042

RESUMO

Proteins are essential molecules, that must correctly perform their roles for the good health of living organisms. The majority of proteins operate in complexes and the way they interact has pivotal influence on the proper functioning of such organisms. In this study we address the problem of protein-protein interaction and we propose and investigate a method based on the use of an ensemble of autoencoders. Our approach, entitled AutoPPI, adopts a strategy based on two autoencoders, one for each type of interactions (positive and negative) and we advance three types of neural network architectures for the autoencoders. Experiments were performed on several data sets comprising proteins from four different species. The results indicate good performances of our proposed model, with accuracy and AUC values of over 0.97 in all cases. The best performing model relies on a Siamese architecture in both the encoder and the decoder, which advantageously captures common features in protein pairs. Comparisons with other machine learning techniques applied for the same problem prove that AutoPPI outperforms most of its contenders, for the considered data sets.

Temporal ordering of cancer microarray data through a reinforcement learning based approach.

Czibula, Gabriela; Bocicor, Iuliana M; Czibula, Istvan-Gergely.

PLoS One ; 8(4): e60883, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-23565283

RESUMO

Temporal modeling and analysis and more specifically, temporal ordering are very important problems within the fields of bioinformatics and computational biology, as the temporal analysis of the events characterizing a certain biological process could provide significant insights into its development and progression. Particularly, in the case of cancer, understanding the dynamics and the evolution of this disease could lead to better methods for prediction and treatment. In this paper we tackle, from a computational perspective, the temporal ordering problem, which refers to constructing a sorted collection of multi-dimensional biological data, collection that reflects an accurate temporal evolution of biological systems. We introduce a novel approach, based on reinforcement learning, more precisely, on Q-learning, for the biological temporal ordering problem. The experimental evaluation is performed using several DNA microarray data sets, two of which contain cancer gene expression data. The obtained solutions are correlated either to the given correct ordering (in the cases where this is provided for validation), or to the overall survival time of the patients (in the case of the cancer data sets), thus confirming a good performance of the proposed model and indicating the potential of our proposal.

Assuntos

Biologia Computacional/métodos , Neoplasias/genética , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Humanos

Promoter sequences prediction using relational association rule mining.

Czibula, Gabriela; Bocicor, Maria-Iuliana; Czibula, Istvan Gergely.

Evol Bioinform Online ; 8: 181-96, 2012.

Artigo em Inglês | MEDLINE | ID: mdl-22563233

RESUMO

In this paper we are approaching, from a computational perspective, the problem of promoter sequences prediction, an important problem within the field of bioinformatics. As the conditions for a DNA sequence to function as a promoter are not known, machine learning based classification models are still developed to approach the problem of promoter identification in the DNA. We are proposing a classification model based on relational association rules mining. Relational association rules are a particular type of association rules and describe numerical orderings between attributes that commonly occur over a data set. Our classifier is based on the discovery of relational association rules for predicting if a DNA sequence contains or not a promoter region. An experimental evaluation of the proposed model and comparison with similar existing approaches is provided. The obtained results show that our classifier overperforms the existing techniques for identifying promoter sequences, confirming the potential of our proposal.

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA