RESUMO
Understanding a protein's function based solely on its amino acid sequence is a crucial but intricate task in bioinformatics. Traditionally, this challenge has proven difficult. However, recent years have witnessed the rise of deep learning as a powerful tool, achieving significant success in protein function prediction. Their strength lies in their ability to automatically learn informative features from protein sequences, which can then be used to predict the protein's function. This study builds upon these advancements by proposing a novel model: CNN-CBAM+BiGRU. It incorporates a Convolutional Block Attention Module (CBAM) alongside BiGRUs. CBAM acts as a spotlight, guiding the CNN to focus on the most informative parts of the protein data, leading to more accurate feature extraction. BiGRUs, a type of Recurrent Neural Network (RNN), excel at capturing long-range dependencies within the protein sequence, which are essential for accurate function prediction. The proposed model integrates the strengths of both CNN-CBAM and BiGRU. This study's findings, validated through experimentation, showcase the effectiveness of this combined approach. For the human dataset, the suggested method outperforms the CNN-BIGRU+ATT model by +1.0â¯% for cellular components, +1.1â¯% for molecular functions, and +0.5â¯% for biological processes. For the yeast dataset, the suggested method outperforms the CNN-BIGRU+ATT model by +2.4â¯% for the cellular component, +1.2â¯% for molecular functions, and +0.6â¯% for biological processes.
Assuntos
Biologia Computacional , Redes Neurais de Computação , Proteínas , Biologia Computacional/métodos , Humanos , Proteínas/genética , Proteínas/metabolismo , Aprendizado Profundo , Bases de Dados de Proteínas , Algoritmos , Sequência de AminoácidosRESUMO
Proteins are the building blocks of all living things. Protein function must be ascertained if the molecular mechanism of life is to be understood. While CNN is good at capturing short-term relationships, GRU and LSTM can capture long-term dependencies. A hybrid approach that combines the complementary benefits of these deep-learning models motivates our work. Protein Language models, which use attention networks to gather meaningful data and build representations for proteins, have seen tremendous success in recent years processing the protein sequences. In this paper, we propose a hybrid CNN + BiGRU - Attention based model with protein language model embedding that effectively combines the output of CNN with the output of BiGRU-Attention for predicting protein functions. We evaluated the performance of our proposed hybrid model on human and yeast datasets. The proposed hybrid model improves the Fmax value over the state-of-the-art model SDN2GO for the cellular component prediction task by 1.9â¯%, for the molecular function prediction task by 3.8â¯% and for the biological process prediction task by 0.6â¯% for human dataset and for yeast dataset the cellular component prediction task by 2.4â¯%, for the molecular function prediction task by 5.2â¯% and for the biological process prediction task by 1.2â¯%.