Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Mais filtros

Base de dados
Tipo de documento
País de afiliação
Intervalo de ano de publicação
1.
Bioinformatics ; 39(3)2023 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-36883697

RESUMO

MOTIVATION: Protein function annotation is fundamental to understanding biological mechanisms. The abundant genome-scale protein-protein interaction (PPI) networks, together with other protein biological attributes, provide rich information for annotating protein functions. As PPI networks and biological attributes describe protein functions from different perspectives, it is highly challenging to cross-fuse them for protein function prediction. Recently, several methods combine the PPI networks and protein attributes via the graph neural networks (GNNs). However, GNNs may inherit or even magnify the bias caused by noisy edges in PPI networks. Besides, GNNs with stacking of many layers may cause the over-smoothing problem of node representations. RESULTS: We develop a novel protein function prediction method, CFAGO, to integrate single-species PPI networks and protein biological attributes via a multi-head attention mechanism. CFAGO is first pre-trained with an encoder-decoder architecture to capture the universal protein representation of the two sources. It is then fine-tuned to learn more effective protein representations for protein function prediction. Benchmark experiments on human and mouse datasets show CFAGO outperforms state-of-the-art single-species network-based methods by at least 7.59%, 6.90%, 11.68% in terms of m-AUPR, M-AUPR, and Fmax, respectively, demonstrating cross-fusion by multi-head attention mechanism can greatly improve the protein function prediction. We further evaluate the quality of captured protein representations in terms of Davies Bouldin Score, whose results show that cross-fused protein representations by multi-head attention mechanism are at least 2.7% better than that of original and concatenated representations. We believe CFAGO is an effective tool for protein function prediction. AVAILABILITY AND IMPLEMENTATION: The source code of CFAGO and experiments data are available at: http://bliulab.net/CFAGO/.


Assuntos
Algoritmos , Mapeamento de Interação de Proteínas , Animais , Humanos , Camundongos , Mapeamento de Interação de Proteínas/métodos , Redes Neurais de Computação , Software , Mapas de Interação de Proteínas , Proteínas/metabolismo
2.
Brief Bioinform ; 22(2): 1972-1983, 2021 03 22.
Artigo em Inglês | MEDLINE | ID: mdl-32065215

RESUMO

Protein complexes are key units for studying a cell system. During the past decades, the genome-scale protein-protein interaction (PPI) data have been determined by high-throughput approaches, which enables the identification of protein complexes from PPI networks. However, the high-throughput approaches often produce considerable fraction of false positive and negative samples. In this study, we propose the mutual important interacting partner relation to reflect the co-complex relationship of two proteins based on their interaction neighborhoods. In addition, a new algorithm called idenPC-MIIP is developed to identify protein complexes from weighted PPI networks. The experimental results on two widely used datasets show that idenPC-MIIP outperforms 17 state-of-the-art methods, especially for identification of small protein complexes with only two or three proteins.


Assuntos
Biologia Computacional/métodos , Mapas de Interação de Proteínas , Algoritmos , Conjuntos de Dados como Assunto , Ensaios de Triagem em Larga Escala/métodos
3.
Brief Bioinform ; 22(4)2021 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-33333549

RESUMO

Protein complexes play important roles in most cellular processes. The available genome-wide protein-protein interaction (PPI) data make it possible for computational methods identifying protein complexes from PPI networks. However, PPI datasets usually contain a large ratio of false positive noise. Moreover, different types of biomolecules in a living cell cooperate to form a union interaction network. Because previous computational methods focus only on PPIs ignoring other types of biomolecule interactions, their predicted protein complexes often contain many false positive proteins. In this study, we develop a novel computational method idenPC-CAP to identify protein complexes from the RNA-protein heterogeneous interaction network consisting of RNA-RNA interactions, RNA-protein interactions and PPIs. By considering interactions among proteins and RNAs, the new method reduces the ratio of false positive proteins in predicted protein complexes. The experimental results demonstrate that idenPC-CAP outperforms the other state-of-the-art methods in this field.


Assuntos
Biologia Computacional , Bases de Dados Genéticas , Mapeamento de Interação de Proteínas , Mapas de Interação de Proteínas , Proteínas de Ligação a RNA , RNA , RNA/genética , RNA/metabolismo , Proteínas de Ligação a RNA/genética , Proteínas de Ligação a RNA/metabolismo
4.
Brief Bioinform ; 21(5): 1531-1548, 2020 09 25.
Artigo em Inglês | MEDLINE | ID: mdl-31631226

RESUMO

Protein complexes are the fundamental units for many cellular processes. Identifying protein complexes accurately is critical for understanding the functions and organizations of cells. With the increment of genome-scale protein-protein interaction (PPI) data for different species, various computational methods focus on identifying protein complexes from PPI networks. In this article, we give a comprehensive and updated review on the state-of-the-art computational methods in the field of protein complex identification, especially focusing on the newly developed approaches. The computational methods are organized into three categories, including cluster-quality-based methods, node-affinity-based methods and ensemble clustering methods. Furthermore, the advantages and disadvantages of different methods are discussed, and then, the performance of 17 state-of-the-art methods is evaluated on two widely used benchmark data sets. Finally, the bottleneck problems and their potential solutions in this important field are discussed.


Assuntos
Biologia Computacional/métodos , Proteínas/química , Algoritmos , Análise por Conglomerados , Mapas de Interação de Proteínas
5.
Bioinform Adv ; 3(1): vbad043, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37113248

RESUMO

Motivation: Enhancers are important cis-regulatory elements that regulate a wide range of biological functions and enhance the transcription of target genes. Although many feature extraction methods have been proposed to improve the performance of enhancer identification, they cannot learn position-related multiscale contextual information from raw DNA sequences. Results: In this article, we propose a novel enhancer identification method (iEnhancer-ELM) based on BERT-like enhancer language models. iEnhancer-ELM tokenizes DNA sequences with multi-scale k-mers and extracts contextual information of different scale k-mers related with their positions via an multi-head attention mechanism. We first evaluate the performance of different scale k-mers, then ensemble them to improve the performance of enhancer identification. The experimental results on two popular benchmark datasets show that our model outperforms state-of-the-art methods. We further illustrate the interpretability of iEnhancer-ELM. For a case study, we discover 30 enhancer motifs via a 3-mer-based model, where 12 of motifs are verified by STREME and JASPAR, demonstrating our model has a potential ability to unveil the biological mechanism of enhancer. Availability and implementation: The models and associated code are available at https://github.com/chen-bioinfo/iEnhancer-ELM. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA