Pesquisa | Secretaria de Estado da Saúde

Predicting the function of rice proteins through Multi-instance Multi-label Learning based on multiple features fusion.

Liu, Jing; Tang, Xinghua; Cui, Shuanglong; Guan, Xiao.

Brief Bioinform ; 23(3)2022 05 13.

Artigo em Inglês | MEDLINE | ID: mdl-35325033

RESUMO

There are a large number of unannotated proteins with unknown functions in rice, which are difficult to be verified by biological experiments. Therefore, computational method is one of the mainstream methods for rice proteins function prediction. Two representative rice proteins, indica protein and japonica protein, are selected as the experimental dataset. In this paper, two feature extraction methods (the residue couple model method and the pseudo amino acid composition method) and the Principal Component Analysis method are combined to design protein descriptive features. Moreover, based on the state-of-the-art MIML algorithm EnMIMLNN, a novel MIML learning framework MK-EnMIMLNN is proposed. And the MK-EnMIMLNN algorithm is designed by learning multiple kernel fusion function neural network. The experimental results show that the hybrid feature extraction method is better than the single feature extraction method. More importantly, the MK-EnMIMLNN algorithm is superior to most classic MIML learning algorithms, which proves the effectiveness of the MK-EnMIMLNN algorithm in rice proteins function prediction.

Assuntos

Oryza , Algoritmos , Redes Neurais de Computação , Oryza/genética , Análise de Componente Principal , Proteínas/química

Multi-instance multi-label distance metric learning for genome-wide protein function prediction.

Xu, Yonghui; Min, Huaqing; Song, Hengjie; Wu, Qingyao.

Comput Biol Chem ; 63: 30-40, 2016 08.

Artigo em Inglês | MEDLINE | ID: mdl-26923212

RESUMO

Multi-instance multi-label (MIML) learning has been proven to be effective for the genome-wide protein function prediction problems where each training example is associated with not only multiple instances but also multiple class labels. To find an appropriate MIML learning method for genome-wide protein function prediction, many studies in the literature attempted to optimize objective functions in which dissimilarity between instances is measured using the Euclidean distance. But in many real applications, Euclidean distance may be unable to capture the intrinsic similarity/dissimilarity in feature space and label space. Unlike other previous approaches, in this paper, we propose to learn a multi-instance multi-label distance metric learning framework (MIMLDML) for genome-wide protein function prediction. Specifically, we learn a Mahalanobis distance to preserve and utilize the intrinsic geometric information of both feature space and label space for MIML learning. In addition, we try to deal with the sparsely labeled data by giving weight to the labeled data. Extensive experiments on seven real-world organisms covering the biological three-domain system (i.e., archaea, bacteria, and eukaryote; Woese et al., 1990) show that the MIMLDML algorithm is superior to most state-of-the-art MIML learning algorithms.

Assuntos

Genoma , Aprendizado de Máquina , Proteínas/genética , Algoritmos

Sparse Markov chain-based semi-supervised multi-instance multi-label method for protein function prediction.

Han, Chao; Chen, Jian; Wu, Qingyao; Mu, Shuai; Min, Huaqing.

J Bioinform Comput Biol ; 13(5): 1543001, 2015 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-26493682

RESUMO

Automated assignment of protein function has received considerable attention in recent years for genome-wide study. With the rapid accumulation of genome sequencing data produced by high-throughput experimental techniques, the process of manually predicting functional properties of proteins has become increasingly cumbersome. Such large genomics data sets can only be annotated computationally. However, automated assignment of functions to unknown protein is challenging due to its inherent difficulty and complexity. Previous studies have revealed that solving problems involving complicated objects with multiple semantic meanings using the multi-instance multi-label (MIML) framework is effective. For the protein function prediction problems, each protein object in nature may associate with distinct structural units (instances) and multiple functional properties (class labels) where each unit is described by an instance and each functional property is considered as a class label. Thus, it is convenient and natural to tackle the protein function prediction problem by using the MIML framework. In this paper, we propose a sparse Markov chain-based semi-supervised MIML method, called Sparse-Markov. A sparse transductive probability graph is constructed to encode the affinity information of the data based on ensemble of Hausdorff distance metrics. Our goal is to exploit the affinity between protein objects in the sparse transductive probability graph to seek a sparse steady state probability of the Markov chain model to do protein function prediction, such that two proteins are given similar functional labels if they are close to each other in terms of an ensemble Hausdorff distance in the graph. Experimental results on seven real-world organism data sets covering three biological domains show that our proposed Sparse-Markov method is able to achieve better performance than four state-of-the-art MIML learning algorithms.

Assuntos

Cadeias de Markov , Proteínas/química , Proteínas/fisiologia , Aprendizado de Máquina Supervisionado , Algoritmos , Animais , Biologia Computacional , Bases de Dados de Proteínas/estatística & dados numéricos , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Proteínas/genética

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa