Pesquisa | BVS - MINISTÉRIO DA SAÚDE

Granular support vector machine to identify unknown structural classes of protein.

Hassan, Rohayanti; Othman, Razib M; Shah, Zuraini A.

Int J Data Min Bioinform ; 12(4): 451-67, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26510297

RESUMO

To date, classification of structural class using local protein structure rather than the whole structure has been gaining widespread attention. It is noted that the structural class lies in local composition or arrangement of secondary structure, while the threshold-based classification method has restricted rules in determining these structural classes. As a consequence, some of the structures are unknown. In order to determine these unknown structural classes, we propose a fusion algorithm, abbreviated as GSVM-SigLpsSCPred (Granular Support Vector Machine--with Significant Local protein structure for Structural Class Prediction), which consists of two major components, which are: optimal local protein structure to represent the feature vector and granular support vector machine to predict the unknown structural classes. The results highlight the performance of GSVM-SigLpsSCPred as an alternative computational method for low-identity sequences.

Assuntos

Algoritmos , Bases de Dados de Proteínas , Proteínas/classificação , Proteínas/genética , Análise de Sequência de Proteína/métodos , Máquina de Vetores de Suporte , Estrutura Secundária de Proteína

Multi-stage filtering for improving confidence level and determining dominant clusters in clustering algorithms of gene expression data.

Kasim, Shahreen; Deris, Safaai; Othman, Razib M.

Comput Biol Med ; 43(9): 1120-33, 2013 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-23930805

RESUMO

A drastic improvement in the analysis of gene expression has lead to new discoveries in bioinformatics research. In order to analyse the gene expression data, fuzzy clustering algorithms are widely used. However, the resulting analyses from these specific types of algorithms may lead to confusion in hypotheses with regard to the suggestion of dominant function for genes of interest. Besides that, the current fuzzy clustering algorithms do not conduct a thorough analysis of genes with low membership values. Therefore, we present a novel computational framework called the "multi-stage filtering-Clustering Functional Annotation" (msf-CluFA) for clustering gene expression data. The framework consists of four components: fuzzy c-means clustering (msf-CluFA-0), achieving dominant cluster (msf-CluFA-1), improving confidence level (msf-CluFA-2) and combination of msf-CluFA-0, msf-CluFA-1 and msf-CluFA-2 (msf-CluFA-3). By employing double filtering in msf-CluFA-1 and apriori algorithms in msf-CluFA-2, our new framework is capable of determining the dominant clusters and improving the confidence level of genes with lower membership values by means of which the unknown genes can be predicted.

Assuntos

Algoritmos , Perfilação da Expressão Gênica/métodos , Regulação Fúngica da Expressão Gênica/fisiologia , Genes Fúngicos/fisiologia , Saccharomyces cerevisiae/metabolismo , Software

Remote protein homology detection and fold recognition using two-layer support vector machine classifiers.

Muda, Hilmi M; Saad, Puteh; Othman, Razib M.

Comput Biol Med ; 41(8): 687-99, 2011 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-21704312

RESUMO

Remote protein homology detection and fold recognition refer to detection of structural homology in proteins where there are small or no similarities in the sequence. To detect protein structural classes from protein primary sequence information, homology-based methods have been developed, which can be divided to three types: discriminative classifiers, generative models for protein families and pairwise sequence comparisons. Support Vector Machines (SVM) and Neural Networks (NN) are two popular discriminative methods. Recent studies have shown that SVM has fast speed during training, more accurate and efficient compared to NN. We present a comprehensive method based on two-layer classifiers. The 1st layer is used to detect up to superfamily and family in SCOP hierarchy using optimized binary SVM classification rules. It used the kernel function known as the Bio-kernel, which incorporates the biological information in the classification process. The 2nd layer uses discriminative SVM algorithm with string kernel that will detect up to protein fold level in SCOP hierarchy. The results obtained were evaluated using mean ROC and mean MRFP and the significance of the result produced with pairwise t-test was tested. Experimental results show that our approaches significantly improve the performance of remote protein homology detection and fold recognition for all three different version SCOP datasets (1.53, 1.67 and 1.73). We achieved 4.19% improvements in term of mean ROC in SCOP 1.53, 4.75% in SCOP 1.67 and 4.03% in SCOP 1.73 datasets when compared to the result produced by well-known methods. The combination of first layer and second layer of BioSVM-2L performs well in remote homology detection and fold recognition even in three different versions of datasets.

Assuntos

Algoritmos , Reconhecimento Automatizado de Padrão/métodos , Proteínas/química , Proteômica/métodos , Análise de Sequência de Proteína/métodos , Animais , Inteligência Artificial , Bases de Dados de Proteínas , Humanos , Dobramento de Proteína , Proteínas/classificação , Curva ROC , Homologia de Sequência de Aminoácidos

Utilizing shared interacting domain patterns and Gene Ontology information to improve protein-protein interaction prediction.

Roslan, Rosfuzah; Othman, Razib M; Shah, Zuraini A; Kasim, Shahreen; Asmuni, Hishammuddin; Taliba, Jumail; Hassan, Rohayanti; Zakaria, Zalmiyah.

Comput Biol Med ; 40(6): 555-64, 2010 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-20417930

RESUMO

Protein-protein interactions (PPIs) play a significant role in many crucial cellular operations such as metabolism, signaling and regulations. The computational methods for predicting PPIs have shown tremendous growth in recent years, but problem such as huge false positive rates has contributed to the lack of solid PPI information. We aimed at enhancing the overlap between computational predictions and experimental results in an effort to partially remove PPIs falsely predicted. The use of protein function predictor named PFP() that are based on shared interacting domain patterns is introduced in this study with the purpose of aiding the Gene Ontology Annotations (GOA). We used GOA and PFP() as agents in a filtering process to reduce false positive pairs in the computationally predicted PPI datasets. The functions predicted by PFP() were extracted from cross-species PPI data in order to assign novel functional annotations for the uncharacterized proteins and also as additional functions for those that are already characterized by the GO (Gene Ontology). The implementation of PFP() managed to increase the chances of finding matching function annotation for the first rule in the filtration process as much as 20%. To assess the capability of the proposed framework in filtering false PPIs, we applied it on the available S. cerevisiae PPIs and measured the performance in two aspects, the improvement made indicated as Signal-to-Noise Ratio (SNR) and the strength of improvement, respectively. The proposed filtering framework significantly achieved better performance than without it in both metrics.

Assuntos

Biologia Computacional/métodos , Modelos Estatísticos , Domínios e Motivos de Interação entre Proteínas , Mapeamento de Interação de Proteínas/métodos , Proteínas/química , Proteínas/fisiologia , Algoritmos , Animais , Proteínas de Caenorhabditis elegans , Análise por Conglomerados , Bases de Dados Genéticas , Proteínas de Drosophila , Humanos , Proteínas de Saccharomyces cerevisiae , Terminologia como Assunto

SPlitSSI-SVM: an algorithm to reduce the misleading and increase the strength of domain signal.

Kalsum, Hassan U; Shah, Zuraini A; Othman, Razib M; Hassan, Rohayanti; Rahim, Shafry M; Asmuni, Hishammudin; Taliba, Jumail; Zakaria, Zalmiyah.

Comput Biol Med ; 39(11): 1013-9, 2009 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-19720371

RESUMO

Protein domains contain information about the prediction of protein structure, function, evolution and design since the protein sequence may contain several domains with different or the same copies of the protein domain. In this study, we proposed an algorithm named SplitSSI-SVM that works with the following steps. First, the training and testing datasets are generated to test the SplitSSI-SVM. Second, the protein sequence is split into subsequence based on order and disorder regions. The protein sequence that is more than 600 residues is split into subsequences to investigate the effectiveness of the protein domain prediction based on subsequence. Third, multiple sequence alignment is performed to predict the secondary structure using bidirectional recurrent neural networks (BRNN) where BRNN considers the interaction between amino acids. The information of about protein secondary structure is used to increase the protein domain boundaries signal. Lastly, support vector machines (SVM) are used to classify the protein domain into single-domain, two-domain and multiple-domain. The SplitSSI-SVM is developed to reduce misleading signal, lower protein domain signal caused by primary structure of protein sequence and to provide accurate classification of the protein domain. The performance of SplitSSI-SVM is evaluated using sensitivity and specificity on single-domain, two-domain and multiple-domain. The evaluation shows that the SplitSSI-SVM achieved better results compared with other protein domain predictors such as DOMpro, GlobPlot, Dompred-DPS, Mateo, Biozon, Armadillo, KemaDom, SBASE, HMMPfam and HMMSMART especially in two-domain and multiple-domain.

Assuntos

Algoritmos , Modelos Teóricos , Alinhamento de Sequência

A genetic similarity algorithm for searching the Gene Ontology terms and annotating anonymous protein sequences.

Othman, Razib M; Deris, Safaai; Illias, Rosli M.

J Biomed Inform ; 41(1): 65-81, 2008 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-17681495

RESUMO

A genetic similarity algorithm is introduced in this study to find a group of semantically similar Gene Ontology terms. The genetic similarity algorithm combines semantic similarity measure algorithm with parallel genetic algorithm. The semantic similarity measure algorithm is used to compute the similitude strength between the Gene Ontology terms. Then, the parallel genetic algorithm is employed to perform batch retrieval and to accelerate the search in large search space of the Gene Ontology graph. The genetic similarity algorithm is implemented in the Gene Ontology browser named basic UTMGO to overcome the weaknesses of the existing Gene Ontology browsers which use a conventional approach based on keyword matching. To show the applicability of the basic UTMGO, we extend its structure to develop a Gene Ontology -based protein sequence annotation tool named extended UTMGO. The objective of developing the extended UTMGO is to provide a simple and practical tool that is capable of producing better results and requires a reasonable amount of running time with low computing cost specifically for offline usage. The computational results and comparison with other related tools are presented to show the effectiveness of the proposed algorithm and tools.

Assuntos

Algoritmos , Sistemas de Gerenciamento de Base de Dados , Bases de Dados de Proteínas , Proteínas/química , Proteínas/classificação , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Armazenamento e Recuperação da Informação/métodos , Dados de Sequência Molecular , Processamento de Linguagem Natural , Reconhecimento Automatizado de Padrão/métodos , Homologia de Sequência de Aminoácidos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA