RESUMEN
This paper applies discrete wavelet transform (DWT) with various protein substitution models to find functional similarity of proteins with low identity. A new metric, 'S' function, based on the DWT is proposed to measure the pair-wise similarity. We also develop a segmentation technique, combined with DWT, to handle long protein sequences. The results are compared with those using the pair-wise alignment and PSI-BLAST.
Asunto(s)
Secuencia de Aminoácidos , Sustitución de Aminoácidos , Homología Estructural de Proteína , Simulación por ComputadorRESUMEN
In our previous work, we developed a computational tool, PreK-ClassK-ClassKv, to predict and classify potassium (K+) channels. For K+ channel prediction (PreK) and classification at family level (ClassK), this method performs well. However, it does not perform so well in classifying voltage-gated potassium (Kv) channels (ClassKv). In this paper, a new method based on the local sequence information of Kv channels is introduced to classify Kv channels. Six transmembrane domains of a Kv channel protein are used to define a protein, and the dipeptide composition technique is used to transform an amino acid sequence to a numerical sequence. A Kv channel protein is represented by a vector with 2000 elements, and a support vector machine algorithm is applied to classify Kv channels. This method shows good performance with averages of total accuracy (Acc), sensitivity (SE), specificity (SP), reliability (R) and Matthews correlation coefficient (MCC) of 98.0%, 89.9%, 100%, 0.95 and 0.94 respectively. The results indicate that the local sequence information-based method is better than the global sequence information-based method to classify Kv channels.
Asunto(s)
Canales de Potasio con Entrada de Voltaje/genética , Algoritmos , Animales , Inteligencia Artificial , Biología Computacional/métodos , Humanos , Modelos Biológicos , Modelos Estadísticos , Péptidos/química , Canales de Potasio con Entrada de Voltaje/clasificación , Reproducibilidad de los Resultados , Sensibilidad y Especificidad , Alineación de Secuencia , Análisis de Secuencia de Proteína/métodosRESUMEN
Although the sequence information on G-protein coupled receptors (GPCRs) continues to grow, many GPCRs remain orphaned (i.e. ligand specificity unknown) or poorly characterized with little structural information available, so an automated and reliable method is badly needed to facilitate the identification of novel receptors. In this study, a method of fast Fourier transform-based support vector machine has been developed for predicting GPCR subfamilies according to protein's hydrophobicity. In classifying Class B, C, D and F subfamilies, the method achieved an overall Matthe's correlation coefficient and accuracy of 0.95 and 93.3%, respectively, when evaluated using the jackknife test. The method achieved an accuracy of 100% on the Class B independent dataset. The results show that this method can classify GPCR subfamilies as well as their functional classification with high accuracy. A web server implementing the prediction is available at http://chem.scu.edu.cn/blast/Pred-GPCR.