RESUMEN
Knowledge of structural class plays an important role in understanding protein folding patterns. As a transitional stage in recognition of three-dimensional structure of a protein, protein structural class prediction is considered to be an important and challenging task. In this study, we firstly introduce a feature extraction technique which is based on tri-grams computed directly from position-specific scoring matrix (PSSM). A total of 8,000 features are extracted to represent a protein. Then, support vector machine-recursive feature elimination (SVM-RFE) is applied for feature selection and reduced features are input to a support vector machine (SVM) classifier to predict structural class of a given protein. To examine the effectiveness of our method, jackknife tests are performed on six widely used benchmark datasets, i.e., Z277, Z498, 1189, 25PDB, D640, and D1185. The overall accuracies of 97.1, 98.6, 92.5, 93.5, 94.2, and 95.9% are achieved on these datasets, respectively. Comparison of the proposed method with other prediction methods shows that our method is very promising to perform the prediction of protein structural class.
Asunto(s)
Bases de Datos de Proteínas , Proteínas/química , Proteínas/genética , Programas Informáticos , Estructura Terciaria de ProteínaRESUMEN
Knowledge of apoptosis proteins plays an important role in understanding the mechanism of programmed cell death. Obtaining information on subcellular location of apoptosis proteins is very helpful to reveal the apoptosis mechanism and understand the function of apoptosis proteins. Because of the cost in time and labor associated with large-scale wet-bench experiments, computational prediction of apoptosis proteins subcellular location becomes very important and many computational tools have been developed in the recent decades. Existing methods differ in the protein sequence representation techniques and classification algorithms adopted. In this study, we firstly introduce a sequence encoding scheme based on tri-grams computed directly from position-specific score matrices, which incorporates evolution information represented in the PSI-BLAST profile and sequence-order information. Then SVM-RFE algorithm is applied for feature selection and reduced vectors are input to a support vector machine classifier to predict subcellular location of apoptosis proteins. Jackknife tests on three widely used datasets show that our method provides the state-of-the-art performance in comparison with other existing methods.
Asunto(s)
Algoritmos , Proteínas Reguladoras de la Apoptosis/metabolismo , Posición Específica de Matrices de Puntuación , Bases de Datos de Proteínas , Humanos , Transporte de Proteínas , Curva ROC , Fracciones Subcelulares/metabolismo , Máquina de Vectores de SoporteRESUMEN
Change in temperature is often a major environmental factor in triggering waterborne disease outbreaks. Previous research has revealed temporal and spatial patterns of bacterial population in several aquatic ecosystems. To date, very little information is available on aquaculture environment. Here, we assessed environmental temperature effects on bacterial community composition in freshwater aquaculture system farming of Litopenaeus vannamei (FASFL). Water samples were collected over a one-year period, and aquatic bacteria were characterized by polymerase chain reaction-denaturing gradient gel electrophoresis (PCR-DGGE) and 16S rDNA pyrosequencing. Resulting DGGE fingerprints revealed a specific and dynamic bacterial population structure with considerable variation over the seasonal change, suggesting that environmental temperature was a key driver of bacterial population in the FASFL. Pyrosequencing data further demonstrated substantial difference in bacterial community composition between the water at higher (WHT) and at lower (WLT) temperatures in the FASFL. Actinobacteria, Proteobacteria and Bacteroidetes were the highest abundant phyla in the FASFL, however, a large number of unclassified bacteria contributed the most to the observed variation in phylogenetic diversity. The WHT harbored remarkably higher diversity and richness in bacterial composition at genus and species levels when compared to the WLT. Some potential pathogenenic species were identified in both WHT and WLT, providing data in support of aquatic animal health management in the aquaculture industry.
Asunto(s)
Bacterias/aislamiento & purificación , Agua Dulce/microbiología , Actinobacteria/clasificación , Actinobacteria/genética , Actinobacteria/aislamiento & purificación , Animales , Acuicultura , Bacterias/clasificación , Bacterias/genética , Bacteroidetes/clasificación , Bacteroidetes/genética , Bacteroidetes/aislamiento & purificación , Electroforesis en Gel de Gradiente Desnaturalizante , Penaeidae , Filogenia , Proteobacteria/clasificación , Proteobacteria/genética , Proteobacteria/aislamiento & purificación , ARN Ribosómico 16S/análisis , Análisis de Secuencia de ARN , TemperaturaRESUMEN
Structural class characterizes the overall folding type of a protein or its domain. Many methods have been proposed to improve the prediction accuracy of protein structural class in recent years, but it is still a challenge for the low-similarity sequences. In this study, we introduce a feature extraction technique based on auto cross covariance (ACC) transformation of position-specific score matrix (PSSM) to represent a protein sequence. Then support vector machine-recursive feature elimination (SVM-RFE) is adopted to select top K features according to their importance and these features are input to a support vector machine (SVM) to conduct the prediction. Performance evaluation of the proposed method is performed using the jackknife test on three low-similarity datasets, i.e., D640, 1189 and 25PDB. By means of this method, the overall accuracies of 97.2%, 96.2%, and 93.3% are achieved on these three datasets, which are higher than those of most existing methods. This suggests that the proposed method could serve as a very cost-effective tool for predicting protein structural class especially for low-similarity datasets.