RESUMEN
A composite vector method for predicting beta-hairpin motifs in proteins is proposed by combining the score of matrix, increment of diversity, the value of distance and auto-correlation information to express the information of sequence. The prediction is based on analysis of data from 3,088 non-homologous protein chains including 6,035 beta-hairpin motifs and 2,738 non-beta-hairpin motifs. The overall accuracy of prediction and Matthew's correlation coefficient are 83.1% and 0.59, respectively. In addition, by using the same methods, the accuracy of 80.7% and Matthew's correlation coefficient of 0.61 are obtained for other dataset with 2,878 non-homologous protein chains, which contains 4,884 beta-hairpin motifs and 4,310 non-beta-hairpin motifs. Better results are also obtained in the prediction of the beta-hairpin motifs of proteins by analysis of the CASP6 dataset.
Asunto(s)
Secuencias de Aminoácidos , Secuencia de Consenso , Modelos Moleculares , Proteoma/química , Algoritmos , Secuencia de Aminoácidos , Aminoácidos/química , Aminoácidos/clasificación , Animales , Inteligencia Artificial , Biología Computacional/métodos , Bases de Datos de Proteínas , Dipéptidos/química , Humanos , Proteómica/métodos , Programas Informáticos , Propiedades de SuperficieRESUMEN
By using of the composite vector with increment of diversity and scoring function to express the information of sequence, a support vector machine (SVM) algorithm for predicting beta-hairpin motifs is proposed. The prediction is done on a dataset of 3,088 non homologous proteins containing 6,027 beta-hairpins. The overall accuracy of prediction and Matthew's correlation coefficient are 79.9% and 0.59 for the independent testing dataset. In addition, a higher accuracy of 83.3% and Matthew's correlation coefficient of 0.67 in the independent testing dataset are obtained on a dataset previously used by Kumar et al. (Nuclic Acid Res 33:154-159). The performance of the method is also evaluated by predicting the beta-hairpins of in the CASP6 proteins, and the better results are obtained. Moreover, this method is used to predict four kinds of supersecondary structures. The overall accuracy of prediction is 64.5% for the independent testing dataset.
Asunto(s)
Inteligencia Artificial , Estructura Secundaria de Proteína , Algoritmos , Secuencia de Aminoácidos , Datos de Secuencia MolecularRESUMEN
BACKGROUND: Improving the health and well-being of women and children has long been a common goal throughout the world. From 2005 to 2011, Suizhou City had an annual average of 22,405 pregnant and parturient women (1.04% of the population) and 98,811 children under 5 years old (4.57% of the population). Understanding the status of maternal and child health care in Suizhou City during such period can provide the local health administrative department valid scientific bases upon which to construct effective policies. METHODS: Various types of annual reports on maternal and child health care were collected and analyzed retrospectively. RESULTS: Mortality rates for infants and children under 5 years showed a declining trend, while the rates of newborn home visiting, maternal health service coverage, and children health systematic management increased annually in Suizhou City from 2005 to 2011. The incidence of birth defect increased from 2.42 in 2005 to 3.89 in 2011. The maternal mortality ratio (MMR) fluctuated from 8.39/100,000 to 28.77/100,000, which was much lower than the national MMR (30.0/100,000 in 2010). The rates of hospitalized delivery and births attended by trained health personnel for pregnant women increased to more than 90% in the past five years. CONCLUSIONS: The improvements in maternal and child health care work in Suizhou City are worthy of recognition. Thus, the government should continue to increase funding in these areas to promote the complete enhancement of the maternal and child health care system.
Asunto(s)
Servicios de Salud del Niño/estadística & datos numéricos , Servicios de Salud Materna/estadística & datos numéricos , China/epidemiología , Femenino , Enfermedades de los Genitales Femeninos/diagnóstico , Enfermedades de los Genitales Femeninos/epidemiología , Geografía , Humanos , Incidencia , Lactante , Mortalidad Infantil , Tamizaje Masivo , Mortalidad Materna , EmbarazoRESUMEN
Identification on protein folding types is always based on the 27-class folds dataset, which was provided by Ding & Dubchak in 2001. But with the avalanche of protein sequences, fold data is also expanding, so it will be the inevitable trend to improve the existing dataset and expand more folding types. In this paper, we construct a multi-class protein fold dataset, which contains 3,457 protein chains with sequence identity below 35% and could be classified into 76 fold types. It was 4 times larger than Ding & Dubchak's dataset. Furthermore, our work proposes a novel approach of support vector machine based on optimal features. By combining motif frequency, low-frequency power spectral density, amino acid composition, the predicted secondary structure and the values of auto-correlation function as feature parameters set, the method adopts criterion of the maximum correlation and the minimum redundancy to filter these features and obtain a 95-dimensions optimal feature subset. Based on the ensemble classification strategy, with 95-dimensions optimal feature as input parameters of support vector machine, we identify the 76-class protein folds and overall accuracy measures up to 44.92% by independent test. In addition, this method has been further used to identify upgraded 27-class protein folds, overall accuracy achieves 66.56%. At last, we also test our method on Ding & Dubchak's 27-class folds dataset and obtained better identification results than most of the previous reported results.
Asunto(s)
Pliegue de Proteína , Proteínas/química , Proteínas/clasificación , Algoritmos , Biología Computacional , Simulación por Computador , Bases de Datos de Proteínas , Estructura Secundaria de Proteína , Estructura Terciaria de Proteína , Análisis de Secuencia de Proteína , Máquina de Vectores de SoporteRESUMEN
A novel method is presented for predicting ß-hairpin motifs in protein sequences. That is Random Forest algorithm on the basis of the multi-characteristic parameters, which include amino acids component of position, hydropathy component of position, predicted secondary structure information and value of auto-correlation function. Firstly, the method is trained and tested on a set of 8,291 ß-hairpin motifs and 6,865 non-ß-hairpin motifs. The overall accuracy and Matthew's correlation coefficient achieve 82.2% and 0.64 using 5-fold cross-validation, while they achieve 81.7% and 0.63 using the independent test. Secondly, the method is also tested on a set of 4,884 ß-hairpin motifs and 4,310 non-ß-hairpin motifs which is used in previous studies. The overall accuracy and Matthew's correlation coefficient achieve 80.9% and 0.61 for 5-fold cross-validation, while they achieve 80.6% and 0.60 for the independent test. Compared with the previous, the present result is better. Thirdly, 4,884 ß-hairpin motifs and 4,310 non-ß-hairpin motifs selected as the training set, and 8,291 ß-hairpin motifs and 6,865 non-ß-hairpin motifs selected as the independent testing set, the overall accuracy and Matthew's correlation coefficient achieve 81.5% and 0.63 with the independent test.