Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 27
Filtrar
1.
Bioinformatics ; 34(22): 3781-3787, 2018 11 15.
Artículo en Inglés | MEDLINE | ID: mdl-29868708

RESUMEN

Motivation: MicroRNAs (miRNAs) are small non-coding RNAs that function in RNA silencing and post-transcriptional regulation of gene expression by targeting messenger RNAs (mRNAs). Because the underlying mechanisms associated with miRNA binding to mRNA are not fully understood, a major challenge of miRNA studies involves the identification of miRNA-target sites on mRNA. In silico prediction of miRNA-target sites can expedite costly and time-consuming experimental work by providing the most promising miRNA-target-site candidates. Results: In this study, we reported the design and implementation of DeepMirTar, a deep-learning-based approach for accurately predicting human miRNA targets at the site level. The predicted miRNA-target sites are those having canonical or non-canonical seed, and features, including high-level expert-designed, low-level expert-designed and raw-data-level, were used to represent the miRNA-target site. Comparison with other state-of-the-art machine-learning methods and existing miRNA-target-prediction tools indicated that DeepMirTar improved overall predictive performance. Availability and implementation: DeepMirTar is freely available at https://github.com/Bjoux2/DeepMirTar_SdA. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Aprendizaje Automático , Algoritmos , Biología Computacional , Regulación de la Expresión Génica , Humanos , MicroARNs , Interferencia de ARN , ARN Mensajero
2.
Med Sci Monit ; 25: 9913-9922, 2019 Dec 24.
Artículo en Inglés | MEDLINE | ID: mdl-31872802

RESUMEN

BACKGROUND Leptin is an adipokine related to overweight and cardiovascular diseases. However, the leptin expression level in epicardial adipose tissue (EAT) of humans and its association with coronary atherosclerosis has never been investigated. MATERIAL AND METHODS Patients receiving cardiac surgery were divided into a coronary artery disease group (CAD group) and a non-CAD group (NCAD group). Blood samples from coronary vein, biopsies of subcutaneous adipose tissue (SAT), and EAT were acquired during the surgery. Serum leptin level and leptin level in EAT and SAT were tested with ELISA, quantitative PCR, and immunohistochemistry and were compared between the CAD group and NCAD group, as well as between stenosis and non-stenosis subgroups. Logistic regression analysis was performed to explore the risk factors for coronary artery stenosis. RESULTS No statistically significant differences were found in demographic and clinical data between groups (all P>0.05). Serum leptin concentration and leptin expression in EAT and SAT of the CAD group were much higher in than in the NCAD group (all P<0.05). In subgroup analysis, there was no difference in serum leptin and expression in SAT of stenosis and non-stenosis patients (All P>0.05). The leptin expression level in EAT of stenosis patients was significantly higher than in non-stenosis patients (P=0.0431). By multivariate logistic regression analysis, we demonstrated that leptin expression level in EAT was an independent risk factor for coronary artery stenosis [OR=1.09, 95%CI (1.01±1.18), P=0.031]. CONCLUSIONS Leptin expression in EAT and SAT were both increased for CAD patients. Leptin expression in EAT was an independent risk factor for coronary atherosclerosis in the adjacent artery, while leptin in SAT was not associated.


Asunto(s)
Tejido Adiposo/metabolismo , Enfermedad de la Arteria Coronaria/metabolismo , Leptina/metabolismo , Pericardio/metabolismo , Adipoquinas/metabolismo , Anciano , Enfermedad de la Arteria Coronaria/sangre , Estenosis Coronaria/metabolismo , Femenino , Expresión Génica/genética , Humanos , Leptina/sangre , Leptina/genética , Masculino , Persona de Mediana Edad , Reacción en Cadena en Tiempo Real de la Polimerasa , Factores de Riesgo , Grasa Subcutánea/metabolismo
3.
Nucleic Acids Res ; 41(Web Server issue): W441-7, 2013 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-23729470

RESUMEN

Knowledge of subcellular localizations (SCLs) of plant proteins relates to their functions and aids in understanding the regulation of biological processes at the cellular level. We present PlantLoc, a highly accurate and fast webserver for predicting the multi-label SCLs of plant proteins. The PlantLoc server has two innovative characters: building localization motif libraries by a recursive method without alignment and Gene Ontology information; and establishing simple architecture for rapidly and accurately identifying plant protein SCLs without a machine learning algorithm. PlantLoc provides predicted SCLs results, confidence estimates and which is the substantiality motif and where it is located on the sequence. PlantLoc achieved the highest accuracy (overall accuracy of 80.8%) of identification of plant protein SCLs as benchmarked by using a new test dataset compared other plant SCL prediction webservers. The ability of PlantLoc to predict multiple sites was also significantly higher than for any other webserver. The predicted substantiality motifs of queries also have great potential for analysis of relationships with protein functional regions. The PlantLoc server is available at http://cal.tongji.edu.cn/PlantLoc/.


Asunto(s)
Proteínas de Plantas/química , Señales de Clasificación de Proteína , Programas Informáticos , Secuencias de Aminoácidos , Internet , Proteínas de Plantas/análisis , Análisis de Secuencia de Proteína
4.
Mol Cell Proteomics ; 11(7): M111.016808, 2012 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-22415040

RESUMEN

Identification of protein structural neighbors to a query is fundamental in structure and function prediction. Here we present BS-align, a systematic method to retrieve backbone string neighbors from primary sequences as templates for protein modeling. The backbone conformation of a protein is represented by the backbone string, as defined in Ramachandran space. The backbone string of a query can be accurately predicted by two innovative technologies: a knowledge-driven sequence alignment and encoding of a backbone string element profile. Then, the predicted backbone string is employed to align against a backbone string database and retrieve a set of backbone string neighbors. The backbone string neighbors were shown to be close to native structures of query proteins. BS-align was successfully employed to predict models of 10 membrane proteins with lengths ranging between 229 and 595 residues, and whose high-resolution structural determinations were difficult to elucidate both by experiment and prediction. The obtained TM-scores and root mean square deviations of the models confirmed that the models based on the backbone string neighbors retrieved by the BS-align were very close to the native membrane structures although the query and the neighbor shared a very low sequence identity. The backbone string system represents a new road for the prediction of protein structure from sequence, and suggests that the similarity of the backbone string would be more informative than describing a protein as belonging to a fold.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Proteínas de la Membrana/química , Secuencia de Aminoácidos , Bases de Datos de Proteínas , Humanos , Modelos Moleculares , Datos de Secuencia Molecular , Conformación Proteica , Proteus mirabilis , Alineación de Secuencia , Análisis de Secuencia de Proteína , Homología de Secuencia de Aminoácido , Homología Estructural de Proteína
5.
Nucleic Acids Res ; 40(Web Server issue): W298-302, 2012 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-22553364

RESUMEN

Many studies have demonstrated that shape string is an extremely important structure representation, since it is more complete than the classical secondary structure. The shape string provides detailed information also in the regions denoted random coil. But few services are provided for systematic analysis of protein shape string. To fill this gap, we have developed an accurate shape string predictor based on two innovative technologies: a knowledge-driven sequence alignment and a sequence shape string profile method. The performance on blind test data demonstrates that the proposed method can be used for accurate prediction of protein shape string. The DSP server provides both predicted shape string and sequence shape string profile for each query sequence. Using this information, the users can compare protein structure or display protein evolution in shape string space. The DSP server is available at both http://cheminfo.tongji.edu.cn/dsp/ and its main mirror http://chemcenter.tongji.edu.cn/dsp/.


Asunto(s)
Conformación Proteica , Programas Informáticos , Internet , Alineación de Secuencia , Análisis de Secuencia de Proteína
6.
Bioinformatics ; 28(1): 32-9, 2012 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-22065541

RESUMEN

MOTIVATION: The precise prediction of protein secondary structure is of key importance for the prediction of 3D structure and biological function. Although the development of many excellent methods over the last few decades has allowed the achievement of prediction accuracies of up to 80%, progress seems to have reached a bottleneck, and further improvements in accuracy have proven difficult. RESULTS: We propose for the first time a structural position-specific scoring matrix (SPSSM), and establish an unprecedented database of 9 million sequences and their SPSSMs. This database, when combined with a purpose-designed BLAST tool, provides a novel prediction tool: SPSSMPred. When the SPSSMPred was validated on a large dataset (10,814 entries), the Q3 accuracy of the protein secondary structure prediction was 93.4%. Our approach was tested on the two latest EVA sets; accuracies of 82.7 and 82.0% were achieved, far higher than can be achieved using other predictors. For further evaluation, we tested our approach on newly determined sequences (141 entries), and obtained an accuracy of 89.6%. For a set of low-homology proteins (40 entries), the SPSSMPred still achieved a Q3 value of 84.6%. AVAILABILITY: The SPSSMPred server is available at http://cal.tongji.edu.cn/SPSSMPred/ CONTACT: lith@tongji.edu.cn


Asunto(s)
Posición Específica de Matrices de Puntuación , Estructura Secundaria de Proteína , Proteínas/química , Animales , Humanos , Homología de Secuencia de Aminoácido
7.
Amino Acids ; 42(5): 1749-55, 2012 May.
Artículo en Inglés | MEDLINE | ID: mdl-21424809

RESUMEN

Numerous methods for predicting γ-turns in proteins have been developed. However, the results they generally provided are not very good, with a Matthews correlation coefficient (MCC)≤0.18. Here, an attempt has been made to develop a method to improve the accuracy of γ-turn prediction. First, we employ the geometric mean metric as optimal criterion to evaluate the performance of support vector machine for the highly imbalanced γ-turn dataset. This metric tries to maximize both the sensitivity and the specificity while keeping them balanced. Second, a predictor to generate protein shape string by structure alignment against the protein structure database has been designed and the predicted shape string is introduced as new variable for γ-turn prediction. Based on this perception, we have developed a new method for γ-turn prediction. After training and testing the benchmark dataset of 320 non-homologous protein chains using a fivefold cross-validation technique, the present method achieves excellent performance. The overall prediction accuracy Qtotal can achieve 92.2% and the MCC is 0.38, which outperform the existing γ-turn prediction methods. Our results indicate that the protein shape string is useful for predicting protein tight turns and it is reasonable to use the dihedral angle information as a variable for machine learning to predict protein folding. The dataset used in this work and the software to generate predicted shape string from structure database can be obtained from anonymous ftp site ftp://cheminfo.tongji.edu.cn/GammaTurnPrediction/ freely.


Asunto(s)
Bases de Datos de Proteínas , Conformación Proteica , Estructura Secundaria de Proteína , Proteínas/química , Algoritmos , Secuencia de Aminoácidos , Datos de Secuencia Molecular , Redes Neurales de la Computación , Pliegue de Proteína , Alineación de Secuencia , Programas Informáticos , Máquina de Vectores de Soporte
8.
J Theor Biol ; 308: 135-40, 2012 Sep 07.
Artículo en Inglés | MEDLINE | ID: mdl-22683368

RESUMEN

The subcellular localization of proteins is closely related to their functions. In this work, we propose a novel approach based on localization motifs to improve the accuracy of predicting subcellular localization of Gram-positive bacterial proteins. Our approach performed well on a five-fold cross validation with an overall success rate of 89.5%. Besides, the overall success rate of an independent testing dataset was 97.7%. Moreover, our approach was tested using a new experimentally-determined set of Gram-positive bacteria proteins and achieved an overall success rate of 96.3%.


Asunto(s)
Proteínas Bacterianas/química , Proteínas Bacterianas/metabolismo , Bacterias Grampositivas/metabolismo , Secuencias de Aminoácidos , Secuencia de Aminoácidos , Bases de Datos de Proteínas , Modelos Biológicos , Datos de Secuencia Molecular , Transporte de Proteínas , Reproducibilidad de los Resultados , Fracciones Subcelulares/metabolismo
9.
BMC Bioinformatics ; 12: 283, 2011 Jul 13.
Artículo en Inglés | MEDLINE | ID: mdl-21749732

RESUMEN

BACKGROUND: The ß-turn is a secondary protein structure type that plays an important role in protein configuration and function. Development of accurate prediction methods to identify ß-turns in protein sequences is valuable. Several methods for ß-turn prediction have been developed; however, the prediction quality is still a challenge and there is substantial room for improvement. Innovations of the proposed method focus on discovering effective features, and constructing a new architectural model. RESULTS: We utilized predicted secondary structures, predicted shape strings and the position-specific scoring matrix (PSSM) as input features, and proposed a novel two-layer model to enhance the prediction. We achieved the highest values according to four evaluation measures, i.e. Q(total) = 87.2%, MCC = 0.66, Q(observed) = 75.9%, and Q(predicted) = 73.8% on the BT426 dataset. The results show that our proposed two-layer model discriminates better between ß-turns and non-ß-turns than the single model due to obtaining higher Q(predicted). Moreover, the predicted shape strings based on the structural alignment approach greatly improve the performance, and the same improvements were observed on BT547 and BT823 datasets as well. CONCLUSION: In this article, we present a comprehensive method for the prediction of ß-turns. Experiments show that the proposed method constitutes a great improvement over the competing prediction methods.


Asunto(s)
Posición Específica de Matrices de Puntuación , Estructura Secundaria de Proteína , Proteínas/química , Algoritmos , Humanos , Análisis de Secuencia de Proteína
10.
Nucleic Acids Res ; 37(17): 5632-40, 2009 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-19651875

RESUMEN

Sequence-based approach for motif prediction is of great interest and remains a challenge. In this work, we develop a local combinational variable approach for sequence-based helix-turn-helix (HTH) motif prediction. First we choose a sequence data set for 88 proteins of 22 amino acids in length to launch an optimized traversal for extracting local combinational segments (LCS) from the data set. Then after LCS refinement, local combinational variables (LCV) are generated to construct prediction models for HTH motifs. Prediction ability of LCV sets at different thresholds is calculated to settle a moderate threshold. The large data set we used comprises 13 HTH families, with 17 455 sequences in total. Our approach predicts HTH motifs more precisely using only primary protein sequence information, with 93.29% accuracy, 93.93% sensitivity and 92.66% specificity. Prediction results of newly reported HTH-containing proteins compared with other prediction web service presents a good prediction model derived from the LCV approach. Comparisons with profile-HMM models from the Pfam protein families database show that the LCV approach maintains a good balance while dealing with HTH-containing proteins and non-HTH proteins at the same time. The LCV approach is to some extent a complementary to the profile-HMM models for its better identification of false-positive data. Furthermore, genome-wide predictions detect new HTH proteins in both Homo sapiens and Escherichia coli organisms, which enlarge applications of the LCV approach. Software for mining LCVs from sequence data set can be obtained from anonymous ftp site ftp://cheminfo.tongji.edu.cn/LCV/freely.


Asunto(s)
Secuencias Hélice-Giro-Hélice , Análisis de Secuencia de Proteína/métodos , Algoritmos , Proteínas de Unión al ADN/química , Proteínas de Unión al ADN/genética , Proteínas de Escherichia coli/química , Proteínas de Escherichia coli/genética , Genómica , Humanos
11.
BMC Bioinformatics ; 11: 109, 2010 Feb 27.
Artículo en Inglés | MEDLINE | ID: mdl-20187963

RESUMEN

BACKGROUND: Recent advances in proteomics technologies such as SELDI-TOF mass spectrometry has shown promise in the detection of early stage cancers. However, dimensionality reduction and classification are considerable challenges in statistical machine learning. We therefore propose a novel approach for dimensionality reduction and tested it using published high-resolution SELDI-TOF data for ovarian cancer. RESULTS: We propose a method based on statistical moments to reduce feature dimensions. After refining and t-testing, SELDI-TOF data are divided into several intervals. Four statistical moments (mean, variance, skewness and kurtosis) are calculated for each interval and are used as representative variables. The high dimensionality of the data can thus be rapidly reduced. To improve efficiency and classification performance, the data are further used in kernel PLS models. The method achieved average sensitivity of 0.9950, specificity of 0.9916, accuracy of 0.9935 and a correlation coefficient of 0.9869 for 100 five-fold cross validations. Furthermore, only one control was misclassified in leave-one-out cross validation. CONCLUSION: The proposed method is suitable for analyzing high-throughput proteomics data.


Asunto(s)
Neoplasias Ováricas/clasificación , Proteómica/métodos , Espectrometría de Masa por Láser de Matriz Asistida de Ionización Desorción/métodos , Biomarcadores de Tumor/análisis , Femenino , Perfilación de la Expresión Génica , Humanos
12.
Tree Physiol ; 40(10): 1466-1473, 2020 10 07.
Artículo en Inglés | MEDLINE | ID: mdl-32510135

RESUMEN

Accurate measurement of total fine root decomposition (the amount of dead fine roots decomposed per unit soil volume) is essential for constructing a soil carbon budget. However, the ingrowth/soil core-based models are dependent on the assumptions that fine roots in litterbags/intact cores have the same relative decomposition rate as those in intact soils and that fine root growth and death rates remain constant over time, while minirhizotrons cannot quantify the total fine root decomposition. To improve the accuracy of estimates for total fine root decomposition, we propose a new method (balanced hybrid) with two models that integrate measurements of soil coring and minirhizotrons into a mass balance model. Model input parameters were fine root biomass, necromass and turnover rate for Model 1, and fine root biomass, necromass and death rate for Model 2. We tested the balanced hybrid method in a loblolly pine plantation forest in coastal North Carolina, USA. The total decomposition rate of absorptive fine roots (ARs) (a combination of first- and second-order fine roots) using Models 1 and 2 was 107 ± 13 g m-2 year-1 and 129 ± 12 g m-2 year-1, respectively. Monthly total AR decomposition was highest from August to November, which corresponded with the highest monthly total ARs mortality. The ARs imaged by minirhizotrons well represent those growing in intact soils, evident by a significant and positive relationship between the standing biomass and the standing length. The total decomposition estimate in both models was sensitive to changes in fine root biomass, turnover rate and death rate but not to change in necromass. Compared with Model 2, Model 1 can avoid the technical difficulty of deciding dead time of individual fine roots but requires greater time and effort to accurately measure fine root biomass dynamics. The balanced hybrid method is an improved technique for measuring total fine root decomposition in plantation forests in which the estimates are based on empirical data from soil coring and minirhizotrons, moving beyond assumptions of traditional approaches.


Asunto(s)
Suelo , Árboles , Biomasa , Bosques , Raíces de Plantas
13.
Am J Chin Med ; 47(5): 1057-1073, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31327236

RESUMEN

Ginkgo biloba extracts (EGb) alleviate myocardial ischemia/reperfusion (MI/R) injury. However, the underlying mechanisms have not yet been characterized. This study aimed to investigate whether activation of large-conductance Ca2+-activated K+ channels at the inner mitochondrial membrane (mitoBKCa) of cardiomyocytes is involved in Ginkgo biloba extract-mediated cardioprotection. Shuxuening injection (SXNI, 12.5ml/kg/d), a widely prescribed herbal medicine containing Ginkgo biloba extracts in China, or vehicle, was administered to C57BL/6 mice via tail vein injection for one week prior to surgical procedures. The mitoBKCa blocker paxilline (PAX) (1ml/kg, 115 nM) was administered via tail vein injection 30min prior to the onset of ischemia. The mice were randomly divided into the following groups: Sham, MI/R, MI/R+SXNI, and MI/R+SXNI+PAX. MI/R was induced by ligating the left anterior descending coronary artery for 30min with subsequent reperfusion for 24h. SXNI pretreatment conferred cardioprotective effects against MI/R injury as evidenced by reduced infarct size, improved cardiac function, and improved mitochondrial function. However, these effects were abrogated by co-administration with PAX. In addition, activation of mitoBKCa by Ginkgo biloba extract EGb761 reduced hypoxia/reoxygenation (H/R)-induced cardiomyocyte injury in vitro through the inhibition of mitochondrial fragmentation, restoration of the mitochondrial membrane potential, decreased generation of superoxide, and inhibition of apoptosis which is associated with alleviating mitochondrial Ca2+ overload. These results indicated that Ginkgo biloba extracts pretreatment protected against MI/R injury via activation of mitoBKCa.


Asunto(s)
Ginkgo biloba/química , Canales de Potasio de Gran Conductancia Activados por el Calcio/metabolismo , Membranas Mitocondriales/efectos de los fármacos , Daño por Reperfusión Miocárdica/tratamiento farmacológico , Miocitos Cardíacos/efectos de los fármacos , Extractos Vegetales/administración & dosificación , Animales , Apoptosis/efectos de los fármacos , China , Humanos , Canales de Potasio de Gran Conductancia Activados por el Calcio/genética , Masculino , Potencial de la Membrana Mitocondrial/efectos de los fármacos , Ratones , Ratones Endogámicos C57BL , Membranas Mitocondriales/metabolismo , Daño por Reperfusión Miocárdica/genética , Daño por Reperfusión Miocárdica/metabolismo , Daño por Reperfusión Miocárdica/fisiopatología , Miocitos Cardíacos/metabolismo
14.
BioData Min ; 10: 1, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28127402

RESUMEN

BACKGROUND: Protein relative solvent accessibility provides insight into understanding protein structure and function. Prediction of protein relative solvent accessibility is often the first stage of predicting other protein properties. Recent predictors of relative solvent accessibility discriminate against exposed regions as compared with buried regions, resulting in higher prediction accuracy associated with buried regions relative to exposed regions. METHODS: Here, we propose a more accurate and balanced predictor of protein relative solvent accessibility. First, we collected known proteins in three subsets according to sequence length and constructed a balanced dataset after reducing redundancy within each subset. Next, we measured the performance associated with different variables and variable combinations to determine the best variable combination. Finally, a predictor called BMRSA was constructed for modelling and prediction, which used the balanced set as the training set, the position- specific scoring matrix, predicted secondary structure, buried-exposed profile, and length of a query sequence as variables, and the conditional random field as the machine-learning method. RESULTS: BMRSA performance on test sets confirmed that our approach improved prediction accuracy relative to state-of-the-art approaches and was balanced in its comparison of buried and exposed regions. Our method is valuable when higher levels of accuracy in predicting exposed-residue states are required. The BMRSA is available at: http://cheminfo.tongji.edu.cn:8080/BMRSA/.

15.
J Pharm Biomed Anal ; 125: 77-84, 2016 Jun 05.
Artículo en Inglés | MEDLINE | ID: mdl-27010354

RESUMEN

Positive acceleration (+Gz) in the head-to-foot direction generated by modern high-performance fighter jets during flight maneuvers is characterized by high G values and a rapid rate of acceleration, and is often long in duration and a repeated occurrence. The acceleration overload far exceeds the pilot's physiological tolerance limits and causes considerable strain on several organ systems. Despite the importance of monitoring pathophysiological alterations related to +Gz exposure, we lack a complete explanation of the pathophysiology of +Gz exposure. Ginkgo biloba extract (GBE) is a classic traditional Chinese medicine (TCM) that might exert a protective effect against +Gz exposure. However, its mechanism remains unclear. Here, a metabolomics approach based on ultra high-performance liquid chromatography coupled with quadrupole time-of-flight mass spectrometry (UHPLC-Q-TOFMS) was used to characterize +Gz-induced metabolic fluctuations in a rat model and to evaluate the protective effect of GBE. Using partial least-squares discriminant analysis for the classification and selection of biomarkers, eighteen serum metabolites related to +Gz exposure were identified, and were found to primarily involve the fatty acid ß-oxidation pathway, glycerophospholipid metabolism, phospholipid metabolism, bile acid metabolism, purine metabolism and lysine metabolism. Taking these potential biomarkers as screening indexes, we found that GBE could reverse the pathological process of +Gz exposure by partially regulating the perturbed fatty acid ß-oxidation pathway, glycerophospholipid metabolism, purine metabolism and lysine metabolism. This indicates that UHPLC-Q-TOFMS-based metabolomics provides a powerful tool to reveal serum metabolic fluctuations in response to +Gz exposure and to study the mechanism underlying TCM.


Asunto(s)
Cromatografía Líquida de Alta Presión/métodos , Ginkgo biloba/química , Espectrometría de Masas/métodos , Metabolómica , Extractos Vegetales/farmacología , Animales , Biomarcadores/metabolismo , Ratas , Ratas Sprague-Dawley
16.
PLoS One ; 10(6): e0128334, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26090958

RESUMEN

UNLABELLED: The precise prediction of protein intrinsically disordered regions, which play a crucial role in biological procedures, is a necessary prerequisite to further the understanding of the principles and mechanisms of protein function. Here, we propose a novel predictor, DisoMCS, which is a more accurate predictor of protein intrinsically disordered regions. The DisoMCS bases on an original multi-class conservative score (MCS) obtained by sequence-order/disorder alignment. Initially, near-disorder regions are defined on fragments located at both the terminus of an ordered region connecting a disordered region. Then the multi-class conservative score is generated by sequence alignment against a known structure database and represented as order, near-disorder and disorder conservative scores. The MCS of each amino acid has three elements: order, near-disorder and disorder profiles. Finally, the MCS is exploited as features to identify disordered regions in sequences. DisoMCS utilizes a non-redundant data set as the training set, MCS and predicted secondary structure as features, and a conditional random field as the classification algorithm. In predicted near-disorder regions a residue is determined as an order or a disorder according to the optimized decision threshold. DisoMCS was evaluated by cross-validation, large-scale prediction, independent tests and CASP (Critical Assessment of Techniques for Protein Structure Prediction) tests. All results confirmed that DisoMCS was very competitive in terms of accuracy of prediction when compared with well-established publicly available disordered region predictors. It also indicated our approach was more accurate when a query has higher homologous with the knowledge database. AVAILABILITY: The DisoMCS is available at http://cal.tongji.edu.cn/disorder/.


Asunto(s)
Biología Computacional/métodos , Proteínas Intrínsecamente Desordenadas/química , Programas Informáticos , Bases de Datos de Proteínas , Curva ROC , Reproducibilidad de los Resultados , Navegador Web
17.
Biochimie ; 95(12): 2460-4, 2013 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-24056076

RESUMEN

Protein eight-state secondary structure prediction is challenging, but is necessary to determine protein structure and function. Here, we report the development of a novel approach, SPSSM8, to predict eight-state secondary structures of proteins accurately from sequences based on the structural position-specific scoring matrix (SPSSM). The SPSSM has been successfully utilized to predict three-state secondary structures. Now we employ an eight-state SPSSM as a feature that is obtained from sequence structure alignment against a large database of 9 million sequences with putative structural information. The SPSSM8 uses a low sequence identity dataset (9062 entries) as a training set and conditional random field for the classification algorithm. The SPSSM8 achieved an average eight-state secondary structure accuracy (Q8) of 71.7% (Q3, 81.6%) for an independent testing set (463 entries), which had an improved accuracy of 10.1% and 4.6% compared with SSPro8 and CNF, respectively, and significantly improved the accuracy of eight-state secondary structure prediction. For CASP 9 dataset (92 entries) the SPSSM8 achieved a Q8 accuracy of 80.1% (Q3, 83.0%). The SPSSM8 was confirmed as an outstanding predictor for eight-state secondary structures of proteins. SPSSM8 is freely available at http://cal.tongji.edu.cn/SPSSM8.


Asunto(s)
Estructura Secundaria de Proteína , Proteínas/química , Secuencia de Aminoácidos , Bases de Datos de Proteínas , Posición Específica de Matrices de Puntuación , Alineación de Secuencia
18.
PLoS One ; 8(12): e83532, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-24376713

RESUMEN

Shape string is structural sequence and is an extremely important structure representation of protein backbone conformations. Nuclear magnetic resonance chemical shifts give a strong correlation with the local protein structure, and are exploited to predict protein structures in conjunction with computational approaches. Here we demonstrate a novel approach, NMRDSP, which can accurately predict the protein shape string based on nuclear magnetic resonance chemical shifts and structural profiles obtained from sequence data. The NMRDSP uses six chemical shifts (HA, H, N, CA, CB and C) and eight elements of structure profiles as features, a non-redundant set (1,003 entries) as the training set, and a conditional random field as a classification algorithm. For an independent testing set (203 entries), we achieved an accuracy of 75.8% for S8 (the eight states accuracy) and 87.8% for S3 (the three states accuracy). This is higher than only using chemical shifts or sequence data, and confirms that the chemical shift and the structure profile are significant features for shape string prediction and their combination prominently improves the accuracy of the predictor. We have constructed the NMRDSP web server and believe it could be employed to provide a solid platform to predict other protein structures and functions. The NMRDSP web server is freely available at http://cal.tongji.edu.cn/NMRDSP/index.jsp.


Asunto(s)
Biología Computacional/métodos , Proteínas/química , Secuencia de Aminoácidos , Internet , Espectroscopía de Resonancia Magnética , Estructura Secundaria de Proteína
19.
Biochimie ; 95(2): 354-8, 2013 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-23116714

RESUMEN

Protein-DNA interactions are involved in many biological processes essential for gene expression and regulation. To understand the molecular mechanisms of protein-DNA recognition, it is crucial to analyze and identify DNA-binding residues of protein-DNA complexes. Here, we proposed a novel descriptor shape string and another two related features shape string PSSM and shape string pair composition to characterize DNA-binding residues. We employed the new features and the position-specific scoring matrix (PSSM) for modeling and prediction. The results of a benchmark dataset showed that our approach significantly improved the accuracy of the predictor. The overall accuracy of our approach reached 85.86% with 85.02% sensitivity and 86.02% specificity. The results also demonstrated that shape string is a powerful descriptor for the prediction of DNA-binding residues. The additional two related features enhanced the predictive value.


Asunto(s)
Algoritmos , ADN/química , Posición Específica de Matrices de Puntuación , Proteínas/química , Programas Informáticos , Secuencia de Aminoácidos , Sitios de Unión , Bases de Datos de Proteínas , Modelos Moleculares , Datos de Secuencia Molecular , Unión Proteica , Dominios y Motivos de Interacción de Proteínas , Sensibilidad y Especificidad
20.
PLoS One ; 8(4): e60559, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23593247

RESUMEN

MOTIVATION: The precise prediction of protein domains, which are the structural, functional and evolutionary units of proteins, has been a research focus in recent years. Although many methods have been presented for predicting protein domains and boundaries, the accuracy of predictions could be improved. RESULTS: In this study we present a novel approach, DomHR, which is an accurate predictor of protein domain boundaries based on a creative hinge region strategy. A hinge region was defined as a segment of amino acids that covers part of a domain region and a boundary region. We developed a strategy to construct profiles of domain-hinge-boundary (DHB) features generated by sequence-domain/hinge/boundary alignment against a database of known domain structures. The DHB features had three elements: normalized domain, hinge, and boundary probabilities. The DHB features were used as input to identify domain boundaries in a sequence. DomHR used a nonredundant dataset as the training set, the DHB and predicted shape string as features, and a conditional random field as the classification algorithm. In predicted hinge regions, a residue was determined to be a domain or a boundary according to a decision threshold. After decision thresholds were optimized, DomHR was evaluated by cross-validation, large-scale prediction, independent test and CASP (Critical Assessment of Techniques for Protein Structure Prediction) tests. All results confirmed that DomHR outperformed other well-established, publicly available domain boundary predictors for prediction accuracy. AVAILABILITY: The DomHR is available at http://cal.tongji.edu.cn/domain/.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Estructura Terciaria de Proteína/genética , Proteínas/química , Proteómica/métodos , Programas Informáticos , Secuencia de Aminoácidos , Bases de Datos de Proteínas , Datos de Secuencia Molecular
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA