Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 15 de 15
1.
Plant Physiol ; 182(1): 408-423, 2020 01.
Article En | MEDLINE | ID: mdl-31685645

Members of the mitochondrial transcription terminator factor (mTERF) family, originally identified in vertebrate mitochondria, are involved in the termination of organellular transcription. In plants, mTERF proteins are mainly localized in chloroplasts and mitochondria. In Arabidopsis (Arabidopsis thaliana), mTERF8/pTAC15 was identified in the plastid-encoded RNA polymerase (PEP) complex, the major RNA polymerase of chloroplasts. In this work, we demonstrate that mTERF8 is associated with the PEP complex. An mTERF8 knockout line displayed a wild-type-like phenotype under standard growth conditions, but showed impaired efficiency of photosystem II electron flow. Transcription of most chloroplast genes was not substantially affected in the mterf8 mutant; however, the level of the psbJ transcript from the psbEFLJ polycistron was increased. RNA blot analysis showed that a larger transcript accumulates in mterf8 than in the wild type. Thus, abnormal transcription and/or RNA processing occur for the psbEFLJ polycistron. Circular reverse transcription PCR and sequence analysis showed that the psbJ transcript terminates 95 nucleotides downstream of the translation stop codon in the wild type, whereas its termination is aberrant in mterf8 Both electrophoresis mobility shift assays and chloroplast chromatin immunoprecipitation analysis showed that mTERF8 specifically binds to the 3' terminal region of psbJ Transcription analysis using the in vitro T7 RNA polymerase system showed that mTERF8 terminates psbJ transcription. Together, these results suggest that mTERF8 is specifically involved in the transcription termination of the chloroplast gene psbJ.


Arabidopsis Proteins/metabolism , Arabidopsis/metabolism , Chloroplasts/metabolism , Transcription, Genetic/genetics , Arabidopsis/genetics , Arabidopsis Proteins/genetics , Chloroplasts/genetics , Chromatin Immunoprecipitation , DNA-Directed RNA Polymerases/genetics , DNA-Directed RNA Polymerases/metabolism , Electrophoretic Mobility Shift Assay , Protein Binding
2.
Protein Pept Lett ; 20(3): 243-8, 2013 Mar.
Article En | MEDLINE | ID: mdl-22591473

Protein disordered regions are associated with some critical cellular functions such as transcriptional regulation, translation and cellular signal transduction, and they are responsible for various diseases. Although experimental methods have been developed to determine these regions, they are time-consuming and expensive. Therefore, it is highly desired to develop computational methods that can provide us with this kind information in a rapid and inexpensive manner. Here we propose a sequence-based computational approach for predicting protein disordered regions by means of the Nearest Neighbor algorithm, in which conservation, amino acid factor and secondary structure status of each amino acid in a fixed-length sliding window are taken as the encoding features. Also, the feature selection based on mRMR (maximum Relevancy Minimum Redundancy) is applied to obtain an optimal 51-feature set that includes 39 conservation features and 12 secondary structure features. With the optimal 51 features, our predictor yielded quite promising MCC (Mathew's correlation coefficients): 0.371 on a rigorous benchmark dataset tested by 5-fold cross-validation and 0.219 on an independent test dataset. Our results suggest that conservation and secondary structure play important roles in intrinsically disordered proteins.


Amino Acids/chemistry , Protein Structure, Secondary , Proteins/chemistry , Sequence Analysis, Protein , Algorithms , Humans
3.
Protein Pept Lett ; 19(1): 99-107, 2012 Jan.
Article En | MEDLINE | ID: mdl-21919854

Given a compounds-forming system, i.e., a system consisting of some compounds and their relationship, can it form a biologically meaningful pathway? It is a fundamental problem in systems biology. Nowadays, a lot of information on different organisms, at both genetic and metabolic levels, has been collected and stored in some specific databases. Based on these data, it is feasible to address such an essential problem. Metabolic pathway is one kind of compounds-forming systems and we analyzed them in yeast by extracting different (biological and graphic) features from each of the 13,736 compounds-forming systems, of which 136 are positive pathways, i.e., known metabolic pathway from KEGG; while 13,600 were negative. Each of these compounds-forming systems was represented by 144 features, of which 88 are graph features and 56 biological features. "Minimum Redundancy Maximum Relevance" and "Incremental Feature Selection" were utilized to analyze these features and 16 optimal features were selected as being able to predict a query compounds- forming system most successfully. It was found through Jackknife cross-validation that the overall success rate of identifying the positive pathways was 74.26%. It is anticipated that this novel approach and encouraging result may give meaningful illumination to investigate this important topic.


Algorithms , Metabolic Networks and Pathways , Saccharomyces cerevisiae/metabolism , Databases, Factual , Predictive Value of Tests , Saccharomyces cerevisiae/chemistry , Systems Biology
4.
Protein Pept Lett ; 19(1): 70-8, 2012 Jan.
Article En | MEDLINE | ID: mdl-21919857

Phosphorylation is one of the most important post-translational modifications, and the identification of protein phosphorylation sites is particularly important for studying disease diagnosis. However, experimental detection of phosphorylation sites is labor intensive. It would be beneficial if computational methods are available to provide an extra reference for the phosphorylation sites. Here we developed a novel sequence-based method for serine, threonine, and tyrosine phosphorylation site prediction. Nearest Neighbor algorithm was employed as the prediction engine. The peptides around the phosphorylation sites with a fixed length of thirteen amino acid residues were extracted via a sliding window along the protein chains concerned. Each of such peptides was coded into a vector with 6,072 features, derived from Amino Acid Index (AAIndex) database, for the classification/detection. Incremental Feature Selection, a feature selection algorithm based on the Maximum Relevancy Minimum Redundancy (mRMR) method was used to select a compact feature set for a further improvement of the classification performance. Three predictors were established for identifying the three types of phosphorylation sites, achieving the overall accuracies of 66.64%, 66.11%% and 66.69%, respectively. These rates were obtained by rigorous jackknife cross-validation tests.


Peptides/chemistry , Phosphoproteins/chemistry , Sequence Analysis, Protein/methods , Support Vector Machine , Binding Sites , Computational Biology , Data Mining , Databases, Protein , Peptides/metabolism , Phosphoproteins/metabolism , Phosphorylation , Predictive Value of Tests , Protein Processing, Post-Translational , Serine/metabolism , Threonine/metabolism , Tyrosine/metabolism
5.
Protein Pept Lett ; 19(1): 91-8, 2012 Jan.
Article En | MEDLINE | ID: mdl-21919855

It is of great use to find out and clear up the interactions between enzymes and small molecules, for understanding the molecular and cellular functions of organisms. In this study, we developed a novel method for the prediction of enzyme-small molecules interactions based on machine learning approach. The biochemical and physicochemical description of proteins and the functional group composition of small molecules are used for representing enzyme-small molecules pairs. Tested by jackknife cross-validation, our predictor achieved an overall accuracy of 87.47%, showing an acceptable efficiency. The 39 features selected by feature selection were analyzed for further understanding of enzyme-small molecule interactions.


Algorithms , Proteins/chemistry , Sequence Analysis, Protein/methods , Small Molecule Libraries/chemistry , Software , Support Vector Machine , Amino Acid Sequence , Computational Biology , Databases, Protein , Hydrophobic and Hydrophilic Interactions , Molecular Sequence Data , Predictive Value of Tests , Protein Binding , Proteins/metabolism , Small Molecule Libraries/metabolism
6.
Protein Pept Lett ; 19(1): 15-22, 2012 Jan.
Article En | MEDLINE | ID: mdl-21919864

It is well known that protein subcellular localizations are closely related to their functions. Although many computational methods and tools are available from Internet, it is still necessary to develop new algorithms in this filed to gain a better understanding of the complex mechanism of plant subcellular localization. Here, we provide a new web server named PSCL for plant protein subcellular localization prediction by employing optimized functional domains. After feature optimization, 848 optimal functional domains from InterPro were obtained to represent each protein. By calculating the distances to each of the seven categories, PSCL showing the possibilities of a protein located into each of those categories in ascending order. Toward our dataset, PSCL achieved a first-order predicted accuracy of 75.7% by jackknife test. Gene Ontology enrichment analysis showing that catalytic activity, cellular process and metabolic process are strongly correlated with the localization of plant proteins. Finally, PSCL, a Linux Operate System based web interface for the predictor was designed and is accessible for public use at http://pscl.biosino.org/.


Plant Cells/chemistry , Plant Proteins/chemistry , Plants/chemistry , Software , Subcellular Fractions/chemistry , Algorithms , Biological Evolution , Computational Biology , Databases, Protein , Phylogeny , Plant Cells/physiology , Plant Proteins/genetics , Protein Structure, Tertiary
7.
Biopolymers ; 95(11): 763-71, 2011 Nov.
Article En | MEDLINE | ID: mdl-21544797

Protein methylation, one of the most important post-translational modifications, typically takes place on arginine or lysine residue. The reversible modification involves a series of basic cellular processes. Identification of methyl proteins with their sites will facilitate the understanding of the molecular mechanism of methylation. Besides the experimental methods, computational predictions of methylated sites are much more desirable for their convenience and fast speed. Here, we propose a method dedicated to predicting methylated sites of proteins. Feature selection was made on sequence conservation, physicochemical/biochemical properties, and structural disorder by applying maximum relevance minimum redundancy and incremental feature selection methods. The prediction models were built according to nearest the neighbor algorithm and evaluated by the jackknife cross-validation. We built 11 and 9 predictors for methylarginine and methyllysine, respectively, and integrated them to predict methylated sites. As a result, the average prediction accuracies are 74.25%, 77.02% for methylarginine and methyllysine training sets, respectively. Feature analysis suggested evolutionary information, and physicochemical/biochemical properties play important roles in the recognition of methylated sites. These findings may provide valuable information for exploiting the mechanisms of methylation. Our method may serve as a useful tool for biologists to find the potential methylated sites of proteins.


Arginine/chemistry , Lysine/chemistry , Methylation , Models, Biological
8.
Biochimie ; 93(3): 489-96, 2011 Mar.
Article En | MEDLINE | ID: mdl-21075167

Palmitoylation is a universal and important lipid modification, involving a series of basic cellular processes, such as membrane trafficking, protein stability and protein aggregation. With the avalanche of new protein sequences generated in the post genomic era, it is highly desirable to develop computational methods for rapidly and effectively identifying the potential palmitoylation sites of uncharacterized proteins so as to timely provide useful information for revealing the mechanism of protein palmitoylation. By using the Incremental Feature Selection approach based on amino acid factors, conservation, disorder feature, and specific features of palmitoylation site, a new predictor named IFS-Palm was developed in this regard. The overall success rate thus achieved by jackknife test on a newly constructed benchmark dataset was 90.65%. It was shown via an in-depth analysis that palmitoylation was intimately correlated with the feature of the upstream residue directly adjacent to cysteine site as well as the conservation of amino acid cysteine. Meanwhile, the protein disorder region might also play an import role in the post-translational modification. These findings may provide useful insights for revealing the mechanisms of palmitoylation.


Computational Biology/methods , Lipoylation , Proteins/chemistry , Proteins/metabolism , Algorithms , Amino Acid Sequence , Binding Sites , Databases, Protein , Reproducibility of Results , Saccharomycetales/metabolism
9.
Molecules ; 15(11): 8177-92, 2010 Nov 12.
Article En | MEDLINE | ID: mdl-21076385

Given a protein-forming system, i.e., a system consisting of certain number of different proteins, can it form a biologically meaningful pathway? This is a fundamental problem in systems biology and proteomics. During the past decade, a vast amount of information on different organisms, at both the genetic and metabolic levels, has been accumulated and systematically stored in various specific databases, such as KEGG, ENZYME, BRENDA, EcoCyc and MetaCyc. These data have made it feasible to address such an essential problem. In this paper, we have analyzed known regulatory pathways in humans by extracting different (biological and graphic) features from each of the 17,069 protein-formed systems, of which 169 are positive pathways, i.e., known regulatory pathways taken from KEGG; while 16,900 were negative, i.e., not formed as a biologically meaningful pathway. Each of these protein-forming systems was represented by 352 features, of which 88 are graph features and 264 biological features. To analyze these features, the "Minimum Redundancy Maximum Relevance" and the "Incremental Feature Selection" techniques were utilized to select a set of 22 optimal features to query whether a protein-forming system is able to form a biologically meaningful pathway or not. It was found through cross-validation that the overall success rate thus obtained in identifying the positive pathways was 79.88%. It is anticipated that, this novel approach and encouraging result, although preliminary yet, may stimulate extensive investigations into this important topic.


Proteins/metabolism , Signal Transduction/physiology , Animals , Databases, Genetic , Humans , Proteins/genetics , Proteomics/methods , Signal Transduction/genetics , Systems Biology/methods
10.
PLoS One ; 5(6): e10972, 2010 Jun 04.
Article En | MEDLINE | ID: mdl-20532046

The metabolic stability is a very important idiosyncracy of proteins that is related to their global flexibility, intramolecular fluctuations, various internal dynamic processes, as well as many marvelous biological functions. Determination of protein's metabolic stability would provide us with useful information for in-depth understanding of the dynamic action mechanisms of proteins. Although several experimental methods have been developed to measure protein's metabolic stability, they are time-consuming and more expensive. Reported in this paper is a computational method, which is featured by (1) integrating various properties of proteins, such as biochemical and physicochemical properties, subcellular locations, network properties and protein complex property, (2) using the mRMR (Maximum Relevance & Minimum Redundancy) principle and the IFS (Incremental Feature Selection) procedure to optimize the prediction engine, and (3) being able to identify proteins among the four types: "short", "medium", "long", and "extra-long" half-life spans. It was revealed through our analysis that the following seven characters played major roles in determining the stability of proteins: (1) KEGG enrichment scores of the protein and its neighbors in network, (2) subcellular locations, (3) polarity, (4) amino acids composition, (5) hydrophobicity, (6) secondary structure propensity, and (7) the number of protein complexes the protein involved. It was observed that there was an intriguing correlation between the predicted metabolic stability of some proteins and the real half-life of the drugs designed to target them. These findings might provide useful insights for designing protein-stability-relevant drugs. The computational method can also be used as a large-scale tool for annotating the metabolic stability for the avalanche of protein sequences generated in the post-genomic age.


Proteins/metabolism , Subcellular Fractions/metabolism , Amino Acid Sequence , Molecular Sequence Data , Proteins/chemistry
11.
Protein Pept Lett ; 17(7): 899-908, 2010 Jul.
Article En | MEDLINE | ID: mdl-20394581

The transcription factor (TF) is a protein that binds DNA at specific site to help regulate the transcription from DNA to RNA. The mechanism of transcriptional regulatory can be much better understood if the category of transcription factors is known. We introduce a system which can automatically categorize transcription factors using their primary structures. A feature analysis strategy called "mRMR" (Minimum Redundancy, Maximum Relevance) is used to analyze the contribution of the TF properties towards the TF classification. mRMR is coupled with forward feature selection to choose an optimized feature subset for the classification. TF properties are composed of the amino acid composition and the physiochemical characters of the proteins. These properties will generate over a hundred features/parameters. We put all the features/parameters into a classifier, called NNA (nearest neighbor algorithm), for the classification. The classification accuracy is 93.81%, evaluated by a Jackknife test. Feature analysis using mRMR algorithm shows that secondary structure, amino acid composition and hydrophobicity are the most relevant features for classification. A free online classifier is available at http://app3.biosino.org/132dvc/tf/.


Algorithms , Amino Acid Sequence , Pattern Recognition, Automated/methods , Transcription Factors , Amino Acids/chemistry , Cysteine/chemistry , Hydrophobic and Hydrophilic Interactions , Molecular Sequence Data , Software , Transcription Factors/chemistry , Transcription Factors/classification , Tryptophan/chemistry
12.
PLoS One ; 5(3): e9603, 2010 Mar 11.
Article En | MEDLINE | ID: mdl-20300175

BACKGROUND: Study of drug-target interaction networks is an important topic for drug development. It is both time-consuming and costly to determine compound-protein interactions or potential drug-target interactions by experiments alone. As a complement, the in silico prediction methods can provide us with very useful information in a timely manner. METHODS/PRINCIPAL FINDINGS: To realize this, drug compounds are encoded with functional groups and proteins encoded by biological features including biochemical and physicochemical properties. The optimal feature selection procedures are adopted by means of the mRMR (Maximum Relevance Minimum Redundancy) method. Instead of classifying the proteins as a whole family, target proteins are divided into four groups: enzymes, ion channels, G-protein- coupled receptors and nuclear receptors. Thus, four independent predictors are established using the Nearest Neighbor algorithm as their operation engine, with each to predict the interactions between drugs and one of the four protein groups. As a result, the overall success rates by the jackknife cross-validation tests achieved with the four predictors are 85.48%, 80.78%, 78.49%, and 85.66%, respectively. CONCLUSION/SIGNIFICANCE: Our results indicate that the network prediction system thus established is quite promising and encouraging.


Pharmaceutical Preparations/chemistry , Technology, Pharmaceutical/methods , Algorithms , Binding Sites , Computational Biology/methods , Humans , Models, Statistical , Protein Conformation , Protein Structure, Secondary , Proteins/chemistry , Receptors, G-Protein-Coupled/metabolism
13.
PLoS One ; 5(12): e15917, 2010 Dec 31.
Article En | MEDLINE | ID: mdl-21209839

BACKGROUND: Hydroxylation is an important post-translational modification and closely related to various diseases. Besides the biotechnology experiments, in silico prediction methods are alternative ways to identify the potential hydroxylation sites. METHODOLOGY/PRINCIPAL FINDINGS: In this study, we developed a novel sequence-based method for identifying the two main types of hydroxylation sites--hydroxyproline and hydroxylysine. First, feature selection was made on three kinds of features consisting of amino acid indices (AAindex) which includes various physicochemical properties and biochemical properties of amino acids, Position-Specific Scoring Matrices (PSSM) which represent evolution information of amino acids and structural disorder of amino acids in the sliding window with length of 13 amino acids, then the prediction model were built using incremental feature selection method. As a result, the prediction accuracies are 76.0% and 82.1%, evaluated by jackknife cross-validation on the hydroxyproline dataset and hydroxylysine dataset, respectively. Feature analysis suggested that physicochemical properties and biochemical properties and evolution information of amino acids contribute much to the identification of the protein hydroxylation sites, while structural disorder had little relation to protein hydroxylation. It was also found that the amino acid adjacent to the hydroxylation site tends to exert more influence than other sites on hydroxylation determination. CONCLUSIONS/SIGNIFICANCE: These findings may provide useful insights for exploiting the mechanisms of hydroxylation.


Computational Biology/methods , Hydroxylysine/chemistry , Hydroxyproline/chemistry , Algorithms , Amino Acids/chemistry , Binding Sites , Biochemistry/methods , Computational Biology/instrumentation , Hydroxylation , Hydroxylysine/metabolism , Hydroxyproline/metabolism , Models, Statistical , Models, Theoretical , Peptides/chemistry , Position-Specific Scoring Matrices , Protein Conformation , Reproducibility of Results
14.
Mol Divers ; 14(1): 81-6, 2010 Feb.
Article En | MEDLINE | ID: mdl-19472067

Protein sumoylation is one of the most important post-translational modifications. Accurate prediction of sumoylation sites is very useful for the analysis of proteome. Though the putative motif Psi K XE can be used, optimization of prediction models still remains a challenge. In this study, we developed a prediction system based on feature selection strategy. A total of 1,272 peptides with 14 residues from SUMOsp (Xue et al. [8] Nucleic Acids Res 34:W254-W257, 2006) were investigated in this study, including 212 substrates and 1,060 non-substrates. Among the substrates, only 162 substrates comply to the motif Psi K XE. First, 1,272 substrates were divided into training set and test set. All the substrates were encoded into feature vectors by hundreds of amino acid properties collected by Amino Acid Index Database (AAIndex, http://www.genome.jp/aaindex ). Then, mRMR (minimum redundancy-maximum relevance) method was applied to extract the most informative features. Finally, Nearest Neighbor Algorithm (NNA) was used to produce the prediction models. Tested by Leave-one-out (LOO) cross-validation, the optimal prediction model reaches the accuracy of 84.4% for the training set and 76.4% for the test set. Especially, 180 substrates were correctly predicted, which was 18 more than using the motif Psi K XE. The final selected features indicate that amino acid residues with two-residue downstream and one-residue upstream of the sumoylation sites play the most important role in determining the occurrence of sumoylation. Based on the feature selection strategy, our prediction system can not only be used for high throughput prediction of sumoylation sites but also as a tool to investigate the mechanism of sumoylation.


Databases, Protein , Models, Chemical , Models, Statistical , Protein Processing, Post-Translational , Small Ubiquitin-Related Modifier Proteins/chemistry , Algorithms , Amino Acid Motifs , Computational Biology , Models, Molecular , Reproducibility of Results , Small Ubiquitin-Related Modifier Proteins/metabolism
15.
Aging Ment Health ; 12(3): 343-8, 2008 May.
Article En | MEDLINE | ID: mdl-18728947

OBJECTIVES: To investigate the prevalence of depressive symptoms in patients with silicosis and its determinants. METHODS: A cross-sectional cohort study was performed. About 121 patients with silicosis randomly selected from a case registry of a non-ferrous metal company and 110 controls completed the questionnaires of sociodemographic variables, Beck depression inventory (BDI) and lung function test. chi(2) test was performed to compare the prevalence of depressive symptoms between two groups. The relationship between the variables and depressive symptoms in patients with silicosis was assessed by performing the logistic regression analysis. RESULTS: Prevalence of depressive symptoms in patients with silicosis was 27.3%, which was higher than the figure 7.3% in controls (chi(2)=15.8, p<001). Severe respiratory symptoms, severe impaired physical function, FEV(1) <50% predicted and (FVC)% predicted less than mean were significantly associated with the depressive symptoms (Odds ratio [OR]=4.6, 5.9, 3.0 and 5.2, respectively). CONCLUSION: High prevalence of depressive symptoms was found in patients with silicosis. Respiratory symptoms, physical function and pulmonary functions were associated with depressive symptoms. Our findings provide evidence for physicians to screen for depressive symptoms in patients with silicosis.


Asian People/statistics & numerical data , Depression/epidemiology , Silicosis/psychology , Age Factors , Aged , Asian People/psychology , China/epidemiology , Cohort Studies , Comorbidity , Control Groups , Cross-Sectional Studies , Depression/diagnosis , Humans , Male , Middle Aged , Personality Inventory/statistics & numerical data , Prevalence , Regression Analysis , Respiratory Function Tests/statistics & numerical data , Severity of Illness Index , Silicosis/diagnosis , Silicosis/epidemiology , Surveys and Questionnaires
...