Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 23
Filtrar
Más filtros













Base de datos
Intervalo de año de publicación
1.
IEEE Trans Cybern ; 52(8): 7704-7718, 2022 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-33523821

RESUMEN

Cross-manifold clustering is an extreme challenge learning problem. Since the low-density hypothesis is not satisfied in cross-manifold problems, many traditional clustering methods failed to discover the cross-manifold structures. In this article, we propose multiple flat projections clustering (MFPC) for cross-manifold clustering. In our MFPC, the given samples are projected into multiple localized flats to discover the global structures of implicit manifolds. Thus, the intersected clusters are distinguished in various projection flats. In MFPC, a series of nonconvex matrix optimization problems is solved by a proposed recursive algorithm. Furthermore, a nonlinear version of MFPC is extended via kernel tricks to deal with a more complex cross-manifold learning situation. The synthetic tests show that our MFPC works on the cross-manifold structures well. Moreover, experimental results on the benchmark datasets and object tracking videos show excellent performance of our MFPC compared with some state-of-the-art manifold clustering methods.

2.
Neural Netw ; 142: 73-91, 2021 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-33984737

RESUMEN

Recent advances show that two-dimensional linear discriminant analysis (2DLDA) is a successful matrix based dimensionality reduction method. However, 2DLDA may encounter the singularity issue theoretically, and also is sensitive to outliers. In this paper, a generalized Lp-norm 2DLDA framework with regularization for an arbitrary p>0 is proposed, named G2DLDA. There are mainly two contributions of G2DLDA: one is G2DLDA model uses an arbitrary Lp-norm to measure the between-class and within-class scatter, and hence a proper p can be selected to achieve robustness. The other one is that the introduced regularization term makes G2DLDA enjoy better generalization performance and avoid singularity. In addition, an effective learning algorithm is designed for G2LDA, which can be solved through a series of convex problems with closed-form solutions. Its convergence can be guaranteed theoretically when 1≤p≤2. Preliminary experimental results on three contaminated human face databases show the effectiveness of the proposed G2DLDA.


Asunto(s)
Algoritmos , Cara , Bases de Datos Factuales , Análisis Discriminante , Generalización Psicológica , Humanos
3.
Bioinformatics ; 32(2): 226-34, 2016 Jan 15.
Artículo en Inglés | MEDLINE | ID: mdl-26415726

RESUMEN

MOTIVATION: With the booming of interactome studies, a lot of interactions can be measured in a high throughput way and large scale datasets are available. It is becoming apparent that many different types of interactions can be potential drug targets. Compared with inhibition of a single protein, inhibition of protein-protein interaction (PPI) is promising to improve the specificity with fewer adverse side-effects. Also it greatly broadens the drug target search space, which makes the drug target discovery difficult. Computational methods are highly desired to efficiently provide candidates for further experiments and hold the promise to greatly accelerate the discovery of novel drug targets. RESULTS: Here, we propose a machine learning method to predict PPI targets in a genomic-wide scale. Specifically, we develop a computational method, named as PrePPItar, to Predict PPIs as drug targets by uncovering the potential associations between drugs and PPIs. First, we survey the databases and manually construct a gold-standard positive dataset for drug and PPI interactions. This effort leads to a dataset with 227 associations among 63 PPIs and 113 FDA-approved drugs and allows us to build models to learn the association rules from the data. Second, we characterize drugs by profiling in chemical structure, drug ATC-code annotation, and side-effect space and represent PPI similarity by a symmetrical S-kernel based on protein amino acid sequence. Then the drugs and PPIs are correlated by Kronecker product kernel. Finally, a support vector machine (SVM), is trained to predict novel associations between drugs and PPIs. We validate our PrePPItar method on the well-established gold-standard dataset by cross-validation. We find that all chemical structure, drug ATC-code, and side-effect information are predictive for PPI target. Moreover, we can increase the PPI target prediction coverage by integrating multiple data sources. Follow-up database search and pathway analysis indicate that our new predictions are worthy of future experimental validation. CONCLUSION: In conclusion, PrePPItar can serve as a useful tool for PPI target discovery and provides a general heterogeneous data integrative framework. AVAILABILITY AND IMPLEMENTATION: PrePPItar is available at http://doc.aporc.org/wiki/PrePPItar. CONTACT: ycwang@nwipb.cas.cn or ywang@amss.ac.cn SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Descubrimiento de Drogas/métodos , Mapeo de Interacción de Proteínas , Máquina de Vectores de Soporte , Algoritmos , Humanos , Preparaciones Farmacéuticas/química , Proteínas/química , Proteínas/efectos de los fármacos , Análisis de Secuencia de Proteína
4.
Gene ; 576(1 Pt 1): 99-104, 2016 Jan 15.
Artículo en Inglés | MEDLINE | ID: mdl-26432000

RESUMEN

Sumoylation is a multifunctional post-translation modification (PTM) in proteins by the small ubiquitin-related modifiers (SUMOs), which have relations to ubiquitin in molecular structure. Sumoylation has been found to be involved in some cellular processes. It is very significant to identify the exact sumoylation sites in proteins for not only basic researches but also drug developments. Comparing with time exhausting experiment methods, it is highly desired to develop computational methods for prediction of sumoylation sites as a complement to experiment in the post-genomic age. In this work, three feature constructions (AAIndex, position-specific amino acid propensity and modification of composition of k-space amino acid pairs) and five different combinations of them were used to construct features. At last, 178 features were selected as the optimal features according to the Mathew's correlation coefficient values in 10-fold cross validation based on linear discriminant analysis. In 10-fold cross-validation on the benchmark dataset, the accuracy and Mathew's correlation coefficient were 86.92% and 0.6845. Comparing with those existing predictors, SUMO_LDA showed its better performance.


Asunto(s)
Bases de Datos de Proteínas , Proteínas/genética , Análisis de Secuencia de Proteína/métodos , Sumoilación/fisiología , Proteínas/metabolismo
5.
Sci Rep ; 5: 10184, 2015 Jun 18.
Artículo en Inglés | MEDLINE | ID: mdl-26084794

RESUMEN

Lysine succinylation in protein is one type of post-translational modifications (PTMs). Succinylation is associated with some diseases and succinylated sites data just has been found in recent years in experiments. It is highly desired to develop computational methods to identify the candidate proteins and their sites. In view of this, a new predictor called iSuc-PseAAC was proposed by incorporating the peptide position-specific propensity into the general form of pseudo amino acid composition. The accuracy is 79.94%, sensitivity 51.07%, specificity 89.42% and MCC 0.431 in leave-one-out cross validation with support vector machine algorithm. It demonstrated by rigorous leave-one-out on stringent benchmark dataset that the new predictor is quite promising and may become a useful high throughput tool in this area. Meanwhile a user-friendly web-server for iSuc-PseAAC is accessible at http://app.aporc.org/iSuc-PseAAC/. Users can easily obtain their desired results without the need to understand the complicated mathematical equations presented in this paper just for its integrity.


Asunto(s)
Simulación por Computador , Modelos Biológicos , Péptidos/metabolismo , Procesamiento Proteico-Postraduccional/fisiología , Ácido Succínico/metabolismo
6.
J Theor Biol ; 379: 10-5, 2015 Aug 21.
Artículo en Inglés | MEDLINE | ID: mdl-25913879

RESUMEN

Large-scale characterization of post-translational modifications (PTMs), such as posphorylation, acetylation and ubiquitination, has highlighted their importance in the regulation of a myriad of signaling events. However, as another type of PTMs-lysine phosphoglycerylation, the data of phosphoglycerylated sites has just been manually experimented in recent years. Given an uncharacterized protein sequence that contains many lysine residues, which one of them can be phosphoglycerylated and which one not? This is a challenging problem. In view of this, establishing a useful computational method and developing an efficient predictor are highly desired. Here a new predictor named Phogly-PseAAC was developed which incorporated with the position specific amino acid propensity. The feature importance through F-score value has also been ranked. The predictor with the best feature set obtained the accuracy 75.10%, sensitivity 68.87%, specificity 75.57% and MCC 0.2538 in LOO test cross validation with center nearest neighbor algorithm. Meanwhile, a web-server for Phogly-PseAAC is accessible at http://app.aporc.org/Phogly-PseAAC/. For the convenience of most experimental scientists, we have further provided a brief instruction for the web-server, by which users can easily get their desired results without the need to follow the complicated mathematics presented in this paper. It is anticipated that Phogly-PseAAC may become a useful high throughput tool for identifying the lysine phosphoglycerylation sites.


Asunto(s)
Fosfoproteínas/genética , Procesamiento Proteico-Postraduccional , Análisis de Secuencia de Proteína/métodos , Lisina , Fosfoproteínas/metabolismo , Fosforilación/fisiología
7.
Neural Netw ; 65: 92-104, 2015 May.
Artículo en Inglés | MEDLINE | ID: mdl-25721558

RESUMEN

In this paper, we propose an L1-norm two-dimensional linear discriminant analysis (L1-2DLDA) with robust performance. Different from the conventional two-dimensional linear discriminant analysis with L2-norm (L2-2DLDA), where the optimization problem is transferred to a generalized eigenvalue problem, the optimization problem in our L1-2DLDA is solved by a simple justifiable iterative technique, and its convergence is guaranteed. Compared with L2-2DLDA, our L1-2DLDA is more robust to outliers and noises since the L1-norm is used. This is supported by our preliminary experiments on toy example and face datasets, which show the improvement of our L1-2DLDA over L2-2DLDA.


Asunto(s)
Algoritmos , Identificación Biométrica/métodos , Análisis Discriminante , Cara
8.
IEEE Trans Neural Netw Learn Syst ; 26(10): 2583-8, 2015 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-25576578

RESUMEN

The twin support vector machine (TWSVM) is one of the powerful classification methods. In this brief, a TWSVM-type clustering method, called twin support vector clustering (TWSVC), is proposed. Our TWSVC includes both linear and nonlinear versions. It determines k cluster center planes by solving a series of quadratic programming problems. To make TWSVC more efficient and stable, an initialization algorithm based on the nearest neighbor graph is also suggested. The experimental results on several benchmark data sets have shown a comparable performance of our TWSVC.

9.
PLoS One ; 9(8): e105018, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25121969

RESUMEN

Nitrotyrosine is one of the post-translational modifications (PTMs) in proteins that occurs when their tyrosine residue is nitrated. Compared with healthy people, a remarkably increased level of nitrotyrosine is detected in those suffering from rheumatoid arthritis, septic shock, and coeliac disease. Given an uncharacterized protein sequence that contains many tyrosine residues, which one of them can be nitrated and which one cannot? This is a challenging problem, not only directly related to in-depth understanding the PTM's mechanism but also to the nitrotyrosine-based drug development. Particularly, with the avalanche of protein sequences generated in the postgenomic age, it is highly desired to develop a high throughput tool in this regard. Here, a new predictor called "iNitro-Tyr" was developed by incorporating the position-specific dipeptide propensity into the general pseudo amino acid composition for discriminating the nitrotyrosine sites from non-nitrotyrosine sites in proteins. It was demonstrated via the rigorous jackknife tests that the new predictor not only can yield higher success rate but also is much more stable and less noisy. A web-server for iNitro-Tyr is accessible to the public at http://app.aporc.org/iNitro-Tyr/. For the convenience of most experimental scientists, we have further provided a protocol of step-by-step guide, by which users can easily get their desired results without the need to follow the complicated mathematics that were presented in this paper just for the integrity of its development process. It has not escaped our notice that the approach presented here can be also used to deal with the other PTM sites in proteins.


Asunto(s)
Aminoácidos/análisis , Proteínas/química , Tirosina/análogos & derivados , Secuencia de Aminoácidos , Internet , Datos de Secuencia Molecular , Tirosina/química
10.
Int J Mol Sci ; 15(5): 7594-610, 2014 May 05.
Artículo en Inglés | MEDLINE | ID: mdl-24857907

RESUMEN

Post-translational modifications (PTMs) play crucial roles in various cell functions and biological processes. Protein hydroxylation is one type of PTM that usually occurs at the sites of proline and lysine. Given an uncharacterized protein sequence, which site of its Pro (or Lys) can be hydroxylated and which site cannot? This is a challenging problem, not only for in-depth understanding of the hydroxylation mechanism, but also for drug development, because protein hydroxylation is closely relevant to major diseases, such as stomach and lung cancers. With the avalanche of protein sequences generated in the post-genomic age, it is highly desired to develop computational methods to address this problem. In view of this, a new predictor called "iHyd-PseAAC" (identify hydroxylation by pseudo amino acid composition) was proposed by incorporating the dipeptide position-specific propensity into the general form of pseudo amino acid composition. It was demonstrated by rigorous cross-validation tests on stringent benchmark datasets that the new predictor is quite promising and may become a useful high throughput tool in this area. A user-friendly web-server for iHyd-PseAAC is accessible at http://app.aporc.org/iHyd-PseAAC/. Furthermore, for the convenience of the majority of experimental scientists, a step-by-step guide on how to use the web-server is given. Users can easily obtain their desired results by following these steps without the need of understanding the complicated mathematical equations presented in this paper just for its integrity.


Asunto(s)
Algoritmos , Dipéptidos/química , Hidroxilisina/química , Hidroxiprolina/química , Proteínas/química , Aminoácidos/química , Bases de Datos de Proteínas , Internet , Procesamiento Proteico-Postraduccional , Interfaz Usuario-Computador
11.
PeerJ ; 1: e171, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-24109555

RESUMEN

As one of the most important and universal posttranslational modifications (PTMs) of proteins, S-nitrosylation (SNO) plays crucial roles in a variety of biological processes, including the regulation of cellular dynamics and many signaling events. Knowledge of SNO sites in proteins is very useful for drug development and basic research as well. Unfortunately, it is both time-consuming and costly to determine the SNO sites purely based on biological experiments. Facing the explosive protein sequence data generated in the post-genomic era, we are challenged to develop automated vehicles for timely and effectively determining the SNO sites for uncharacterized proteins. To address the challenge, a new predictor called iSNO-AAPair was developed by taking into account the coupling effects for all the pairs formed by the nearest residues and the pairs by the next nearest residues along protein chains. The cross-validation results on a state-of-the-art benchmark have shown that the new predictor outperformed the existing predictors. The same was true when tested by the independent proteins whose experimental SNO sites were known. A user-friendly web-server for iSNO-AAPair was established at http://app.aporc.org/iSNO-AAPair/, by which users can easily obtain their desired results without the need to follow the mathematical equations involved during its development.

12.
Bioinformatics ; 29(10): 1317-24, 2013 May 15.
Artículo en Inglés | MEDLINE | ID: mdl-23564845

RESUMEN

MOTIVATION: Discovering drug's Anatomical Therapeutic Chemical (ATC) classification rules at molecular level is of vital importance to understand a vast majority of drugs action. However, few studies attempt to annotate drug's potential ATC-codes by computational approaches. RESULTS: Here, we introduce drug-target network to computationally predict drug's ATC-codes and propose a novel method named NetPredATC. Starting from the assumption that drugs with similar chemical structures or target proteins share common ATC-codes, our method, NetPredATC, aims to assign drug's potential ATC-codes by integrating chemical structures and target proteins. Specifically, we first construct a gold-standard positive dataset from drugs' ATC-code annotation databases. Then we characterize ATC-code and drug by their similarity profiles and define kernel function to correlate them. Finally, we use a kernel method, support vector machine, to automatically predict drug's ATC-codes. Our method was validated on four drug datasets with various target proteins, including enzymes, ion channels, G-protein couple receptors and nuclear receptors. We found that both drug's chemical structure and target protein are predictive, and target protein information has better accuracy. Further integrating these two data sources revealed more experimentally validated ATC-codes for drugs. We extensively compared our NetPredATC with SuperPred, which is a chemical similarity-only based method. Experimental results showed that our NetPredATC outperforms SuperPred not only in predictive coverage but also in accuracy. In addition, database search and functional annotation analysis support that our novel predictions are worthy of future experimental validation. CONCLUSION: In conclusion, our new method, NetPredATC, can predict drug's ATC-codes more accurately by incorporating drug-target network and integrating data, which will promote drug mechanism understanding and drug repositioning and discovery. AVAILABILITY: NetPredATC is available at http://doc.aporc.org/wiki/NetPredATC. CONTACT: ycwang@nwipb.cas.cn or ywang@amss.ac.cn SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Bases de Datos Farmacéuticas , Sistemas de Liberación de Medicamentos , Preparaciones Farmacéuticas/química , Programas Informáticos
13.
Protein Pept Lett ; 20(1): 71-7, 2013 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-22789108

RESUMEN

Protein methylation is an important and reversible post-translational modification which regulates diverse protein properties. Many methylation sites on arginine and lysine have been identification through experiments. However, experimental identification without prior knowledge is laborious and costly. Hence, there is interest in the development of computational methods for reliable prediction of methylation sites. Prediction of methylation sites may provide researches with useful information for further productivity in methylation candidate sites discovery. This work proposes Methcrf, a computational predictor based on conditional random field (CRF) for predicting protein methylation sites limit to lysine and arginine residues due to the absence of enough experimentally verified data for other residues. The approach is developed to consider combining protein sequence features with structural information such as solvent accessibility of amino acids that surround the methylation sites. In 10-fold cross validation Methcrf can achieve the area under receiver operating characteristic curve (AUC) of 0.85 and 0.80 for arginine and lysine, respectively. The proposed method has comparable performance with previous methods for accurately predicting methylation sites.


Asunto(s)
Arginina/química , Lisina/química , Metilación , Proteínas/química , Cadenas de Markov , Procesamiento Proteico-Postraduccional , Proteínas/metabolismo , Curva ROC , Análisis de Secuencia de Proteína
14.
Neural Netw ; 25(1): 114-21, 2012 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-21890319

RESUMEN

Twin support vector machines (TWSVMs) obtain faster learning speed by solving a pair of smaller SVM-type problems. In order to increase its efficiency further, this paper presents a coordinate descent margin based twin vector machine (CDMTSVM) compared with the original TWSVM. The major advantages of CDMTSVM lie in two aspects: (1) The primal and dual problems are reformulated and improved by adding a regularization term in the primal problems which implies maximizing the "margin" between the proximal hyperplane and bounding hyperplane, yielding the dual problems to be stable positive definite quadratic programming problems. (2) A novel coordinate descent method is proposed for our dual problems which leads to very fast training. As our coordinate descent method handles one data point at a time, it can process very large datasets that need not reside in memory. Our experiments on publicly available datasets indicate that our CDMTSVM is not only fast, but also shows good generalization performance.


Asunto(s)
Algoritmos , Máquina de Vectores de Soporte , Interpretación Estadística de Datos , Humanos
15.
Comput Biol Chem ; 35(6): 353-62, 2011 Dec 14.
Artículo en Inglés | MEDLINE | ID: mdl-22099632

RESUMEN

Proteins are involved in almost every action of every organism by interacting with other small molecules including drugs. Computationally predicting the drug-protein interactions is particularly important in speeding up the process of developing novel drugs. To borrow the information from existing drug-protein interactions, we need to define the similarity among proteins and the similarity among drugs. Usually these similarities are defined based on one single data source and many methods have been proposed. However, the availability of many genomic and chemogenomic data sources allows us to integrate these useful data sources to improve the predictions. Thus a great challenge is how to integrate these heterogeneous data sources. Here, we propose a kernel-based method to predict drug-protein interactions by integrating multiple types of data. Specially, we collect drug pharmacological and therapeutic effects, drug chemical structures, and protein genomic information to characterize the drug-target interactions, then integrate them by a kernel function within a support vector machine (SVM)-based predictor. With this data fusion technology, we establish the drug-protein interactions from a collections of data sources. Our new method is validated on four classes of drug target proteins, including enzymes, ion channels (ICs), G-protein couple receptors (GPCRs), and nuclear receptors (NRs). We find that every single data source is predictive and integration of different data sources allows the improvement of accuracy, i.e., data integration can uncover more experimentally observed drug-target interactions upon the same levels of false positive rate than single data source based methods. The functional annotation analysis indicates that our new predictions are worthy of future experimental validation. In conclusion, our new method can efficiently integrate diverse data sources, and will promote the further research in drug discovery.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Reconocimiento de Normas Patrones Automatizadas , Preparaciones Farmacéuticas/química , Proteínas/química
16.
BMC Bioinformatics ; 12: 409, 2011 Oct 24.
Artículo en Inglés | MEDLINE | ID: mdl-22024143

RESUMEN

BACKGROUND: With the development of genome-sequencing technologies, protein sequences are readily obtained by translating the measured mRNAs. Therefore predicting protein-protein interactions from the sequences is of great demand. The reason lies in the fact that identifying protein-protein interactions is becoming a bottleneck for eventually understanding the functions of proteins, especially for those organisms barely characterized. Although a few methods have been proposed, the converse problem, if the features used extract sufficient and unbiased information from protein sequences, is almost untouched. RESULTS: In this study, we interrogate this problem theoretically by an optimization scheme. Motivated by the theoretical investigation, we find novel encoding methods for both protein sequences and protein pairs. Our new methods exploit sufficiently the information of protein sequences and reduce artificial bias and computational cost. Thus, it significantly outperforms the available methods regarding sensitivity, specificity, precision, and recall with cross-validation evaluation and reaches ~80% and ~90% accuracy in Escherichia coli and Saccharomyces cerevisiae respectively. Our findings here hold important implication for other sequence-based prediction tasks because representation of biological sequence is always the first step in computational biology. CONCLUSIONS: By considering the converse problem, we propose new representation methods for both protein sequences and protein pairs. The results show that our method significantly improves the accuracy of protein-protein interaction predictions.


Asunto(s)
Mapeo de Interacción de Proteínas , Proteínas/metabolismo , Máquina de Vectores de Soporte , Secuencia de Aminoácidos , Escherichia coli/genética , Escherichia coli/metabolismo , Proteínas/química , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Sensibilidad y Especificidad
17.
BMC Syst Biol ; 5 Suppl 1: S6, 2011 Jun 20.
Artículo en Inglés | MEDLINE | ID: mdl-21689481

RESUMEN

BACKGROUND: Enzymes are known as the largest class of proteins and their functions are usually annotated by the Enzyme Commission (EC), which uses a hierarchy structure, i.e., four numbers separated by periods, to classify the function of enzymes. Automatically categorizing enzyme into the EC hierarchy is crucial to understand its specific molecular mechanism. RESULTS: In this paper, we introduce two key improvements in predicting enzyme function within the machine learning framework. One is to introduce the efficient sequence encoding methods for representing given proteins. The second one is to develop a structure-based prediction method with low computational complexity. In particular, we propose to use the conjoint triad feature (CTF) to represent the given protein sequences by considering not only the composition of amino acids but also the neighbor relationships in the sequence. Then we develop a support vector machine (SVM)-based method, named as SVMHL (SVM for hierarchy labels), to output enzyme function by fully considering the hierarchical structure of EC. The experimental results show that our SVMHL with the CTF outperforms SVMHL with the amino acid composition (AAC) feature both in predictive accuracy and Matthew's correlation coefficient (MCC). In addition, SVMHL with the CTF obtains the accuracy and MCC ranging from 81% to 98% and 0.82 to 0.98 when predicting the first three EC digits on a low-homologous enzyme dataset. We further demonstrate that our method outperforms the methods which do not take account of hierarchical relationship among enzyme categories and alternative methods which incorporate prior knowledge about inter-class relationships. CONCLUSIONS: Our structure-based prediction model, SVMHL with the CTF, reduces the computational complexity and outperforms the alternative approaches in enzyme function prediction. Therefore our new method will be a useful tool for enzyme function prediction community.


Asunto(s)
Inteligencia Artificial , Biología Computacional/métodos , Enzimas/metabolismo , Benchmarking , Enzimas/química
18.
IEEE Trans Neural Netw ; 22(6): 962-8, 2011 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-21550880

RESUMEN

For classification problems, the generalized eigenvalue proximal support vector machine (GEPSVM) and twin support vector machine (TWSVM) are regarded as milestones in the development of the powerful SVMs, as they use the nonparallel hyperplane classifiers. In this brief, we propose an improved version, named twin bounded support vector machines (TBSVM), based on TWSVM. The significant advantage of our TBSVM over TWSVM is that the structural risk minimization principle is implemented by introducing the regularization term. This embodies the marrow of statistical learning theory, so this modification can improve the performance of classification. In addition, the successive overrelaxation technique is used to solve the optimization problems to speed up the training procedure. Experimental results show the effectiveness of our method in both computation time and classification accuracy, and therefore confirm the above conclusion further.


Asunto(s)
Algoritmos , Inteligencia Artificial , Modelos Teóricos , Reconocimiento de Normas Patrones Automatizadas/métodos , Simulación por Computador
19.
Protein Pept Lett ; 18(6): 573-87, 2011 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-21271979

RESUMEN

Protein S-nitrosylation plays a key and specific role in many cellular processes. Detecting possible S-nitrosylated substrates and their corresponding exact sites is crucial for studying the mechanisms of these biological processes. Comparing with the expensive and time-consuming biochemical experiments, the computational methods are attracting considerable attention due to their convenience and fast speed. Although some computational models have been developed to predict S-nitrosylation sites, their accuracy is still low. In this work,we incorporate support vector machine to predict protein S-nitrosylation sites. After a careful evaluation of six encoding schemes, we propose a new efficient predictor, CPR-SNO, using the coupling patterns based encoding scheme. The performance of our CPR-SNO is measured with the area under the ROC curve (AUC) of 0.8289 in 10-fold cross validation experiments, which is significantly better than the existing best method GPS-SNO 1.0's 0.685 performance. In further annotating large-scale potential S-nitrosylated substrates, CPR-SNO also presents an encouraging predictive performance. These results indicate that CPR-SNO can be used as a competitive protein S-nitrosylation sites predictor to the biological community. Our CPR-SNO has been implemented as a web server and is available at http://math.cau.edu.cn/CPR -SNO/CPR-SNO.html.


Asunto(s)
Inteligencia Artificial , Óxidos de Nitrógeno/metabolismo , Procesamiento Proteico-Postraduccional , Proteínas/química , Proteínas/metabolismo , Animales , Sitios de Unión , Humanos , Internet , Ratones , Anotación de Secuencia Molecular , Reconocimiento de Normas Patrones Automatizadas , Curva ROC
20.
Protein Pept Lett ; 18(2): 186-93, 2011 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-21054270

RESUMEN

Protein palmitoylation is an important and common post-translational lipid modification of proteins and plays a critical role in various cellular processes. Identification of Palmitoylation sites is fundamental to decipher the mechanisms of these biological processes. However, experimental determination of palmitoylation residues without prior knowledge is laborious and costly. Thus computational approaches for prediction of palmitoylation sites in proteins have become highly desirable. Here, we propose PPWMs, a computational predictor using Position Weight Matrices (PWMs) encoding scheme and support vector machine (SVM) for identifying protein palmitoylation sites. Our PPWMs shows a nice predictive performance with the area under the ROC curve (AUC) of 0.9472 for the S-palmitoylation sites prediction and 0.9964 for the N-palmitoylation sites prediction on the newly proposed dataset. Comparison results show the superiority of PPWMs over two existing widely known palmitoylation site predictors CSS-Palm 2.0 and CKSAAP-Palm in many cases. Moreover, an attempt of incorporating structure information such as accessible surface area (ASA) and secondary structure (SS) into prediction is made and the structure characteristics are analyzed roughly. The corresponding software can be freely downloaded from http://math.cau.edu.cn/PPWMs.html.


Asunto(s)
Inteligencia Artificial , Posición Específica de Matrices de Puntuación , Procesamiento Proteico-Postraduccional , Proteínas/metabolismo , Simulación por Computador , Lipoilación , Modelos Biológicos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA