Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 54
Filtrar
Más filtros

Bases de datos
País/Región como asunto
Tipo del documento
País de afiliación
Intervalo de año de publicación
1.
BMC Bioinformatics ; 24(1): 301, 2023 Jul 28.
Artículo en Inglés | MEDLINE | ID: mdl-37507654

RESUMEN

BACKGROUND: The identification of tumor T cell antigens (TTCAs) is crucial for providing insights into their functional mechanisms and utilizing their potential in anticancer vaccines development. In this context, TTCAs are highly promising. Meanwhile, experimental technologies for discovering and characterizing new TTCAs are expensive and time-consuming. Although many machine learning (ML)-based models have been proposed for identifying new TTCAs, there is still a need to develop a robust model that can achieve higher rates of accuracy and precision. RESULTS: In this study, we propose a new stacking ensemble learning-based framework, termed StackTTCA, for accurate and large-scale identification of TTCAs. Firstly, we constructed 156 different baseline models by using 12 different feature encoding schemes and 13 popular ML algorithms. Secondly, these baseline models were trained and employed to create a new probabilistic feature vector. Finally, the optimal probabilistic feature vector was determined based the feature selection strategy and then used for the construction of our stacked model. Comparative benchmarking experiments indicated that StackTTCA clearly outperformed several ML classifiers and the existing methods in terms of the independent test, with an accuracy of 0.932 and Matthew's correlation coefficient of 0.866. CONCLUSIONS: In summary, the proposed stacking ensemble learning-based framework of StackTTCA could help to precisely and rapidly identify true TTCAs for follow-up experimental verification. In addition, we developed an online web server ( http://2pmlab.camt.cmu.ac.th/StackTTCA ) to maximize user convenience for high-throughput screening of novel TTCAs.


Asunto(s)
Biología Computacional , Neoplasias , Humanos , Biología Computacional/métodos , Algoritmos , Aprendizaje Automático , Linfocitos T
2.
BMC Bioinformatics ; 24(1): 356, 2023 Sep 21.
Artículo en Inglés | MEDLINE | ID: mdl-37735626

RESUMEN

BACKGROUND: Tyrosinase is an enzyme involved in melanin production in the skin. Several hyperpigmentation disorders involve the overproduction of melanin and instability of tyrosinase activity resulting in darker, discolored patches on the skin. Therefore, discovering tyrosinase inhibitory peptides (TIPs) is of great significance for basic research and clinical treatments. However, the identification of TIPs using experimental methods is generally cost-ineffective and time-consuming. RESULTS: Herein, a stacked ensemble learning approach, called TIPred, is proposed for the accurate and quick identification of TIPs by using sequence information. TIPred explored a comprehensive set of various baseline models derived from well-known machine learning (ML) algorithms and heterogeneous feature encoding schemes from multiple perspectives, such as chemical structure properties, physicochemical properties, and composition information. Subsequently, 130 baseline models were trained and optimized to create new probabilistic features. Finally, the feature selection approach was utilized to determine the optimal feature vector for developing TIPred. Both tenfold cross-validation and independent test methods were employed to assess the predictive capability of TIPred by using the stacking strategy. Experimental results showed that TIPred significantly outperformed the state-of-the-art method in terms of the independent test, with an accuracy of 0.923, MCC of 0.757 and an AUC of 0.977. CONCLUSIONS: The proposed TIPred approach could be a valuable tool for rapidly discovering novel TIPs and effectively identifying potential TIP candidates for follow-up experimental validation. Moreover, an online webserver of TIPred is publicly available at http://pmlabstack.pythonanywhere.com/TIPred .


Asunto(s)
Melaninas , Monofenol Monooxigenasa , Algoritmos , Aprendizaje Automático , Péptidos
3.
Brief Bioinform ; 22(6)2021 11 05.
Artículo en Inglés | MEDLINE | ID: mdl-33963832

RESUMEN

The release of interleukin (IL)-6 is stimulated by antigenic peptides from pathogens as well as by immune cells for activating aggressive inflammation. IL-6 inducing peptides are derived from pathogens and can be used as diagnostic biomarkers for predicting various stages of disease severity as well as being used as IL-6 inhibitors for the suppression of aggressive multi-signaling immune responses. Thus, the accurate identification of IL-6 inducing peptides is of great importance for investigating their mechanism of action as well as for developing diagnostic and immunotherapeutic applications. This study proposes a novel stacking ensemble model (termed StackIL6) for accurately identifying IL-6 inducing peptides. More specifically, StackIL6 was constructed from twelve different feature descriptors derived from three major groups of features (composition-based features, composition-transition-distribution-based features and physicochemical properties-based features) and five popular machine learning algorithms (extremely randomized trees, logistic regression, multi-layer perceptron, support vector machine and random forest). To enhance the utility of baseline models, they were effectively and systematically integrated through a stacking strategy to build the final meta-based model. Extensive benchmarking experiments demonstrated that StackIL6 could achieve significantly better performance than the existing method (IL6PRED) and outperformed its constituent baseline models on both training and independent test datasets, which thereby support its excellent discrimination and generalization abilities. To facilitate easy access to the StackIL6 model, it was established as a freely available web server accessible at http://camt.pythonanywhere.com/StackIL6. It is anticipated that StackIL6 can help to facilitate rapid screening of promising IL-6 inducing peptides for the development of diagnostic and immunotherapeutic applications in the future.


Asunto(s)
Biología Computacional/métodos , Interleucina-6/biosíntesis , Péptidos/metabolismo , Algoritmos , Secuencia de Aminoácidos , Benchmarking , Fenómenos Químicos , Humanos , Aprendizaje Automático , Péptidos/química , Curva ROC , Reproducibilidad de los Resultados
4.
Methods ; 204: 189-198, 2022 08.
Artículo en Inglés | MEDLINE | ID: mdl-34883239

RESUMEN

The development of efficient and effective bioinformatics tools and pipelines for identifying peptides with dipeptidyl peptidase IV (DPP-IV) inhibitory activities from large-scale protein datasets is of great importance for the discovery and development of potential and promising antidiabetic drugs. In this study, we present a novel stacking-based ensemble learning predictor (termed StackDPPIV) designed for identification of DPP-IV inhibitory peptides. Unlike the existing method, which is based on single-feature-based methods, we combined five popular machine learning algorithms in conjunction with ten different feature encodings from multiple perspectives to generate a pool of various baseline models. Subsequently, the probabilistic features derived from these baseline models were systematically integrated and deemed as new feature representations. Finally, in order to improve the predictive performance, the genetic algorithm based on the self-assessment-report was utilized to determine a set of informative probabilistic features and then used the optimal one for developing the final meta-predictor (StackDPPIV). Experiment results demonstrated that StackDPPIV could outperform its constituent baseline models on both the training and independent datasets. Furthermore, StackDPPIV achieved an accuracy of 0.891, MCC of 0.784 and AUC of 0.961, which were 9.4%, 19.0% and 11.4%, respectively, higher than that of the existing method on the independent test. Feature analysis demonstrated that our feature representations had more discriminative ability as compared to conventional feature descriptors, which highlights the combination of different features was essential for the performance improvement. In order to implement the proposed predictor, we had built a user-friendly online web server at http://pmlabstack.pythonanywhere.com/StackDPPIV.


Asunto(s)
Dipeptidil Peptidasa 4 , Péptidos , Biología Computacional , Dipeptidil Peptidasa 4/metabolismo , Aprendizaje Automático , Péptidos/farmacología , Proteínas
5.
Bioinformatics ; 37(17): 2556-2562, 2021 Sep 09.
Artículo en Inglés | MEDLINE | ID: mdl-33638635

RESUMEN

MOTIVATION: The identification of bitter peptides through experimental approaches is an expensive and time-consuming endeavor. Due to the huge number of newly available peptide sequences in the post-genomic era, the development of automated computational models for the identification of novel bitter peptides is highly desirable. RESULTS: In this work, we present BERT4Bitter, a bidirectional encoder representation from transformers (BERT)-based model for predicting bitter peptides directly from their amino acid sequence without using any structural information. To the best of our knowledge, this is the first time a BERT-based model has been employed to identify bitter peptides. Compared to widely used machine learning models, BERT4Bitter achieved the best performance with an accuracy of 0.861 and 0.922 for cross-validation and independent tests, respectively. Furthermore, extensive empirical benchmarking experiments on the independent dataset demonstrated that BERT4Bitter clearly outperformed the existing method with improvements of 8.0% accuracy and 16.0% Matthews coefficient correlation, highlighting the effectiveness and robustness of BERT4Bitter. We believe that the BERT4Bitter method proposed herein will be a useful tool for rapidly screening and identifying novel bitter peptides for drug development and nutritional research. AVAILABILITYAND IMPLEMENTATION: The user-friendly web server of the proposed BERT4Bitter is freely accessible at http://pmlab.pythonanywhere.com/BERT4Bitter. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

6.
J Comput Aided Mol Des ; 36(11): 781-796, 2022 11.
Artículo en Inglés | MEDLINE | ID: mdl-36284036

RESUMEN

The blood-brain barrier (BBB) is the primary barrier with a highly selective semipermeable border between blood vascular endothelial cells and the central nervous system. Since BBB can prevent drugs circulating in the blood from crossing into the interstitial fluid of the brain where neurons reside, many researchers are working hard on developing drug delivery systems to penetrate the BBB which currently poses a challenge. Thus, blood-brain barrier penetrating peptides (B3PPs) are an alternative neurotherapeutic for brain-related disorder since they can facilitate drug delivery into the brain. In the meanwhile, developing computational methods that are effective for both the identification and characterization of B3PPs in a cost-effective manner plays an important role for basic reach and in the pharmaceutical industry. Even though few computational methods for B3PP identification have been developed, their performance might fail in terms of generalization ability and interpretability. In this study, a novel and efficient scoring card method-based predictor (termed SCMB3PP) is presented for improving B3PP identification and characterization. To overcome the limitation of black-box computational approaches, the SCMB3PP predictor can automatically estimate amino acid and dipeptide propensities to be B3PPs. Both cross-validation and independent tests indicate that SCMB3PP can achieve impressive performance and outperform various popular machine learning-based methods and the existing methods on multiple independent test datasets. Furthermore, SCMB3PP-derived amino acid propensities were utilized to identify informative biophysical and biochemical properties for characterizing B3PPs. Finally, an online user-friendly web server ( http://pmlabstack.pythonanywhere.com/SCMB3PP ) is established to identify novel and potential B3PP cost-effectively. This novel computational approach is anticipated to facilitate the large-scale identification of high potential B3PP candidates for follow-up experimental validation.


Asunto(s)
Barrera Hematoencefálica , Dipéptidos , Dipéptidos/química , Dipéptidos/metabolismo , Puntaje de Propensión , Células Endoteliales , Péptidos/metabolismo , Aminoácidos/química
7.
Genomics ; 113(1 Pt 2): 689-698, 2021 01.
Artículo en Inglés | MEDLINE | ID: mdl-33017626

RESUMEN

Fast, accurate identification and characterization of amyloid proteins at a large-scale is essential for understating their role in therapeutic intervention strategies. As a matter of fact, there exist only one in silico model for amyloid protein identification using the random forest (RF) model in conjunction with various feature types namely the RFAmy. However, it suffers from low interpretability for biologists. Thus, it is highly desirable to develop a simple and easily interpretable prediction method with robust accuracy as compared to the existing complicated model. In this study, we propose iAMY-SCM, the first scoring card method-based predictor for predicting and analyzing amyloid proteins. Herein, the iAMY-SCM made use of a simple weighted-sum function in conjunction with the propensity scores of dipeptides for the amyloid protein identification. Cross-validation results indicated that iAMY-SCM provided an accuracy of 0.895 that corresponded to 10-22% higher performance than that of widely used machine learning models. Furthermore, iAMY-SCM achieving an accuracy of 0.827 as evaluated by an independent test, which was found to be comparable to that of RFAmy and was approximately 9-13% higher than widely used machine learning models. Furthermore, the analysis of estimated propensity scores of amino acids and dipeptides were performed to provide insights into the biophysical and biochemical properties of amyloid proteins. As such, this demonstrates that the proposed iAMY-SCM is efficient and reliable in terms of simplicity, interpretability and implementation. To facilitate ease of use of the proposed iAMY-SCM, a user-friendly and publicly accessible web server at http://camt.pythonanywhere.com/iAMY-SCM has been established. We anticipate that that iAMY-SCM will be an important tool for facilitating the large-scale prediction and characterization of amyloid protein.


Asunto(s)
Amiloide/química , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Amiloide/genética , Amiloide/metabolismo , Aprendizaje Automático , Puntaje de Propensión , Conformación Proteica , Multimerización de Proteína
8.
Sensors (Basel) ; 22(22)2022 Nov 14.
Artículo en Inglés | MEDLINE | ID: mdl-36433385

RESUMEN

Recent advancement in Deep Learning-based Convolutional Neural Networks (D-CNNs) has led research to improve the efficiency and performance of barcode recognition in Supply Chain Management (SCM). D-CNNs required real-world images embedded with ground truth data, which is often not readily available in the case of SCM barcode recognition. This study introduces two invented barcode datasets: InventBar and ParcelBar. The datasets contain labeled barcode images with 527 consumer goods and 844 post boxes in the indoor environment. To explore the influential capability of the datasets that affect recognition process, five existing D-CNN algorithms were applied and compared over a set of recently available barcode datasets. To confirm the model's performance and accuracy, runtime and Mean Average Precision (mAP) were examined based on different IoU thresholds and image transformation settings. The results show that YOLO v5 works best for the ParcelBar in terms of speed and accuracy. The situation is different for the InventBar since Faster R-CNN could allow the model to learn faster with a small drop in accuracy. It is proven that the proposed datasets can be practically utilized for the mainstream D-CNN frameworks. Both are available for developing barcode recognition models and positively affect comparative studies.


Asunto(s)
Benchmarking , Redes Neurales de la Computación , Algoritmos , Recolección de Datos
9.
J Comput Aided Mol Des ; 35(10): 1037-1053, 2021 10.
Artículo en Inglés | MEDLINE | ID: mdl-34622387

RESUMEN

Fast and accurate identification of inhibitors with potency against HCV NS5B polymerase is currently a challenging task. As conventional experimental methods is the gold standard method for the design and development of new HCV inhibitors, they often require costly investment of time and resources. In this study, we develop a novel machine learning-based meta-predictor (termed StackHCV) for accurate and large-scale identification of HCV inhibitors. Unlike the existing method, which is based on single-feature-based approach, we first constructed a pool of various baseline models by employing a wide range of heterogeneous molecular fingerprints with five popular machine learning algorithms (k-nearest neighbor, multi-layer perceptron, partial least squares, random forest and support vectors machine). Secondly, we integrated these baseline models in order to develop the final meta-based model by means of the stacking strategy. Extensive benchmarking experiments showed that StackHCV achieved a more accurate and stable performance as compared to its constituent baseline models on the training dataset and also outperformed the existing predictor on the independent test dataset. To facilitate the high-throughput identification of HCV inhibitors, we built a web server that can be freely accessed at http://camt.pythonanywhere.com/StackHCV . It is expected that StackHCV could be a useful tool for fast and precise identification of potential drugs against HCV NS5B particularly for liver cancer therapy and other clinical applications.


Asunto(s)
Antivirales/farmacología , Inhibidores Enzimáticos/farmacología , Hepacivirus/efectos de los fármacos , Hepatitis C/tratamiento farmacológico , Internet/estadística & datos numéricos , Aprendizaje Automático , ARN Polimerasa Dependiente del ARN/antagonistas & inhibidores , Proteínas no Estructurales Virales/antagonistas & inhibidores , Algoritmos , Antivirales/aislamiento & purificación , Inhibidores Enzimáticos/aislamiento & purificación , Hepacivirus/aislamiento & purificación , Hepatitis C/virología , Humanos , Máquina de Vectores de Soporte
10.
Genomics ; 112(4): 2813-2822, 2020 07.
Artículo en Inglés | MEDLINE | ID: mdl-32234434

RESUMEN

In general, hydrolyzed proteins, plant-derived alkaloids and toxins displays unpleasant bitter taste. Thus, the perception of bitter taste plays a crucial role in protecting animals from poisonous plants and environmental toxins. Therapeutic peptides have attracted great attention as a new drug class. The successful identification and characterization of bitter peptides are essential for drug development and nutritional research. Owing to the large volume of peptides generated in the post-genomic era, there is an urgent need to develop computational methods for rapidly and effectively discriminating bitter peptides from non-bitter peptides. To the best of our knowledge, there is yet no computational model for predicting and analyzing bitter peptides using sequence information. In this study, we present for the first time a computational model called the iBitter-SCM that can predict the bitterness of peptides directly from their amino acid sequence without any dependence on their functional domain or structural information. iBitter-SCM is a simple and effective method that was built using the scoring card method (SCM) with estimated propensity scores of amino acids and dipeptides. Our benchmarking results demonstrated that iBitter-SCM achieved an accuracy and Matthews coefficient correlation of 84.38% and 0.688, respectively, on the independent dataset. Rigorous independent test indicated that iBitter-SCM was superior to those of other widely used machine-learning classifiers (e.g. k-nearest neighbor, naive Bayes, decision tree and random forest) owing to its simplicity, interpretability and implementation. Furthermore, the analysis of estimated propensity scores of amino acids and dipeptides were performed to provide a better understanding of the biophysical and biochemical properties of bitter peptides. For the convenience of experimental scientists, a web server is provided publicly at http://camt.pythonanywhere.com/iBitter-SCM. It is anticipated that iBitter-SCM can serve as an important tool to facilitate the high-throughput prediction and de novo design of bitter peptides.


Asunto(s)
Dipéptidos/química , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Gusto , Aminoácidos/química , Interacciones Hidrofóbicas e Hidrofílicas , Aprendizaje Automático , Puntaje de Propensión , Alineación de Secuencia
11.
Int J Mol Sci ; 22(23)2021 Dec 04.
Artículo en Inglés | MEDLINE | ID: mdl-34884927

RESUMEN

Umami ingredients have been identified as important factors in food seasoning and production. Traditional experimental methods for characterizing peptides exhibiting umami sensory properties (umami peptides) are time-consuming, laborious, and costly. As a result, it is preferable to develop computational tools for the large-scale identification of available sequences in order to identify novel peptides with umami sensory properties. Although a computational tool has been developed for this purpose, its predictive performance is still insufficient. In this study, we use a feature representation learning approach to create a novel machine-learning meta-predictor called UMPred-FRL for improved umami peptide identification. We combined six well-known machine learning algorithms (extremely randomized trees, k-nearest neighbor, logistic regression, partial least squares, random forest, and support vector machine) with seven different feature encodings (amino acid composition, amphiphilic pseudo-amino acid composition, dipeptide composition, composition-transition-distribution, and pseudo-amino acid composition) to develop the final meta-predictor. Extensive experimental results demonstrated that UMPred-FRL was effective and achieved more accurate performance on the benchmark dataset compared to its baseline models, and consistently outperformed the existing method on the independent test dataset. Finally, to aid in the high-throughput identification of umami peptides, the UMPred-FRL web server was established and made freely available online. It is expected that UMPred-FRL will be a powerful tool for the cost-effective large-scale screening of candidate peptides with potential umami sensory properties.


Asunto(s)
Biología Computacional/métodos , Aprendizaje Automático , Péptidos/química , Algoritmos , Bases de Datos de Proteínas , Proteínas en la Dieta/química , Internet , Máquina de Vectores de Soporte , Gusto
12.
Int J Mol Sci ; 22(16)2021 Aug 19.
Artículo en Inglés | MEDLINE | ID: mdl-34445663

RESUMEN

Accurate identification of bitter peptides is of great importance for better understanding their biochemical and biophysical properties. To date, machine learning-based methods have become effective approaches for providing a good avenue for identifying potential bitter peptides from large-scale protein datasets. Although few machine learning-based predictors have been developed for identifying the bitterness of peptides, their prediction performances could be improved. In this study, we developed a new predictor (named iBitter-Fuse) for achieving more accurate identification of bitter peptides. In the proposed iBitter-Fuse, we have integrated a variety of feature encoding schemes for providing sufficient information from different aspects, namely consisting of compositional information and physicochemical properties. To enhance the predictive performance, the customized genetic algorithm utilizing self-assessment-report (GA-SAR) was employed for identifying informative features followed by inputting optimal ones into a support vector machine (SVM)-based classifier for developing the final model (iBitter-Fuse). Benchmarking experiments based on both 10-fold cross-validation and independent tests indicated that the iBitter-Fuse was able to achieve more accurate performance as compared to state-of-the-art methods. To facilitate the high-throughput identification of bitter peptides, the iBitter-Fuse web server was established and made freely available online. It is anticipated that the iBitter-Fuse will be a useful tool for aiding the discovery and de novo design of bitter peptides.


Asunto(s)
Algoritmos , Aprendizaje Automático , Fragmentos de Péptidos/química , Programas Informáticos , Máquina de Vectores de Soporte , Gusto , Benchmarking , Humanos , Valor Predictivo de las Pruebas
13.
J Proteome Res ; 19(10): 4125-4136, 2020 10 02.
Artículo en Inglés | MEDLINE | ID: mdl-32897718

RESUMEN

The inhibition of dipeptidyl peptidase IV (DPP-IV, E.C.3.4.14.5) is well recognized as a new avenue for the treatment of Type 2 diabetes (T2D). Until now, peptide-like DDP-IV inhibitors have been shown to normalize the blood glucose concentration in T2D subjects. To the best of our knowledge, there is yet no computational model for predicting and analyzing DPP-IV inhibitory peptides using sequence information. In this study, we present for the first time a simple and easily interpretable sequence-based predictor using the scoring card method (SCM) for modeling the bioactivity of DPP-IV inhibitory peptides (iDPPIV-SCM). Particularly, the iDPPIV-SCM was developed by employing the SCM method together with the propensity scores of amino acids. Rigorous independent test results demonstrated that the proposed iDPPIV-SCM was found to be superior to those of well-known machine learning (ML) classifiers (e.g., k-nearest neighbor, logistic regression, and decision tree) with demonstrated improvements of 2-11, 4-22, and 7-10% for accuracy, MCC, and AUC, respectively, while also achieving comparable results to that of the support vector machine. Furthermore, the analysis of estimated propensity scores of amino acids as derived from the iDPPIV-SCM was performed so as to provide a more in-depth understanding on the molecular basis for enhancing the DPP-IV inhibitory potency. Taken together, these results revealed that iDPPIV-SCM was superior to those of other well-known ML classifiers owing to its simplicity, interpretability, and validity. For the convenience of biologists, the predictive model is deployed as a publicly accessible web server at http://camt.pythonanywhere.com/iDPPIV-SCM. It is anticipated that iDPPIV-SCM can serve as an important tool for the rapid screening of promising DPP-IV inhibitory peptides prior to their synthesis.


Asunto(s)
Diabetes Mellitus Tipo 2 , Dipeptidil Peptidasa 4 , Aminoácidos , Diabetes Mellitus Tipo 2/tratamiento farmacológico , Humanos , Péptidos , Máquina de Vectores de Soporte
14.
Anal Biochem ; 599: 113747, 2020 06 15.
Artículo en Inglés | MEDLINE | ID: mdl-32333902

RESUMEN

In spite of the repertoire of existing cancer therapies, the ongoing recurrence and new cases of cancer poses a challenging health concern that prompts for novel and effective treatment. Cancer immunotherapy represents a promising venue for treatment by harnessing the body's immune system to combat cancer. Therefore, the identification of tumor T cell antigen represents an exciting area to explore. Computational tools have been instrumental in the identification of tumor T cell antigens and it is highly desirable to attain highly accurate models in a timely fashion from large volumes of peptides generated in the post-genomic era. In this study, we present a reliable, accurate, unbiased and automated sequence-based predictor named iTTCA-Hybrid for identifying tumor T cell antigens. The iTTCA-Hybrid approach proposed herein employs two robust machine learning models (e.g. support vector machine and random forest) constructed using five feature encoding strategies (i.e. amino acid composition, dipeptide composition, pseudo amino acid composition, distribution of amino acid properties in sequences and physicochemical properties derived from the AAindex). Rigorous independent test indicated that the iTTCA-Hybrid approach achieved an accuracy and area under the curve of 73.60% and 0.783, respectively, which corresponds to 4% and 7% performance increase than those of existing methods thereby indicating the superiority of the proposed model. To the best of our knowledge, the iTTCA-Hybrid is the first free web server (Available at http://camt.pythonanywhere.com/iTTCA-Hybrid) for identifying tumor T cell antigens presented by the MHC class I. The proposed web server allows robust predictions to be made without the need to develop in-house prediction models.


Asunto(s)
Antígenos de Neoplasias/inmunología , Antígenos de Histocompatibilidad Clase I/análisis , Aprendizaje Automático , Neoplasias/inmunología , Linfocitos T/inmunología , Humanos , Inmunoterapia , Neoplasias/terapia , Linfocitos T/citología
15.
J Chem Inf Model ; 60(12): 6666-6678, 2020 12 28.
Artículo en Inglés | MEDLINE | ID: mdl-33094610

RESUMEN

Umami or the taste of monosodium glutamate represents one of the major attractive taste modalities in humans. Therefore, knowledge about biophysical and biochemical properties of the umami taste is important for both scientific research and the food industry. Experimental approaches for predicting umami peptides are labor intensive, time consuming, and expensive. To date, computational models for the prediction and analysis of umami peptides as a function of sequence information have not been developed yet. In this study, we have proposed the first sequence-based predictor named iUmami-SCM using primary sequence information for the identification and characterization of umami peptides. iUmami-SCM utilized a newly developed scoring card method (SCM) in conjunction with the propensity scores of amino acids and dipeptide. Our predictor demonstrated excellent prediction performance ability for predicting umami peptides as well as outperforming other commonly used machine learning classifiers. Particularly, iUmami-SCM afforded the highest accuracy and Matthews correlation coefficient of 0.865 and 0.679, respectively, on an independent data set. Furthermore, the analysis of SCM-derived propensity scores was performed so as to provide a more in-depth understanding and knowledge of biophysical and biochemical properties of umami intensities of peptides. To develop a convenient bioinformatics tool, the best model is deployed as a web server that is made publicly available at http://camt.pythonanywhere.com/iUmami-SCM. The iUmami-SCM, as presented herein, serves as a powerful computational technique for large-scale umami peptide identification as well as facilitating the interpretation of umami peptides.


Asunto(s)
Dipéptidos , Péptidos , Gusto , Aminoácidos , Biología Computacional , Humanos , Puntaje de Propensión
16.
J Comput Aided Mol Des ; 34(10): 1105-1116, 2020 10.
Artículo en Inglés | MEDLINE | ID: mdl-32557165

RESUMEN

Phage virion protein (PVP) perforate the host cell membrane and eventually culminates in cell rupture thereby releasing replicated phages. The accurate identification of PVP is thus a crucial step towards improving our understanding of the biological function and mechanisms of PVPs. Therefore, it is desirable to develop a computational method that is capable of fast and accurate identification of PVPs. To address this, we propose a novel sequence-based meta-predictor employing probabilistic information (referred herein as the Meta-iPVP) for the accurate identification of PVPs. Particularly, efficient feature representation approach was used to generate discriminative probabilistic features from four machine learning (ML) algorithms making use of seven feature encodings. To the best of our knowledge, the Meta-iPVP is the first meta-based approach that has been developed for PVP prediction. Independent test results indicated that the Meta-iPVP could discern important characteristics between PVPs and non-PVPs as well as achieving the best accuracy and MCC of 0.817 and 0.642, respectively, which corresponds to 6-10% and 14-21% improvements over existing PVP predictors. As such, this demonstrates that the proposed Meta-iPVP is a more efficient, robust and promising for the identification of PVPs. The predictive model is deployed as a publicly accessible Meta-iPVP webserver freely available online at http://camt.pythonanywhere.com/Meta-iPVP .


Asunto(s)
Algoritmos , Bacteriófagos/metabolismo , Biología Computacional/métodos , Aprendizaje Automático , Análisis de Secuencia de Proteína/métodos , Proteínas Virales/química , Virión/metabolismo , Humanos , Programas Informáticos , Máquina de Vectores de Soporte , Proteínas Virales/metabolismo
18.
Int J Mol Sci ; 21(1)2019 Dec 20.
Artículo en Inglés | MEDLINE | ID: mdl-31861928

RESUMEN

Understanding of quorum-sensing peptides (QSPs) in their functional mechanism plays an essential role in finding new opportunities to combat bacterial infections by designing drugs. With the avalanche of the newly available peptide sequences in the post-genomic age, it is highly desirable to develop a computational model for efficient, rapid and high-throughput QSP identification purely based on the peptide sequence information alone. Although, few methods have been developed for predicting QSPs, their prediction accuracy and interpretability still requires further improvements. Thus, in this work, we proposed an accurate sequence-based predictor (called iQSP) and a set of interpretable rules (called IR-QSP) for predicting and analyzing QSPs. In iQSP, we utilized a powerful support vector machine (SVM) cooperating with 18 informative features from physicochemical properties (PCPs). Rigorous independent validation test showed that iQSP achieved maximum accuracy and MCC of 93.00% and 0.86, respectively. Furthermore, a set of interpretable rules IR-QSP was extracted by using random forest model and the 18 informative PCPs. Finally, for the convenience of experimental scientists, the iQSP web server was established and made freely available online. It is anticipated that iQSP will become a useful tool or at least as a complementary existing method for predicting and analyzing QSPs.


Asunto(s)
Fenómenos Fisiológicos Bacterianos , Aprendizaje Automático , Péptidos/metabolismo , Percepción de Quorum , Secuencia de Aminoácidos , Bacterias/química , Descubrimiento de Drogas , Modelos Moleculares , Péptidos/química , Máquina de Vectores de Soporte
19.
BMC Bioinformatics ; 16 Suppl 1: S8, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-25708243

RESUMEN

BACKGROUND: Photosynthetic proteins (PSPs) greatly differ in their structure and function as they are involved in numerous subprocesses that take place inside an organelle called a chloroplast. Few studies predict PSPs from sequences due to their high variety of sequences and structues. This work aims to predict and characterize PSPs by establishing the datasets of PSP and non-PSP sequences and developing prediction methods. RESULTS: A novel bioinformatics method of predicting and characterizing PSPs based on scoring card method (SCMPSP) was used. First, a dataset consisting of 649 PSPs was established by using a Gene Ontology term GO:0015979 and 649 non-PSPs from the SwissProt database with sequence identity <= 25%.- Several prediction methods are presented based on support vector machine (SVM), decision tree J48, Bayes, BLAST, and SCM. The SVM method using dipeptide features-performed well and yielded - a test accuracy of 72.31%. The SCMPSP method uses the estimated propensity scores of 400 dipeptides - as PSPs and has a test accuracy of 71.54%, which is comparable to that of the SVM method. The derived propensity scores of 20 amino acids were further used to identify informative physicochemical properties for characterizing PSPs. The analytical results reveal the following four characteristics of PSPs: 1) PSPs favour hydrophobic side chain amino acids; 2) PSPs are composed of the amino acids prone to form helices in membrane environments; 3) PSPs have low interaction with water; and 4) PSPs prefer to be composed of the amino acids of electron-reactive side chains. CONCLUSIONS: The SCMPSP method not only estimates the propensity of a sequence to be PSPs, it also discovers characteristics that further improve understanding of PSPs. The SCMPSP source code and the datasets used in this study are available at http://iclab.life.nctu.edu.tw/SCMPSP/.


Asunto(s)
Proteínas de Cloroplastos/metabolismo , Biología Computacional/métodos , Fotosíntesis , Teorema de Bayes , Proteínas de Cloroplastos/química , Proteínas de Cloroplastos/genética , Bases de Datos de Proteínas , Dipéptidos/química , Dipéptidos/metabolismo , Ontología de Genes , Membranas Intracelulares/metabolismo , Estructura Secundaria de Proteína , Máquina de Vectores de Soporte , Agua/metabolismo
20.
BMC Bioinformatics ; 16 Suppl 18: S14, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26681483

RESUMEN

BACKGROUND: Protein-protein interactions (PPIs) are involved in various biological processes, and underlying mechanism of the interactions plays a crucial role in therapeutics and protein engineering. Most machine learning approaches have been developed for predicting the binding affinity of protein-protein complexes based on structure and functional information. This work aims to predict the binding affinity of heterodimeric protein complexes from sequences only. RESULTS: This work proposes a support vector machine (SVM) based binding affinity classifier, called SVM-BAC, to classify heterodimeric protein complexes based on the prediction of their binding affinity. SVM-BAC identified 14 of 580 sequence descriptors (physicochemical, energetic and conformational properties of the 20 amino acids) to classify 216 heterodimeric protein complexes into low and high binding affinity. SVM-BAC yielded the training accuracy, sensitivity, specificity, AUC and test accuracy of 85.80%, 0.89, 0.83, 0.86 and 83.33%, respectively, better than existing machine learning algorithms. The 14 features and support vector regression were further used to estimate the binding affinities (Pkd) of 200 heterodimeric protein complexes. Prediction performance of a Jackknife test was the correlation coefficient of 0.34 and mean absolute error of 1.4. We further analyze three informative physicochemical properties according to their contribution to prediction performance. Results reveal that the following properties are effective in predicting the binding affinity of heterodimeric protein complexes: apparent partition energy based on buried molar fractions, relations between chemical structure and biological activity in principal component analysis IV, and normalized frequency of beta turn. CONCLUSIONS: The proposed sequence-based prediction method SVM-BAC uses an optimal feature selection method to identify 14 informative features to classify and predict binding affinity of heterodimeric protein complexes. The characterization analysis revealed that the average numbers of beta turns and hydrogen bonds at protein-protein interfaces in high binding affinity complexes are more than those in low binding affinity complexes.


Asunto(s)
Proteínas/química , Máquina de Vectores de Soporte , Área Bajo la Curva , Dimerización , Enlace de Hidrógeno , Análisis de Componente Principal , Unión Proteica , Mapas de Interacción de Proteínas , Estructura Terciaria de Proteína , Proteínas/metabolismo , Curva ROC
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA