Pesquisa | Portal Regional da BVS

Application of Genetic Programming (GP) Formalism for Building Disease Predictive Models from Protein-Protein Interactions (PPI) Data.

Vyas, Renu; Bapat, Sanket; Goel, Purva; Karthikeyan, Muthukumarasamy; Tambe, Sanjeev S; Kulkarni, Bhaskar D.

IEEE/ACM Trans Comput Biol Bioinform ; 15(1): 27-37, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-28113781

RESUMO

Protein-protein interactions (PPIs) play a vital role in the biological processes involved in the cell functions and disease pathways. The experimental methods known to predict PPIs require tremendous efforts and the results are often hindered by the presence of a large number of false positives. Herein, we demonstrate the use of a new Genetic Programming (GP) based Symbolic Regression (SR) approach for predicting PPIs related to a disease. In a case study, a dataset consisting of one hundred and thirty five PPI complexes related to cancer was used to construct a generic PPI predicting model with good PPI prediction accuracy and generalization ability. A high correlation coefficient(CC) of 0.893, low root mean square error (RMSE) and mean absolute percentage error (MAPE) values of 478.221 and 0.239, respectively were achieved for both the training and test set outputs. To validate the discriminatory nature of the model, it was applied on a dataset of diabetes complexes where it yielded significantly low CC values. Thus, the GP model developed here serves a dual purpose: (a)a predictor of the binding energy of cancer related PPI complexes, and (b)a classifier for discriminating PPI complexes related to cancer from those of other diseases.

Genetic programming based quantitative structure-retention relationships for the prediction of Kovats retention indices.

Goel, Purva; Bapat, Sanket; Vyas, Renu; Tambe, Amruta; Tambe, Sanjeev S.

J Chromatogr A ; 1420: 98-109, 2015 Nov 13.

Artigo em Inglês | MEDLINE | ID: mdl-26460075

RESUMO

The development of quantitative structure-retention relationships (QSRR) aims at constructing an appropriate linear/nonlinear model for the prediction of the retention behavior (such as Kovats retention index) of a solute on a chromatographic column. Commonly, multi-linear regression and artificial neural networks are used in the QSRR development in the gas chromatography (GC). In this study, an artificial intelligence based data-driven modeling formalism, namely genetic programming (GP), has been introduced for the development of quantitative structure based models predicting Kovats retention indices (KRI). The novelty of the GP formalism is that given an example dataset, it searches and optimizes both the form (structure) and the parameters of an appropriate linear/nonlinear data-fitting model. Thus, it is not necessary to pre-specify the form of the data-fitting model in the GP-based modeling. These models are also less complex, simple to understand, and easy to deploy. The effectiveness of GP in constructing QSRRs has been demonstrated by developing models predicting KRIs of light hydrocarbons (case study-I) and adamantane derivatives (case study-II). In each case study, two-, three- and four-descriptor models have been developed using the KRI data available in the literature. The results of these studies clearly indicate that the GP-based models possess an excellent KRI prediction accuracy and generalization capability. Specifically, the best performing four-descriptor models in both the case studies have yielded high (>0.9) values of the coefficient of determination (R(2)) and low values of root mean squared error (RMSE) and mean absolute percent error (MAPE) for training, test and validation set data. The characteristic feature of this study is that it introduces a practical and an effective GP-based method for developing QSRRs in gas chromatography that can be gainfully utilized for developing other types of data-driven models in chromatography science.

Assuntos

Adamantano/química , Cromatografia Gasosa/instrumentação , Bases de Dados Factuais , Hidrocarbonetos/química , Redes Neurais de Computação , Dinâmica não Linear , Cromatografia Gasosa/métodos , Humanos , Modelos Lineares

Role of Chemical Reactivity and Transition State Modeling for Virtual Screening.

Karthikeyan, Muthukumarasamy; Vyas, Renu; Tambe, Sanjeev S; Radhamohan, Deepthi; Kulkarni, Bhaskar D.

Comb Chem High Throughput Screen ; 18(7): 638-57, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26138569

RESUMO

Every drug discovery research program involves synthesis of a novel and potential drug molecule utilizing atom efficient, economical and environment friendly synthetic strategies. The current work focuses on the role of the reactivity based fingerprints of compounds as filters for virtual screening using a tool ChemScore. A reactant-like (RLS) and a product- like (PLS) score can be predicted for a given compound using the binary fingerprints derived from the numerous known organic reactions which capture the molecule-molecule interactions in the form of addition, substitution, rearrangement, elimination and isomerization reactions. The reaction fingerprints were applied to large databases in biology and chemistry, namely ChEMBL, KEGG, HMDB, DSSTox, and the Drug Bank database. A large network of 1113 synthetic reactions was constructed to visualize and ascertain the reactant product mappings in the chemical reaction space. The cumulative reaction fingerprints were computed for 4000 molecules belonging to 29 therapeutic classes of compounds, and these were found capable of discriminating between the cognition disorder related and anti-allergy compounds with reasonable accuracy of 75% and AUC 0.8. In this study, the transition state based fingerprints were also developed and used effectively for virtual screening in drug related databases. The methodology presented here provides an efficient handle for the rapid scoring of molecular libraries for virtual screening.

Assuntos

Simulação por Computador , Avaliação Pré-Clínica de Medicamentos , Biotransformação , Bases de Dados de Produtos Farmacêuticos , Estrutura Molecular , Paclitaxel/química , Software

A Study of Applications of Machine Learning Based Classification Methods for Virtual Screening of Lead Molecules.

Vyas, Renu; Bapat, Sanket; Jain, Esha; Tambe, Sanjeev S; Karthikeyan, Muthukumarasamy; Kulkarni, Bhaskar D.

Comb Chem High Throughput Screen ; 18(7): 658-72, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26138573

RESUMO

The ligand-based virtual screening of combinatorial libraries employs a number of statistical modeling and machine learning methods. A comprehensive analysis of the application of these methods for the diversity oriented virtual screening of biological targets/drug classes is presented here. A number of classification models have been built using three types of inputs namely structure based descriptors, molecular fingerprints and therapeutic category for performing virtual screening. The activity and affinity descriptors of a set of inhibitors of four target classes DHFR, COX, LOX and NMDA have been utilized to train a total of six classifiers viz. Artificial Neural Network (ANN), k nearest neighbor (k-NN), Support Vector Machine (SVM), Naïve Bayes (NB), Decision Tree--(DT) and Random Forest--(RF). Among these classifiers, the ANN was found as the best classifier with an AUC of 0.9 irrespective of the target. New molecular fingerprints based on pharmacophore, toxicophore and chemophore (PTC), were used to build the ANN models for each dataset. A good accuracy of 87.27% was obtained using 296 chemophoric binary fingerprints for the COX-LOX inhibitors compared to pharmacophoric (67.82%) and toxicophoric (70.64%). The methodology was validated on the classical Ames mutagenecity dataset of 4337 molecules. To evaluate it further, selectivity and promiscuity of molecules from five drug classes viz. anti-anginal, anti-convulsant, anti-depressant, anti-arrhythmic and anti-diabetic were studied. The TPC fingerprints computed for each category were able to capture the drug-class specific features using the k-NN classifier. These models can be useful for selecting optimal molecules for drug design.

Assuntos

Sistemas de Liberação de Medicamentos , Desenho de Fármacos , Aprendizado de Máquina , Antibacterianos/química , Antibacterianos/uso terapêutico , Anticonvulsivantes/química , Anticonvulsivantes/uso terapêutico , Antidepressivos/química , Antidepressivos/uso terapêutico , Arritmias Cardíacas/tratamento farmacológico , Hipoglicemiantes/química , Hipoglicemiantes/uso terapêutico

Artificial neural networks for prediction of mycobacterial promoter sequences.

Kalate, Rupali N; Tambe, Sanjeev S; Kulkarni, Bhaskar D.

Comput Biol Chem ; 27(6): 555-64, 2003 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-14667783

RESUMO

A multilayered feed-forward ANN architecture trained using the error-back-propagation (EBP) algorithm has been developed for predicting whether a given nucleotide sequence is a mycobacterial promoter sequence. Owing to the high prediction capability ( congruent with 97%) of the developed network model, it has been further used in conjunction with the caliper randomization (CR) approach for determining the structurally/functionally important regions in the promoter sequences. The results obtained thereby indicate that: (i) upstream region of -35 box, (ii) -35 region, (iii) spacer region and, (iv) -10 box, are important for mycobacterial promoters. The CR approach also suggests that the -38 to -29 region plays a significant role in determining whether a given sequence is a mycobacterial promoter. In essence, the present study establishes ANNs as a tool for predicting mycobacterial promoter sequences and determining structurally/functionally important sub-regions therein.

Assuntos

Mycobacterium/genética , Redes Neurais de Computação , Regiões Promotoras Genéticas , Sequência de Bases , Genes Bacterianos/genética , Dados de Sequência Molecular , Sequências Reguladoras de Ácido Nucleico , Análise de Sequência de DNA

Statistical analysis of the physico-chemical data on the coastal waters of Cochin.

Iyer, C S Padmanabha; Sindhu, Manonmani; Kulkarni, Savita G; Tambe, Sanjeev S; Kulkarni, Bhaskar D.

J Environ Monit ; 5(2): 324-7, 2003 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-12729276

RESUMO

Measurements of temperature, salinity, dissolved oxygen, nitrogen as ammonia, nitrate and nitrite, and phosphate along with chlorophyll were carried out at three stations on the coastal waters of Cochin, south west India, at two-levels of the water column over a period of five years. The data set has been factorised using principal component analysis (PCA) for extracting linear relationships existing among a set of variables. A graphical display of the scores generated from the PCA was done by means of boxplots and biplots, which helped in the interpretation of the data. The major factors conditioning the system are related to the input of fresh water from the estuary of the Periyar river and the high organic load of the bottom sediment in the coastal area which results in a reducing environment, as reflected in the parameters of dissolved oxygen, ammoniacal-nitrogen and nitrite-nitrogen. Another factor which contributes to the variation in the system is related to the unloading activity in the port area. The present approach presents a logical way to interpret the complex data of the physico-chemical measurements.

Assuntos

Monitoramento Ambiental/estatística & dados numéricos , Sedimentos Geológicos/química , Poluentes da Água/análise , Água/química , Amônia/análise , Clorofila/análise , Índia , Nitratos/análise , Nitritos/análise , Oxigênio/análise , Fosfatos/análise , Temperatura

Genetic programming assisted stochastic optimization strategies for optimization of glucose to gluconic acid fermentation.

Cheema, Jitender Jit Singh; Sankpal, Narendra V; Tambe, Sanjeev S; Kulkarni, Bhaskar D.

Biotechnol Prog ; 18(6): 1356-65, 2002.

Artigo em Inglês | MEDLINE | ID: mdl-12467472

RESUMO

This article presents two hybrid strategies for the modeling and optimization of the glucose to gluconic acid batch bioprocess. In the hybrid approaches, first a novel artificial intelligence formalism, namely, genetic programming (GP), is used to develop a process model solely from the historic process input-output data. In the next step, the input space of the GP-based model, representing process operating conditions, is optimized using two stochastic optimization (SO) formalisms, viz., genetic algorithms (GAs) and simultaneous perturbation stochastic approximation (SPSA). These SO formalisms possess certain unique advantages over the commonly used gradient-based optimization techniques. The principal advantage of the GP-GA and GP-SPSA hybrid techniques is that process modeling and optimization can be performed exclusively from the process input-output data without invoking the detailed knowledge of the process phenomenology. The GP-GA and GP-SPSA techniques have been employed for modeling and optimization of the glucose to gluconic acid bioprocess, and the optimized process operating conditions obtained thereby have been compared with those obtained using two other hybrid modeling-optimization paradigms integrating artificial neural networks (ANNs) and GA/SPSA formalisms. Finally, the overall optimized operating conditions given by the GP-GA method, when verified experimentally resulted in a significant improvement in the gluconic acid yield. The hybrid strategies presented here are generic in nature and can be employed for modeling and optimization of a wide variety of batch and continuous bioprocesses.

Assuntos

Inteligência Artificial , Aspergillus niger/metabolismo , Gluconatos/metabolismo , Glucose/metabolismo , Aspergillus niger/crescimento & desenvolvimento , Biomassa , Reatores Biológicos , Simulação por Computador , Fermentação , Processos Estocásticos

Obtaining functional form for chaotic time series evolution using genetic algorithm.

Yadavalli, Vamsi K.; Dahule, Rahul K.; Tambe, Sanjeev S.; Kulkarni, B. D..

Chaos ; 9(3): 789-794, 1999 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-12779874

RESUMO

A genetic algorithm (GA) based strategy is presented for deducing an exact or near-exact functional form from a time series. The GA formalism proposed here utilizes (i) the "postfix" representation with a view to reduce the procedural complexities and (ii) the "elitist mating" scheme to produce fitter offspring strings. The GA procedure is exemplified by considering chaotic time series of the well-known logistic, Henon and universal maps. The GA correctly recovers the underlying functional forms for the respective time series. Measurements from a number of finite-dimensional physical, biological, and other systems often give rise to complex time series and the presented methodology should prove useful in obtaining functional forms describing accurately the evolution of the time series. (c) 1999 American Institute of Physics.

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA