Búsqueda | Biblioteca Virtual en Salud Fronteriza

1.

StarPep Toolbox: an open-source software to assist chemical space analysis of bioactive peptides and their functions using complex networks.

Aguilera-Mendoza, Longendri; Ayala-Ruano, Sebastián; Martinez-Rios, Felix; Chavez, Edgar; García-Jacas, César R; Brizuela, Carlos A; Marrero-Ponce, Yovani.

Bioinformatics ; 39(8)2023 08 01.

Artículo en Inglés | MEDLINE | ID: mdl-37603724

RESUMEN

MOTIVATION: Antimicrobial peptides (AMPs) are promising molecules to treat infectious diseases caused by multi-drug resistance pathogens, some types of cancer, and other conditions. Computer-aided strategies are efficient tools for the high-throughput screening of AMPs. RESULTS: This report highlights StarPep Toolbox, an open-source and user-friendly software to study the bioactive chemical space of AMPs using complex network-based representations, clustering, and similarity-searching models. The novelty of this research lies in the combination of network science and similarity-searching techniques, distinguishing it from conventional methods based on machine learning and other computational approaches. The network-based representation of the AMP chemical space presents promising opportunities for peptide drug repurposing, development, and optimization. This approach could serve as a baseline for the discovery of a new generation of therapeutics peptides. AVAILABILITY AND IMPLEMENTATION: All underlying code and installation files are accessible through GitHub (https://github.com/Grupo-Medicina-Molecular-y-Traslacional/StarPep) under the Apache 2.0 license.

Asunto(s)

Péptidos , Programas Informáticos , Análisis por Conglomerados , Reposicionamiento de Medicamentos , Ensayos Analíticos de Alto Rendimiento

2.

Multiquery Similarity Searching Models: An Alternative Approach for Predicting Hemolytic Activity from Peptide Sequence.

Castillo-Mendieta, Kevin; Agüero-Chapin, Guillermin; Marquez, Edgar; Perez-Castillo, Yunierkis; Barigye, Stephen J; Pérez-Cárdenas, Mariela; Peréz-Giménez, Facundo; Marrero-Ponce, Yovani.

Chem Res Toxicol ; 37(4): 580-589, 2024 Apr 15.

Artículo en Inglés | MEDLINE | ID: mdl-38501392

RESUMEN

The desirable pharmacological properties and a broad number of therapeutic activities have made peptides promising drugs over small organic molecules and antibody drugs. Nevertheless, toxic effects, such as hemolysis, have hampered the development of such promising drugs. Hence, a reliable computational tool to predict peptide hemolytic toxicity is enormously useful before synthesis and experimental evaluation. Currently, four web servers that predict hemolytic activity using machine learning (ML) algorithms are available; however, they exhibit some limitations, such as the need for a reliable negative set and limited application domain. Hence, we developed a robust model based on a novel theoretical approach that combines network science and a multiquery similarity searching (MQSS) method. A total of 1152 initial models were constructed from 144 scaffolds generated in a previous report. These were evaluated on external data sets, and the best models were fused and improved. Our best MQSS model I1 outperformed all state-of-the-art ML-based models and was used to characterize the prevalence of hemolytic toxicity on therapeutic peptides. Based on our model's estimation, the number of hemolytic peptides might be 3.9-fold higher than the reported.

Asunto(s)

Hemólisis , Péptidos , Humanos , Secuencia de Aminoácidos , Péptidos/farmacología , Péptidos/química , Algoritmos , Aprendizaje Automático

3.

Rethinking the applicability domain analysis in QSAR models.

Mora, Jose R; Marquez, Edgar A; Pérez-Pérez, Noel; Contreras-Torres, Ernesto; Perez-Castillo, Yunierkis; Agüero-Chapin, Guillermin; Martinez-Rios, Felix; Marrero-Ponce, Yovani; Barigye, Stephen J.

J Comput Aided Mol Des ; 38(1): 9, 2024 Feb 14.

Artículo en Inglés | MEDLINE | ID: mdl-38351144

RESUMEN

Notwithstanding the wide adoption of the OECD principles (or best practices) for QSAR modeling, disparities between in silico predictions and experimental results are frequent, suggesting that model predictions are often too optimistic. Of these OECD principles, the applicability domain (AD) estimation has been recognized in several reports in the literature to be one of the most challenging, implying that the actual reliability measures of model predictions are often unreliable. Applying tree-based error analysis workflows on 5 QSAR models reported in the literature and available in the QsarDB repository, i.e., androgen receptor bioactivity (agonists, antagonists, and binders, respectively) and membrane permeability (highest membrane permeability and the intrinsic permeability), we demonstrate that predictions erroneously tagged as reliable (AD prediction errors) overwhelmingly correspond to instances in subspaces (cohorts) with the highest prediction error rates, highlighting the inhomogeneity of the AD space. In this sense, we call for more stringent AD analysis guidelines which require the incorporation of model error analysis schemes, to provide critical insight on the reliability of underlying AD algorithms. Additionally, any selected AD method should be rigorously validated to demonstrate its suitability for the model space over which it is applied. These steps will ultimately contribute to more accurate estimations of the reliability of model predictions. Finally, error analysis may also be useful in "rational" model refinement in that data expansion efforts and model retraining are focused on cohorts with the highest error rates.

Asunto(s)

Algoritmos , Relación Estructura-Actividad Cuantitativa , Reproducibilidad de los Resultados

4.

Machine learning approach to discovery of small molecules with potential inhibitory action against vasoactive metalloproteases.

Cañizares-Carmenate, Yudith; Mena-Ulecia, Karel; MacLeod Carey, Desmond; Perera-Sardiña, Yunier; Hernández-Rodríguez, Erix W; Marrero-Ponce, Yovani; Torrens, Francisco; Castillo-Garit, Juan A.

Mol Divers ; 26(3): 1383-1397, 2022 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-34216326

RESUMEN

With the advancement of combinatorial chemistry and big data, drug repositioning has boomed. In this sense, machine learning and artificial intelligence techniques offer a priori information to identify the most promising candidates. In this study, we combine QSAR and docking methodologies to identify compounds with potential inhibitory activity of vasoactive metalloproteases for the treatment of cardiovascular diseases. To develop this study, we used a database of 191 thermolysin inhibitor compounds, which is the largest as far as we know. First, we use Dragon's molecular descriptors (0-3D) to develop classification models using Bayesian networks (Naive Bayes) and artificial neural networks (Multilayer Perceptron). The obtained models are used for virtual screening of small molecules in the international DrugBank database. Second, docking experiments are carried out for all three enzymes using the Autodock Vina program, to identify possible interactions with the active site of human metalloproteases. As a result, high-performance artificial intelligence QSAR models are obtained for training and prediction sets. These allowed the identification of 18 compounds with potential inhibitory activity and an adequate oral bioavailability profile, which were evaluated using docking. Four of them showed high binding energies for the three enzymes, and we propose them as potential dual ACE/NEP inhibitors for the control of blood pressure. In summary, the in silico strategies used here constitute an important tool for the early identification of new antihypertensive drug candidates, with substantial savings in time and money.

Asunto(s)

Inteligencia Artificial , Aprendizaje Automático , Teorema de Bayes , Reposicionamiento de Medicamentos , Humanos , Metaloproteasas , Simulación del Acoplamiento Molecular , Relación Estructura-Actividad Cuantitativa

5.

Smoothed Spherical Truncation based on Fuzzy Membership Functions: Application to the Molecular Encoding.

García-Jacas, César R; Marrero-Ponce, Yovani; Brizuela, Carlos A; Suárez-Lezcano, José; Martinez-Rios, Felix.

J Comput Chem ; 41(3): 203-217, 2020 01 30.

Artículo en Inglés | MEDLINE | ID: mdl-31647589

RESUMEN

A novel spherical truncation method, based on fuzzy membership functions, is introduced to truncate interatomic (or interaminoacid) relations according to smoothing values computed from fuzzy membership degrees. In this method, the molecules are circumscribed into a sphere, so that the geometric centers of the molecules are the centers of the spheres. The fuzzy membership degree of each atom (or aminoacid) is computed from its distance with respect to the geometric center of the molecule, by using a fuzzy membership function. So, the smoothing value to be applied in the truncation of a relation (or interaction) is computed by averaging the fuzzy membership degrees of the atoms (or aminoacids) involved in the relation. This truncation method is rather different from the existing ones, at considering the geometric center for the whole molecule and not only for atom-groups, as well as for using fuzzy membership functions to compute the smoothing values. A variability study on a set comprised of 20,469 compounds (15,050 drug-like compounds, 2994 drugs approved, 880 natural products from African sources, and 1545 plant-derived natural compounds exhibiting anti-cancerous activity) demonstrated that the truncation method proposed allows to determine molecular encodings with better ability for discriminating among structurally different molecules than the encodings obtained without applying truncation or applying non-fuzzy truncation functions. Moreover, a principal component analysis revealed that orthogonal chemical information of the molecules is encoded by using the method proposed. Lastly, a modeling study proved that the truncation method improves the modeling ability of existing geometric molecular descriptors, at allowing to develop more robust models than the ones built only using non-truncated descriptors. In this sense, a comparison and statistical assessment were performed on eight chemical datasets. As a result, the models based on the truncated molecular encodings yielded statistically better results than 12 procedures considered from the literature. It can thus be stated that the proposed truncation method is a relevant strategy for obtaining better molecular encodings, which will be ultimately useful in enhancing the modeling ability of existing encodings both on small-to-medium size molecules and biomacromolecules. © 2019 Wiley Periodicals, Inc.

6.

Distributed and multicore QuBiLS-MIDAS software v2.0: Computing chiral, fuzzy, weighted and truncated geometrical molecular descriptors based on tensor algebra.

García-Jacas, César R; Marrero-Ponce, Yovani; Vivas-Reyes, Ricardo; Suárez-Lezcano, José; Martinez-Rios, Felix; Terán, Julio E; Aguilera-Mendoza, Longendri.

J Comput Chem ; 41(12): 1209-1227, 2020 05 05.

Artículo en Inglés | MEDLINE | ID: mdl-32058625

RESUMEN

Advances to the distributed, multi-core and fully cross-platform QuBiLS-MIDAS software v2.0 (http://tomocomd.com/qubils-midas) are reported in this article since the v1.0 release. The QuBiLS-MIDAS software is the only one that computes atom-pair and alignment-free geometrical MDs (3D-MDs) from several distance metrics other than the Euclidean distance, as well as alignment-free 3D-MDs that codify structural information regarding the relations among three and four atoms of a molecule. The most recent features added to the QuBiLS-MIDAS software v2.0 are related (a) to the calculation of atomic weightings from indices based on the vertex-degree invariant (e.g., Alikhanidi index); (b) to consider central chirality during the molecular encoding; (c) to use measures based on clustering methods and statistical functions to codify structural information among more than two atoms; (d) to the use of a novel method based on fuzzy membership functions to spherically truncate inter-atomic relations; and (e) to the use of weighted and fuzzy aggregation operators to compute global 3D-MDs according to the importance and/or interrelation of the atoms of a molecule during the molecular encoding. Moreover, a novel module to compute QuBiLS-MIDAS 3D-MDs from their headings was also developed. This module can be used either by the graphical user interface or by means of the software library. By using the library, both the predictive models built with the QuBiLS-MIDAS 3D-MDs and the QuBiLS-MIDAS 3D-MDs calculation can be embedded in other tools. A set of predefined QuBiLS-MIDAS 3D-MDs with high information content and low redundancy on a set comprised of 20,469 compounds is also provided to be employed in further cheminformatics tasks. This set of predefined 3D-MDs evidenced better performance than all the universe of Dragon (v5.5) and PaDEL 0D-to-3D MDs in variability studies, whereas a linear independence study proved that these QuBiLS-MIDAS 3D-MDs codify chemical information orthogonal to the Dragon 0D-to-3D MDs. This set of predefined 3D-MDs would be periodically updated as long as new results be achieved. In general, this report highlights our continued efforts to provide a better tool for a most suitable characterization of compounds, and in this way, to contribute to obtaining better outcomes in future applications.

7.

Graph-based data integration from bioactive peptide databases of pharmaceutical interest: toward an organized collection enabling visual network analysis.

Aguilera-Mendoza, Longendri; Marrero-Ponce, Yovani; Beltran, Jesus A; Tellez Ibarra, Roberto; Guillen-Ramirez, Hugo A; Brizuela, Carlos A.

Bioinformatics ; 35(22): 4739-4747, 2019 11 01.

Artículo en Inglés | MEDLINE | ID: mdl-30994884

RESUMEN

MOTIVATION: Bioactive peptides have gained great attention in the academy and pharmaceutical industry since they play an important role in human health. However, the increasing number of bioactive peptide databases is causing the problem of data redundancy and duplicated efforts. Even worse is the fact that the available data is non-standardized and often dirty with data entry errors. Therefore, there is a need for a unified view that enables a more comprehensive analysis of the information on this topic residing at different sites. RESULTS: After collecting web pages from a large variety of bioactive peptide databases, we organized the web content into an integrated graph database (starPepDB) that holds a total of 71 310 nodes and 348 505 relationships. In this graph structure, there are 45 120 nodes representing peptides, and the rest of the nodes are connected to peptides for describing metadata. Additionally, to facilitate a better understanding of the integrated data, a software tool (starPep toolbox) has been developed for supporting visual network analysis in a user-friendly way; providing several functionalities such as peptide retrieval and filtering, network construction and visualization, interactive exploration and exporting data options. AVAILABILITY AND IMPLEMENTATION: Both starPepDB and starPep toolbox are freely available at http://mobiosd-hub.com/starpep/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Bases de Datos Factuales , Programas Informáticos , Humanos , Metadatos , Péptidos , Preparaciones Farmacéuticas

8.

Ensemble Models Based on QuBiLS-MAS Features and Shallow Learning for the Prediction of Drug-Induced Liver Toxicity: Improving Deep Learning and Traditional Approaches.

Mora, Jose R; Marrero-Ponce, Yovani; García-Jacas, César R; Suarez Causado, Amileth.

Chem Res Toxicol ; 33(7): 1855-1873, 2020 07 20.

Artículo en Inglés | MEDLINE | ID: mdl-32406679

RESUMEN

Drug-induced liver injury (DILI) is a key safety issue in the drug discovery pipeline and a regulatory concern. Thus, many in silico tools have been proposed to improve the hepatotoxicity prediction of organic-type chemicals. Here, classifiers for the prediction of DILI were developed by using QuBiLS-MAS 0-2.5D molecular descriptors and shallow machine learning techniques, on a training set composed of 1075 molecules. The best ensemble model build, E13, was obtained with good statistical parameters for the learning series, namely, the following: accuracy = 0.840, sensibility = 0.890, specificity = 0.761, Matthew's correlation coefficient = 0.660, and area under the ROC curve = 0.904. The model was also satisfactorily evaluated with Y-scrambling test, and repeated k-fold cross-validation and repeated k-holdout validation. In addition, an exhaustive external validation was also carried out by using two test sets and five external test sets, with an average accuracy value equal to 0.854 (±0.062) and a coverage equal to 98.4% according to its applicability domain. A statistical comparison of the performance of the E13 model, with regard to results and tools (e.g., Padel DDPredictor Software, Deep Learning DILIserver, and Vslead) reported in the literature, was also performed. In general, E13 presented the best global performance in all experiments. The sum of the ranking differences procedure provided a very similar grouping pattern to that of the M-ANOVA statistical analysis, where E13 was identified as the best model for DILI predictions. A noncommercial and fully cross-platform software for the DILI prediction was also developed, which is freely available at http://tomocomd.com/apps/ptoxra. This software was used for the screening of seven data sets, containing natural products, leads, toxic materials, and FDA approved drugs, to assess the usefulness of the QSAR models in the DILI labeling of organic substances; it was found that 50-92% of the evaluated molecules are positive-DILI compounds. All in all, it can be stated that the E13 model is a relevant method for the prediction of DILI risk in humans, as it shows the best results among all of the methods analyzed.

Asunto(s)

Enfermedad Hepática Inducida por Sustancias y Drogas , Modelos Biológicos , Descubrimiento de Drogas , Aprendizaje Automático , Relación Estructura-Actividad Cuantitativa , Programas Informáticos

9.

LEGO-based generalized set of two linear algebraic 3D bio-macro-molecular descriptors: Theory and validation by QSARs.

Marrero-Ponce, Yovani; Teran, Julio E; Contreras-Torres, Ernesto; García-Jacas, César R; Perez-Castillo, Yunierkis; Cubillan, Nestor; Peréz-Giménez, Facundo; Valdés-Martini, José R.

J Theor Biol ; 485: 110039, 2020 01 21.

Artículo en Inglés | MEDLINE | ID: mdl-31589877

RESUMEN

Novel 3D protein descriptors based on bilinear, quadratic and linear algebraic maps in Rn are proposed. The latter employs the kth 2-tuple (dis) similarity matrix to codify information related to covalent and non-covalent interactions in these biopolymers. The calculation of the inter-amino acid distances is generalized by using several dis-similarity coefficients, where normalization procedures based on the simple stochastic and mutual probability schemes are applied. A new local-fragment approach based on amino acid-types and amino acid-groups is proposed to characterize regions of interest in proteins. Topological and geometric macromolecular cutoffs are defined using local and total indices to highlight non-covalent interactions existing between the side-chains of each amino acid. Moreover, local and total indices calculations are generalized considering a LEGO approach, by using several aggregation operators. Collinearity and variability analyses are performed to evaluate every generalizing component applied to the definition of these novel indices. These experiments are oriented to reduce the number of MDs obtained for performing prediction models. The predictive power of the proposed indices was evaluated using two benchmark datasets, folding rate and secondary structural classification of proteins. The proposed MDs are modeled using the following strategies: Multiple Linear Regression (MLR) and Support Vector Machine (SVM), respectively. The best regression model developed for the folding rate of proteins yields a cross-validation coefficient of 0.875 (Test Set) and the best model developed for secondary structural classification obtained 98% of instances correctly classified (Test Set). These statistical parameters are superior to the ones obtained with existing MDs reported in the literature. Overall, the new theoretical generalization enhanced the information extraction into the MDs, allowing a better correlation between these two evaluated benchmark datasets and the proposed indices. The optimal theoretical configurations defined for the calculation of these MDs consider low collinearity and less information redundancy among them. These theoretical configurations and the software are available at http://tomocomd.com/mulims-mcompas.

Asunto(s)

Proteínas , Relación Estructura-Actividad Cuantitativa , Programas Informáticos , Aminoácidos , Modelos Lineales

10.

MuLiMs-MCoMPAs: A Novel Multiplatform Framework to Compute Tensor Algebra-Based Three-Dimensional Protein Descriptors.

Contreras-Torres, Ernesto; Marrero-Ponce, Yovani; Terán, Julio E; García-Jacas, César R; Brizuela, Carlos A; Sánchez-Rodríguez, Juan Carlos.

J Chem Inf Model ; 60(2): 1042-1059, 2020 02 24.

Artículo en Inglés | MEDLINE | ID: mdl-31663741

RESUMEN

This report introduces the MuLiMs-MCoMPAs software (acronym for Multi-Linear Maps based on N-Metric and Contact Matrices of 3D Protein and Amino-acid weightings), designed to compute tensor-based 3D protein structural descriptors by applying two- and three-linear algebraic forms. Moreover, these descriptors contemplate generalizing components such as novel 3D protein structural representations, (dis)similarity metrics, and multimetrics to extract geometrical related information between two and three amino acids, weighting schemes based on amino acid properties, matrix normalization procedures that consider simple-stochastic and mutual probability transformations, topological and geometrical cutoffs, amino acid, and group-based MD calculations, and aggregation operators for merging amino acidic and group MDs. The MuLiMs-MCoMPAs software, which belongs to the ToMoCoMD-CAMPS suite, was developed in Java (version 1.8) using the Chemistry Development Kit (CDK) (version 1.4.19) and the Jmol libraries. This software implemented a divide-and-conquer strategy to parallelize the computation of the indices as well as modules for data preprocessing and batch computing functionalities. Furthermore, it consists of two components: (i) a desktop-graphical user interface (GUI) and (ii) an API library. The relevance of this novel approach is demonstrated through two analyses that considered Shannon's entropy-based variability and a principal component analysis. These studies showed that the MuLiMs-MCoMPAs' three-linear descriptor family contains higher informational entropy than several other descriptors generated with available computation tools. Moreover, the MuLiMs-MCoMPAs indices capture additional orthogonal information to the one codified by the available calculation approaches. As a result, two sets of suggested theoretical configurations that contain 13648 two-linear indices and 20263 three-linear indices are available for download at tomocomd.com . Furthermore, as a demonstration of the applicability and easy integration of the MuLiMs library into a QSAR-based expert system, a software application (ProStAF) was generated to predict SCOP protein structural classes and folding rate. It can thus be anticipated that the MuLiMs-MCoMPAs framework will turn into a valuable contribution to the chem- and bioinformatics research fields.

Asunto(s)

Simulación por Computador , Proteínas/química , Programas Informáticos , Diseño de Fármacos , Modelos Moleculares , Conformación Proteica , Proteínas/metabolismo

11.

When global and local molecular descriptors are more than the sum of its parts: Simple, But Not Simpler?

Martínez-López, Yoan; Marrero-Ponce, Yovani; Barigye, Stephen J; Teran, Enrique; Martínez-Santiago, Oscar; Zambrano, Cesar H; Torres, F Javier.

Mol Divers ; 24(4): 913-932, 2020 Nov.

Artículo en Inglés | MEDLINE | ID: mdl-31659696

RESUMEN

In this report, we introduce a set of aggregation operators (AOs) to calculate global and local (group and atom type) molecular descriptors (MDs) as a generalization of the classical approach of molecular encoding using the sum of the atomic (or fragment) contributions. These AOs are implemented in a new and free software denominated MD-LOVIs ( http://tomocomd.com/md-lovis ), which allows for the calculation of MDs from atomic weights vector and LOVIs (local vertex invariants). This software was developed in Java programming language and employed the Chemical Development Kit (CDK) library for handling chemical structures and the calculation of atomic weights. An analysis of the complexities of the algorithms presented herein demonstrates that these aspects were efficiently implemented. The calculation speed experiments show that the MD-LOVIs software has satisfactory behavior when compared to software such as Padel, CDKDescriptor, DRAGON and Bluecal software. Shannon's entropy (SE)-based variability studies demonstrate that MD-LOVIs yields indices with greater information content when compared to those of popular academic and commercial software. A principal component analysis reveals that our approach captures chemical information orthogonal to that codified by the DRAGON, Padel and Mold2 software, as a result of the several generalizations in MD-LOVIs not used in other programs. Lastly, three QSARs were built using multiple linear regression with genetic algorithms, and the statistical parameters of these models demonstrate that the MD-LOVIs indices obtained with AOs yield better performance than those obtained when the summation operator is used exclusively. Moreover, it is also revealed that the MD-LOVIs indices yield models with comparable to superior performance when compared to other QSAR methodologies reported in the literature, despite their simplicity. The studies performed herein collectively demonstrated that MD-LOVIs software generates indices as simple as possible, but not simpler and that use of AOs enhances the diversity of the chemical information codified, which consequently improves the performance of traditional MDs.

Asunto(s)

Modelos Químicos , Bibliotecas de Moléculas Pequeñas/química , Algoritmos , Modelos Lineales , Análisis Multivariante , Relación Estructura-Actividad Cuantitativa , Programas Informáticos

12.

Enhancing Acute Oral Toxicity Predictions by using Consensus Modeling and Algebraic Form-Based 0D-to-2D Molecular Encodes.

García-Jacas, César R; Marrero-Ponce, Yovani; Cortés-Guzmán, Fernando; Suárez-Lezcano, José; Martinez-Rios, Felix O; García-González, Luis A; Pupo-Meriño, Mario; Martinez-Mayorga, Karina.

Chem Res Toxicol ; 32(6): 1178-1192, 2019 06 17.

Artículo en Inglés | MEDLINE | ID: mdl-31066547

RESUMEN

Quantitative structure-activity relationships (QSAR) are introduced to predict acute oral toxicity (AOT), by using the QuBiLS-MAS (acronym for quadratic, bilinear and N-Linear maps based on graph-theoretic electronic-density matrices and atomic weightings) framework for the molecular encoding. Three training sets were employed to build the models: EPA training set (5931 compounds), EPA-full training set (7413 compounds), and Zhu training set (10â¯152 compounds). Additionally, the EPA test set (1482 compounds) was used for the validation of the QSAR models built on the EPA training set, while the ProTox (425 compounds) and T3DB (284 compounds) external sets were employed for the assessment of all the models. The k-nearest neighbor, multilayer perceptron, random forest, and support vector machine procedures were employed to build several base (individual) models. The base models with REPA-training ≥ 0.75 ( R = correlation coefficient) and MAEEPA-training ≤ 0.5 (MAE = mean absolute error) were retained to build consensus models. As a result, two consensus models based on the minimum operator and denoted as M19 and M22, as well as a consensus model based on the weighted average operator and denoted as M24, were selected as the best ones for each training set considered. According to the applicability domain (AD) analysis performed, model M19 (built on the EPA training set) has MAEtest-AD = 0.4044, MAEProTox-AD = 0.4067 and MAET3DB-AD = 0.2586 on the EPA test set, ProTox external set, and T3DB external set, respectively; whereas model M22 (built on the EPA-full set) and model M24 (built on the Zhu set) present MAEProTox-AD = 0.3992 and MAET3DB-AD = 0.2286, and MAEProTox-AD = 0.3773 and MAET3DB-AD = 0.2471 on the two external sets accounted for, respectively. These outcomes were compared and statistically validated with respect to 14 QSAR methods (e.g., admetSAR, ProTox-II) from the literature. As a result, model M22 presents the best overall performance. In addition, a retrospective study on 261 withdrawn drugs due to their toxic/side effects was performed, to assess the usefulness of prospectively using the QSAR models proposed in the labeling of chemicals. A comparison with regard to the methods from the literature was also made. As a result, model M22 has the best ability of labeling a compound as toxic according to the globally harmonized system of classification and labeling of chemicals. Therefore, it can be concluded that the models proposed, especially model M22, constitute prominent tools for studying AOT, at providing the best results among all the methods examined. A freely available software was also developed to be used in virtual screening tasks ( http://tomocomd.com/apps/ptoxra ).

Asunto(s)

Análisis por Conglomerados , Máquina de Vectores de Soporte , Pruebas de Toxicidad Aguda , Administración Oral , Animales , Humanos , Relación Estructura-Actividad Cuantitativa

13.

Novel "extended sequons" of human N-glycosylation sites improve the precision of qualitative predictions: an alignment-free study of pattern recognition using ProtDCal protein features.

Ruiz-Blanco, Yasser B; Marrero-Ponce, Yovani; García-Hernández, Enrique; Green, James.

Amino Acids ; 49(2): 317-325, 2017 Feb.

Artículo en Inglés | MEDLINE | ID: mdl-27896447

RESUMEN

N-Glycosylation is a common post-translational modification that plays an important role in the proper folding and function of many proteins. This modification is largely dependent on the presence of a sequence motif called a "sequon" defined as Asn-Xxx-Ser/Thr. However, evidence has shown that the presence of such a "sequon" is insufficient to determine the occurrence of N-glycosylation with high precision. This study aims to elucidate patterns that can more accurately predict N-glycosylation sites in human proteins. The novel motifs are evaluated using benchmarking data from 188 organisms. Performance is largely sustained compared to the human data, which validates the robustness of the novel extracted "extended sequons". We, therefore, introduce new knowledge about sequence-related factors that control N-glycosylation.

Asunto(s)

Algoritmos , Proteínas/metabolismo , Bases de Datos de Proteínas , Glicosilación , Humanos , Procesamiento Proteico-Postraduccional , Proteínas/química , Programas Informáticos

14.

Overlap and diversity in antimicrobial peptide databases: compiling a non-redundant set of sequences.

Aguilera-Mendoza, Longendri; Marrero-Ponce, Yovani; Tellez-Ibarra, Roberto; Llorente-Quesada, Monica T; Salgado, Jesús; Barigye, Stephen J; Liu, Jun.

Bioinformatics ; 31(15): 2553-9, 2015 Aug 01.

Artículo en Inglés | MEDLINE | ID: mdl-25819673

RESUMEN

MOTIVATION: The large variety of antimicrobial peptide (AMP) databases developed to date are characterized by a substantial overlap of data and similarity of sequences. Our goals are to analyze the levels of redundancy for all available AMP databases and use this information to build a new non-redundant sequence database. For this purpose, a new software tool is introduced. RESULTS: A comparative study of 25 AMP databases reveals the overlap and diversity among them and the internal diversity within each database. The overlap analysis shows that only one database (Peptaibol) contains exclusive data, not present in any other, whereas all sequences in the LAMP_Patent database are included in CAMP_Patent. However, the majority of databases have their own set of unique sequences, as well as some overlap with other databases. The complete set of non-duplicate sequences comprises 16 990 cases, which is almost half of the total number of reported peptides. On the other hand, the diversity analysis identifies the most and least diverse databases and proves that all databases exhibit some level of redundancy. Finally, we present a new parallel-free software, named Dover Analyzer, developed to compute the overlap and diversity between any number of databases and compile a set of non-redundant sequences. These results are useful for selecting or building a suitable representative set of AMPs, according to specific needs.

Asunto(s)

Péptidos Catiónicos Antimicrobianos/química , Bases de Datos de Ácidos Nucleicos , Bases de Datos de Proteínas , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Algoritmos , Humanos

15.

A new type of quinoxalinone derivatives affects viability, invasion, and intracellular growth of Toxoplasma gondii tachyzoites in vitro.

Rivera Fernández, Norma; Mondragón Castelán, Mónica; González Pozos, Sirenia; Ramírez Flores, Carlos J; Mondragón González, Ricardo; Gómez de León, Carmen T; Castro Elizalde, Kitzia N; Marrero Ponce, Yovani; Arán, Vicente J; Martins Alho, Miriam A; Mondragón Flores, Ricardo.

Parasitol Res ; 115(5): 2081-96, 2016 May.

Artículo en Inglés | MEDLINE | ID: mdl-26888289

RESUMEN

Quinoxalinone derivatives, identified as VAM2 compounds (7-nitroquinoxalin-2-ones), were evaluated against Toxoplasma gondii tachyzoites of the RH strain. The VAM2 compounds were previously synthesized based on the design obtained from an in silico prediction with the software TOMOCOMD-CARDD. From the ten VAM2 drugs tested, several showed a deleterious effect on tachyzoites. However, VAM2-2 showed the highest toxoplasmicidal activity generating a remarkable decrease in tachyzoite viability (in about 91 %) and a minimal alteration in the host cell. An evident inhibition of host cell invasion by tachyzoites previously treated with VAM2-2 was observed in a dose-dependent manner. In addition, remarkable alterations were observed in the pellicle parasite, such as swelling, roughness, and blebbing. Toxoplasma motility was inhibited, and subpellicular cytoskeleton integrity was altered, inducing a release of its components to the soluble fraction. VAM2-2 showed a clear and specific deleterious effect on tachyzoites viability, structural integrity, and invasive capabilities with limited effects in host cells morphology and viability. VAM2-2 minimum inhibitory concentration (MIC50) was determined as 3.3 µM ± 1.8. Effects of quinoxalinone derivatives on T. gondii provide the basis for a future therapeutical alternative in the treatment of toxoplasmosis.

Asunto(s)

Quinoxalinas/farmacología , Toxoplasma/efectos de los fármacos , Animales , Línea Celular Tumoral , Citoesqueleto , Humanos , Ratones , Ratones Endogámicos BALB C , Toxoplasma/fisiología , Toxoplasma/ultraestructura , Toxoplasmosis/parasitología

16.

Physico-Chemical and Structural Interpretation of Discrete Derivative Indices on N-Tuples Atoms.

Martínez-Santiago, Oscar; Marrero-Ponce, Yovani; Barigye, Stephen J; Le Thi Thu, Huong; Torres, F Javier; Zambrano, Cesar H; Muñiz Olite, Jorge L; Cruz-Monteagudo, Maykel; Vivas-Reyes, Ricardo; Vázquez Infante, Liliana; Artiles Martínez, Luis M.

Int J Mol Sci ; 17(6)2016 May 27.

Artículo en Inglés | MEDLINE | ID: mdl-27240357

RESUMEN

This report examines the interpretation of the Graph Derivative Indices (GDIs) from three different perspectives (i.e., in structural, steric and electronic terms). It is found that the individual vertex frequencies may be expressed in terms of the geometrical and electronic reactivity of the atoms and bonds, respectively. On the other hand, it is demonstrated that the GDIs are sensitive to progressive structural modifications in terms of: size, ramifications, electronic richness, conjugation effects and molecular symmetry. Moreover, it is observed that the GDIs quantify the interaction capacity among molecules and codify information on the activation entropy. A structure property relationship study reveals that there exists a direct correspondence between the individual frequencies of atoms and Hückel's Free Valence, as well as between the atomic GDIs and the chemical shift in NMR, which collectively validates the theory that these indices codify steric and electronic information of the atoms in a molecule. Taking in consideration the regularity and coherence found in experiments performed with the GDIs, it is possible to say that GDIs possess plausible interpretation in structural and physicochemical terms.

Asunto(s)

Preparaciones Farmacéuticas/química , Algoritmos , Gráficos por Computador , Diseño de Fármacos , Entropía

17.

ProtDCal: A program to compute general-purpose-numerical descriptors for sequences and 3D-structures of proteins.

Ruiz-Blanco, Yasser B; Paz, Waldo; Green, James; Marrero-Ponce, Yovani.

BMC Bioinformatics ; 16: 162, 2015 May 16.

Artículo en Inglés | MEDLINE | ID: mdl-25982853

RESUMEN

BACKGROUND: The exponential growth of protein structural and sequence databases is enabling multifaceted approaches to understanding the long sought sequence-structure-function relationship. Advances in computation now make it possible to apply well-established data mining and pattern recognition techniques to these data to learn models that effectively relate structure and function. However, extracting meaningful numerical descriptors of protein sequence and structure is a key issue that requires an efficient and widely available solution. RESULTS: We here introduce ProtDCal, a new computational software suite capable of generating tens of thousands of features considering both sequence-based and 3D-structural descriptors. We demonstrate, by means of principle component analysis and Shannon entropy tests, how ProtDCal's sequence-based descriptors provide new and more relevant information not encoded by currently available servers for sequence-based protein feature generation. The wide diversity of the 3D-structure-based features generated by ProtDCal is shown to provide additional complementary information and effectively completes its general protein encoding capability. As demonstration of the utility of ProtDCal's features, prediction models of N-linked glycosylation sites are trained and evaluated. Classification performance compares favourably with that of contemporary predictors of N-linked glycosylation sites, in spite of not using domain-specific features as input information. CONCLUSIONS: ProtDCal provides a friendly and cross-platform graphical user interface, developed in the Java programming language and is freely available at: http://bioinf.sce.carleton.ca/ProtDCal/ . ProtDCal introduces local and group-based encoding which enhances the diversity of the information captured by the computed features. Furthermore, we have shown that adding structure-based descriptors contributes non-redundant additional information to the features-based characterization of polypeptide systems. This software is intended to provide a useful tool for general-purpose encoding of protein sequences and structures for applications is protein classification, similarity analyses and function prediction.

Asunto(s)

Procesamiento Proteico-Postraduccional , Estructura Secundaria de Proteína , Proteínas/química , Proteínas/clasificación , Programas Informáticos , Glicosilación , Humanos , Análisis de Componente Principal

18.

Novel 3D bio-macromolecular bilinear descriptors for protein science: Predicting protein structural classes.

Marrero-Ponce, Yovani; Contreras-Torres, Ernesto; García-Jacas, César R; Barigye, Stephen J; Cubillán, Néstor; Alvarado, Ysaías J.

J Theor Biol ; 374: 125-37, 2015 Jun 07.

Artículo en Inglés | MEDLINE | ID: mdl-25843214

RESUMEN

In the present study, we introduce novel 3D protein descriptors based on the bilinear algebraic form in the â(n) space on the coulombic matrix. For the calculation of these descriptors, macromolecular vectors belonging to â(n) space, whose components represent certain amino acid side-chain properties, were used as weighting schemes. Generalization approaches for the calculation of inter-amino acidic residue spatial distances based on Minkowski metrics are proposed. The simple- and double-stochastic schemes were defined as approaches to normalize the coulombic matrix. The local-fragment indices for both amino acid-types and amino acid-groups are presented in order to permit characterizing fragments of interest in proteins. On the other hand, with the objective of taking into account specific interactions among amino acids in global or local indices, geometric and topological cut-offs are defined. To assess the utility of global and local indices a classification model for the prediction of the major four protein structural classes, was built with the Linear Discriminant Analysis (LDA) technique. The developed LDA-model correctly classifies the 92.6% and 92.7% of the proteins on the training and test sets, respectively. The obtained model showed high values of the generalized square correlation coefficient (GC(2)) on both the training and test series. The statistical parameters derived from the internal and external validation procedures demonstrate the robustness, stability and the high predictive power of the proposed model. The performance of the LDA-model demonstrates the capability of the proposed indices not only to codify relevant biochemical information related to the structural classes of proteins, but also to yield suitable interpretability. It is anticipated that the current method will benefit the prediction of other protein attributes or functions.

Asunto(s)

Biología Computacional/métodos , Sustancias Macromoleculares/química , Conformación Proteica , Proteínas/química , Algoritmos , Aminoácidos/química , Simulación por Computador , Modelos Lineales , Modelos Biológicos , Modelos Moleculares , Relación Estructura-Actividad Cuantitativa , Reproducibilidad de los Resultados , Procesos Estocásticos

19.

A Hooke×³s law-based approach to protein folding rate.

Ruiz-Blanco, Yasser B; Marrero-Ponce, Yovani; Prieto, Pablo J; Salgado, Jesús; García, Yamila; Sotomayor-Torres, Clivia M.

J Theor Biol ; 364: 407-17, 2015 Jan 07.

Artículo en Inglés | MEDLINE | ID: mdl-25245368

RESUMEN

Kinetics is a key aspect of the renowned protein folding problem. Here, we propose a comprehensive approach to folding kinetics where a polypeptide chain is assumed to behave as an elastic material described by the Hooke×³s law. A novel parameter called elastic-folding constant results from our model and is suggested to distinguish between protein with two-state and multi-state folding pathways. A contact-free descriptor, named folding degree, is introduced as a suitable structural feature to study protein-folding kinetics. This approach generalizes the observed correlations between varieties of structural descriptors with the folding rate constant. Additionally several comparisons among structural classes and folding mechanisms were carried out showing the good performance of our model with proteins of different types. The present model constitutes a simple rationale for the structural and energetic factors involved in protein folding kinetics.

Asunto(s)

ADN/química , Pliegue de Proteína , Proteínas/química , Simulación por Computador , Cinética , Modelos Químicos , Estructura Secundaria de Proteína , Termodinámica

20.

IMMAN: free software for information theory-based chemometric analysis.

Urias, Ricardo W Pino; Barigye, Stephen J; Marrero-Ponce, Yovani; García-Jacas, César R; Valdes-Martiní, José R; Perez-Gimenez, Facundo.

Mol Divers ; 19(2): 305-19, 2015 May.

Artículo en Inglés | MEDLINE | ID: mdl-25620721

RESUMEN

The features and theoretical background of a new and free computational program for chemometric analysis denominated IMMAN (acronym for Information theory-based CheMoMetrics ANalysis) are presented. This is multi-platform software developed in the Java programming language, designed with a remarkably user-friendly graphical interface for the computation of a collection of information-theoretic functions adapted for rank-based unsupervised and supervised feature selection tasks. A total of 20 feature selection parameters are presented, with the unsupervised and supervised frameworks represented by 10 approaches in each case. Several information-theoretic parameters traditionally used as molecular descriptors (MDs) are adapted for use as unsupervised rank-based feature selection methods. On the other hand, a generalization scheme for the previously defined differential Shannon's entropy is discussed, as well as the introduction of Jeffreys information measure for supervised feature selection. Moreover, well-known information-theoretic feature selection parameters, such as information gain, gain ratio, and symmetrical uncertainty are incorporated to the IMMAN software ( http://mobiosd-hub.com/imman-soft/ ), following an equal-interval discretization approach. IMMAN offers data pre-processing functionalities, such as missing values processing, dataset partitioning, and browsing. Moreover, single parameter or ensemble (multi-criteria) ranking options are provided. Consequently, this software is suitable for tasks like dimensionality reduction, feature ranking, as well as comparative diversity analysis of data matrices. Simple examples of applications performed with this program are presented. A comparative study between IMMAN and WEKA feature selection tools using the Arcene dataset was performed, demonstrating similar behavior. In addition, it is revealed that the use of IMMAN unsupervised feature selection methods improves the performance of both IMMAN and WEKA supervised algorithms. Graphic representation for Shannon's distribution of MD calculating software.

Asunto(s)

Modelos Teóricos , Programas Informáticos , Algoritmos

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA