|

Overproduce and select, or determine optimal molecular descriptor subset via configuration space optimization? Application to the prediction of ecotoxicological endpoints.

García-González, Luis A; Marrero-Ponce, Yovani; Brizuela, Carlos A; García-Jacas, César R.

Mol Inform ; 42(6): e2200227, 2023 Jun.

Article En | MEDLINE | ID: mdl-36894503

Predicting the likely biological activity (or property) of compounds is a fundamental and challenging task in the drug discovery process. Current computational methodologies aim to improve their predictive accuracies by using deep learning (DL) approaches. However, non-DL based approaches for small- and medium-sized chemical datasets have demonstrated to be most suitable for. In this approach, an initial universe of molecular descriptors (MDs) is first calculated, then different feature selection algorithms are applied, and finally, one or several predictive models are built. Herein we demonstrate that this traditional approach may miss relevant information by assuming that the initial universe of MDs codifies all relevant aspects for the respective learning task. We argue that this limitation is mainly because of the constrained intervals of the parameters used in the algorithms that compute MDs, parameters that define the Descriptor Configuration Space (DCS). We propose to relax these constraints in an open CDS approach, so that a larger universe of MDs can be initially considered. We model the generation of MDs as a multicriteria optimization problem and tackle it with a variant of the standard genetic algorithm. As a novel component, the fitness function is computed by aggregating four criteria via the Choquet integral. Experimental results show that the proposed approach generates a meaningful DCS by improving state-of-the-art approaches in most of the benchmarking chemical datasets accounted for.

Algorithms , Quantitative Structure-Activity Relationship , Drug Discovery , Benchmarking

Embedded-AMP: A Multi-Thread Computational Method for the Systematic Identification of Antimicrobial Peptides Embedded in Proteome Sequences.

Carballo, Germán Meléndrez; Vázquez, Karen Guerrero; García-González, Luis A; Rio, Gabriel Del; Brizuela, Carlos A.

Antibiotics (Basel) ; 12(1)2023 Jan 10.

Article En | MEDLINE | ID: mdl-36671338

Antimicrobial peptides (AMPs) have gained the attention of the research community for being an alternative to conventional antimicrobials to fight antibiotic resistance and for displaying other pharmacologically relevant activities, such as cell penetration, autophagy induction, immunomodulation, among others. The identification of AMPs had been accomplished by combining computational and experimental approaches and have been mostly restricted to self-contained peptides despite accumulated evidence indicating AMPs may be found embedded within proteins, the functions of which are not necessarily associated with antimicrobials. To address this limitation, we propose a machine-learning (ML)-based pipeline to identify AMPs that are embedded in proteomes. Our method performs an in-silico digestion of every protein in the proteome to generate unique k-mers of different lengths, computes a set of molecular descriptors for each k-mer, and performs an antimicrobial activity prediction. To show the efficiency of the method we used the shrimp proteome, and the pipeline analyzed all k-mers between 10 and 60 amino acids in length to predict all AMPs in less than 20 min. As an application example we predicted AMPs in different rodents (common cuy, common rat, and naked mole rat) with different reported longevities and found a relation between species longevity and the number of predicted AMPs. The analysis shows as the longevity of the species is higher, the number of predicted AMPs is also higher. The pipeline is available as a web service.

Handcrafted versus non-handcrafted (self-supervised) features for the classification of antimicrobial peptides: complementary or redundant?

García-Jacas, César R; García-González, Luis A; Martinez-Rios, Felix; Tapia-Contreras, Issac P; Brizuela, Carlos A.

Brief Bioinform ; 23(6)2022 11 19.

Article En | MEDLINE | ID: mdl-36215083

Antimicrobial peptides (AMPs) have received a great deal of attention given their potential to become a plausible option to fight multi-drug resistant bacteria as well as other pathogens. Quantitative sequence-activity models (QSAMs) have been helpful to discover new AMPs because they allow to explore a large universe of peptide sequences and help reduce the number of wet lab experiments. A main aspect in the building of QSAMs based on shallow learning is to determine an optimal set of protein descriptors (features) required to discriminate between sequences with different antimicrobial activities. These features are generally handcrafted from peptide sequence datasets that are labeled with specific antimicrobial activities. However, recent developments have shown that unsupervised approaches can be used to determine features that outperform human-engineered (handcrafted) features. Thus, knowing which of these two approaches contribute to a better classification of AMPs, it is a fundamental question in order to design more accurate models. Here, we present a systematic and rigorous study to compare both types of features. Experimental outcomes show that non-handcrafted features lead to achieve better performances than handcrafted features. However, the experiments also prove that an improvement in performance is achieved when both types of features are merged. A relevance analysis reveals that non-handcrafted features have higher information content than handcrafted features, while an interaction-based importance analysis reveals that handcrafted features are more important. These findings suggest that there is complementarity between both types of features. Comparisons regarding state-of-the-art deep models show that shallow models yield better performances both when fed with non-handcrafted features alone and when fed with non-handcrafted and handcrafted features together.

Anti-Infective Agents , Antimicrobial Peptides , Humans , Antimicrobial Cationic Peptides/pharmacology , Anti-Infective Agents/pharmacology , Anti-Infective Agents/chemistry , Amino Acid Sequence

Do deep learning models make a difference in the identification of antimicrobial peptides?

García-Jacas, César R; Pinacho-Castellanos, Sergio A; García-González, Luis A; Brizuela, Carlos A.

Brief Bioinform ; 23(3)2022 05 13.

Article En | MEDLINE | ID: mdl-35380616

In the last few decades, antimicrobial peptides (AMPs) have been explored as an alternative to classical antibiotics, which in turn motivated the development of machine learning models to predict antimicrobial activities in peptides. The first generation of these predictors was filled with what is now known as shallow learning-based models. These models require the computation and selection of molecular descriptors to characterize each peptide sequence and train the models. The second generation, known as deep learning-based models, which no longer requires the explicit computation and selection of those descriptors, started to be used in the prediction task of AMPs just four years ago. The superior performance claimed by deep models regarding shallow models has created a prevalent inertia to using deep learning to identify AMPs. However, methodological flaws and/or modeling biases in the building of deep models do not support such superiority. Here, we analyze the main pitfalls that led to establish biased conclusions on the leading performance of deep models. Also, we analyze whether deep models truly contribute to achieve better predictions than shallow models by performing fair studies on different state-of-the-art benchmarking datasets. The experiments reveal that deep models do not outperform shallow models in the classification of AMPs, and that both types of models codify similar chemical information since their predictions are highly similar. Thus, according to the currently available datasets, we conclude that the use of deep learning could not be the most suitable approach to develop models to identify AMPs, mainly because shallow models achieve comparable-to-superior performances and are simpler (Ockham's razor principle). Even so, we suggest the use of deep learning only when its capabilities lead to obtaining significantly better performance gains worth the additional computational cost.

Deep Learning , Amino Acid Sequence , Antimicrobial Peptides , Machine Learning , Peptides/chemistry

Cross-Cultural Adaptation and Validation of the Translated Patient-Rated Wrist Evaluation Score.

Gómez-Eslava, Bárbara; Rodriguez-Ricardo, Maria Cristina; Serpa, Juan Camilo; Fajury, Raschid; García-González, Luis A.

J Wrist Surg ; 10(4): 303-307, 2021 Aug.

Article En | MEDLINE | ID: mdl-34381633

Introduction The purpose of this study is to perform a cross-cultural adaptation and validation of the translated Patient-Rated Wrist Evaluation (PRWE) score exclusively for pathologies of the wrist. Materials and Methods A methodological study of cross-cultural validation of clinical scores was performed through a test-retest reliability analysis, internal consistency, response to change, and criterion validity assessment. Results The test was applied to 57 patients with 139 surveys. Stability evaluated through Lin's concordance correlation coefficient was 0.98, with 95% confidence interval (CI) = 0.97-0.99; Cronbach's alpha was > 0.91; the difference in score was 24.26 (standard deviation: 26.59); the standardized response mean was 0.912; the effect size was 0.924; the Spearman's coefficient between the differences of PRWE and DASH-Disabilities of the Arm, Shoulder, and Hand-scores was r = 0.899, with 95% CI = 0.811-0.947; Spearman's nonparametric correlation test between PRWE and DASH was 0.82, with 95% CI = 0.711-0.890. Conclusions We successfully validated the Spanish translation of the PRWE scale. It showed valid and reliable interpretation of functional status and response to treatment after distal radius fracture, for Colombian population. Level of Evidence This is a level II, methodological study for scale validation.

Molecular Characterization of Coxsackievirus A24v from Feces and Conjunctiva Reveals Epidemiological Links.

Fonseca, Magilé C; Pupo-Meriño, Mario; García-González, Luis A; Muné, Mayra; Resik, Sonia; Norder, Heléne; Sarmiento, Luis.

Microorganisms ; 9(3)2021 Mar 05.

Article En | MEDLINE | ID: mdl-33807540

Coxsackievirus A24 variant (CVA24v), the main causative agent of acute hemorrhagic conjunctivitis (AHC), can be isolated from both the eyes and lower alimentary tract. However, the molecular features of CVA24v in feces is not well-documented. In this study, we compared the VP1 and 3C sequences of CVA24v strains isolated from feces during AHC epidemics in Cuba in 1997, 2003, and 2008-2009 with those obtained from conjunctival swabs during the same epidemic period. The sequence analyses of the 3C and VP1 region of stool isolates from the three epidemics showed a high degree of nucleotide identity (ranging from 97.3-100%) to the corresponding conjunctival isolates. The phylogenetic analysis showed that fecal CVA24v isolates from the 1997 and 2003 Cuban outbreaks formed a clade with CVA24v strains isolated from conjunctival swabs in Cuba and other countries during the same period. There were three amino acid changes (3C region) and one amino acid change (VP1 region) in seven CVA24v strains isolated sequentially over 20 days from fecal samples of one patient, suggesting viral replication in the intestine. Despite these substitutions, the virus from the conjunctival swab and fecal samples were genetically very similar. Therefore, fecal samples should be considered as a reliable alternative sample type for the routine molecular diagnosis and molecular epidemiology of CVA24v, also during outbreaks of AHC.

Molecular evolution of coxsackievirus A24v in Cuba over 23-years, 1986-2009.

Fonseca, Magilé C; Pupo-Meriño, Mario; García-González, Luis A; Resik, Sonia; Hung, Lai Heng; Muné, Mayra; Rodríguez, Hermis; Morier, Luis; Norder, Heléne; Sarmiento, Luis.

Sci Rep ; 10(1): 13761, 2020 08 13.

Article En | MEDLINE | ID: mdl-32792520

Coxsackievirus A24 variant (CVA24v) is a major causative agent of acute hemorrhagic conjunctivitis outbreaks worldwide, yet the evolutionary and transmission dynamics of the virus remain unclear. To address this, we analyzed and compared the 3C and partial VP1 gene regions of CVA24v isolates obtained from five outbreaks in Cuba between 1986 and 2009 and strains isolated worldwide. Here we show that Cuban strains were homologous to those isolated in Africa, the Americas and Asia during the same time period. Two genotypes of CVA24v (GIII and GIV) were repeatedly introduced into Cuba and they arose about two years before the epidemic was detected. The two genotypes co-evolved with a population size that is stable over time. However, nucleotide substitution rates peaked during pandemics with 4.39 × 10-3 and 5.80 × 10-3 substitutions per site per year for the 3C and VP1 region, respectively. The phylogeographic analysis identified 25 and 19 viral transmission routes based on 3C and VP1 regions, respectively. Pandemic viruses usually originated in Asia, and both China and Brazil were the major hub for the global dispersal of the virus. Together, these data provide novel insight into the epidemiological dynamics of this virus and possibly other pandemic viruses.

Capsid Proteins/genetics , Conjunctivitis, Acute Hemorrhagic/epidemiology , Coxsackievirus Infections/epidemiology , Cysteine Endopeptidases/genetics , Enterovirus C, Human/genetics , Viral Proteins/genetics , 3C Viral Proteases , Base Sequence , Conjunctivitis, Acute Hemorrhagic/pathology , Conjunctivitis, Acute Hemorrhagic/transmission , Coxsackievirus Infections/pathology , Coxsackievirus Infections/transmission , Cuba/epidemiology , Disease Outbreaks , Evolution, Molecular , Humans , Phylogeny , Sequence Alignment

Enhancing Acute Oral Toxicity Predictions by using Consensus Modeling and Algebraic Form-Based 0D-to-2D Molecular Encodes.

García-Jacas, César R; Marrero-Ponce, Yovani; Cortés-Guzmán, Fernando; Suárez-Lezcano, José; Martinez-Rios, Felix O; García-González, Luis A; Pupo-Meriño, Mario; Martinez-Mayorga, Karina.

Chem Res Toxicol ; 32(6): 1178-1192, 2019 06 17.

Article En | MEDLINE | ID: mdl-31066547

Quantitative structure-activity relationships (QSAR) are introduced to predict acute oral toxicity (AOT), by using the QuBiLS-MAS (acronym for quadratic, bilinear and N-Linear maps based on graph-theoretic electronic-density matrices and atomic weightings) framework for the molecular encoding. Three training sets were employed to build the models: EPA training set (5931 compounds), EPA-full training set (7413 compounds), and Zhu training set (10â¯152 compounds). Additionally, the EPA test set (1482 compounds) was used for the validation of the QSAR models built on the EPA training set, while the ProTox (425 compounds) and T3DB (284 compounds) external sets were employed for the assessment of all the models. The k-nearest neighbor, multilayer perceptron, random forest, and support vector machine procedures were employed to build several base (individual) models. The base models with REPA-training ≥ 0.75 ( R = correlation coefficient) and MAEEPA-training ≤ 0.5 (MAE = mean absolute error) were retained to build consensus models. As a result, two consensus models based on the minimum operator and denoted as M19 and M22, as well as a consensus model based on the weighted average operator and denoted as M24, were selected as the best ones for each training set considered. According to the applicability domain (AD) analysis performed, model M19 (built on the EPA training set) has MAEtest-AD = 0.4044, MAEProTox-AD = 0.4067 and MAET3DB-AD = 0.2586 on the EPA test set, ProTox external set, and T3DB external set, respectively; whereas model M22 (built on the EPA-full set) and model M24 (built on the Zhu set) present MAEProTox-AD = 0.3992 and MAET3DB-AD = 0.2286, and MAEProTox-AD = 0.3773 and MAET3DB-AD = 0.2471 on the two external sets accounted for, respectively. These outcomes were compared and statistically validated with respect to 14 QSAR methods (e.g., admetSAR, ProTox-II) from the literature. As a result, model M22 presents the best overall performance. In addition, a retrospective study on 261 withdrawn drugs due to their toxic/side effects was performed, to assess the usefulness of prospectively using the QSAR models proposed in the labeling of chemicals. A comparison with regard to the methods from the literature was also made. As a result, model M22 has the best ability of labeling a compound as toxic according to the globally harmonized system of classification and labeling of chemicals. Therefore, it can be concluded that the models proposed, especially model M22, constitute prominent tools for studying AOT, at providing the best results among all the methods examined. A freely available software was also developed to be used in virtual screening tasks ( http://tomocomd.com/apps/ptoxra ).

Cluster Analysis , Support Vector Machine , Toxicity Tests, Acute , Administration, Oral , Animals , Humans , Quantitative Structure-Activity Relationship

GOWAWA Aggregation Operator-based Global Molecular Characterizations: Weighting Atom/bond Contributions (LOVIs/LOEIs) According to their Influence in the Molecular Encoding.

García-Jacas, César R; Cabrera-Leyva, Lisset; Marrero-Ponce, Yovani; Suárez-Lezcano, José; Cortés-Guzmán, Fernando; García-González, Luis A.

Mol Inform ; 37(12): e1800039, 2018 12.

Article En | MEDLINE | ID: mdl-30070434

A different perspective to compute global weighted definitions of molecular descriptors from the contributions of each atom (LOVIs) or covalent bond (LOEIs) within a molecule is presented, using the generalized ordered weighted averaging - weighted averaging (GOWAWA) aggregation operator. This operator is rather different from the other norm-, mean- and statistic-based operators used up to date for the descriptors calculation from LOVIs/LOEIs. GOWAWA unifies the generalized ordered weighted averaging (GOWA) and the weighted generalized mean (WGM) functions and, in addition, it uses a smoothing parameter to assign different importance values to both functions depending on the problem under study. With the GOWAWA operator, diversity of novel global aggregations of molecular descriptors can be determined, where the influence that each atom (or covalent bond) has on the molecular characterization is taken into account. Therefore, this approach is completely different from the ones reported in the literature, where the values of LOVIs/LOEIs are considered equally important. To demonstrate the feasibility of using this operator, the QuBiLS-MIDAS descriptors (http://tomocomd.com/qubils-midas) were used and, as a result, a module was built into the corresponding software to compute them, being thus the only software reported in the literature that can be employed to determine weighted descriptors. Moreover, several modeling studies were performed on eight chemical datasets, which demonstrated that, with the GOWAWA aggregation operator, weighted QuBiLS-MIDAS descriptors that contribute to develop models with greater predictive power can be computed, if compared to the models based on the non-weighted descriptors calculated from the other operators used up to date. A non-parametric statistical assessment confirmed that the GOWAWA-based predictions are significantly superior to the others obtained. Therefore, all in all, it can be concluded that, from the results achieved, the GOWAWA operator constitutes a prominent alternative to codify relevant chemical information of the molecules, ultimately useful in improving the modeling ability of several old and recent descriptors whose definition is based on the LOVIs/LOEIs calculation.

Quantitative Structure-Activity Relationship , Software , Databases, Chemical