Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
J Chem Phys ; 158(21)2023 Jun 07.
Artículo en Inglés | MEDLINE | ID: mdl-37260016

RESUMEN

Knowledge of the physical properties of ionic liquids (ILs), such as the surface tension and speed of sound, is important for both industrial and research applications. Unfortunately, technical challenges and costs limit exhaustive experimental screening efforts of ILs for these critical properties. Previous work has demonstrated that the use of quantum-mechanics-based thermochemical property prediction tools, such as the conductor-like screening model for real solvents, when combined with machine learning (ML) approaches, may provide an alternative pathway to guide the rapid screening and design of ILs for desired physiochemical properties. However, the question of which machine-learning approaches are most appropriate remains. In the present study, we examine how different ML architectures, ranging from tree-based approaches to feed-forward artificial neural networks, perform in generating nonlinear multivariate quantitative structure-property relationship models for the prediction of the temperature- and pressure-dependent surface tension of and speed of sound in ILs over a wide range of surface tensions (16.9-76.2 mN/m) and speeds of sound (1009.7-1992 m/s). The ML models are further interrogated using the powerful interpretation method, shapley additive explanations. We find that several different ML models provide high accuracy, according to traditional statistical metrics. The decision tree-based approaches appear to be the most accurate and precise, with extreme gradient-boosting trees and gradient-boosting trees being the best performers. However, our results also indicate that the promise of using machine-learning to gain deep insights into the underlying physics driving structure-property relationships in ILs may still be somewhat premature.

2.
Mol Biol Evol ; 38(2): 702-715, 2021 01 23.
Artículo en Inglés | MEDLINE | ID: mdl-32941612

RESUMEN

Despite SARS-CoV and SARS-CoV-2 being equipped with highly similar protein arsenals, the corresponding zoonoses have spread among humans at extremely different rates. The specific characteristics of these viruses that led to such distinct outcomes remain unclear. Here, we apply proteome-wide comparative structural analysis aiming to identify the unique molecular elements in the SARS-CoV-2 proteome that may explain the differing consequences. By combining protein modeling and molecular dynamics simulations, we suggest nonconservative substitutions in functional regions of the spike glycoprotein (S), nsp1, and nsp3 that are contributing to differences in virulence. Particularly, we explain why the substitutions at the receptor-binding domain of S affect the structure-dynamics behavior in complexes with putative host receptors. Conservation of functional protein regions within the two taxa is also noteworthy. We suggest that the highly conserved main protease, nsp5, of SARS-CoV and SARS-CoV-2 is part of their mechanism of circumventing the host interferon antiviral response. Overall, most substitutions occur on the protein surfaces and may be modulating their antigenic properties and interactions with other macromolecules. Our results imply that the striking difference in the pervasiveness of SARS-CoV-2 and SARS-CoV among humans seems to significantly derive from molecular features that modulate the efficiency of viral particles in entering the host cells and blocking the host immune response.


Asunto(s)
Simulación de Dinámica Molecular , Proteómica , SARS-CoV-2/química , SARS-CoV-2/patogenicidad , Coronavirus Relacionado al Síndrome Respiratorio Agudo Severo/química , Coronavirus Relacionado al Síndrome Respiratorio Agudo Severo/patogenicidad , Proteínas Virales/química , Animales , Humanos , Dominios Proteicos , Coronavirus Relacionado al Síndrome Respiratorio Agudo Severo/metabolismo , SARS-CoV-2/metabolismo , Especificidad de la Especie , Proteínas Virales/metabolismo
3.
Annu Rev Phys Chem ; 72: 641-666, 2021 04 20.
Artículo en Inglés | MEDLINE | ID: mdl-33636998

RESUMEN

Quantum chemistry in the form of density functional theory (DFT) calculations is a powerful numerical experiment for predicting intermolecular interaction energies. However, no chemical insight is gained in this way beyond predictions of observables. Energy decomposition analysis (EDA) can quantitatively bridge this gap by providing values for the chemical drivers of the interactions, such as permanent electrostatics, Pauli repulsion, dispersion, and charge transfer. These energetic contributions are identified by performing DFT calculations with constraints that disable components of the interaction. This review describes the second-generation version of the absolutely localized molecular orbital EDA (ALMO-EDA-II). The effects of different physical contributions on changes in observables such as structure and vibrational frequencies upon complex formation are characterized via the adiabatic EDA. Example applications include red- versus blue-shifting hydrogen bonds; the bonding and frequency shifts of CO, N2, and BF bound to a [Ru(II)(NH3)5]2 + moiety; and the nature of the strongly bound complexes between pyridine and the benzene and naphthalene radical cations. Additionally, the use of ALMO-EDA-II to benchmark and guide the development of advanced force fields for molecular simulation is illustrated with the recent, very promising, MB-UCB potential.

4.
J Comput Aided Mol Des ; 35(11): 1095-1123, 2021 11.
Artículo en Inglés | MEDLINE | ID: mdl-34708263

RESUMEN

The advent of computational drug discovery holds the promise of significantly reducing the effort of experimentalists, along with monetary cost. More generally, predicting the binding of small organic molecules to biological macromolecules has far-reaching implications for a range of problems, including metabolomics. However, problems such as predicting the bound structure of a protein-ligand complex along with its affinity have proven to be an enormous challenge. In recent years, machine learning-based methods have proven to be more accurate than older methods, many based on simple linear regression. Nonetheless, there remains room for improvement, as these methods are often trained on a small set of features, with a single functional form for any given physical effect, and often with little mention of the rationale behind choosing one functional form over another. Moreover, it is not entirely clear why one machine learning method is favored over another. In this work, we endeavor to undertake a comprehensive effort towards developing high-accuracy, machine-learned scoring functions, systematically investigating the effects of machine learning method and choice of features, and, when possible, providing insights into the relevant physics using methods that assess feature importance. Here, we show synergism among disparate features, yielding adjusted R2 with experimental binding affinities of up to 0.871 on an independent test set and enrichment for native bound structures of up to 0.913. When purely physical terms that model enthalpic and entropic effects are used in the training, we use feature importance assessments to probe the relevant physics and hopefully guide future investigators working on this and other computational chemistry problems.


Asunto(s)
Descubrimiento de Drogas/métodos , Aprendizaje Automático , Proteínas/metabolismo , Ligandos , Simulación del Acoplamiento Molecular , Termodinámica
5.
J Chem Phys ; 147(16): 161721, 2017 Oct 28.
Artículo en Inglés | MEDLINE | ID: mdl-29096520

RESUMEN

In this work, we evaluate the accuracy of the classical AMOEBA model for representing many-body interactions, such as polarization, charge transfer, and Pauli repulsion and dispersion, through comparison against an energy decomposition method based on absolutely localized molecular orbitals (ALMO-EDA) for the water trimer and a variety of ion-water systems. When the 2- and 3-body contributions according to the many-body expansion are analyzed for the ion-water trimer systems examined here, the 3-body contributions to Pauli repulsion and dispersion are found to be negligible under ALMO-EDA, thereby supporting the validity of the pairwise-additive approximation in AMOEBA's 14-7 van der Waals term. However AMOEBA shows imperfect cancellation of errors for the missing effects of charge transfer and incorrectness in the distance dependence for polarization when compared with the corresponding ALMO-EDA terms. We trace the larger 2-body followed by 3-body polarization errors to the Thole damping scheme used in AMOEBA, and although the width parameter in Thole damping can be changed to improve agreement with the ALMO-EDA polarization for points about equilibrium, the correct profile of polarization as a function of intermolecular distance cannot be reproduced. The results suggest that there is a need for re-examining the damping and polarization model used in the AMOEBA force field and provide further insights into the formulations of polarizable force fields in general.

6.
Annu Rev Phys Chem ; 65: 149-74, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-24328448

RESUMEN

Computational modeling at the atomistic and mesoscopic levels has undergone dramatic development in the past 10 years to meet the challenge of adequately accounting for the many-body nature of intermolecular interactions. At the heart of this challenge is the ability to identify the strengths and specific limitations of pairwise-additive interactions, to improve classical models to explicitly account for many-body effects, and consequently to enhance their ability to describe a wider range of reference data and build confidence in their predictive capacity. However, the corresponding computational cost of these advanced classical models increases significantly enough that statistical convergence of condensed phase observables becomes more difficult to achieve. Here we review a hierarchy of potential energy surface models used in molecular simulations for systems with many degrees of freedom that best meet the trade-off between accuracy and computational speed in order to define a sweet spot for a given scientific problem of interest.


Asunto(s)
Simulación por Computador , Modelos Químicos , Proteínas/química , Agua/química , Animales , Humanos , Teoría Cuántica , Electricidad Estática , Termodinámica
7.
J Chem Phys ; 143(17): 174104, 2015 Nov 07.
Artículo en Inglés | MEDLINE | ID: mdl-26547155

RESUMEN

We have adapted a hybrid extended Lagrangian self-consistent field (EL/SCF) approach, developed for time reversible Born Oppenheimer molecular dynamics for quantum electronic degrees of freedom, to the problem of classical polarization. In this context, the initial guess for the mutual induction calculation is treated by auxiliary induced dipole variables evolved via a time-reversible velocity Verlet scheme. However, we find numerical instability, which is manifested as an accumulation in the auxiliary velocity variables, that in turn results in an unacceptable increase in the number of SCF cycles to meet even loose convergence tolerances for the real induced dipoles over the course of a 1 ns trajectory of the AMOEBA14 water model. By diagnosing the numerical instability as a problem of resonances that corrupt the dynamics, we introduce a simple thermostating scheme, illustrated using Berendsen weak coupling and Nose-Hoover chain thermostats, applied to the auxiliary dipole velocities. We find that the inertial EL/SCF (iEL/SCF) method provides superior energy conservation with less stringent convergence thresholds and a correspondingly small number of SCF cycles, to reproduce all properties of the polarization model in the NVT and NVE ensembles accurately. Our iEL/SCF approach is a clear improvement over standard SCF approaches to classical mutual induction calculations and would be worth investigating for application to ab initio molecular dynamics as well.

8.
Drug Discov Today ; 29(3): 103891, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38246414

RESUMEN

Each of the ∼20,000 proteins in the human proteome is a potential target for compounds that bind to it and modify its function. The 3D structures of most of these proteins are now available. Here, we discuss the prospects for using these structures to perform proteome-wide virtual HTS (VHTS). We compare physics-based (docking) and AI VHTS approaches, some of which are now being applied with large databases of compounds to thousands of targets. Although preliminary proteome-wide screens are now within our grasp, further methodological developments are expected to improve the accuracy of the results.


Asunto(s)
Proteoma , Humanos , Proteoma/metabolismo
9.
ACS Omega ; 9(17): 19548-19559, 2024 Apr 30.
Artículo en Inglés | MEDLINE | ID: mdl-38708262

RESUMEN

Carbon dioxide (CO2) is a detrimental greenhouse gas and is the main contributor to global warming. In addressing this environmental challenge, a promising approach emerges through the utilization of deep eutectic solvents (DESs) as an ecofriendly and sustainable medium for effective CO2 capture. Chemically reactive DESs, which form chemical bonds with the CO2, are superior to nonreactive, physically based DESs for CO2 absorption. However, there are no accurate computational models that provide accurate predictions of the CO2 solubility in chemically reactive DESs. Here, we develop machine learning (ML) models to predict the solubility of CO2 in chemically reactive DESs. As training data, we collected 214 data points for the CO2 solubility in 149 different chemically reactive DESs at different temperatures, pressures, and DES molar ratios from published work. The physics-driven input features for the ML models include σ-profile descriptors that quantify the relative probability of a molecular surface segment having a certain screening charge density and were calculated with the first-principle quantum chemical method COSMO-RS. We show here that, although COSMO-RS does not explicitly calculate chemical reaction profiles, the COSMO-RS-derived σ-profile features can be used to predict bond formation. Of the models trained, an artificial neural network (ANN) provides the most accurate CO2 solubility prediction with an average absolute relative deviation of 2.94% on the testing sets. Overall, this work provides ML models that can predict CO2 solubility precisely and thus accelerate the design and application of chemically reactive DESs.

10.
J Chem Theory Comput ; 20(9): 3911-3926, 2024 May 14.
Artículo en Inglés | MEDLINE | ID: mdl-38387055

RESUMEN

Deep eutectic solvents (DESs) are emerging as environmentally friendly designer solvents for mass transport and heat transfer processes in industrial applications; however, the lack of accurate tools to predict and thus control their viscosities under both a range of environmental factors and formulations hinders their general application. While DESs may serve as designer solvents, with nearly unlimited combinations, this unfortunately makes it experimentally infeasible to comprehensively measure the viscosities of all DESs of potential industrial interest. To assist in the design of DESs, we have developed several new machine learning (ML) models that accurately and rapidly predict the viscosities of a diverse group of DESs at different temperatures and molar ratios using, to date, one of the most comprehensive data sets containing the properties of over 670 DESs over a wide range of temperatures (278.15-385.25 K). Three ML models, including support vector regression (SVR), feed forward neural networks (FFNNs), and categorical boosting (CatBoost), were developed to predict DES viscosity as a function of temperature and molar ratio and contrasted with multilinear and two-factor polynomial regression baselines. Quantum chemistry-based, COSMO-RS-derived sigma profile (σ-profile) features were used as inputs for the ML models. The CatBoost model is excellent at externally predicting DES viscosity, as indicated by high R2 (0.99) and low root-mean-square-error (RMSE) and average absolute relative deviations (AARD) (5.22%) values for the testing data sets, and 98% of the data points lie within the 15% of AARD deviations. Furthermore, SHapley additive explanation (SHAP) analysis was employed to interpret the ML results and rationalize the viscosity predictions. The result is an ML approach that accurately predicts viscosity and will aid in accelerating the design of appropriate DESs for industrial applications.

11.
Proteins ; 81(11): 1919-30, 2013 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-23760773

RESUMEN

Protein-protein interactions are a fundamental aspect of many biological processes. The advent of recombinant protein and computational techniques has allowed for the rational design of proteins with novel binding capabilities. It is therefore desirable to predict which designed proteins are capable of binding in vitro. To this end, we have developed a learned classification model that combines energetic and non-energetic features. Our feature set is adapted from specialized potentials for aromatic interactions, hydrogen bonds, electrostatics, shape, and desolvation. A binding model built on these features was initially developed for CAPRI Round 21, achieving top results in the independent assessment. Here, we present a more thoroughly trained and validated model, and compare various support-vector machine kernels. The Gaussian kernel model classified both high-resolution complexes and designed nonbinders with 79-86% accuracy on independent test data. We also observe that multiple physical potentials for dielectric-dependent electrostatics and hydrogen bonding contribute to the enhanced predictive accuracy, suggesting that their combined information is much greater than that of any single energetics model. We also study the change in predictive performance as the model features or training data are varied, observing unusual patterns of prediction in designed interfaces as compared with other data types.


Asunto(s)
Modelos Teóricos , Proteínas/química , Algoritmos , Unión Proteica , Programas Informáticos
12.
Proteins ; 81(12): 2221-8, 2013 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-24038640

RESUMEN

We describe methods and results for four new types of challenge in the Critical Assessment of PRedicted Interactions (CAPRI). Two new challenges asked predictors to create models related to protein interface design. The first of these was to distinguish binding interfaces from designed nonbinding interfaces. The second was to predict the effects of all single-point mutations on hemagglutinin binding to two small designed proteins. Two additional challenges asked predictors to submit high-resolution structures for interface-bound crystallographic waters and for binding heparin to a putative glycosylase.


Asunto(s)
Hemaglutininas/química , Simulación del Acoplamiento Molecular , Mapas de Interacción de Proteínas , Programas Informáticos , Algoritmos , Inteligencia Artificial , Cristalografía por Rayos X , Heparina/química , Modelos Moleculares , Mutagénesis , Mutación Puntual , Unión Proteica , Conformación Proteica , Agua/química
13.
Proteins ; 81(11): 1980-7, 2013 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-23843247

RESUMEN

Community-wide blind prediction experiments such as CAPRI and CASP provide an objective measure of the current state of predictive methodology. Here we describe a community-wide assessment of methods to predict the effects of mutations on protein-protein interactions. Twenty-two groups predicted the effects of comprehensive saturation mutagenesis for two designed influenza hemagglutinin binders and the results were compared with experimental yeast display enrichment data obtained using deep sequencing. The most successful methods explicitly considered the effects of mutation on monomer stability in addition to binding affinity, carried out explicit side-chain sampling and backbone relaxation, evaluated packing, electrostatic, and solvation effects, and correctly identified around a third of the beneficial mutations. Much room for improvement remains for even the best techniques, and large-scale fitness landscapes should continue to provide an excellent test bed for continued evaluation of both existing and new prediction methodologies.


Asunto(s)
Bases de Datos de Proteínas , Mapeo de Interacción de Proteínas , Algoritmos , Mutación , Unión Proteica
14.
Comput Struct Biotechnol J ; 21: 1122-1139, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36789259

RESUMEN

For plants, distinguishing between mutualistic and pathogenic microbes is a matter of survival. All microbes contain microbe-associated molecular patterns (MAMPs) that are perceived by plant pattern recognition receptors (PRRs). Lysin motif receptor-like kinases (LysM-RLKs) are PRRs attuned for binding and triggering a response to specific MAMPs, including chitin oligomers (COs) in fungi, lipo-chitooligosaccharides (LCOs), which are produced by mycorrhizal fungi and nitrogen-fixing rhizobial bacteria, and peptidoglycan in bacteria. The identification and characterization of LysM-RLKs in candidate bioenergy crops including Populus are limited compared to other model plant species, thus inhibiting our ability to both understand and engineer microbe-mediated gains in plant productivity. As such, we performed a sequence analysis of LysM-RLKs in the Populus genome and predicted their function based on phylogenetic analysis with known LysM-RLKs. Then, using predictive models, molecular dynamics simulations, and comparative structural analysis with previously characterized CO and LCO plant receptors, we identified probable ligand-binding sites in Populus LysM-RLKs. Using several machine learning models, we predicted remarkably consistent binding affinity rankings of Populus proteins to CO. In addition, we used a modified Random Walk with Restart network-topology based approach to identify a subset of Populus LysM-RLKs that are functionally related and propose a corresponding signal transduction cascade. Our findings provide the first look into the role of LysM-RLKs in Populus-microbe interactions and establish a crucial jumping-off point for future research efforts to understand specificity and redundancy in microbial perception mechanisms.

15.
Proteins ; 80(7): 1766-79, 2012 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-22434479

RESUMEN

Normal mode analysis has emerged as a useful technique for investigating protein motions on long time scales. This is largely due to the advent of coarse-graining techniques, particularly Hooke's Law-based potentials and the rotational-translational blocking (RTB) method for reducing the size of the force-constant matrix, the Hessian. Here we present a new method for domain decomposition for use in RTB that is based on hierarchical clustering of atomic density gradients, which we call Density-Cluster RTB (DCRTB). The method reduces the number of degrees of freedom by 85-90% compared with the standard blocking approaches. We compared the normal modes from DCRTB against standard RTB using 1-4 residues in sequence in a single block, with good agreement between the two methods. We also show that Density-Cluster RTB and standard RTB perform well in capturing the experimentally determined direction of conformational change. Significantly, we report superior correlation of DCRTB with B-factors compared with 1-4 residue per block RTB. Finally, we show significant reduction in computational cost for Density-Cluster RTB that is nearly 100-fold for many examples.


Asunto(s)
Modelos Químicos , Proteínas/química , Análisis por Conglomerados , Biología Computacional , Bases de Datos de Proteínas , Modelos Moleculares , Conformación Proteica , Estructura Terciaria de Proteína
16.
Nat Commun ; 13(1): 5285, 2022 09 08.
Artículo en Inglés | MEDLINE | ID: mdl-36075915

RESUMEN

In addition to its essential role in viral polyprotein processing, the SARS-CoV-2 3C-like protease (3CLpro) can cleave human immune signaling proteins, like NF-κB Essential Modulator (NEMO) and deregulate the host immune response. Here, in vitro assays show that SARS-CoV-2 3CLpro cleaves NEMO with fine-tuned efficiency. Analysis of the 2.50 Å resolution crystal structure of 3CLpro C145S bound to NEMO226-234 reveals subsites that tolerate a range of viral and host substrates through main chain hydrogen bonds while also enforcing specificity using side chain hydrogen bonds and hydrophobic contacts. Machine learning- and physics-based computational methods predict that variation in key binding residues of 3CLpro-NEMO helps explain the high fitness of SARS-CoV-2 in humans. We posit that cleavage of NEMO is an important piece of information to be accounted for, in the pathology of COVID-19.


Asunto(s)
COVID-19 , SARS-CoV-2 , Antivirales/química , Cisteína Endopeptidasas/metabolismo , Humanos , Péptido Hidrolasas , Proteínas
17.
bioRxiv ; 2021 Nov 15.
Artículo en Inglés | MEDLINE | ID: mdl-34816264

RESUMEN

In addition to its essential role in viral polyprotein processing, the SARS-CoV-2 3C-like (3CLpro) protease can cleave human immune signaling proteins, like NF-κB Essential Modulator (NEMO) and deregulate the host immune response. Here, in vitro assays show that SARS-CoV-2 3CLpro cleaves NEMO with fine-tuned efficiency. Analysis of the 2.14 Å resolution crystal structure of 3CLpro C145S bound to NEMO 226-235 reveals subsites that tolerate a range of viral and host substrates through main chain hydrogen bonds while also enforcing specificity using side chain hydrogen bonds and hydrophobic contacts. Machine learning- and physics-based computational methods predict that variation in key binding residues of 3CLpro- NEMO helps explain the high fitness of SARS-CoV-2 in humans. We posit that cleavage of NEMO is an important piece of information to be accounted for in the pathology of COVID-19.

18.
Proteins ; 78(15): 3156-65, 2010 Nov 15.
Artículo en Inglés | MEDLINE | ID: mdl-20715288

RESUMEN

We present a computationally efficient method for flexible refinement of docking predictions that reflects observed motions within a protein's structural class. Using structural homologs, we derive deformation models that capture likely motions. The models or "replicates" typically align along a rigid core, with a handful of flexible loops, linkers and tails. A few replicates can generate a much larger number of conformers, by exchanging each flexible region independently of the others. In this way, 10 replicates of a protein having 6 flexible regions can be used to generate a million conformations of a molecule. While this has obvious advantages in terms of sampling, the cost of assessing energies at every conformer is prohibitive, particularly when both molecules are flexible. Our approach addresses this combinatorial explosion, using key assumptions to compress the sampling by many orders of magnitude. ReplicOpter can perform hierarchical clustering from a list of rigid docking predictions and find nearby structures to any promising cluster representatives. These predicted complexes can then be refined and rescored. ReplicOpter's scoring function includes a Lennard-Jones potential softened using the Anderson-Chandler-Weeks decomposition, a desolvation term derived from the Atomic Contact Energy function, Coulombic electrostatics, hydrogen bonding, and terms to model pi-pi and pi-cation interactions. ReplicOpter has performed well on several recent CAPRI systems. We are presently benchmarking ReplicOpter on the complete docking benchmark set to fully establish its utility in refining rigid docking predictions and identifying near-native solutions.


Asunto(s)
Biología Computacional/métodos , Modelos Químicos , Mapeo de Interacción de Proteínas/métodos , Proteínas/química , Programas Informáticos , Algoritmos , Análisis por Conglomerados , Modelos Moleculares , Unión Proteica , Conformación Proteica , Proteínas/metabolismo
19.
PLoS Comput Biol ; 5(10): e1000531, 2009 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-19816556

RESUMEN

In allostery, a binding event at one site in a protein modulates the behavior of a distant site. Identifying residues that relay the signal between sites remains a challenge. We have developed predictive models using support-vector machines, a widely used machine-learning method. The training data set consisted of residues classified as either hotspots or non-hotspots based on experimental characterization of point mutations from a diverse set of allosteric proteins. Each residue had an associated set of calculated features. Two sets of features were used, one consisting of dynamical, structural, network, and informatic measures, and another of structural measures defined by Daily and Gray. The resulting models performed well on an independent data set consisting of hotspots and non-hotspots from five allosteric proteins. For the independent data set, our top 10 models using Feature Set 1 recalled 68-81% of known hotspots, and among total hotspot predictions, 58-67% were actual hotspots. Hence, these models have precision P = 58-67% and recall R = 68-81%. The corresponding models for Feature Set 2 had P = 55-59% and R = 81-92%. We combined the features from each set that produced models with optimal predictive performance. The top 10 models using this hybrid feature set had R = 73-81% and P = 64-71%, the best overall performance of any of the sets of models. Our methods identified hotspots in structural regions of known allosteric significance. Moreover, our predicted hotspots form a network of contiguous residues in the interior of the structures, in agreement with previous work. In conclusion, we have developed models that discriminate between known allosteric hotspots and non-hotspots with high accuracy and sensitivity. Moreover, the pattern of predicted hotspots corresponds to known functional motifs implicated in allostery, and is consistent with previous work describing sparse networks of allosterically important residues.


Asunto(s)
Sitio Alostérico/genética , Modelos Químicos , Proteínas/química , Relación Estructura-Actividad , Algoritmos , Inteligencia Artificial , Análisis por Conglomerados , Represoras Lac/química , Represoras Lac/genética , Represoras Lac/metabolismo , Modelos Moleculares , Unión Proteica , Proteínas/genética , Proteínas/metabolismo , Reproducibilidad de los Resultados , Transducción de Señal , Termodinámica , Factores de Transcripción
20.
Biochim Biophys Acta Gen Subj ; 1864(4): 129535, 2020 04.
Artículo en Inglés | MEDLINE | ID: mdl-31954798

RESUMEN

Selecting peptides that bind strongly to the major histocompatibility complex (MHC) for inclusion in a vaccine has therapeutic potential for infections and tumors. Machine learning models trained on sequence data exist for peptide:MHC (p:MHC) binding predictions. Here, we train support vector machine classifier (SVMC) models on physicochemical sequence-based and structure-based descriptor sets to predict peptide binding to a well-studied model mouse MHC I allele, H-2Db. Recursive feature elimination and two-way forward feature selection were also performed. Although low on sensitivity compared to the current state-of-the-art algorithms, models based on physicochemical descriptor sets achieve specificity and precision comparable to the most popular sequence-based algorithms. The best-performing model is a hybrid descriptor set containing both sequence-based and structure-based descriptors. Interestingly, close to half of the physicochemical sequence-based descriptors remaining in the hybrid model were properties of the anchor positions, residues 5 and 9 in the peptide sequence. In contrast, residues flanking position 5 make little to no residue-specific contribution to the binding affinity prediction. The results suggest that machine-learned models incorporating both sequence-based descriptors and structural data may provide information on specific physicochemical properties determining binding affinities.


Asunto(s)
Antígenos de Histocompatibilidad Clase I/química , Aprendizaje Automático , Péptidos/química , Algoritmos , Alelos , Secuencia de Aminoácidos , Animales , Antígenos de Histocompatibilidad Clase I/genética , Ratones , Unión Proteica , Conformación Proteica
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA