Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
1.
J Comput Aided Mol Des ; 35(11): 1095-1123, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-34708263

RESUMO

The advent of computational drug discovery holds the promise of significantly reducing the effort of experimentalists, along with monetary cost. More generally, predicting the binding of small organic molecules to biological macromolecules has far-reaching implications for a range of problems, including metabolomics. However, problems such as predicting the bound structure of a protein-ligand complex along with its affinity have proven to be an enormous challenge. In recent years, machine learning-based methods have proven to be more accurate than older methods, many based on simple linear regression. Nonetheless, there remains room for improvement, as these methods are often trained on a small set of features, with a single functional form for any given physical effect, and often with little mention of the rationale behind choosing one functional form over another. Moreover, it is not entirely clear why one machine learning method is favored over another. In this work, we endeavor to undertake a comprehensive effort towards developing high-accuracy, machine-learned scoring functions, systematically investigating the effects of machine learning method and choice of features, and, when possible, providing insights into the relevant physics using methods that assess feature importance. Here, we show synergism among disparate features, yielding adjusted R2 with experimental binding affinities of up to 0.871 on an independent test set and enrichment for native bound structures of up to 0.913. When purely physical terms that model enthalpic and entropic effects are used in the training, we use feature importance assessments to probe the relevant physics and hopefully guide future investigators working on this and other computational chemistry problems.


Assuntos
Descoberta de Drogas/métodos , Aprendizado de Máquina , Proteínas/metabolismo , Ligantes , Simulação de Acoplamento Molecular , Termodinâmica
2.
ACS Omega ; 9(17): 19548-19559, 2024 Apr 30.
Artigo em Inglês | MEDLINE | ID: mdl-38708262

RESUMO

Carbon dioxide (CO2) is a detrimental greenhouse gas and is the main contributor to global warming. In addressing this environmental challenge, a promising approach emerges through the utilization of deep eutectic solvents (DESs) as an ecofriendly and sustainable medium for effective CO2 capture. Chemically reactive DESs, which form chemical bonds with the CO2, are superior to nonreactive, physically based DESs for CO2 absorption. However, there are no accurate computational models that provide accurate predictions of the CO2 solubility in chemically reactive DESs. Here, we develop machine learning (ML) models to predict the solubility of CO2 in chemically reactive DESs. As training data, we collected 214 data points for the CO2 solubility in 149 different chemically reactive DESs at different temperatures, pressures, and DES molar ratios from published work. The physics-driven input features for the ML models include σ-profile descriptors that quantify the relative probability of a molecular surface segment having a certain screening charge density and were calculated with the first-principle quantum chemical method COSMO-RS. We show here that, although COSMO-RS does not explicitly calculate chemical reaction profiles, the COSMO-RS-derived σ-profile features can be used to predict bond formation. Of the models trained, an artificial neural network (ANN) provides the most accurate CO2 solubility prediction with an average absolute relative deviation of 2.94% on the testing sets. Overall, this work provides ML models that can predict CO2 solubility precisely and thus accelerate the design and application of chemically reactive DESs.

3.
J Chem Theory Comput ; 20(9): 3911-3926, 2024 May 14.
Artigo em Inglês | MEDLINE | ID: mdl-38387055

RESUMO

Deep eutectic solvents (DESs) are emerging as environmentally friendly designer solvents for mass transport and heat transfer processes in industrial applications; however, the lack of accurate tools to predict and thus control their viscosities under both a range of environmental factors and formulations hinders their general application. While DESs may serve as designer solvents, with nearly unlimited combinations, this unfortunately makes it experimentally infeasible to comprehensively measure the viscosities of all DESs of potential industrial interest. To assist in the design of DESs, we have developed several new machine learning (ML) models that accurately and rapidly predict the viscosities of a diverse group of DESs at different temperatures and molar ratios using, to date, one of the most comprehensive data sets containing the properties of over 670 DESs over a wide range of temperatures (278.15-385.25 K). Three ML models, including support vector regression (SVR), feed forward neural networks (FFNNs), and categorical boosting (CatBoost), were developed to predict DES viscosity as a function of temperature and molar ratio and contrasted with multilinear and two-factor polynomial regression baselines. Quantum chemistry-based, COSMO-RS-derived sigma profile (σ-profile) features were used as inputs for the ML models. The CatBoost model is excellent at externally predicting DES viscosity, as indicated by high R2 (0.99) and low root-mean-square-error (RMSE) and average absolute relative deviations (AARD) (5.22%) values for the testing data sets, and 98% of the data points lie within the 15% of AARD deviations. Furthermore, SHapley additive explanation (SHAP) analysis was employed to interpret the ML results and rationalize the viscosity predictions. The result is an ML approach that accurately predicts viscosity and will aid in accelerating the design of appropriate DESs for industrial applications.

4.
Proteins ; 81(11): 1919-30, 2013 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-23760773

RESUMO

Protein-protein interactions are a fundamental aspect of many biological processes. The advent of recombinant protein and computational techniques has allowed for the rational design of proteins with novel binding capabilities. It is therefore desirable to predict which designed proteins are capable of binding in vitro. To this end, we have developed a learned classification model that combines energetic and non-energetic features. Our feature set is adapted from specialized potentials for aromatic interactions, hydrogen bonds, electrostatics, shape, and desolvation. A binding model built on these features was initially developed for CAPRI Round 21, achieving top results in the independent assessment. Here, we present a more thoroughly trained and validated model, and compare various support-vector machine kernels. The Gaussian kernel model classified both high-resolution complexes and designed nonbinders with 79-86% accuracy on independent test data. We also observe that multiple physical potentials for dielectric-dependent electrostatics and hydrogen bonding contribute to the enhanced predictive accuracy, suggesting that their combined information is much greater than that of any single energetics model. We also study the change in predictive performance as the model features or training data are varied, observing unusual patterns of prediction in designed interfaces as compared with other data types.


Assuntos
Modelos Teóricos , Proteínas/química , Algoritmos , Ligação Proteica , Software
5.
Proteins ; 81(12): 2221-8, 2013 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-24038640

RESUMO

We describe methods and results for four new types of challenge in the Critical Assessment of PRedicted Interactions (CAPRI). Two new challenges asked predictors to create models related to protein interface design. The first of these was to distinguish binding interfaces from designed nonbinding interfaces. The second was to predict the effects of all single-point mutations on hemagglutinin binding to two small designed proteins. Two additional challenges asked predictors to submit high-resolution structures for interface-bound crystallographic waters and for binding heparin to a putative glycosylase.


Assuntos
Hemaglutininas/química , Simulação de Acoplamento Molecular , Mapas de Interação de Proteínas , Software , Algoritmos , Inteligência Artificial , Cristalografia por Raios X , Heparina/química , Modelos Moleculares , Mutagênese , Mutação Puntual , Ligação Proteica , Conformação Proteica , Água/química
6.
Proteins ; 81(11): 1980-7, 2013 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-23843247

RESUMO

Community-wide blind prediction experiments such as CAPRI and CASP provide an objective measure of the current state of predictive methodology. Here we describe a community-wide assessment of methods to predict the effects of mutations on protein-protein interactions. Twenty-two groups predicted the effects of comprehensive saturation mutagenesis for two designed influenza hemagglutinin binders and the results were compared with experimental yeast display enrichment data obtained using deep sequencing. The most successful methods explicitly considered the effects of mutation on monomer stability in addition to binding affinity, carried out explicit side-chain sampling and backbone relaxation, evaluated packing, electrostatic, and solvation effects, and correctly identified around a third of the beneficial mutations. Much room for improvement remains for even the best techniques, and large-scale fitness landscapes should continue to provide an excellent test bed for continued evaluation of both existing and new prediction methodologies.


Assuntos
Bases de Dados de Proteínas , Mapeamento de Interação de Proteínas , Algoritmos , Mutação , Ligação Proteica
7.
Comput Struct Biotechnol J ; 21: 1122-1139, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36789259

RESUMO

For plants, distinguishing between mutualistic and pathogenic microbes is a matter of survival. All microbes contain microbe-associated molecular patterns (MAMPs) that are perceived by plant pattern recognition receptors (PRRs). Lysin motif receptor-like kinases (LysM-RLKs) are PRRs attuned for binding and triggering a response to specific MAMPs, including chitin oligomers (COs) in fungi, lipo-chitooligosaccharides (LCOs), which are produced by mycorrhizal fungi and nitrogen-fixing rhizobial bacteria, and peptidoglycan in bacteria. The identification and characterization of LysM-RLKs in candidate bioenergy crops including Populus are limited compared to other model plant species, thus inhibiting our ability to both understand and engineer microbe-mediated gains in plant productivity. As such, we performed a sequence analysis of LysM-RLKs in the Populus genome and predicted their function based on phylogenetic analysis with known LysM-RLKs. Then, using predictive models, molecular dynamics simulations, and comparative structural analysis with previously characterized CO and LCO plant receptors, we identified probable ligand-binding sites in Populus LysM-RLKs. Using several machine learning models, we predicted remarkably consistent binding affinity rankings of Populus proteins to CO. In addition, we used a modified Random Walk with Restart network-topology based approach to identify a subset of Populus LysM-RLKs that are functionally related and propose a corresponding signal transduction cascade. Our findings provide the first look into the role of LysM-RLKs in Populus-microbe interactions and establish a crucial jumping-off point for future research efforts to understand specificity and redundancy in microbial perception mechanisms.

8.
Proteins ; 80(7): 1766-79, 2012 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-22434479

RESUMO

Normal mode analysis has emerged as a useful technique for investigating protein motions on long time scales. This is largely due to the advent of coarse-graining techniques, particularly Hooke's Law-based potentials and the rotational-translational blocking (RTB) method for reducing the size of the force-constant matrix, the Hessian. Here we present a new method for domain decomposition for use in RTB that is based on hierarchical clustering of atomic density gradients, which we call Density-Cluster RTB (DCRTB). The method reduces the number of degrees of freedom by 85-90% compared with the standard blocking approaches. We compared the normal modes from DCRTB against standard RTB using 1-4 residues in sequence in a single block, with good agreement between the two methods. We also show that Density-Cluster RTB and standard RTB perform well in capturing the experimentally determined direction of conformational change. Significantly, we report superior correlation of DCRTB with B-factors compared with 1-4 residue per block RTB. Finally, we show significant reduction in computational cost for Density-Cluster RTB that is nearly 100-fold for many examples.


Assuntos
Modelos Químicos , Proteínas/química , Análise por Conglomerados , Biologia Computacional , Bases de Dados de Proteínas , Modelos Moleculares , Conformação Proteica , Estrutura Terciária de Proteína
9.
Proteins ; 78(15): 3156-65, 2010 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-20715288

RESUMO

We present a computationally efficient method for flexible refinement of docking predictions that reflects observed motions within a protein's structural class. Using structural homologs, we derive deformation models that capture likely motions. The models or "replicates" typically align along a rigid core, with a handful of flexible loops, linkers and tails. A few replicates can generate a much larger number of conformers, by exchanging each flexible region independently of the others. In this way, 10 replicates of a protein having 6 flexible regions can be used to generate a million conformations of a molecule. While this has obvious advantages in terms of sampling, the cost of assessing energies at every conformer is prohibitive, particularly when both molecules are flexible. Our approach addresses this combinatorial explosion, using key assumptions to compress the sampling by many orders of magnitude. ReplicOpter can perform hierarchical clustering from a list of rigid docking predictions and find nearby structures to any promising cluster representatives. These predicted complexes can then be refined and rescored. ReplicOpter's scoring function includes a Lennard-Jones potential softened using the Anderson-Chandler-Weeks decomposition, a desolvation term derived from the Atomic Contact Energy function, Coulombic electrostatics, hydrogen bonding, and terms to model pi-pi and pi-cation interactions. ReplicOpter has performed well on several recent CAPRI systems. We are presently benchmarking ReplicOpter on the complete docking benchmark set to fully establish its utility in refining rigid docking predictions and identifying near-native solutions.


Assuntos
Biologia Computacional/métodos , Modelos Químicos , Mapeamento de Interação de Proteínas/métodos , Proteínas/química , Software , Algoritmos , Análise por Conglomerados , Modelos Moleculares , Ligação Proteica , Conformação Proteica , Proteínas/metabolismo
10.
PLoS Comput Biol ; 5(10): e1000531, 2009 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-19816556

RESUMO

In allostery, a binding event at one site in a protein modulates the behavior of a distant site. Identifying residues that relay the signal between sites remains a challenge. We have developed predictive models using support-vector machines, a widely used machine-learning method. The training data set consisted of residues classified as either hotspots or non-hotspots based on experimental characterization of point mutations from a diverse set of allosteric proteins. Each residue had an associated set of calculated features. Two sets of features were used, one consisting of dynamical, structural, network, and informatic measures, and another of structural measures defined by Daily and Gray. The resulting models performed well on an independent data set consisting of hotspots and non-hotspots from five allosteric proteins. For the independent data set, our top 10 models using Feature Set 1 recalled 68-81% of known hotspots, and among total hotspot predictions, 58-67% were actual hotspots. Hence, these models have precision P = 58-67% and recall R = 68-81%. The corresponding models for Feature Set 2 had P = 55-59% and R = 81-92%. We combined the features from each set that produced models with optimal predictive performance. The top 10 models using this hybrid feature set had R = 73-81% and P = 64-71%, the best overall performance of any of the sets of models. Our methods identified hotspots in structural regions of known allosteric significance. Moreover, our predicted hotspots form a network of contiguous residues in the interior of the structures, in agreement with previous work. In conclusion, we have developed models that discriminate between known allosteric hotspots and non-hotspots with high accuracy and sensitivity. Moreover, the pattern of predicted hotspots corresponds to known functional motifs implicated in allostery, and is consistent with previous work describing sparse networks of allosterically important residues.


Assuntos
Sítio Alostérico/genética , Modelos Químicos , Proteínas/química , Relação Estrutura-Atividade , Algoritmos , Inteligência Artificial , Análise por Conglomerados , Repressores Lac/química , Repressores Lac/genética , Repressores Lac/metabolismo , Modelos Moleculares , Ligação Proteica , Proteínas/genética , Proteínas/metabolismo , Reprodutibilidade dos Testes , Transdução de Sinais , Termodinâmica , Fatores de Transcrição
11.
Nat Commun ; 10(1): 5612, 2019 12 09.
Artigo em Inglês | MEDLINE | ID: mdl-31819058

RESUMO

Human myeloid-derived growth factor (hMYDGF) is a 142-residue protein with a C-terminal endoplasmic reticulum (ER) retention sequence (ERS). Extracellular MYDGF mediates cardiac repair in mice after anoxic injury. Although homologs of hMYDGF are found in eukaryotes as distant as protozoans, its structure and function are unknown. Here we present the NMR solution structure of hMYDGF, which consists of a short α-helix and ten ß-strands distributed in three ß-sheets. Conserved residues map to the unstructured ERS, loops on the face opposite the ERS, and the surface of a cavity underneath the conserved loops. The only protein or portion of a protein known to have a similar fold is the base domain of VNN1. We suggest, in analogy to the tethering of the VNN1 nitrilase domain to the plasma membrane via its base domain, that MYDGF complexed to the KDEL receptor binds cargo via its conserved residues for transport to the ER.


Assuntos
Retículo Endoplasmático/metabolismo , Interleucinas/química , Sequência de Aminoácidos , Cálcio/metabolismo , Humanos , Concentração de Íons de Hidrogênio , Interleucinas/metabolismo , Espectroscopia de Ressonância Magnética , Modelos Moleculares , Filogenia , Domínios Proteicos , Proteínas Recombinantes/biossíntese , Homologia Estrutural de Proteína
12.
J Chem Theory Comput ; 14(12): 6722-6733, 2018 Dec 11.
Artigo em Inglês | MEDLINE | ID: mdl-30428257

RESUMO

In this work, we have developed an anisotropic polarizable model for the AMOEBA force field that is derived from electrostatic fitting on a gas phase water molecule as the primary approach to improve the many-body polarization model. We validate our approach using small to large water cluster benchmark data sets and ambient liquid water properties and through comparisons to a variational energy decomposition analysis breakdown of molecular interactions for water and water-ion trimer systems. We find that the accounting of anisotropy polarization for a single water molecule demonstrably improves the description of the many-body polarization energy in all cases. This study provides a proof of principle for extending our protocol for developing a general purpose anisotropic polarizable force field for other biological and material functional groups to better describe complex and asymmetric environments for which accurate polarization models are most needed.


Assuntos
Simulação de Dinâmica Molecular , Água/química , Anisotropia , Conformação Molecular , Eletricidade Estática , Termodinâmica
13.
J Mol Biol ; 428(4): 709-719, 2016 Feb 22.
Artigo em Inglês | MEDLINE | ID: mdl-26854760

RESUMO

Many proteins have small-molecule binding pockets that are not easily detectable in the ligand-free structures. These cryptic sites require a conformational change to become apparent; a cryptic site can therefore be defined as a site that forms a pocket in a holo structure, but not in the apo structure. Because many proteins appear to lack druggable pockets, understanding and accurately identifying cryptic sites could expand the set of drug targets. Previously, cryptic sites were identified experimentally by fragment-based ligand discovery and computationally by long molecular dynamics simulations and fragment docking. Here, we begin by constructing a set of structurally defined apo-holo pairs with cryptic sites. Next, we comprehensively characterize the cryptic sites in terms of their sequence, structure, and dynamics attributes. We find that cryptic sites tend to be as conserved in evolution as traditional binding pockets but are less hydrophobic and more flexible. Relying on this characterization, we use machine learning to predict cryptic sites with relatively high accuracy (for our benchmark, the true positive and false positive rates are 73% and 29%, respectively). We then predict cryptic sites in the entire structurally characterized human proteome (11,201 structures, covering 23% of all residues in the proteome). CryptoSite increases the size of the potentially "druggable" human proteome from ~40% to ~78% of disease-associated proteins. Finally, to demonstrate the utility of our approach in practice, we experimentally validate a cryptic site in protein tyrosine phosphatase 1B using a covalent ligand and NMR spectroscopy. The CryptoSite Web server is available at http://salilab.org/cryptosite.


Assuntos
Biologia Computacional/métodos , Proteínas/química , Proteínas/metabolismo , Proteoma/análise , Sítios de Ligação , Humanos , Aprendizado de Máquina , Conformação Proteica
14.
J Phys Chem B ; 120(37): 9811-32, 2016 09 22.
Artigo em Inglês | MEDLINE | ID: mdl-27513316

RESUMO

Advanced potential energy surfaces are defined as theoretical models that explicitly include many-body effects that transcend the standard fixed-charge, pairwise-additive paradigm typically used in molecular simulation. However, several factors relating to their software implementation have precluded their widespread use in condensed-phase simulations: the computational cost of the theoretical models, a paucity of approximate models and algorithmic improvements that can ameliorate their cost, underdeveloped interfaces and limited dissemination in computational code bases that are widely used in the computational chemistry community, and software implementations that have not kept pace with modern high-performance computing (HPC) architectures, such as multicore CPUs and modern graphics processing units (GPUs). In this Feature Article we review recent progress made in these areas, including well-defined polarization approximations and new multipole electrostatic formulations, novel methods for solving the mutual polarization equations and increasing the MD time step, combining linear-scaling electronic structure methods with new QM/MM methods that account for mutual polarization between the two regions, and the greatly improved software deployment of these models and methods onto GPU and CPU hardware platforms. We have now approached an era where multipole-based polarizable force fields can be routinely used to obtain computational results comparable to state-of-the-art density functional theory while reaching sampling statistics that are acceptable when compared to that obtained from simpler fixed partial charge force fields.


Assuntos
Algoritmos , Gráficos por Computador , Simulação de Dinâmica Molecular , Teoria Quântica , Software , Eletricidade Estática , Propriedades de Superfície
15.
PLoS One ; 6(4): e18535, 2011 Apr 07.
Artigo em Inglês | MEDLINE | ID: mdl-21490921

RESUMO

Neurons and glial cells in the developing brain arise from neural progenitor cells (NPCs). Nestin, an intermediate filament protein, is thought to be expressed exclusively by NPCs in the normal brain, and is replaced by the expression of proteins specific for neurons or glia in differentiated cells. Nestin expressing NPCs are found in the adult brain in the subventricular zone (SVZ) of the lateral ventricle and the subgranular zone (SGZ) of the dentate gyrus. While significant attention has been paid to studying NPCs in the SVZ and SGZ in the adult brain, relatively little attention has been paid to determining whether nestin-expressing neural cells (NECs) exist outside of the SVZ and SGZ. We therefore stained sections immunocytochemically from the adult rat and human brain for NECs, observed four distinct classes of these cells, and present here the first comprehensive report on these cells. Class I cells are among the smallest neural cells in the brain and are widely distributed. Class II cells are located in the walls of the aqueduct and third ventricle. Class IV cells are found throughout the forebrain and typically reside immediately adjacent to a neuron. Class III cells are observed only in the basal forebrain and closely related areas such as the hippocampus and corpus striatum. Class III cells resemble neurons structurally and co-express markers associated exclusively with neurons. Cell proliferation experiments demonstrate that Class III cells are not recently born. Instead, these cells appear to be mature neurons in the adult brain that express nestin. Neurons that express nestin are not supposed to exist in the brain at any stage of development. That these unique neurons are found only in brain regions involved in higher order cognitive function suggests that they may be remodeling their cytoskeleton in supporting the neural plasticity required for these functions.


Assuntos
Encéfalo/citologia , Encéfalo/metabolismo , Proteínas de Filamentos Intermediários/metabolismo , Proteínas do Tecido Nervoso/metabolismo , Neurônios/metabolismo , Adulto , Idoso , Idoso de 80 Anos ou mais , Animais , Células Cultivadas , Giro Denteado/metabolismo , Humanos , Imuno-Histoquímica , Masculino , Microscopia Confocal , Pessoa de Meia-Idade , Nestina , Ratos
16.
J Mol Biol ; 414(2): 289-302, 2011 Nov 25.
Artigo em Inglês | MEDLINE | ID: mdl-22001016

RESUMO

The CAPRI (Critical Assessment of Predicted Interactions) and CASP (Critical Assessment of protein Structure Prediction) experiments have demonstrated the power of community-wide tests of methodology in assessing the current state of the art and spurring progress in the very challenging areas of protein docking and structure prediction. We sought to bring the power of community-wide experiments to bear on a very challenging protein design problem that provides a complementary but equally fundamental test of current understanding of protein-binding thermodynamics. We have generated a number of designed protein-protein interfaces with very favorable computed binding energies but which do not appear to be formed in experiments, suggesting that there may be important physical chemistry missing in the energy calculations. A total of 28 research groups took up the challenge of determining what is missing: we provided structures of 87 designed complexes and 120 naturally occurring complexes and asked participants to identify energetic contributions and/or structural features that distinguish between the two sets. The community found that electrostatics and solvation terms partially distinguish the designs from the natural complexes, largely due to the nonpolar character of the designed interactions. Beyond this polarity difference, the community found that the designed binding surfaces were, on average, structurally less embedded in the designed monomers, suggesting that backbone conformational rigidity at the designed surface is important for realization of the designed function. These results can be used to improve computational design strategies, but there is still much to be learned; for example, one designed complex, which does form in experiments, was classified by all metrics as a nonbinder.


Assuntos
Modelos Moleculares , Proteínas/química , Sítios de Ligação , Ligação Proteica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA