RESUMEN
A common challenge in drug design pertains to finding chemical modifications to a ligand that increases its affinity to the target protein. An underutilized advance is the increase in structural biology throughput, which has progressed from an artisanal endeavor to a monthly throughput of hundreds of different ligands against a protein in modern synchrotrons. However, the missing piece is a framework that turns high-throughput crystallography data into predictive models for ligand design. Here, we designed a simple machine learning approach that predicts protein-ligand affinity from experimental structures of diverse ligands against a single protein paired with biochemical measurements. Our key insight is using physics-based energy descriptors to represent protein-ligand complexes and a learning-to-rank approach that infers the relevant differences between binding modes. We ran a high-throughput crystallography campaign against the SARS-CoV-2 main protease (MPro), obtaining parallel measurements of over 200 protein-ligand complexes and their binding activities. This allows us to design one-step library syntheses which improved the potency of two distinct micromolar hits by over 10-fold, arriving at a noncovalent and nonpeptidomimetic inhibitor with 120 nM antiviral efficacy. Crucially, our approach successfully extends ligands to unexplored regions of the binding pocket, executing large and fruitful moves in chemical space with simple chemistry.
Asunto(s)
COVID-19 , Humanos , Ligandos , SARS-CoV-2 , Antivirales , BiologíaRESUMEN
Intracellular phase separation of proteins into biomolecular condensates is increasingly recognized as a process with a key role in cellular compartmentalization and regulation. Different hypotheses about the parameters that determine the tendency of proteins to form condensates have been proposed, with some of them probed experimentally through the use of constructs generated by sequence alterations. To broaden the scope of these observations, we established an in silico strategy for understanding on a global level the associations between protein sequence and phase behavior and further constructed machine-learning models for predicting protein liquid-liquid phase separation (LLPS). Our analysis highlighted that LLPS-prone proteins are more disordered, less hydrophobic, and of lower Shannon entropy than sequences in the Protein Data Bank or the Swiss-Prot database and that they show a fine balance in their relative content of polar and hydrophobic residues. To further learn in a hypothesis-free manner the sequence features underpinning LLPS, we trained a neural network-based language model and found that a classifier constructed on such embeddings learned the underlying principles of phase behavior at a comparable accuracy to a classifier that used knowledge-based features. By combining knowledge-based features with unsupervised embeddings, we generated an integrated model that distinguished LLPS-prone sequences both from structured proteins and from unstructured proteins with a lower LLPS propensity and further identified such sequences from the human proteome at a high accuracy. These results provide a platform rooted in molecular principles for understanding protein phase behavior. The predictor, termed DeePhase, is accessible from https://deephase.ch.cam.ac.uk/.
Asunto(s)
Secuencia de Aminoácidos , Aprendizaje Automático , Análisis de Secuencia de Proteína/métodos , Animales , Humanos , Interacciones Hidrofóbicas e HidrofílicasRESUMEN
The predictive capabilities of deep neural networks (DNNs) continue to evolve to increasingly impressive levels. However, it is still unclear how training procedures for DNNs succeed in finding parameters that produce good results for such high-dimensional and nonconvex loss functions. In particular, we wish to understand why simple optimization schemes, such as stochastic gradient descent, do not end up trapped in local minima with high loss values that would not yield useful predictions. We explain the optimizability of DNNs by characterizing the local minima and transition states of the loss-function landscape (LFL) along with their connectivity. We show that the LFL of a DNN in the shallow network or data-abundant limit is funneled, and thus easy to optimize. Crucially, in the opposite low-data/deep limit, although the number of minima increases, the landscape is characterized by many minima with similar loss values separated by low barriers. This organization is different from the hierarchical landscapes of structural glass formers and explains why minimization procedures commonly employed by the machine-learning community can navigate the LFL successfully and reach low-lying solutions.
RESUMEN
Predicting ligand biological activity is a key challenge in drug discovery. Ligand-based statistical approaches are often hampered by noise due to undersampling: The number of molecules known to be active or inactive is vastly less than the number of possible chemical features that might determine binding. We derive a statistical framework inspired by random matrix theory and combine the framework with high-quality negative data to discover important chemical differences between active and inactive molecules by disentangling undersampling noise. Our model outperforms standard benchmarks when tested against a set of challenging retrospective tests. We prospectively apply our model to the human muscarinic acetylcholine receptor M1, finding four experimentally confirmed agonists that are chemically dissimilar to all known ligands. The hit rate of our model is significantly higher than the state of the art. Our model can be interpreted and visualized to offer chemical insights about the molecular motifs that are synergistic or antagonistic to M1 agonism, which we have prospectively experimentally verified.
Asunto(s)
Descubrimiento de Drogas/estadística & datos numéricos , Modelos Estadísticos , Antagonistas Muscarínicos/química , Receptores Muscarínicos/química , Humanos , Ligandos , Antagonistas Muscarínicos/uso terapéutico , Receptores Muscarínicos/efectos de los fármacosRESUMEN
A key challenge for soft materials design and coarse-graining simulations is determining interaction potentials between components that give rise to desired condensed-phase structures. In theory, the Ornstein-Zernike equation provides an elegant framework for solving this inverse problem. Pioneering work in liquid state theory derived analytical closures for the framework. However, these analytical closures are approximations, valid only for specific classes of interaction potentials. In this work, we combine the physics of liquid state theory with machine learning to infer a closure directly from simulation data. The resulting closure is more accurate than commonly used closures across a broad range of interaction potentials.
RESUMEN
Electrolytes play an important role in a plethora of applications ranging from energy storage to biomaterials. Notwithstanding this, the structure of concentrated electrolytes remains enigmatic. Many theoretical approaches attempt to model the concentrated electrolyte by introducing the idea of ion pairs, with ions either being tightly "paired" with a counter-ion or "free" to screen charge. In this study, we reframe the problem into the language of computational statistics and test the null hypothesis that all ions share the same local environment. Applying the framework to molecular dynamics simulations, we find that this null hypothesis is not supported by data. Our statistical technique suggests the presence of two distinct local ionic environments at intermediate concentrations, whose differences surprisingly originate in like charge correlations rather than unlike charge attraction. Through considering the effect of these "aggregated" and "non-aggregated" states on bulk properties including effective ion concentration and dielectric constant, we identify a scaling relation between the effective screening length and theoretical Debye length, which applies across different dielectric constants and ion concentrations.
RESUMEN
Deep neural networks are workhorse models in machine learning with multiple layers of nonlinear functions composed in series. Their loss function is highly nonconvex, yet empirically even gradient descent minimization is sufficient to arrive at accurate and predictive models. It is hitherto unknown why deep neural networks are easily optimizable. We analyze the energy landscape of a spin glass model of deep neural networks using random matrix theory and algebraic geometry. We analytically show that the multilayered structure holds the key to optimizability: Fixing the number of parameters and increasing network depth, the number of stationary points in the loss function decreases, minima become more clustered in parameter space, and the trade-off between the depth and width of minima becomes less severe. Our analytical results are numerically verified through comparison with neural networks trained on a set of classical benchmark datasets. Our model uncovers generic design principles of machine learning models.
Asunto(s)
Aprendizaje Automático , Redes Neurales de la Computación , Dinámicas no Lineales , TermodinámicaRESUMEN
The development of molecular descriptors is a central challenge in cheminformatics. Most approaches use algorithms that extract atomic environments or end-to-end machine learning. However, a looming question is that how do these approaches compare with the critical eye of trained chemists. The CAS fingerprint engages expert chemists to curate chemical motifs, which they deem could influence bioactivity. In this paper, we benchmark the CAS fingerprint against commonly used fingerprints using a well-established benchmark set of 88 targets. We show that the CAS fingerprint outperforms most of the commonly used molecular fingerprints. Analysis of the CAS fingerprint reveals that experts tend to select features that are rarely reported in the literature, though not all rare features are selected. Our analysis also shows that the CAS fingerprint provides a different source of information compared to other commonly used fingerprints. These results suggest that anthropomorphic insights do have predictive power and highlight the importance of a chemist-in-the-loop approach in the era of machine learning.
Asunto(s)
Algoritmos , Aprendizaje Automático , QuimioinformáticaRESUMEN
Machine learning methods may have the potential to significantly accelerate drug discovery. However, the increasing rate of new methodological approaches being published in the literature raises the fundamental question of how models should be benchmarked and validated. We reanalyze the data generated by a recently published large-scale comparison of machine learning models for bioactivity prediction and arrive at a somewhat different conclusion. We show that the performance of support vector machines is competitive with that of deep learning methods. Additionally, using a series of numerical experiments, we question the relevance of area under the receiver operating characteristic curve as a metric in virtual screening. We further suggest that area under the precision-recall curve should be used in conjunction with the receiver operating characteristic curve. Our numerical experiments also highlight challenges in estimating the uncertainty in model performance via scaffold-split nested cross validation.
Asunto(s)
Aprendizaje Profundo , Descubrimiento de Drogas/métodos , Aprendizaje Automático , Algoritmos , Área Bajo la Curva , Benchmarking , Simulación por Computador , Descubrimiento de Drogas/normas , Descubrimiento de Drogas/estadística & datos numéricos , Evaluación Preclínica de Medicamentos , Humanos , Curva ROC , Máquina de Vectores de Soporte , Interfaz Usuario-ComputadorRESUMEN
Many biological systems are appropriately viewed as passive inclusions immersed in an active bath: from proteins on active membranes to microscopic swimmers confined by boundaries. The nonequilibrium forces exerted by the active bath on the inclusions or boundaries often regulate function, and such forces may also be exploited in artificial active materials. Nonetheless, the general phenomenology of these active forces remains elusive. We show that the fluctuation spectrum of the active medium, the partitioning of energy as a function of wavenumber, controls the phenomenology of force generation. We find that, for a narrow, unimodal spectrum, the force exerted by a nonequilibrium system on two embedded walls depends on the width and the position of the peak in the fluctuation spectrum, and oscillates between repulsion and attraction as a function of wall separation. We examine two apparently disparate examples: the Maritime Casimir effect and recent simulations of active Brownian particles. A key implication of our work is that important nonequilibrium interactions are encoded within the fluctuation spectrum. In this sense, the noise becomes the signal.
Asunto(s)
Simulación por Computador , Modelos Teóricos , Animales , Fenómenos Biofísicos , Biofisica , Membranas Artificiales , Movimiento (Física) , Estrés MecánicoRESUMEN
Rapid determination of whether a candidate compound will bind to a particular target receptor remains a stumbling block in drug discovery. We use an approach inspired by random matrix theory to decompose the known ligand set of a target in terms of orthogonal "signals" of salient chemical features, and distinguish these from the much larger set of ligand chemical features that are not relevant for binding to that particular target receptor. After removing the noise caused by finite sampling, we show that the similarity of an unknown ligand to the remaining, cleaned chemical features is a robust predictor of ligand-target affinity, performing as well or better than any algorithm in the published literature. We interpret our algorithm as deriving a model for the binding energy between a target receptor and the set of known ligands, where the underlying binding energy model is related to the classic Ising model in statistical physics.
Asunto(s)
Biología Computacional/métodos , Descubrimiento de Drogas/métodos , Algoritmos , Ligandos , Modelos Teóricos , Unión Proteica , Proteínas/químicaRESUMEN
The decay of correlations in ionic fluids is a classical problem in soft matter physics that underpins applications ranging from controlling colloidal self-assembly to batteries and supercapacitors. The conventional wisdom, based on analyzing a solvent-free electrolyte model, suggests that all correlation functions between species decay with a common decay length in the asymptotic far field limit. Nonetheless, a solvent is present in many electrolyte systems. We show using an analytical theory and molecular dynamics simulations that multiple decay lengths can coexist in the asymptotic limit as well as at intermediate distances once a hard sphere solvent is considered. Our analysis provides an explanation for the recently observed discontinuous change in the structural force across a thin film of ionic liquid-solvent mixtures as the composition is varied, as well as reframes recent debates in the literature about the screening length in concentrated electrolytes.
RESUMEN
The structure and interactions in electrolytes at high concentration have implications from energy storage to biomolecular interactions. However, many experimental observations are yet to be explained in these mixtures, which are far beyond the regime of validity of mean-field models. Here, we study the structural forces in a mixture of ionic liquid and solvent that is miscible in all proportions at room temperature. Using the surface force balance to measure the force between macroscopic smooth surfaces across the liquid mixtures, we uncover an abrupt increase in the wavelength above a threshold ion concentration. Below the threshold concentration, the wavelength is determined by the size of the solvent molecule, whereas above the threshold, it is the diameter of a cation-anion pair that determines the wavelength.
RESUMEN
In many contexts, it is extremely costly to perform enough high-quality experimental measurements to accurately parametrize a predictive quantitative model. However, it is often much easier to carry out large numbers of experiments that indicate whether each sample is above or below a given threshold. Can many such categorical or "coarse" measurements be combined with a much smaller number of high-resolution or "fine" measurements to yield accurate models? Here, we demonstrate an intuitive strategy, inspired by statistical physics, wherein the coarse measurements are used to identify the salient features of the data, while the fine measurements determine the relative importance of these features. A linear model is inferred from the fine measurements, augmented by a quadratic term that captures the correlation structure of the coarse data. We illustrate our strategy by considering the problems of predicting the antimalarial potency and aqueous solubility of small organic molecules from their 2D molecular structure.
RESUMEN
The interaction between charged objects in an electrolyte solution is a fundamental question in soft matter physics. It is well known that the electrostatic contribution to the interaction energy decays exponentially with object separation. Recent measurements reveal that, contrary to the conventional wisdom given by the classic Poisson-Boltzmann theory, the decay length increases with the ion concentration for concentrated electrolytes and can be an order of magnitude larger than the ion diameter in ionic liquids. We derive a simple scaling theory that explains this anomalous dependence of the decay length on the ion concentration. Our theory successfully collapses the decay lengths of a wide class of salts onto a single curve. A novel prediction of our theory is that the decay length increases linearly with the Bjerrum length, which we experimentally verify by surface force measurements. Moreover, we quantitatively relate the measured decay length to classic measurements of the activity coefficient in concentrated electrolytes, thus showing that the measured decay length is indeed a bulk property of the concentrated electrolyte as well as contributing a mechanistic insight into empirical activity coefficients.
RESUMEN
Reversible in operando control of friction is an unsolved challenge that is crucial to industrial tribology. Recent studies show that at low sliding velocities, this control can be achieved by applying an electric field across electrolyte lubricants. However, the phenomenology at high sliding velocities is yet unknown. In this paper, we investigate the hydrodynamic friction across electrolytes under shear beyond the transition to turbulence. We develop a novel, highly parallelised numerical method for solving the coupled Navier-Stokes Poisson-Nernst-Planck equation. Our results show that turbulent drag cannot be controlled across dilute electrolytes using static electric fields alone. The limitations of the Poisson-Nernst-Planck formalism hint at ways in which turbulent drag could be controlled using electric fields.
RESUMEN
Screening of a surface charge by an electrolyte and the resulting interaction energy between charged objects is of fundamental importance in scenarios from bio-molecular interactions to energy storage. The conventional wisdom is that the interaction energy decays exponentially with object separation and the decay length is a decreasing function of ion concentration; the interaction is thus negligible in a concentrated electrolyte. Contrary to this conventional wisdom, we have shown by surface force measurements that the decay length is an increasing function of ion concentration and Bjerrum length for concentrated electrolytes. In this paper we report surface force measurements to test directly the scaling of the screening length with Bjerrum length. Furthermore, we identify a relationship between the concentration dependence of this screening length and empirical measurements of activity coefficient and differential capacitance. The dependence of the screening length on the ion concentration and the Bjerrum length can be explained by a simple scaling conjecture based on the physical intuition that solvent molecules, rather than ions, are charge carriers in a concentrated electrolyte.
RESUMEN
Recent molecular dynamics simulations show that thermal gradients can induce electric fields in water that are comparable in magnitude to electric fields seen in ionic thin films and biomembranes. This surprising non-equilibrium phenomenon of thermomolecular orientation is also observed more generally in simulations of polar and non-polar size-asymmetric dumbbell fluids. However, a microscopic theory linking thermomolecular orientation and polarization to molecular properties is yet unknown. Here, we formulate an analytically solvable microscopic model of size-asymmetric dumbbell molecules in a temperature gradient using a mean-field, local equilibrium approach. Our theory reveals the relationship between the extent of thermomolecular orientation and polarization, and molecular volume, size anisotropy and dipole moment. Predictions of the theory agree quantitatively with molecular dynamics simulations. Crucially, our framework shows how thermomolecular orientation can be controlled and maximized by tuning microscopic molecular properties.
RESUMEN
A gap in understanding the link between continuum theories of ion transport in ionic liquids and the underlying microscopic dynamics has hindered the development of frameworks for transport phenomena in these concentrated electrolytes. Here, we construct a continuum theory for ion transport in ionic liquids by coarse graining a simple exclusion process of interacting particles on a lattice. The resulting dynamical equations can be written as a gradient flow with a mobility matrix that vanishes at high densities. This form of the mobility matrix gives rise to a charging behavior that is different to the one known for electrolytic solutions, but which agrees qualitatively with the phenomenology observed in experiments and simulations.
Asunto(s)
Líquidos Iónicos/química , Modelos Químicos , Aniones/química , Cationes/química , TemperaturaRESUMEN
Charging of a conducting tubular nanopore in a nanostructured electrode is treated using an exactly solvable 1D lattice model, including ion correlations screened by ion-image interactions. Analytical expressions are obtained for the accumulated charge and capacitance as a function of voltage. They show that the mechanism of charge storage, and the qualitative form of the capacitance-voltage curve, are sensitive to how favorable it is for ions to occupy the unpolarized pore, and the pore radius. Qualitative predictions of the theory are corroborated by Monte Carlo simulations. These results highlight the effect of ion affinity to unpolarized pores on the charge and energy storage in supercapacitors. Furthermore, they suggest that the question of the occupancy of unpolarized pores could be answered by measuring the capacitance-voltage dependence.