RESUMO
Proteins play a central role in biology from immune recognition to brain activity. While major advances in machine learning have improved our ability to predict protein structure from sequence, determining protein function from its sequence or structure remains a major challenge. Here, we introduce holographic convolutional neural network (H-CNN) for proteins, which is a physically motivated machine learning approach to model amino acid preferences in protein structures. H-CNN reflects physical interactions in a protein structure and recapitulates the functional information stored in evolutionary data. H-CNN accurately predicts the impact of mutations on protein stability and binding of protein complexes. Our interpretable computational model for protein structure-function maps could guide design of novel proteins with desired function.
Assuntos
Algoritmos , Redes Neurais de Computação , Proteínas/genética , Aprendizado de Máquina , AminoácidosRESUMO
Evolutionary algorithms, inspired by natural evolution, aim to optimize difficult objective functions without computing derivatives. Here we detail the relationship between classical population genetics of quantitative traits and evolutionary optimization, and formulate a new evolutionary algorithm. Optimization of a continuous objective function is analogous to searching for high fitness phenotypes on a fitness landscape. We describe how natural selection moves a population along the non-Euclidean gradient that is induced by the population on the fitness landscape (the natural gradient). We show how selection is related to Newton's method in optimization under quadratic fitness landscapes, and how selection increases fitness at the cost of reducing diversity. We describe the generation of new phenotypes and introduce an operator that recombines the whole population to generate variants. Finally, we introduce a proof-of-principle algorithm that combines natural selection, our recombination operator, and an adaptive method to increase selection and find the optimum. The algorithm is extremely simple in implementation; it has no matrix inversion or factorization, does not require storing a covariance matrix, and may form the basis of more general model-based optimization algorithms with natural gradient updates.
RESUMO
Change-point analysis is a flexible and computationally tractable tool for the analysis of times series data from systems that transition between discrete states and whose observables are corrupted by noise. The change point algorithm is used to identify the time indices (change points) at which the system transitions between these discrete states. We present a unified information-based approach to testing for the existence of change points. This new approach reconciles two previously disparate approaches to change-point analysis (frequentist and information based) for testing transitions between states. The resulting method is statistically principled, parameter and prior free, and widely applicable to a wide range of change-point problems.
RESUMO
Infusion of broadly neutralizing antibodies (bNAbs) has shown promise as an alternative to anti-retroviral therapy against HIV. A key challenge is to suppress viral escape, which is more effectively achieved with a combination of bNAbs. Here, we propose a computational approach to predict the efficacy of a bNAb therapy based on the population genetics of HIV escape, which we parametrize using high-throughput HIV sequence data from bNAb-naive patients. By quantifying the mutational target size and the fitness cost of HIV-1 escape from bNAbs, we predict the distribution of rebound times in three clinical trials. We show that a cocktail of three bNAbs is necessary to effectively suppress viral escape, and predict the optimal composition of such bNAb cocktail. Our results offer a rational therapy design for HIV, and show how genetic data can be used to predict treatment outcomes and design new approaches to pathogenic control.
Assuntos
Infecções por HIV , HIV-1 , Anticorpos Neutralizantes , Anticorpos Amplamente Neutralizantes , Anticorpos Anti-HIV , Infecções por HIV/tratamento farmacológico , HIV-1/genética , HumanosRESUMO
We expand upon a natural analogy between Bayesian statistics and statistical physics in which sample size corresponds to inverse temperature. This analogy motivates the definition of two statistical quantities: a learning capacity and a Gibbs entropy. The analysis of the learning capacity, corresponding to the heat capacity in thermal physics, leads to insight into the mechanism of learning and explains why some models have anomalously high learning performance. We explore the properties of the learning capacity in a number of examples, including a sloppy model. Next, we propose that the Gibbs entropy provides a natural device for counting distinguishable distributions in the context of Bayesian inference. We use this device to define a generalized principle of indifference in which every distinguishable model is assigned equal a priori probability. This principle results in a solution to a long-standing problem in Bayesian inference: the definition of an objective or uninformative prior. A key characteristic of this approach is that it can be applied to analyses where the model dimension is unknown and circumvents the automatic rejection of higher-dimensional models in Bayesian inference.