RESUMEN
Proteins are typically represented by discrete atomic coordinates providing an accessible framework to describe different conformations. However, in some fields proteins are more accurately represented as near-continuous surfaces, as these are imprinted with geometric (shape) and chemical (electrostatics) features of the underlying protein structure. Protein surfaces are dependent on their chemical composition and, ultimately determine protein function, acting as the interface that engages in interactions with other molecules. In the past, such representations were utilized to compare protein structures on global and local scales and have shed light on functional properties of proteins. Here we describe RosettaSurf, a surface-centric computational design protocol, that focuses on the molecular surface shape and electrostatic properties as means for protein engineering, offering a unique approach for the design of proteins and their functions. The RosettaSurf protocol combines the explicit optimization of molecular surface features with a global scoring function during the sequence design process, diverging from the typical design approaches that rely solely on an energy scoring function. With this computational approach, we attempt to address a fundamental problem in protein design related to the design of functional sites in proteins, even when structurally similar templates are absent in the characterized structural repertoire. Surface-centric design exploits the premise that molecular surfaces are, to a certain extent, independent of the underlying sequence and backbone configuration, meaning that different sequences in different proteins may present similar surfaces. We benchmarked RosettaSurf on various sequence recovery datasets and showcased its design capabilities by generating epitope mimics that were biochemically validated. Overall, our results indicate that the explicit optimization of surface features may lead to new routes for the design of functional proteins.
Asunto(s)
Ingeniería de Proteínas , Proteínas , Algoritmos , Biología Computacional/métodos , Conformación Proteica , Ingeniería de Proteínas/métodos , Proteínas/química , Electricidad EstáticaRESUMEN
De novo protein design explores uncharted sequence and structure space to generate novel proteins not sampled by evolution. A main challenge in de novo design involves crafting "designable" structural templates to guide the sequence searches toward adopting target structures. We present a convolutional variational autoencoder that learns patterns of protein structure, dubbed Genesis. We coupled Genesis with trRosetta to design sequences for a set of protein folds and found that Genesis is capable of reconstructing native-like distance and angle distributions for five native folds and three novel, the so-called "dark-matter" folds as a demonstration of generalizability. We used a high-throughput assay to characterize the stability of the designs through protease resistance, obtaining encouraging success rates for folded proteins. Genesis enables exploration of the protein fold space within minutes, unrestricted by protein topologies. Our approach addresses the backbone designability problem, showing that small neural networks can efficiently learn structural patterns in proteins. A record of this paper's transparent peer review process is included in the supplemental information.
Asunto(s)
Aprendizaje Profundo , Pliegue de Proteína , Proteínas , Proteínas/química , Redes Neurales de la Computación , Conformación Proteica , Modelos Moleculares , AlgoritmosRESUMEN
This paper considers regression tasks involving high-dimensional multivariate processes whose structure is dependent on some known graph topology. We put forth a new definition of time-vertex wide-sense stationarity, or joint stationarity for short, that goes beyond product graphs. Joint stationarity helps by reducing the estimation variance and recovery complexity. In particular, for any jointly stationary process (a) one reliably learns the covariance structure from as little as a single realization of the process and (b) solves MMSE recovery problems, such as interpolation and denoising, in computational time nearly linear on the number of edges and timesteps. Experiments with three datasets suggest that joint stationarity can yield accuracy improvements in the recovery of high-dimensional processes evolving over a graph, even when the latter is only approximately known, or the process is not strictly stationary.