Búsqueda | Portal Regional de la BVS Paraguay

Computational design of novel Cas9 PAM-interacting domains using evolution-based modelling and structural quality assessment.

Malbranke, Cyril; Rostain, William; Depardieu, Florence; Cocco, Simona; Monasson, Rémi; Bikard, David.

PLoS Comput Biol ; 19(11): e1011621, 2023 Nov.

Artículo en Inglés | MEDLINE | ID: mdl-37976326

RESUMEN

We present here an approach to protein design that combines (i) scarce functional information such as experimental data (ii) evolutionary information learned from a natural sequence variants and (iii) physics-grounded modeling. Using a Restricted Boltzmann Machine (RBM), we learn a sequence model of a protein family. We use semi-supervision to leverage available functional information during the RBM training. We then propose a strategy to explore the protein representation space that can be informed by external models such as an empirical force-field method (FoldX). Our approach is applied to a domain of the Cas9 protein responsible for recognition of a short DNA motif. We experimentally assess the functionality of 71 variants generated to explore a range of RBM and FoldX energies. Sequences with as many as 50 differences (20% of the protein domain) to the wild-type retained functionality. Overall, 21/71 sequences designed with our method were functional. Interestingly, 6/71 sequences showed an improved activity in comparison with the original wild-type protein sequence. These results demonstrate the interest in further exploring the synergies between machine-learning of protein sequence representations and physics grounded modeling strategies informed by structural information.

Asunto(s)

Sistemas CRISPR-Cas , Proteínas , Proteínas/genética , Proteínas/química , Secuencia de Aminoácidos , Aprendizaje Automático , Aprendizaje

Machine learning for evolutionary-based and physics-inspired protein design: Current and future synergies.

Malbranke, Cyril; Bikard, David; Cocco, Simona; Monasson, Rémi; Tubiana, Jérôme.

Curr Opin Struct Biol ; 80: 102571, 2023 06.

Artículo en Inglés | MEDLINE | ID: mdl-36947951

RESUMEN

Computational protein design facilitates the discovery of novel proteins with prescribed structure and functionality. Exciting designs were recently reported using novel data-driven methodologies that can be roughly divided into two categories: evolutionary-based and physics-inspired approaches. The former infer characteristic sequence features shared by sets of evolutionary-related proteins, such as conserved or coevolving positions, and recombine them to generate candidates with similar structure and function. The latter approaches estimate key biochemical properties, such as structure free energy, conformational entropy, or binding affinities using machine learning surrogates, and optimize them to yield improved designs. Here, we review recent progress along both tracks, discuss their strengths and weaknesses, and highlight opportunities for synergistic approaches.

Asunto(s)

Aprendizaje Automático , Proteínas , Proteínas/química , Física , Bases de Datos de Proteínas

Improving sequence-based modeling of protein families using secondary-structure quality assessment.

Malbranke, Cyril; Bikard, David; Cocco, Simona; Monasson, Rémi.

Bioinformatics ; 37(22): 4083-4090, 2021 11 18.

Artículo en Inglés | MEDLINE | ID: mdl-34117879

RESUMEN

MOTIVATION: Modeling of protein family sequence distribution from homologous sequence data recently received considerable attention, in particular for structure and function predictions, as well as for protein design. In particular, direct coupling analysis, a method to infer effective pairwise interactions between residues, was shown to capture important structural constraints and to successfully generate functional protein sequences. Building on this and other graphical models, we introduce a new framework to assess the quality of the secondary structures of the generated sequences with respect to reference structures for the family. RESULTS: We introduce two scoring functions characterizing the likeliness of the secondary structure of a protein sequence to match a reference structure, called Dot Product and Pattern Matching. We test these scores on published experimental protein mutagenesis and design dataset, and show improvement in the detection of nonfunctional sequences. We also show that use of these scores help rejecting nonfunctional sequences generated by graphical models (Restricted Boltzmann Machines) learned from homologous sequence alignments. AVAILABILITY AND IMPLEMENTATION: Data and code available at https://github.com/CyrilMa/ssqa. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Proteínas , Proteínas/química , Secuencia de Aminoácidos , Alineación de Secuencia , Estructura Secundaria de Proteína , Mutagénesis

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA