Pesquisa | BVS Economia da Saúde

Computational design of novel Cas9 PAM-interacting domains using evolution-based modelling and structural quality assessment.

Malbranke, Cyril; Rostain, William; Depardieu, Florence; Cocco, Simona; Monasson, Rémi; Bikard, David.

PLoS Comput Biol ; 19(11): e1011621, 2023 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-37976326

RESUMO

We present here an approach to protein design that combines (i) scarce functional information such as experimental data (ii) evolutionary information learned from a natural sequence variants and (iii) physics-grounded modeling. Using a Restricted Boltzmann Machine (RBM), we learn a sequence model of a protein family. We use semi-supervision to leverage available functional information during the RBM training. We then propose a strategy to explore the protein representation space that can be informed by external models such as an empirical force-field method (FoldX). Our approach is applied to a domain of the Cas9 protein responsible for recognition of a short DNA motif. We experimentally assess the functionality of 71 variants generated to explore a range of RBM and FoldX energies. Sequences with as many as 50 differences (20% of the protein domain) to the wild-type retained functionality. Overall, 21/71 sequences designed with our method were functional. Interestingly, 6/71 sequences showed an improved activity in comparison with the original wild-type protein sequence. These results demonstrate the interest in further exploring the synergies between machine-learning of protein sequence representations and physics grounded modeling strategies informed by structural information.

Assuntos

Sistemas CRISPR-Cas , Proteínas , Proteínas/genética , Proteínas/química , Sequência de Aminoácidos , Aprendizado de Máquina , Aprendizagem

Improving sequence-based modeling of protein families using secondary-structure quality assessment.

Malbranke, Cyril; Bikard, David; Cocco, Simona; Monasson, Rémi.

Bioinformatics ; 37(22): 4083-4090, 2021 11 18.

Artigo em Inglês | MEDLINE | ID: mdl-34117879

RESUMO

MOTIVATION: Modeling of protein family sequence distribution from homologous sequence data recently received considerable attention, in particular for structure and function predictions, as well as for protein design. In particular, direct coupling analysis, a method to infer effective pairwise interactions between residues, was shown to capture important structural constraints and to successfully generate functional protein sequences. Building on this and other graphical models, we introduce a new framework to assess the quality of the secondary structures of the generated sequences with respect to reference structures for the family. RESULTS: We introduce two scoring functions characterizing the likeliness of the secondary structure of a protein sequence to match a reference structure, called Dot Product and Pattern Matching. We test these scores on published experimental protein mutagenesis and design dataset, and show improvement in the detection of nonfunctional sequences. We also show that use of these scores help rejecting nonfunctional sequences generated by graphical models (Restricted Boltzmann Machines) learned from homologous sequence alignments. AVAILABILITY AND IMPLEMENTATION: Data and code available at https://github.com/CyrilMa/ssqa. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Proteínas , Proteínas/química , Sequência de Aminoácidos , Alinhamento de Sequência , Estrutura Secundária de Proteína , Mutagênese

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA