Pesquisa | Biblioteca Virtual em Saúde

1.

Geometric potentials from deep learning improve prediction of CDR H3 loop structures.

Ruffolo, Jeffrey A; Guerra, Carlos; Mahajan, Sai Pooja; Sulam, Jeremias; Gray, Jeffrey J.

Bioinformatics ; 36(Suppl_1): i268-i275, 2020 07 01.

Artigo em Inglês | MEDLINE | ID: mdl-32657412

RESUMO

MOTIVATION: Antibody structure is largely conserved, except for a complementarity-determining region featuring six variable loops. Five of these loops adopt canonical folds which can typically be predicted with existing methods, while the remaining loop (CDR H3) remains a challenge due to its highly diverse set of observed conformations. In recent years, deep neural networks have proven to be effective at capturing the complex patterns of protein structure. This work proposes DeepH3, a deep residual neural network that learns to predict inter-residue distances and orientations from antibody heavy and light chain sequence. The output of DeepH3 is a set of probability distributions over distances and orientation angles between pairs of residues. These distributions are converted to geometric potentials and used to discriminate between decoy structures produced by RosettaAntibody and predict new CDR H3 loop structures de novo. RESULTS: When evaluated on the Rosetta antibody benchmark dataset of 49 targets, DeepH3-predicted potentials identified better, same and worse structures [measured by root-mean-squared distance (RMSD) from the experimental CDR H3 loop structure] than the standard Rosetta energy function for 33, 6 and 10 targets, respectively, and improved the average RMSD of predictions by 32.1% (1.4 Å). Analysis of individual geometric potentials revealed that inter-residue orientations were more effective than inter-residue distances for discriminating near-native CDR H3 loops. When applied to de novo prediction of CDR H3 loop structures, DeepH3 achieves an average RMSD of 2.2 ± 1.1 Å on the Rosetta antibody benchmark. AVAILABILITY AND IMPLEMENTATION: DeepH3 source code and pre-trained model parameters are freely available at https://github.com/Graylab/deepH3-distances-orientations. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Aprendizado Profundo , Anticorpos , Regiões Determinantes de Complementaridade , Modelos Moleculares , Conformação Proteica

2.

Modeling of lamprey reticulospinal neurons: multiple distinct parameter sets yield realistic simulations.

Ruffolo, Jeffrey A; McClellan, Andrew D.

J Neurophysiol ; 124(3): 895-913, 2020 09 01.

Artigo em Inglês | MEDLINE | ID: mdl-32697608

RESUMO

For the lamprey and other vertebrates, reticulospinal (RS) neurons project descending axons to the spinal cord and activate motor networks to initiate locomotion and other behaviors. In the present study, a biophysically detailed computer model of lamprey RS neurons was constructed consisting of three compartments: dendritic, somatic, and axon initial segment (AIS). All compartments included passive channels. In addition, the soma and AIS had fast potassium and sodium channels. The soma included three additional voltage-gated ion channels (slow sodium and high- and low-voltage-activated calcium) and calcium-activated potassium channels. An initial manually adjusted default parameter set, which was based, in part, on modified parameters from models of lamprey spinal neurons, generated simulations of single action potentials and repetitive firing that scored favorably (0.658; maximum = 0.964) compared with experimentally derived properties of lamprey RS neurons. Subsequently, a dual-annealing search paradigm identified 4,302 viable parameter sets at local maxima within parameter space that yielded higher scores than the default parameter set, including many with much higher scores of approximately 0.85-0.87 (i.e., ~30% improvement). In addition, 5- and 2-conductance grid searches identified a relatively large number of viable parameters sets for which significant correlations were present between maximum conductances for pairs of ion channels. The present results indicated that multiple model parameter sets ("solutions") generated action potentials and repetitive firing that mimicked many of the properties of lamprey RS neurons. To our knowledge, this is the first study to systematically explore parameter space for a biophysically detailed model of lamprey RS neurons.NEW & NOTEWORTHY A computer model of lamprey reticulospinal neurons with a default parameter set produced simulations of action potentials and repetitive firing that scored favorably compared with the properties of these neurons. A dual-annealing search algorithm explored ~50 million parameter sets and identified 4,302 distinct viable parameter sets within parameter space that yielded higher/much higher scores than the default parameter set. In addition, 5- and 2-conductance grid searches identified significant correlations between maximum conductances for pairs of ion channels.

Assuntos

Potenciais de Ação/fisiologia , Simulação por Computador , Lampreias/fisiologia , Locomoção/fisiologia , Modelos Biológicos , Rede Nervosa/fisiologia , Neurônios/fisiologia , Medula Espinal/fisiologia , Animais , Comportamento Animal/fisiologia , Canais de Potássio/fisiologia , Canais de Sódio/fisiologia , Medula Espinal/citologia

3.

Flexible protein-protein docking with a multitrack iterative transformer.

Chu, Lee-Shin; Ruffolo, Jeffrey A; Harmalkar, Ameya; Gray, Jeffrey J.

Protein Sci ; 33(2): e4862, 2024 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-38148272

RESUMO

Conventional protein-protein docking algorithms usually rely on heavy candidate sampling and reranking, but these steps are time-consuming and hinder applications that require high-throughput complex structure prediction, for example, structure-based virtual screening. Existing deep learning methods for protein-protein docking, despite being much faster, suffer from low docking success rates. In addition, they simplify the problem to assume no conformational changes within any protein upon binding (rigid docking). This assumption precludes applications when binding-induced conformational changes play a role, such as allosteric inhibition or docking from uncertain unbound model structures. To address these limitations, we present GeoDock, a multitrack iterative transformer network to predict a docked structure from separate docking partners. Unlike deep learning models for protein structure prediction that input multiple sequence alignments, GeoDock inputs just the sequences and structures of the docking partners, which suits the tasks when the individual structures are given. GeoDock is flexible at the protein residue level, allowing the prediction of conformational changes upon binding. On the Database of Interacting Protein Structures (DIPS) test set, GeoDock achieves a 43% top-1 success rate, outperforming all other tested methods. However, in the standard DIPS train/test splits, we discovered contamination of close homologs in the training set. After decontaminating the training set, the success rate is 31%. On the DB5.5 test set and a benchmark dataset of antibody-antigen complexes, GeoDock outperforms the deep learning models trained using the same dataset but falls behind most of the conventional methods and AlphaFold-Multimer. GeoDock attains an average inference speed of under 1 s on a single GPU, enabling its application in large-scale structure screening. Although binding-induced conformational changes are still a challenge owing to limited training and evaluation data, our architecture sets up the foundation to capture this backbone flexibility. Code and a demonstration Jupyter notebook are available at https://github.com/Graylab/GeoDock.

Assuntos

Algoritmos , Proteínas , Salicilatos , Conformação Proteica , Ligação Proteica , Proteínas/química , Simulação de Acoplamento Molecular

4.

Toward enhancement of antibody thermostability and affinity by computational design in the absence of antigen.

Hutchinson, Mark; Ruffolo, Jeffrey A; Haskins, Nantaporn; Iannotti, Michael; Vozza, Giuliana; Pham, Tony; Mehzabeen, Nurjahan; Shandilya, Harini; Rickert, Keith; Croasdale-Wood, Rebecca; Damschroder, Melissa; Fu, Ying; Dippel, Andrew; Gray, Jeffrey J; Kaplan, Gilad.

MAbs ; 16(1): 2362775, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38899735

RESUMO

Over the past two decades, therapeutic antibodies have emerged as a rapidly expanding domain within the field of biologics. In silico tools that can streamline the process of antibody discovery and optimization are critical to support a pipeline that is growing more numerous and complex every year. High-quality structural information remains critical for the antibody optimization process, but antibody-antigen complex structures are often unavailable and in silico antibody docking methods are still unreliable. In this study, DeepAb, a deep learning model for predicting antibody Fv structure directly from sequence, was used in conjunction with single-point experimental deep mutational scanning (DMS) enrichment data to design 200 potentially optimized variants of an anti-hen egg lysozyme (HEL) antibody. We sought to determine whether DeepAb-designed variants containing combinations of beneficial mutations from the DMS exhibit enhanced thermostability and whether this optimization affected their developability profile. The 200 variants were produced through a robust high-throughput method and tested for thermal and colloidal stability (Tonset, Tm, Tagg), affinity (KD) relative to the parental antibody, and for developability parameters (nonspecific binding, aggregation propensity, self-association). Of the designed clones, 91% and 94% exhibited increased thermal and colloidal stability and affinity, respectively. Of these, 10% showed a significantly increased affinity for HEL (5- to 21-fold increase) and thermostability (>2.5C increase in Tm1), with most clones retaining the favorable developability profile of the parental antibody. Additional in silico tests suggest that these methods would enrich for binding affinity even without first collecting experimental DMS measurements. These data open the possibility of in silico antibody optimization without the need to predict the antibody-antigen interface, which is notoriously difficult in the absence of crystal structures.

Assuntos

Afinidade de Anticorpos , Muramidase , Muramidase/química , Muramidase/imunologia , Muramidase/genética , Estabilidade Proteica , Humanos , Antígenos/imunologia , Antígenos/química , Animais , Simulação por Computador

5.

Contextual protein and antibody encodings from equivariant graph transformers.

Mahajan, Sai Pooja; Ruffolo, Jeffrey A; Gray, Jeffrey J.

bioRxiv ; 2023 Jul 29.

Artigo em Inglês | MEDLINE | ID: mdl-37503113

RESUMO

The optimal residue identity at each position in a protein is determined by its structural, evolutionary, and functional context. We seek to learn the representation space of the optimal amino-acid residue in different structural contexts in proteins. Inspired by masked language modeling (MLM), our training aims to transduce learning of amino-acid labels from non-masked residues to masked residues in their structural environments and from general (e.g., a residue in a protein) to specific contexts (e.g., a residue at the interface of a protein or antibody complex). Our results on native sequence recovery and forward folding with AlphaFold2 suggest that the amino acid label for a protein residue may be determined from its structural context alone (i.e., without knowledge of the sequence labels of surrounding residues). We further find that the sequence space sampled from our masked models recapitulate the evolutionary sequence neighborhood of the wildtype sequence. Remarkably, the sequences conditioned on highly plastic structures recapitulate the conformational flexibility encoded in the structures. Furthermore, maximum-likelihood interfaces designed with masked models recapitulate wildtype binding energies for a wide range of protein interfaces and binding strengths. We also propose and compare fine-tuning strategies to train models for designing CDR loops of antibodies in the structural context of the antibody-antigen interface by leveraging structural databases for proteins, antibodies (synthetic and experimental) and protein-protein complexes. We show that pretraining on more general contexts improves native sequence recovery for antibody CDR loops, especially for the hypervariable CDR H3, while fine-tuning helps to preserve patterns observed in special contexts.

6.

IgLM: Infilling language modeling for antibody sequence design.

Shuai, Richard W; Ruffolo, Jeffrey A; Gray, Jeffrey J.

Cell Syst ; 14(11): 979-989.e4, 2023 11 15.

Artigo em Inglês | MEDLINE | ID: mdl-37909045

RESUMO

Discovery and optimization of monoclonal antibodies for therapeutic applications relies on large sequence libraries but is hindered by developability issues such as low solubility, high aggregation, and high immunogenicity. Generative language models, trained on millions of protein sequences, are a powerful tool for the on-demand generation of realistic, diverse sequences. We present the Immunoglobulin Language Model (IgLM), a deep generative language model for creating synthetic antibody libraries. Compared with prior methods that leverage unidirectional context for sequence generation, IgLM formulates antibody design based on text-infilling in natural language, allowing it to re-design variable-length spans within antibody sequences using bidirectional context. We trained IgLM on 558 million (M) antibody heavy- and light-chain variable sequences, conditioning on each sequence's chain type and species of origin. We demonstrate that IgLM can generate full-length antibody sequences from a variety of species and its infilling formulation allows it to generate infilled complementarity-determining region (CDR) loop libraries with improved in silico developability profiles. A record of this paper's transparent peer review process is included in the supplemental information.

Assuntos

Regiões Determinantes de Complementaridade , Biblioteca de Peptídeos , Sequência de Aminoácidos , Regiões Determinantes de Complementaridade/genética , Anticorpos Monoclonais

7.

Flexible Protein-Protein Docking with a Multi-Track Iterative Transformer.

Chu, Lee-Shin; Ruffolo, Jeffrey A; Harmalkar, Ameya; Gray, Jeffrey J.

bioRxiv ; 2023 Jul 01.

Artigo em Inglês | MEDLINE | ID: mdl-37425754

RESUMO

Conventional protein-protein docking algorithms usually rely on heavy candidate sampling and re-ranking, but these steps are time-consuming and hinder applications that require high-throughput complex structure prediction, e.g., structure-based virtual screening. Existing deep learning methods for protein-protein docking, despite being much faster, suffer from low docking success rates. In addition, they simplify the problem to assume no conformational changes within any protein upon binding (rigid docking). This assumption precludes applications when binding-induced conformational changes play a role, such as allosteric inhibition or docking from uncertain unbound model structures. To address these limitations, we present GeoDock, a multi-track iterative transformer network to predict a docked structure from separate docking partners. Unlike deep learning models for protein structure prediction that input multiple sequence alignments (MSAs), GeoDock inputs just the sequences and structures of the docking partners, which suits the tasks when the individual structures are given. GeoDock is flexible at the protein residue level, allowing the prediction of conformational changes upon binding. For a benchmark set of rigid targets, GeoDock obtains a 41% success rate, outperforming all the other tested methods. For a more challenging benchmark set of flexible targets, GeoDock achieves a similar number of top-model successes as the traditional method ClusPro [1], but fewer than ReplicaDock2 [2]. GeoDock attains an average inference speed of under one second on a single GPU, enabling its application in large-scale structure screening. Although binding-induced conformational changes are still a challenge owing to limited training and evaluation data, our architecture sets up the foundation to capture this backbone flexibility. Code and a demonstration Jupyter notebook are available at https://github.com/Graylab/GeoDock.

8.

Fast, accurate antibody structure prediction from deep learning on massive set of natural antibodies.

Ruffolo, Jeffrey A; Chu, Lee-Shin; Mahajan, Sai Pooja; Gray, Jeffrey J.

Nat Commun ; 14(1): 2389, 2023 04 25.

Artigo em Inglês | MEDLINE | ID: mdl-37185622

RESUMO

Antibodies have the capacity to bind a diverse set of antigens, and they have become critical therapeutics and diagnostic molecules. The binding of antibodies is facilitated by a set of six hypervariable loops that are diversified through genetic recombination and mutation. Even with recent advances, accurate structural prediction of these loops remains a challenge. Here, we present IgFold, a fast deep learning method for antibody structure prediction. IgFold consists of a pre-trained language model trained on 558 million natural antibody sequences followed by graph networks that directly predict backbone atom coordinates. IgFold predicts structures of similar or better quality than alternative methods (including AlphaFold) in significantly less time (under 25 s). Accurate structure prediction on this timescale makes possible avenues of investigation that were previously infeasible. As a demonstration of IgFold's capabilities, we predicted structures for 1.4 million paired antibody sequences, providing structural insights to 500-fold more antibodies than have experimentally determined structures.

Assuntos

Aprendizado Profundo , Conformação Proteica , Anticorpos/química , Regiões Determinantes de Complementaridade/química , Antígenos

9.

ProGen2: Exploring the boundaries of protein language models.

Nijkamp, Erik; Ruffolo, Jeffrey A; Weinstein, Eli N; Naik, Nikhil; Madani, Ali.

Cell Syst ; 14(11): 968-978.e3, 2023 11 15.

Artigo em Inglês | MEDLINE | ID: mdl-37909046

RESUMO

Attention-based models trained on protein sequences have demonstrated incredible success at classification and generation tasks relevant for artificial-intelligence-driven protein design. However, we lack a sufficient understanding of how very large-scale models and data play a role in effective protein model development. We introduce a suite of protein language models, named ProGen2, that are scaled up to 6.4B parameters and trained on different sequence datasets drawn from over a billion proteins from genomic, metagenomic, and immune repertoire databases. ProGen2 models show state-of-the-art performance in capturing the distribution of observed evolutionary sequences, generating novel viable sequences, and predicting protein fitness without additional fine-tuning. As large model sizes and raw numbers of protein sequences continue to become more widely accessible, our results suggest that a growing emphasis needs to be placed on the data distribution provided to a protein sequence model. Our models and code are open sourced for widespread adoption in protein engineering. A record of this paper's Transparent Peer Review process is included in the supplemental information.

Assuntos

Inteligência Artificial , Proteínas , Proteínas/genética , Sequência de Aminoácidos , Idioma , Bases de Dados Factuais

10.

Antibody structure prediction using interpretable deep learning.

Ruffolo, Jeffrey A; Sulam, Jeremias; Gray, Jeffrey J.

Patterns (N Y) ; 3(2): 100406, 2022 Feb 11.

Artigo em Inglês | MEDLINE | ID: mdl-35199061

RESUMO

Therapeutic antibodies make up a rapidly growing segment of the biologics market. However, rational design of antibodies is hindered by reliance on experimental methods for determining antibody structures. Here, we present DeepAb, a deep learning method for predicting accurate antibody FV structures from sequence. We evaluate DeepAb on a set of structurally diverse, therapeutically relevant antibodies and find that our method consistently outperforms the leading alternatives. Previous deep learning methods have operated as "black boxes" and offered few insights into their predictions. By introducing a directly interpretable attention mechanism, we show our network attends to physically important residue pairs (e.g., proximal aromatics and key hydrogen bonding interactions). Finally, we present a novel mutant scoring metric derived from network confidence and show that for a particular antibody, all eight of the top-ranked mutations improve binding affinity. This model will be useful for a broad range of antibody prediction and design tasks.

11.

Simultaneous prediction of antibody backbone and side-chain conformations with deep learning.

Akpinaroglu, Deniz; Ruffolo, Jeffrey A; Mahajan, Sai Pooja; Gray, Jeffrey J.

PLoS One ; 17(6): e0258173, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35704640

RESUMO

Antibody engineering is becoming increasingly popular in medicine for the development of diagnostics and immunotherapies. Antibody function relies largely on the recognition and binding of antigenic epitopes via the loops in the complementarity determining regions. Hence, accurate high-resolution modeling of these loops is essential for effective antibody engineering and design. Deep learning methods have previously been shown to effectively predict antibody backbone structures described as a set of inter-residue distances and orientations. However, antigen binding is also dependent on the specific conformations of surface side-chains. To address this shortcoming, we created DeepSCAb: a deep learning method that predicts inter-residue geometries as well as side-chain dihedrals of the antibody variable fragment. The network requires only sequence as input, rendering it particularly useful for antibodies without any known backbone conformations. Rotamer predictions use an interpretable self-attention layer, which learns to identify structurally conserved anchor positions across several species. We evaluate the performance of the model for discriminating near-native structures from sets of decoys and find that DeepSCAb outperforms similar methods lacking side-chain context. When compared to alternative rotamer repacking methods, which require an input backbone structure, DeepSCAb predicts side-chain conformations competitively. Our findings suggest that DeepSCAb improves antibody structure prediction with accurate side-chain modeling and is adaptable to applications in docking of antibody-antigen complexes and design of new therapeutic antibody sequences.

Assuntos

Aprendizado Profundo , Complexo Antígeno-Anticorpo , Conformação Proteica , Homologia Estrutural de Proteína

12.

Hallucinating structure-conditioned antibody libraries for target-specific binders.

Mahajan, Sai Pooja; Ruffolo, Jeffrey A; Frick, Rahel; Gray, Jeffrey J.

Front Immunol ; 13: 999034, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-36341416

RESUMO

Antibodies are widely developed and used as therapeutics to treat cancer, infectious disease, and inflammation. During development, initial leads routinely undergo additional engineering to increase their target affinity. Experimental methods for affinity maturation are expensive, laborious, and time-consuming and rarely allow the efficient exploration of the relevant design space. Deep learning (DL) models are transforming the field of protein engineering and design. While several DL-based protein design methods have shown promise, the antibody design problem is distinct, and specialized models for antibody design are desirable. Inspired by hallucination frameworks that leverage accurate structure prediction DL models, we propose the FvHallucinator for designing antibody sequences, especially the CDR loops, conditioned on an antibody structure. Such a strategy generates targeted CDR libraries that retain the conformation of the binder and thereby the mode of binding to the epitope on the antigen. On a benchmark set of 60 antibodies, FvHallucinator generates sequences resembling natural CDRs and recapitulates perplexity of canonical CDR clusters. Furthermore, the FvHallucinator designs amino acid substitutions at the VH-VL interface that are enriched in human antibody repertoires and therapeutic antibodies. We propose a pipeline that screens FvHallucinator designs to obtain a library enriched in binders for an antigen of interest. We apply this pipeline to the CDR H3 of the Trastuzumab-HER2 complex to generate in silico designs predicted to improve upon the binding affinity and interfacial properties of the original antibody. Thus, the FvHallucinator pipeline enables generation of inexpensive, diverse, and targeted antibody libraries enriched in binders for antibody affinity maturation.

Assuntos

Anticorpos , Regiões Determinantes de Complementaridade , Humanos , Regiões Determinantes de Complementaridade/química , Sequência de Aminoácidos , Afinidade de Anticorpos , Antígenos , Alucinações

13.

Designing proteins with language models.

Ruffolo, Jeffrey A; Madani, Ali.

Nat Biotechnol ; 42(2): 200-202, 2024 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-38361067

Assuntos

Engenharia de Proteínas

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA