RESUMEN
T cells are essential immune cells responsible for identifying and eliminating pathogens. Through interactions between their T-cell antigen receptors (TCRs) and antigens presented by major histocompatibility complex molecules (MHCs) or MHC-like molecules, T cells discriminate foreign and self peptides. Determining the fundamental principles that govern these interactions has important implications in numerous medical contexts. However, reconstructing a map between T cells and their antagonist antigens remains an open challenge for the field of immunology, and success of in silico reconstructions of this relationship has remained incremental. In this Perspective, we discuss the role that new state-of-the-art deep-learning models for predicting protein structure may play in resolving some of the unanswered questions the field faces linking TCR and peptide-MHC properties to T-cell specificity. We provide a comprehensive overview of structural databases and the evolution of predictive models, and highlight the breakthrough AlphaFold provided the field.
Asunto(s)
Inmunidad Adaptativa , Receptores de Antígenos de Linfocitos T , Humanos , Receptores de Antígenos de Linfocitos T/inmunología , Receptores de Antígenos de Linfocitos T/metabolismo , Receptores de Antígenos de Linfocitos T/química , Inmunidad Celular , Conformación Proteica , Linfocitos T/inmunología , Aprendizaje Profundo , Modelos Moleculares , AnimalesRESUMEN
Intrathecal synthesis of central nervous system (CNS)-reactive autoantibodies is observed across patients with autoimmune encephalitis (AE), who show multiple residual neurobehavioral deficits and relapses despite immunotherapies. We leveraged two common forms of AE, mediated by leucine-rich glioma inactivated-1 (LGI1) and contactin-associated protein-like 2 (CASPR2) antibodies, as human models to comprehensively reconstruct and profile cerebrospinal fluid (CSF) B cell receptor (BCR) characteristics. We hypothesized that the resultant observations would both inform the observed therapeutic gap and determine the contribution of intrathecal maturation to pathogenic B cell lineages. From the CSF of three patients, 381 cognate-paired IgG BCRs were isolated by cell sorting and scRNA-seq, and 166 expressed as monoclonal antibodies (mAbs). Sixty-two percent of mAbs from singleton BCRs reacted with either LGI1 or CASPR2 and, strikingly, this rose to 100% of cells in clonal groups with ≥4 members. These autoantigen-reactivities were more concentrated within antibody-secreting cells (ASCs) versus B cells (P < 0.0001), and both these cell types were more differentiated than LGI1- and CASPR2-unreactive counterparts. Despite greater differentiation, autoantigen-reactive cells had acquired few mutations intrathecally and showed minimal variation in autoantigen affinities within clonal expansions. Also, limited CSF T cell receptor clonality was observed. In contrast, a comparison of germline-encoded BCRs versus the founder intrathecal clone revealed marked gains in both affinity and mutational distances (P = 0.004 and P < 0.0001, respectively). Taken together, in patients with LGI1 and CASPR2 antibody encephalitis, our results identify CSF as a compartment with a remarkably high frequency of clonally expanded autoantigen-reactive ASCs whose BCR maturity appears dominantly acquired outside the CNS.
Asunto(s)
Enfermedades Autoinmunes del Sistema Nervioso , Encefalitis , Glioma , Enfermedad de Hashimoto , Humanos , Leucina , Péptidos y Proteínas de Señalización Intracelular , Recurrencia Local de Neoplasia , Autoanticuerpos , AutoantígenosRESUMEN
Antibodies are key proteins of the adaptive immune system, and there exists a large body of academic literature and patents dedicated to their study and concomitant conversion into therapeutics, diagnostics, or reagents. These documents often contain extensive functional characterisations of the sets of antibodies they describe. However, leveraging these heterogeneous reports, for example to offer insights into the properties of query antibodies of interest, is currently challenging as there is no central repository through which this wide corpus can be mined by sequence or structure. Here, we present PLAbDab (the Patent and Literature Antibody Database), a self-updating repository containing over 150,000 paired antibody sequences and 3D structural models, of which over 65 000 are unique. We describe the methods used to extract, filter, pair, and model the antibodies in PLAbDab, and showcase how PLAbDab can be searched by sequence, structure, or keyword. PLAbDab uses include annotating query antibodies with potential antigen information from similar entries, analysing structural models of existing antibodies to identify modifications that could improve their properties, and facilitating the compilation of bespoke datasets of antibody sequences/structures that bind to a specific antigen. PLAbDab is freely available via Github (https://github.com/oxpig/PLAbDab) and as a searchable webserver (https://opig.stats.ox.ac.uk/webapps/plabdab/).
Asunto(s)
Anticuerpos , Bases de Datos Factuales , Anticuerpos/química , Anticuerpos/genética , Antígenos/metabolismo , Modelos Moleculares , Patentes como Asunto , InternetRESUMEN
Nanobodies are essential proteins of the adaptive immune systems of camelid and shark species, complementing conventional antibodies. Properties such as their relatively small size, solubility and high thermostability make VHH (variable heavy domain of the heavy chain) and VNAR (variable new antigen receptor) modalities a promising therapeutic format and a valuable resource for a wide range of biological applications. The volume of academic literature and patents related to nanobodies has risen significantly over the past decade. Here, we present PLAbDab-nano, a nanobody complement to the Patent and Literature Antibody Database (PLAbDab). PLAbDab-nano is a self-updating, searchable repository containing â¼5000 annotated VHH and VNAR sequences. We describe the methods used to curate the entries in PLAbDab-nano, and highlight how PLAbDab-nano could be used to design diverse libraries, as well as find sequences similar to known patented or therapeutic entries. PLAbDab-nano is freely available as a searchable web server (https://opig.stats.ox.ac.uk/webapps/plabdab-nano/).
RESUMEN
MOTIVATION: The versatile binding properties of antibodies have made them an extremely important class of biotherapeutics. However, therapeutic antibody development is a complex, expensive and time-consuming task, with the final antibody needing to not only have strong and specific binding, but also be minimally impacted by developability issues. The success of transformer-based language models in protein sequence space and the availability of vast amounts of antibody sequences, has led to the development of many antibody-specific language models to help guide antibody design. Antibody diversity primarily arises from V(D)J recombination, mutations within the CDRs, and/or from a few non-germline mutations outside the CDRs. Consequently, a significant portion of the variable domain of all natural antibody sequences remains germline. This affects the pre-training of antibody-specific language models, where this facet of the sequence data introduces a prevailing bias towards germline residues. This poses a challenge, as mutations away from the germline are often vital for generating specific and potent binding to a target, meaning that language models need be able to suggest key mutations away from germline. RESULTS: In this study, we explore the implications of the germline bias, examining its impact on both general-protein and antibody-specific language models. We develop and train a series of new antibody-specific language models optimised for predicting non-germline residues. We then compare our final model, AbLang-2, with current models and show how it suggests a diverse set of valid mutations with high cumulative probability. AVAILABILITY AND IMPLEMENTATION: AbLang-2 is trained on both unpaired and paired data, and is freely available at https://github.com/oxpig/AbLang2.git. SUPPLEMENTARY INFORMATION: Supplementary data are available at Journal Name online.
RESUMEN
SUMMARY: In this article, we introduce ABodyBuilder3, an improved and scalable antibody structure prediction model based on ABodyBuilder2. We achieve a new state-of-the-art accuracy in the modelling of CDR loops by leveraging language model embeddings, and show how predicted structures can be further improved through careful relaxation strategies. Finally, we incorporate a predicted Local Distance Difference Test into the model output to allow for a more accurate estimation of uncertainties. AVAILABILITY AND IMPLEMENTATION: The software package is available at https://github.com/Exscientia/ABodyBuilder3 with model weights and data at https://zenodo.org/records/11354577.
Asunto(s)
Anticuerpos , Programas Informáticos , Anticuerpos/química , Anticuerpos/inmunología , Biología Computacional/métodos , Modelos Moleculares , Conformación Proteica , Regiones Determinantes de Complementariedad/químicaRESUMEN
MOTIVATION: Antibody-antigen complex modelling is an important step in computational workflows for therapeutic antibody design. While experimentally determined structures of both antibody and the cognate antigen are often not available, recent advances in machine learning-driven protein modelling have enabled accurate prediction of both antibody and antigen structures. Here, we analyse the ability of protein-protein docking tools to use machine learning generated input structures for information-driven docking. RESULTS: In an information-driven scenario, we find that HADDOCK can generate accurate models of antibody-antigen complexes using an ensemble of antibody structures generated by machine learning tools and AlphaFold2 predicted antigen structures. Targeted docking using knowledge of the complementary determining regions on the antibody and some information about the targeted epitope allows the generation of high-quality models of the complex with reduced sampling, resulting in a computationally cheap protocol that outperforms the ZDOCK baseline. AVAILABILITY AND IMPLEMENTATION: The source code of HADDOCK3 is freely available at github.com/haddocking/haddock3. The code to generate and analyse the data is available at github.com/haddocking/ai-antibodies. The full runs, including docking models from all modules of a workflow have been deposited in our lab collection (data.sbgrid.org/labs/32/1139) at the SBGRID data repository.
Asunto(s)
Complejo Antígeno-Anticuerpo , Aprendizaje Automático , Simulación del Acoplamiento Molecular , Complejo Antígeno-Anticuerpo/química , Programas Informáticos , Anticuerpos/química , Antígenos/química , Antígenos/inmunología , Epítopos/química , Epítopos/inmunologíaRESUMEN
A novel class of protein misfolding characterized by either the formation of non-native noncovalent lasso entanglements in the misfolded structure or loss of native entanglements has been predicted to exist and found circumstantial support through biochemical assays and limited-proteolysis mass spectrometry data. Here, we examine whether it is possible to design small molecule compounds that can bind to specific folding intermediates and thereby avoid these misfolded states in computer simulations under idealized conditions (perfect drug-binding specificity, zero promiscuity, and a smooth energy landscape). Studying two proteins, type III chloramphenicol acetyltransferase (CAT-III) and D-alanyl-D-alanine ligase B (DDLB), that were previously suggested to form soluble misfolded states through a mechanism involving a failure-to-form of native entanglements, we explore two different drug design strategies using coarse-grained structure-based models. The first strategy, in which the native entanglement is stabilized by drug binding, failed to decrease misfolding because it formed an alternative entanglement at a nearby region. The second strategy, in which a small molecule was designed to bind to a non-native tertiary structure and thereby destabilize the native entanglement, succeeded in decreasing misfolding and increasing the native state population. This strategy worked because destabilizing the entanglement loop provided more time for the threading segment to position itself correctly to be wrapped by the loop to form the native entanglement. Further, we computationally identified several FDA-approved drugs with the potential to bind these intermediate states and rescue misfolding in these proteins. This study suggests it is possible for small molecule drugs to prevent protein misfolding of this type.
Asunto(s)
Pliegue de Proteína , Proteínas , Proteínas/química , Simulación por Computador , Programas Informáticos , Espectrometría de MasasRESUMEN
SUMMARY: The development of new vaccines and antibody therapeutics typically takes several years and requires over $1bn in investment. Accurate knowledge of the paratope (antibody binding site) can speed up and reduce the cost of this process by improving our understanding of antibody-antigen binding. We present Paragraph, a structure-based paratope prediction tool that outperforms current state-of-the-art tools using simpler feature vectors and no antigen information. AVAILABILITY AND IMPLEMENTATION: Source code is freely available at www.github.com/oxpig/Paragraph. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Anticuerpos , Redes Neurales de la Computación , Sitios de Unión de Anticuerpos , Programas Informáticos , AntígenosRESUMEN
In 2013, we released the Structural Antibody Database (SAbDab), a publicly available repository of experimentally determined antibody structures. In the interim, the rapid increase in the number of antibody structure depositions to the Protein Data Bank, driven primarily by increased interest in antibodies as biotherapeutics, has led us to implement several improvements to the original database infrastructure. These include the development of SAbDab-nano, a sub-database that tracks nanobodies (heavy chain-only antibodies) which have seen a particular growth in attention from both the academic and pharmaceutical research communities over the past few years. Both SAbDab and SAbDab-nano are updated weekly, comprehensively annotated with the latest features described here, and are freely accessible at opig.stats.ox.ac.uk/webapps/newsabdab/.
Asunto(s)
Anticuerpos/genética , Bases de Datos Genéticas , Anticuerpos de Dominio Único/genética , Programas Informáticos , Anticuerpos/inmunología , Humanos , Cadenas Pesadas de Inmunoglobulina/genética , Cadenas Pesadas de Inmunoglobulina/inmunología , Anticuerpos de Dominio Único/inmunología , Anticuerpos de Dominio Único/uso terapéuticoRESUMEN
Proteins often undergo structural perturbations upon binding to other proteins or ligands or when they are subjected to environmental changes. Hydrogen-deuterium exchange mass spectrometry (HDX-MS) can be used to explore conformational changes in proteins by examining differences in the rate of deuterium incorporation in different contexts. To determine deuterium incorporation rates, HDX-MS measurements are typically made over a time course. Recently introduced methods show that incorporating the temporal dimension into the statistical analysis improves power and interpretation. However, these approaches have technical assumptions that hinder their flexibility. Here, we propose a more flexible methodology by reframing these methods in a Bayesian framework. Our proposed framework has improved algorithmic stability, allows us to perform uncertainty quantification, and can calculate statistical quantities that are inaccessible to other approaches. We demonstrate the general applicability of the method by showing it can perform rigorous model selection on a spike-in HDX-MS experiment, improved interpretation in an epitope mapping experiment, and increased sensitivity in a small molecule case-study. Bayesian analysis of an HDX experiment with an antibody dimer bound to an E3 ubiquitin ligase identifies at least two interaction interfaces where previous methods obtained confounding results due to the complexities of conformational changes on binding. Our findings are consistent with the cocrystal structure of these proteins, demonstrating a bayesian approach can identify important binding epitopes from HDX data. We also generate HDX-MS data of the bromodomain-containing protein BRD4 in complex with GSK1210151A to demonstrate the increased sensitivity of adopting a Bayesian approach.
Asunto(s)
Medición de Intercambio de Deuterio , Espectrometría de Masas de Intercambio de Hidrógeno-Deuterio , Teorema de Bayes , Deuterio/química , Medición de Intercambio de Deuterio/métodos , Proteínas Nucleares , Espectrometría de Masas/métodos , Factores de TranscripciónRESUMEN
SUMMARY: Motivation. Predicting the native state of a protein has long been considered a gateway problem for understanding protein folding. Recent advances in structural modeling driven by deep learning have achieved unprecedented success at predicting a protein's crystal structure, but it is not clear if these models are learning the physics of how proteins dynamically fold into their equilibrium structure or are just accurate knowledge-based predictors of the final state. Results. In this work, we compare the pathways generated by state-of-the-art protein structure prediction methods to experimental data about protein folding pathways. The methods considered were AlphaFold 2, RoseTTAFold, trRosetta, RaptorX, DMPfold, EVfold, SAINT2 and Rosetta. We find evidence that their simulated dynamics capture some information about the folding pathway, but their predictive ability is worse than a trivial classifier using sequence-agnostic features like chain length. The folding trajectories produced are also uncorrelated with experimental observables such as intermediate structures and the folding rate constant. These results suggest that recent advances in structure prediction do not yet provide an enhanced understanding of protein folding. Availability. The data underlying this article are available in GitHub at https://github.com/oxpig/structure-vs-folding/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Pliegue de Proteína , Proteínas , Proteínas/química , FísicaRESUMEN
MOTIVATION: Antibodies are a key component of the immune system and have been extensively used as biotherapeutics. Accurate knowledge of their structure is central to understanding their antigen-binding function. The key area for antigen binding and the main area of structural variation in antibodies are concentrated in the six complementarity determining regions (CDRs), with the most important for binding and most variable being the CDR-H3 loop. The sequence and structural variability of CDR-H3 make it particularly challenging to model. Recently deep learning methods have offered a step change in our ability to predict protein structures. RESULTS: In this work, we present ABlooper, an end-to-end equivariant deep learning-based CDR loop structure prediction tool. ABlooper rapidly predicts the structure of CDR loops with high accuracy and provides a confidence estimate for each of its predictions. On the models of the Rosetta Antibody Benchmark, ABlooper makes predictions with an average CDR-H3 RMSD of 2.49 Å, which drops to 2.05 Å when considering only its 75% most confident predictions. AVAILABILITY AND IMPLEMENTATION: https://github.com/oxpig/ABlooper. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Anticuerpos , Regiones Determinantes de Complementariedad , Conformación Proteica , Modelos Moleculares , Regiones Determinantes de Complementariedad/química , Anticuerpos/químicaRESUMEN
MOTIVATION: Antibodies are one of the most important classes of pharmaceuticals, with over 80 approved molecules currently in use against a wide variety of diseases. The drug discovery process for antibody therapeutic candidates however is time- and cost-intensive and heavily reliant on in vivo and in vitro high throughput screens. Here, we introduce a framework for structure-based deep learning for antibodies (DLAB) which can virtually screen putative binding antibodies against antigen targets of interest. DLAB is built to be able to predict antibody-antigen binding for antigens with no known antibody binders. RESULTS: We demonstrate that DLAB can be used both to improve antibody-antigen docking and structure-based virtual screening of antibody drug candidates. DLAB enables improved pose ranking for antibody docking experiments as well as selection of antibody-antigen pairings for which accurate poses are generated and correctly ranked. We also show that DLAB can identify binding antibodies against specific antigens in a case study. Our results demonstrate the promise of deep learning methods for structure-based virtual screening of antibodies. AVAILABILITY AND IMPLEMENTATION: The DLAB source code and pre-trained models are available at https://github.com/oxpig/dlab-public. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Aprendizaje Profundo , Anticuerpos/química , Antígenos , Programas InformáticosRESUMEN
Fragment merging is a promising approach to progressing fragments directly to on-scale potency: each designed compound incorporates the structural motifs of overlapping fragments in a way that ensures compounds recapitulate multiple high-quality interactions. Searching commercial catalogues provides one useful way to quickly and cheaply identify such merges and circumvents the challenge of synthetic accessibility, provided they can be readily identified. Here, we demonstrate that the Fragment Network, a graph database that provides a novel way to explore the chemical space surrounding fragment hits, is well-suited to this challenge. We use an iteration of the database containing >120 million catalogue compounds to find fragment merges for four crystallographic screening campaigns and contrast the results with a traditional fingerprint-based similarity search. The two approaches identify complementary sets of merges that recapitulate the observed fragment-protein interactions but lie in different regions of chemical space. We further show our methodology is an effective route to achieving on-scale potency by retrospective analyses for two different targets; in analyses of public COVID Moonshot and Mycobacterium tuberculosis EthR inhibitors, potential inhibitors with micromolar IC50 values were identified. This work demonstrates the use of the Fragment Network to increase the yield of fragment merges beyond that of a classical catalogue search.
Asunto(s)
COVID-19 , Mycobacterium tuberculosis , Humanos , Estudios Retrospectivos , Bases de Datos Factuales , CristalografíaRESUMEN
The electrostatic properties of proteins arise from the number and distribution of polar and charged residues. Electrostatic interactions in proteins play a critical role in numerous processes such as molecular recognition, protein solubility, viscosity, and antibody developability. Thus, characterizing and quantifying electrostatic properties of a protein are prerequisites for understanding these processes. Here, we present PEP-Patch, a tool to visualize and quantify the electrostatic potential on the protein surface in terms of surface patches, denoting separated areas of the surface with a common physical property. We highlight its applicability to elucidate protease substrate specificity and antibody-antigen recognition and predict heparin column retention times of antibodies as an indicator of pharmacokinetics.
Asunto(s)
Anticuerpos , Proteínas , Electricidad Estática , Proteínas/química , Solubilidad , ViscosidadRESUMEN
Over the past few years, many machine learning-based scoring functions for predicting the binding of small molecules to proteins have been developed. Their objective is to approximate the distribution which takes two molecules as input and outputs the energy of their interaction. Only a scoring function that accounts for the interatomic interactions involved in binding can accurately predict binding affinity on unseen molecules. However, many scoring functions make predictions based on data set biases rather than an understanding of the physics of binding. These scoring functions perform well when tested on similar targets to those in the training set but fail to generalize to dissimilar targets. To test what a machine learning-based scoring function has learned, input attribution, a technique for learning which features are important to a model when making a prediction on a particular data point, can be applied. If a model successfully learns something beyond data set biases, attribution should give insight into the important binding interactions that are taking place. We built a machine learning-based scoring function that aimed to avoid the influence of bias via thorough train and test data set filtering and show that it achieves comparable performance on the Comparative Assessment of Scoring Functions, 2016 (CASF-2016) benchmark to other leading methods. We then use the CASF-2016 test set to perform attribution and find that the bonds identified as important by PointVS, unlike those extracted from other scoring functions, have a high correlation with those found by a distance-based interaction profiler. We then show that attribution can be used to extract important binding pharmacophores from a given protein target when supplied with a number of bound structures. We use this information to perform fragment elaboration and see improvements in docking scores compared to using structural information from a traditional, data-based approach. This not only provides definitive proof that the scoring function has learned to identify some important binding interactions but also constitutes the first deep learning-based method for extracting structural information from a target for molecule design.
Asunto(s)
Aprendizaje Automático , Proteínas , Unión Proteica , Ligandos , Proteínas/química , Bases de Datos de Proteínas , Simulación del Acoplamiento MolecularRESUMEN
Proteomics is a data-rich science with complex experimental designs and an intricate measurement process. To obtain insights from the large data sets produced, statistical methods, including machine learning, are routinely applied. For a quantity of interest, many of these approaches only produce a point estimate, such as a mean, leaving little room for more nuanced interpretations. By contrast, Bayesian statistics allows quantification of uncertainty through the use of probability distributions. These probability distributions enable scientists to ask complex questions of their proteomics data. Bayesian statistics also offers a modular framework for data analysis by making dependencies between data and parameters explicit. Hence, specifying complex hierarchies of parameter dependencies is straightforward in the Bayesian framework. This allows us to use a statistical methodology which equals, rather than neglects, the sophistication of experimental design and instrumentation present in proteomics. Here, we review Bayesian methods applied to proteomics, demonstrating their potential power, alongside the challenges posed by adopting this new statistical framework. To illustrate our review, we give a walk-through of the development of a Bayesian model for dynamic organic orthogonal phase-separation (OOPS) data.
Asunto(s)
Aprendizaje Automático , Proteómica , Teorema de Bayes , Probabilidad , IncertidumbreRESUMEN
MOTIVATION: An essential step in the development of virtual screening methods is the use of established sets of actives and decoys for benchmarking and training. However, the decoy molecules in commonly used sets are biased meaning that methods often exploit these biases to separate actives and decoys, and do not necessarily learn to perform molecular recognition. This fundamental issue prevents generalization and hinders virtual screening method development. RESULTS: We have developed a deep learning method (DeepCoy) that generates decoys to a user's preferred specification in order to remove such biases or construct sets with a defined bias. We validated DeepCoy using two established benchmarks, DUD-E and DEKOIS 2.0. For all 102 DUD-E targets and 80 of the 81 DEKOIS 2.0 targets, our generated decoy molecules more closely matched the active molecules' physicochemical properties while introducing no discernible additional risk of false negatives. The DeepCoy decoys improved the Deviation from Optimal Embedding (DOE) score by an average of 81% and 66%, respectively, decreasing from 0.166 to 0.032 for DUD-E and from 0.109 to 0.038 for DEKOIS 2.0. Further, the generated decoys are harder to distinguish than the original decoy molecules via docking with Autodock Vina, with virtual screening performance falling from an AUC ROC of 0.70 to 0.63. AVAILABILITY AND IMPLEMENTATION: The code is available at https://github.com/oxpig/DeepCoy. Generated molecules can be downloaded from http://opig.stats.ox.ac.uk/resources. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.