RESUMO
The ability to accurately predict antibody-antigen complex structures from their sequences could greatly advance our understanding of the immune system and would aid in the development of novel antibody therapeutics. There have been considerable recent advancements in predicting protein-protein interactions (PPIs) fueled by progress in machine learning (ML). To understand the current state of the field, we compare six representative methods for predicting antibody-antigen complexes from sequence, including two deep learning approaches trained to predict PPIs in general (AlphaFold-Multimer and RoseTTAFold), two composite methods that initially predict antibody and antigen structures separately and dock them (using antibody-mode ClusPro), local refinement in Rosetta (SnugDock) of globally docked poses from ClusPro, and a pipeline combining homology modeling with rigid-body docking informed by ML-based epitope and paratope prediction (AbAdapt). We find that AlphaFold-Multimer outperformed other methods, although the absolute performance leaves considerable room for improvement. AlphaFold-Multimer models of lower quality display significant structural biases at the level of tertiary motifs (TERMs) toward having fewer structural matches in non-antibody-containing structures from the Protein Data Bank (PDB). Specifically, better models exhibit more common PDB-like TERMs at the antibody-antigen interface than worse ones. Importantly, the clear relationship between performance and the commonness of interfacial TERMs suggests that the scarcity of interfacial geometry data in the structural database may currently limit the application of ML to the prediction of antibody-antigen interactions.
Assuntos
Complexo Antígeno-Anticorpo , Complexo Antígeno-Anticorpo/química , Conformação Proteica , Anticorpos/química , Anticorpos/imunologia , Simulação de Acoplamento Molecular , Modelos Moleculares , HumanosRESUMO
Comparing accuracies of structural protein-protein interaction (PPI) models for different complexes on an absolute scale is a challenge, requiring normalization of scores across structures of different sizes and shapes. To help address this challenge, we have developed a statistical significance metric for docking models, called random-docking (RD) p-value. This score evaluates a PPI model based on how likely a random docking process is to produce a model of better or equal accuracy. The binding partners are randomly docked against each other a large number of times, and the probability of sampling a model of equal or greater accuracy from this reference distribution is the RD p-value. Using a subset of top predicted models from CAPRI (Critical Assessment of PRediction of Interactions) rounds over 2017-2020, we find that the ease of achieving a given root mean squared deviation or DOCKQ score varies considerably by target; achieving the same relative metric can be thousands of times easier for one complex compared to another. In contrast, RD p-values inherently normalize scores for models of different complexes, making them globally comparable. Furthermore, one can calculate RD p-values after generating a reference distribution that accounts for prior information about the interface geometry, such as residues involved in binding, by giving the random-docking process access the same information. Thus, one can decouple improvements in prediction accuracy that arise solely from basic modeling constraints from those due to the rest of the method. We provide efficient code for computing RD p-values at https://github.com/Grigoryanlab/RDP.
Assuntos
Mapeamento de Interação de Proteínas , Proteínas , Proteínas/química , Mapeamento de Interação de Proteínas/métodos , Simulação de Acoplamento Molecular , Conformação Proteica , Ligação Proteica , Software , Algoritmos , Biologia Computacional/métodos , Sítios de LigaçãoRESUMO
Three billion years of evolution has produced a tremendous diversity of protein molecules1, but the full potential of proteins is likely to be much greater. Accessing this potential has been challenging for both computation and experiments because the space of possible protein molecules is much larger than the space of those likely to have functions. Here we introduce Chroma, a generative model for proteins and protein complexes that can directly sample novel protein structures and sequences, and that can be conditioned to steer the generative process towards desired properties and functions. To enable this, we introduce a diffusion process that respects the conformational statistics of polymer ensembles, an efficient neural architecture for molecular systems that enables long-range reasoning with sub-quadratic scaling, layers for efficiently synthesizing three-dimensional structures of proteins from predicted inter-residue geometries and a general low-temperature sampling algorithm for diffusion models. Chroma achieves protein design as Bayesian inference under external constraints, which can involve symmetries, substructure, shape, semantics and even natural-language prompts. The experimental characterization of 310 proteins shows that sampling from Chroma results in proteins that are highly expressed, fold and have favourable biophysical properties. The crystal structures of two designed proteins exhibit atomistic agreement with Chroma samples (a backbone root-mean-square deviation of around 1.0 Å). With this unified approach to protein design, we hope to accelerate the programming of protein matter to benefit human health, materials science and synthetic biology.
Assuntos
Algoritmos , Simulação por Computador , Conformação Proteica , Proteínas , Humanos , Teorema de Bayes , Evolução Molecular Direcionada , Aprendizado de Máquina , Modelos Moleculares , Dobramento de Proteína , Proteínas/química , Proteínas/metabolismo , Semântica , Biologia Sintética/métodos , Biologia Sintética/tendênciasRESUMO
The gaseous hormone ethylene is perceived in plants by membrane-bound receptors, the best studied of these being ETR1 from Arabidopsis. Ethylene receptors can mediate a response to ethylene concentrations at less than one part per billion; however, the mechanistic basis for such high-affinity ligand binding has remained elusive. Here we identify an Asp residue within the ETR1 transmembrane domain that plays a critical role in ethylene binding. Site-directed mutation of the Asp to Asn results in a functional receptor that has a reduced affinity for ethylene, but still mediates ethylene responses in planta. The Asp residue is highly conserved among ethylene receptor-like proteins in plants and bacteria, but Asn variants exist, pointing to the physiological relevance of modulating ethylene-binding kinetics. Our results also support a bifunctional role for the Asp residue in forming a polar bridge to a conserved Lys residue in the receptor to mediate changes in signaling output. We propose a new structural model for the mechanism of ethylene binding and signal transduction, one with similarities to that found in a mammalian olfactory receptor.
Assuntos
Proteínas de Arabidopsis , Arabidopsis , Arabidopsis/genética , Arabidopsis/metabolismo , Proteínas de Arabidopsis/metabolismo , Receptores de Superfície Celular/metabolismo , Etilenos/metabolismo , Transdução de Sinais/fisiologiaRESUMO
We propose a high-throughput method for quantitatively measuring hundreds of protein-peptide binding affinities in parallel. In this assay, a solution of protein is dialyzed into a buffer containing a pool of potential binding peptides, such that upon equilibration the relative abundance of a peptide species is mathematically related to that peptide's dissociation constant, Kd . We use isobaric multiplexed quantitative proteomics to simultaneously determine the relative abundance, and hence the Kd and its associated error, for an entire peptide library. We apply this technique, which we call PEDAL (parallel equilibrium dialysis for affinity learning), to determine accurate Kd 's between a PDZ domain and hundreds of peptides, spanning an affinity range of multiple orders of magnitude in a single experiment. PEDAL is a convenient, fast, and low-cost method for measuring large numbers of protein-peptide affinities in parallel, providing a rare combination of true in-solution binding equilibria with the ability to multiplex.
Assuntos
Peptídeos , Diálise Renal , Peptídeos/metabolismo , Proteínas , Espectrometria de Massas , Biblioteca de PeptídeosRESUMO
Designing novel proteins to perform desired functions, such as binding or catalysis, is a major goal in synthetic biology. A variety of computational approaches can aid in this task. An energy-based framework rooted in the sequence-structure statistics of tertiary motifs (TERMs) can be used for sequence design on predefined backbones. Neural network models that use backbone coordinate-derived features provide another way to design new proteins. In this work, we combine the two methods to make neural structure-based models more suitable for protein design. Specifically, we supplement backbone-coordinate features with TERM-derived data, as inputs, and we generate energy functions as outputs. We present two architectures that generate Potts models over the sequence space: TERMinator, which uses both TERM-based and coordinate-based information, and COORDinator, which uses only coordinate-based information. Using these two models, we demonstrate that TERMs can be utilized to improve native sequence recovery performance of neural models. Furthermore, we demonstrate that sequences designed by TERMinator are predicted to fold to their target structures by AlphaFold. Finally, we show that both TERMinator and COORDinator learn notions of energetics, and these methods can be fine-tuned on experimental data to improve predictions. Our results suggest that using TERM-based and coordinate-based features together may be beneficial for protein design and that structure-based neural models that produce Potts energy tables have utility for flexible applications in protein science.
Assuntos
Redes Neurais de Computação , Proteínas , Sequência de Aminoácidos , Proteínas/químicaRESUMO
OBJECTIVE: This study's objective was determining whether gap detection deficits are present in a longstanding cohort of people living with HIV (PLWH) compared to those living without HIV (PLWOH) using a new gap detection modelling technique (i.e. fitting gap responses using the Hill equation and analysing the individual gap detection resulting curves with non-linear statistics). This approach provides a measure of both gap threshold and the steepness of the gap length/correct detection relationship. DESIGN: The relationship between the correct identification rate at each gap length was modelled using the Hill equation. Results were analysed using a nonlinear mixed-effect regression model. STUDY SAMPLE: 45 PLWH (age range 41-78) and 39 PLWOH (age range 38-79) were enrolled and completed gap detection testing. RESULTS: The likelihood ratio statistic comparing the full regression model with the HIV effects to the null model, assuming one population curve for both groups, was highly significant (p < 0.001), suggesting a less precise relationship between gap length and correct detection in PLWH. CONCLUSIONS: PLWH showed degraded gap detection ability compared to PLWOH, likely due to central nervous system effects of HIV infection or treatment. The Hill equation provided a new approach for modelling gap detection ability.
Assuntos
Infecções por HIV , Humanos , Adulto , Pessoa de Meia-Idade , Idoso , Infecções por HIV/epidemiologia , Dinâmica não Linear , Inquéritos e QuestionáriosRESUMO
The optimal use of many biotherapeutics is restricted by Anti-drug antibodies (ADAs) and hypersensitivity responses which can affect potency and ability to administer a treatment. Here we demonstrate that Re-surfacing can be utilized as a generalizable approach to engineer proteins with extensive surface residue modifications in order to avoid binding by pre-existing ADAs. This technique was applied to E. coli Asparaginase (ASN) to produce functional mutants with up to 58 substitutions resulting in direct modification of 35% of surface residues. Re-surfaced ASNs exhibited significantly reduced binding to murine, rabbit and human polyclonal ADAs, with a negative correlation observed between binding and mutational distance from the native protein. Reductions in ADA binding correlated with diminished hypersensitivity responses in an in vivo mouse model. By using computational design approaches to traverse extended distances in mutational space while maintaining function, protein Re-surfacing may provide a means to generate novel or second line therapies for life-saving drugs with limited therapeutic alternatives.
Assuntos
Asparaginase , Escherichia coli , Humanos , Animais , Camundongos , Coelhos , Asparaginase/genética , Asparaginase/uso terapêutico , Escherichia coli/genética , Anticorpos , Proteínas de MembranaRESUMO
Despite advances in protein engineering, the de novo design of small proteins or peptides that bind to a desired target remains a difficult task. Most computational methods search for binder structures in a library of candidate scaffolds, which can lead to designs with poor target complementarity and low success rates. Instead of choosing from pre-defined scaffolds, we propose that custom peptide structures can be constructed to complement a target surface. Our method mines tertiary motifs (TERMs) from known structures to identify surface-complementing fragments or "seeds." We combine seeds that satisfy geometric overlap criteria to generate peptide backbones and score the backbones to identify the most likely binding structures. We found that TERM-based seeds can describe known binding structures with high resolution: the vast majority of peptide binders from 486 peptide-protein complexes can be covered by seeds generated from single-chain structures. Furthermore, we demonstrate that known peptide structures can be reconstructed with high accuracy from peptide-covering seeds. As a proof of concept, we used our method to design 100 peptide binders of TRAF6, seven of which were predicted by Rosetta to form higher-quality interfaces than a native binder. The designed peptides interact with distinct sites on TRAF6, including the native peptide-binding site. These results demonstrate that known peptide-binding structures can be constructed from TERMs in single-chain structures and suggest that TERM information can be applied to efficiently design novel target-complementing binders.
Assuntos
Peptídeos , Fator 6 Associado a Receptor de TNF , Sítios de Ligação , Peptídeos/química , Ligação Proteica , Engenharia de Proteínas , Fator 6 Associado a Receptor de TNF/metabolismoRESUMO
Engineered proteins generally must possess a stable structure in order to achieve their designed function. Stable designs, however, are astronomically rare within the space of all possible amino acid sequences. As a consequence, many designs must be tested computationally and experimentally in order to find stable ones, which is expensive in terms of time and resources. Here we use a high-throughput, low-fidelity assay to experimentally evaluate the stability of approximately 200,000 novel proteins. These include a wide range of sequence perturbations, providing a baseline for future work in the field. We build a neural network model that predicts protein stability given only sequences of amino acids, and compare its performance to the assayed values. We also report another network model that is able to generate the amino acid sequences of novel stable proteins given requested secondary sequences. Finally, we show that the predictive model-despite weaknesses including a noisy data set-can be used to substantially increase the stability of both expert-designed and model-generated proteins.
Assuntos
Redes Neurais de Computação , Proteínas , Sequência de Aminoácidos , Aminoácidos , Estabilidade Proteica , Proteínas/químicaRESUMO
Relating a protein's sequence to its conformation is a central challenge for both structure prediction and sequence design. Statistical contact potentials, as well as their more descriptive versions that account for side-chain orientation and other geometric descriptors, have served as simplistic but useful means of representing second-order contributions in sequence-structure relationships. Here we ask what happens when a pairwise potential is conditioned on the fully defined geometry of interacting backbones fragments. We show that the resulting structure-conditioned coupling energies more accurately reflect pair preferences as a function of structural contexts. These structure-conditioned energies more reliably encode native sequence information and more highly correlate with experimentally determined coupling energies. Clustering a database of interaction motifs by structure results in ensembles of similar energies and clustering them by energy results in ensembles of similar structures. By comparing many pairs of interaction motifs and showing that structural similarity and energetic similarity go hand-in-hand, we provide a tangible link between modular sequence and structure elements. This link is applicable to structural modeling, and we show that scoring CASP models with structured-conditioned energies results in substantially higher correlation with structural quality than scoring the same models with a contact potential. We conclude that structure-conditioned coupling energies are a good way to model the impact of interaction geometry on second-order sequence preferences.
Assuntos
Aminoácidos , Aminoácidos/química , Modelos Moleculares , Conformação ProteicaRESUMO
Antibody-based therapeutics and vaccines are essential to combat COVID-19 morbidity and mortality after severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection. Multiple mutations in SARS-CoV-2 that could impair antibody defenses propagated in human-to-human transmission and spillover or spillback events between humans and animals. To develop prevention and therapeutic strategies, we formed an international consortium to map the epitope landscape on the SARS-CoV-2 spike protein, defining and structurally illustrating seven receptor binding domain (RBD)directed antibody communities with distinct footprints and competition profiles. Pseudovirion-based neutralization assays reveal spike mutations, individually and clustered together in variants, that affect antibody function among the communities. Key classes of RBD-targeted antibodies maintain neutralization activity against these emerging SARS-CoV-2 variants. These results provide a framework for selecting antibody treatment cocktails and understanding how viral variants might affect antibody therapeutic efficacy.
Assuntos
Anticorpos Neutralizantes/imunologia , Anticorpos Antivirais/imunologia , Mapeamento de Epitopos , Epitopos Imunodominantes/imunologia , SARS-CoV-2/imunologia , Glicoproteína da Espícula de Coronavírus/imunologia , Anticorpos Neutralizantes/uso terapêutico , Anticorpos Antivirais/uso terapêutico , Antígenos Virais/química , Antígenos Virais/imunologia , COVID-19/terapia , Humanos , Epitopos Imunodominantes/química , Ligação Proteica , Domínios Proteicos , Glicoproteína da Espícula de Coronavírus/químicaRESUMO
The exquisite structure-function correlations observed in filamentous protein assemblies provide a paradigm for the design of synthetic peptide-based nanomaterials. However, the plasticity of quaternary structure in sequence-space and the lability of helical symmetry present significant challenges to the de novo design and structural analysis of such filaments. Here, we describe a rational approach to design self-assembling peptide nanotubes based on controlling lateral interactions between protofilaments having an unusual cross-α supramolecular architecture. Near-atomic resolution cryo-EM structural analysis of seven designed nanotubes provides insight into the designability of interfaces within these synthetic peptide assemblies and identifies a non-native structural interaction based on a pair of arginine residues. This arginine clasp motif can robustly mediate cohesive interactions between protofilaments within the cross-α nanotubes. The structure of the resultant assemblies can be controlled through the sequence and length of the peptide subunits, which generates synthetic peptide filaments of similar dimensions to flagella and pili.
Assuntos
Nanotubos de Peptídeos/ultraestrutura , Arginina/química , Arginina/genética , Microscopia Crioeletrônica , Modelos Moleculares , Nanotubos de Peptídeos/química , Conformação Proteica em alfa-Hélice , Relação Estrutura-AtividadeRESUMO
BACKGROUND: Cervical nerve root avulsion is a well-documented result of high-velocity motor vehicle accidents (MVAs). In up to 21% of cases, preganglionic cervical root avulsion can result in a complex regional pain syndrome (CRPS) impacting the quality of life for patients already impaired by motor, sensory, and autonomic dysfunction. The optimal treatment strategies include repeated stellate ganglion blocks (SBGs). CASE DESCRIPTION: A 43-year-old male sustained a high-velocity MVA resulting in the left C8 nerve root avulsion. This resulted in weakness in the C8 distribution, tactile allodynia, and dysesthesias. The magnetic resonance imaging demonstrated an abnormal signal ventral to the C8-T1 level. As the patient was not considered a candidate for surgical intervention secondary to the attendant brachial plexus injury, a C7-C8 epidural steroid injection was performed; this did not provide improvement. Before placing a spinal cord stimulator, the patient underwent a series of six ultrasound-guided SBGs performed 2 weeks apart; there was 75% improvement in pain and strength. Six years later, the patient continues to do well while receiving SBGs 4 times a year. CONCLUSION: A preganglionic cervical nerve root avulsion should not be a contraindication for a stellate ganglion block in a patient with established CRPS.
RESUMO
Current state-of-the-art approaches to computational protein design (CPD) aim to capture the determinants of structure from physical principles. While this has led to many successful designs, it does have strong limitations associated with inaccuracies in physical modeling, such that a reliable general solution to CPD has yet to be found. Here, we propose a design framework-one based on identifying and applying patterns of sequence-structure compatibility found in known proteins, rather than approximating them from models of interatomic interactions. We carry out extensive computational analyses and an experimental validation for our method. Our results strongly argue that the Protein Data Bank is now sufficiently large to enable proteins to be designed by using only examples of structural motifs from unrelated proteins. Because our method is likely to have orthogonal strengths relative to existing techniques, it could represent an important step toward removing remaining barriers to robust CPD.
Assuntos
Motivos de Aminoácidos , Biologia Computacional/métodos , Engenharia de Proteínas/métodos , Estrutura Terciária de Proteína , Proteínas/química , Substituição de Aminoácidos , Desenho Assistido por Computador , Bases de Dados de Proteínas , Modelos MolecularesRESUMO
Cytokines are critical for guiding the differentiation of T lymphocytes to perform specialized tasks in the immune response. Developing strategies to manipulate cytokine-signaling pathways holds promise to program T cell differentiation toward the most therapeutically useful direction. Suppressor of cytokine signaling (SOCS) proteins are attractive targets, as they effectively inhibit undesirable cytokine signaling. However, these proteins target multiple signaling pathways, some of which we may need to remain uninhibited. SOCS3 inhibits IL-12 signaling but also inhibits the IL-2-signaling pathway. In this study, we use computational protein design based on SOCS3 and JAK crystal structures to engineer a mutant SOCS3 with altered specificity. We generated a mutant SOCS3 designed to ablate interactions with JAK1 but maintain interactions with JAK2. We show that this mutant does indeed ablate JAK1 inhibition, although, unexpectedly, it still coimmunoprecipitates with JAK1 and does so to a greater extent than with JAK2. When expressed in CD8 T cells, mutant SOCS3 preserved inhibition of JAK2-dependent STAT4 phosphorylation following IL-12 treatment. However, inhibition of STAT phosphorylation was ablated following stimulation with JAK1-dependent cytokines IL-2, IFN-α, and IL-21. Wild-type SOCS3 inhibited CD8 T cell expansion in vivo and induced a memory precursor phenotype. In vivo T cell expansion was restored by expression of the mutant SOCS3, and this also reverted the phenotype toward effector T cell differentiation. These data show that SOCS proteins can be engineered to fine-tune their specificity, and this can exert important changes to T cell biology.
Assuntos
Linfócitos T CD8-Positivos/imunologia , Citocinas/imunologia , Fator de Transcrição STAT4/imunologia , Fator de Transcrição STAT5/imunologia , Proteína 3 Supressora da Sinalização de Citocinas/genética , Animais , Diferenciação Celular , Células Cultivadas , Técnicas de Silenciamento de Genes , Janus Quinase 1/imunologia , Janus Quinase 2/imunologia , Camundongos , Camundongos Endogâmicos C57BL , Mutação , Fosforilação , Engenharia de Proteínas , Transdução de SinaisRESUMO
Understanding the relationship between protein sequence and structure well enough to design new proteins with desired functions is a longstanding goal in protein science. Here, we show that recurring tertiary structural motifs (TERMs) in the PDB provide rich information for protein-peptide interaction prediction and design. TERM statistics can be used to predict peptide binding energies for Bcl-2 family proteins as accurately as widely used structure-based tools. Furthermore, design using TERM energies (dTERMen) rapidly and reliably generates high-affinity peptide binders of anti-apoptotic proteins Bfl-1 and Mcl-1 with just 15%-38% sequence identity to any known native Bcl-2 family protein ligand. High-resolution structures of four designed peptides bound to their targets provide opportunities to analyze the strengths and limitations of the computational design method. Our results support dTERMen as a powerful approach that can complement existing tools for protein engineering.
Assuntos
Antígenos de Histocompatibilidade Menor/química , Proteína de Sequência 1 de Leucemia de Células Mieloides/química , Peptídeos/química , Proteínas Proto-Oncogênicas c-bcl-2/química , Sequência de Aminoácidos , Sítios de Ligação , Clonagem Molecular , Cristalografia por Raios X , Expressão Gênica , Vetores Genéticos/química , Vetores Genéticos/metabolismo , Humanos , Antígenos de Histocompatibilidade Menor/genética , Antígenos de Histocompatibilidade Menor/metabolismo , Simulação de Acoplamento Molecular , Proteína de Sequência 1 de Leucemia de Células Mieloides/antagonistas & inibidores , Proteína de Sequência 1 de Leucemia de Células Mieloides/genética , Proteína de Sequência 1 de Leucemia de Células Mieloides/metabolismo , Peptídeos/genética , Peptídeos/metabolismo , Ligação Proteica , Conformação Proteica em alfa-Hélice , Engenharia de Proteínas , Domínios e Motivos de Interação entre Proteínas , Estrutura Terciária de Proteína , Proteínas Proto-Oncogênicas c-bcl-2/antagonistas & inibidores , Proteínas Proto-Oncogênicas c-bcl-2/genética , Proteínas Proto-Oncogênicas c-bcl-2/metabolismo , Proteínas Recombinantes/química , Proteínas Recombinantes/genética , Proteínas Recombinantes/metabolismo , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Alinhamento de Sequência , Relação Estrutura-Atividade , TermodinâmicaRESUMO
In order to increase the hit rate of discovering diverse, beneficial protein variants via high-throughput screening, we have developed a computational method to optimize combinatorial mutagenesis libraries for overall enrichment in two distinct properties of interest. Given scoring functions for evaluating individual variants, POCoM (Pareto Optimal Combinatorial Mutagenesis) scores entire libraries in terms of averages over their constituent members, and designs optimal libraries as sets of mutations whose combinations make the best trade-offs between average scores. This represents the first general-purpose method to directly design combinatorial libraries for multiple objectives characterizing their constituent members. Despite being rigorous in mapping out the Pareto frontier, it is also very fast even for very large libraries (e.g., designing 30 mutation, billion-member libraries in only hours). We here instantiate POCoM with scores based on a target's protein structure and its homologs' sequences, enabling the design of libraries containing variants balancing these two important yet quite different types of information. We demonstrate POCoM's generality and power in case study applications to green fluorescent protein, cytochrome P450, and ß-lactamase. Analysis of the POCoM library designs provides insights into the trade-offs between structure- and sequence-based scores, as well as the impacts of experimental constraints on library designs. POCoM libraries incorporate mutations that have previously been found favorable experimentally, while diversifying the contexts in which these mutations are situated and maintaining overall variant quality.
Assuntos
Biologia Computacional/métodos , Biblioteca Gênica , Mutagênese , Algoritmos , Sistema Enzimático do Citocromo P-450/genética , Proteínas de Fluorescência Verde/metabolismo , Modelos Moleculares , Mutação , Oligonucleotídeos/genética , Linguagens de Programação , Engenharia de Proteínas/métodos , Proteínas/genética , Software , beta-Lactamases/genéticaRESUMO
Co-evolution between pairs of residues in a multiple sequence alignment (MSA) of homologous proteins has long been proposed as an indicator of structural contacts. Recently, several methods, such as direct-coupling analysis (DCA) and MetaPSICOV, have been shown to achieve impressive rates of contact prediction by taking advantage of considerable sequence data. In this paper, we show that prediction success rates are highly sensitive to the structural definition of a contact, with more permissive definitions (i.e., those classifying more pairs as true contacts) naturally leading to higher positive predictive rates, but at the expense of the amount of structural information contributed by each contact. Thus, the remaining limitations of contact prediction algorithms are most noticeable in conjunction with geometrically restrictive contacts-precisely those that contribute more information in structure prediction. We suggest that to improve prediction rates for such "informative" contacts one could combine co-evolution scores with additional indicators of contact likelihood. Specifically, we find that when a pair of co-varying positions in an MSA is occupied by residue pairs with favorable statistical contact energies, that pair is more likely to represent a true contact. We show that combining a contact potential metric with DCA or MetaPSICOV performs considerably better than DCA or MetaPSICOV alone, respectively. This is true regardless of contact definition, but especially true for stricter and more informative contact definitions. In summary, this work outlines some remaining challenges to be addressed in contact prediction and proposes and validates a promising direction towards improvement.