RESUMO
Proteins often act through oligomeric interactions with other proteins. X-ray crystallography and cryo-electron microscopy provide detailed information on the structures of biological assemblies, defined as the most likely biologically relevant structures derived from experimental data. In crystal structures, the most relevant assembly may be ambiguously determined, since multiple assemblies observed in the crystal lattice may be plausible. It is estimated that 10-15% of PDB entries may have incorrect or ambiguous assembly annotations. Accurate assemblies are required for understanding functional data and training of deep learning methods for predicting assembly structures. As with any other kind of biological data, replication via multiple independent experiments provides important validation for the determination of biological assembly structures. Here we present the Protein Common Assembly Database (ProtCAD), which presents clusters of protein assembly structures observed in independent structure determinations of homologous proteins in the Protein Data Bank (PDB). ProtCAD is searchable by PDB entry, UniProt identifiers, or Pfam domain designations and provides downloads of coordinate files, PyMol scripts, and publicly available assembly annotations for each cluster of assemblies. About 60% of PDB entries contain assemblies in clusters of at least 2 independent experiments. All clusters and coordinates are available on ProtCAD web site (http://dunbrack2.fccc.edu/protcad).
Assuntos
Bases de Dados de Proteínas , Complexos Multiproteicos , Proteínas , Microscopia Crioeletrônica , Cristalografia por Raios X , Proteínas/química , Complexos Multiproteicos/químicaRESUMO
Janus Kinase-1 (JAK1) plays key roles during neurodevelopment and following neuronal injury, while activatory JAK1 mutations are linked to leukemia. In mice, Jak1 genetic deletion results in perinatal lethality, suggesting non-redundant roles and/or regulation of JAK1 for which other JAKs cannot compensate. Proteomic studies reveal that JAK1 is more likely palmitoylated compared to other JAKs, implicating palmitoylation as a possible JAK1-specific regulatory mechanism. However, the importance of palmitoylation for JAK1 signaling has not been addressed. Here, we report that JAK1 is palmitoylated in transfected HEK293T cells and endogenously in cultured Dorsal Root Ganglion (DRG) neurons. We further use comprehensive screening in transfected non-neuronal cells and shRNA-mediated knockdown in DRG neurons to identify the related enzymes ZDHHC3 and ZDHHC7 as dominant protein acyltransferases (PATs) for JAK1. Surprisingly, we found palmitoylation minimally affects JAK1 localization in neurons, but is critical for JAK1's kinase activity in cells and even in vitro. We propose this requirement is likely because palmitoylation facilitates transphosphorylation of key sites in JAK1's activation loop, a possibility consistent with structural models of JAK1. Importantly, we demonstrate a leukemia-associated JAK1 mutation overrides the palmitoylation-dependence of JAK1 activity, potentially explaining why this mutation is oncogenic. Finally, we show that JAK1 palmitoylation is important for neuropoietic cytokine-dependent signaling and neuronal survival and that combined Zdhhc3/7 loss phenocopies loss of palmitoyl-JAK1. These findings provide new insights into the control of JAK signaling in both physiological and pathological contexts.
Assuntos
Citocinas , Lipoilação , Neurônios , Transdução de Sinais , Animais , Feminino , Humanos , Camundongos , Gravidez , Citocinas/metabolismo , Gânglios Espinais/metabolismo , Células HEK293 , Janus Quinase 1/genética , Janus Quinase 1/metabolismo , Neurônios/citologia , Neurônios/metabolismo , Proteômica , Sobrevivência CelularRESUMO
Although Ras/mitogen-activated protein kinase (MAPK) signaling is activated in most human cancers, attempts to target this pathway using kinase-active site inhibitors have not typically led to durable clinical benefit. To address this shortcoming, we sought to test the feasibility of an alternative targeting strategy, focused on the ERK2 substrate binding domains, D and DEF binding pocket (DBP). Disabling the ERK2-DBP domain in mice caused baseline erythrocytosis. Consequently, we investigated the role of the ERK2-D and -DBP domains in disease, using a JAK2-dependent model of polycythemia vera (PV). Of note, inactivation of the ERK2-DBP domain promoted the progression of disease from PV to myelofibrosis, suggesting that the ERK2-DBP domain normally opposes progression. ERK2-DBP inactivation also prevented oncogenic JAK2 kinase (JAK2V617F) from promoting oncogene-induced senescence in vitro. The ERK2-DBP mutation attenuated JAK2-mediated oncogene-induced senescence by preventing the physical interaction of ERK2 with the transcription factor Egr1. Because inactivation of the ERK2-DBP created a functional ERK2 kinase limited to binding substrates through its D domain, these data suggested that the D domain substrates were responsible for promoting oncogene-induced progenitor growth and tumor progression and that pharmacologic targeting of the ERK2-D domain may attenuate cancer cell growth. Indeed, pharmacologic agents targeting the ERK2-D domain were effective in attenuating the growth of JAK2-dependent myeloproliferative neoplasm cell lines. Taken together, these data indicate that the ERK-D and -DBP domains can play distinct roles in the progression of neoplasms and that the D domain has the potential to be a potent therapeutic target in Ras/MAPK-dependent cancers.
Assuntos
Janus Quinase 2 , Policitemia Vera , Animais , Linhagem Celular , Humanos , Janus Quinase 2/genética , Sistema de Sinalização das MAP Quinases , Camundongos , Proteínas Quinases Ativadas por Mitógeno , Fosforilação , Transdução de SinaisRESUMO
Myelination enables electrical impulses to propagate on axons at the highest speed, encoding essential life functions. The Rho family GTPases, RAC1 and CDC42, have been shown to critically regulate Schwann cell myelination. P21-activated kinase 2 (PAK2) is an effector of RAC1/CDC42, but its specific role in myelination remains undetermined. We produced a Schwann cell-specific knockout mouse of Pak2 (scPak2-/-) to evaluate PAK2's role in myelination. Deletion of Pak2 specifically in mouse Schwann cells resulted in severe hypomyelination, slowed nerve conduction velocity, and behavior dysfunctions in the scPak2-/- peripheral nerve. Many Schwann cells in scPak2-/-sciatic nerves were arrested at the stage of axonal sorting. These abnormalities were rescued by reintroducing Pak2, but not the kinase-dead mutation of Pak2, via lentivirus delivery to scPak2-/- Schwann cells in vivo. Moreover, ablation of Pak2 in Schwann cells blocked the promyelinating effect driven by neuregulin-1, prion protein, and inactivated RAC1/CDC42. Conversely, the ablation of Pak2 in neurons exhibited no phenotype. Such PAK2 activity can also be either enhanced or inhibited by different myelin lipids. We have identified a novel promyelinating factor, PAK2, that acts as a critical convergence point for multiple promyelinating signaling pathways. The promyelination by PAK2 is Schwann cell-autonomous. Myelin lipids, identified as inhibitors or activators of PAK2, may be utilized to develop therapies for repairing abnormal myelin in peripheral neuropathies.
RESUMO
The active form of kinases is shared across different family members, as are several commonly observed inactive forms. We previously performed a clustering of the conformation of the activation loop of all protein kinase structures in the Protein Data Bank (PDB) into eight classes based on the dihedral angles that place the Phe side chain of the DFG motif at the N-terminus of the activation loop. Our clusters are strongly associated with the placement of the activation loop, the C-helix, and other structural elements of kinases. We present Kincore, a web resource providing access to our conformational assignments for kinase structures in the PDB. While other available databases provide conformational states or drug type but not both, KinCore includes the conformational state and the inhibitor type (Type 1, 1.5, 2, 3, allosteric) for each kinase chain. The user can query and browse the database using these attributes or determine the conformational labels of a kinase structure using the web server or a standalone program. The database and labeled structure files can be downloaded from the server. Kincore will help in understanding the conformational dynamics of these proteins and guide development of inhibitors targeting specific states. Kincore is available at http://dunbrack.fccc.edu/kincore.
Assuntos
Bases de Dados de Proteínas , Inibidores de Proteínas Quinases/classificação , Proteínas Quinases/classificação , Software , Conformação Proteica , Inibidores de Proteínas Quinases/química , Proteínas Quinases/químicaRESUMO
Reliably scoring and ranking candidate models of protein complexes and assigning their oligomeric state from the structure of the crystal lattice represent outstanding challenges. A community-wide effort was launched to tackle these challenges. The latest resources on protein complexes and interfaces were exploited to derive a benchmark dataset consisting of 1677 homodimer protein crystal structures, including a balanced mix of physiological and non-physiological complexes. The non-physiological complexes in the benchmark were selected to bury a similar or larger interface area than their physiological counterparts, making it more difficult for scoring functions to differentiate between them. Next, 252 functions for scoring protein-protein interfaces previously developed by 13 groups were collected and evaluated for their ability to discriminate between physiological and non-physiological complexes. A simple consensus score generated using the best performing score of each of the 13 groups, and a cross-validated Random Forest (RF) classifier were created. Both approaches showed excellent performance, with an area under the Receiver Operating Characteristic (ROC) curve of 0.93 and 0.94, respectively, outperforming individual scores developed by different groups. Additionally, AlphaFold2 engines recalled the physiological dimers with significantly higher accuracy than the non-physiological set, lending support to the reliability of our benchmark dataset annotations. Optimizing the combined power of interface scoring functions and evaluating it on challenging benchmark datasets appears to be a promising strategy.
Assuntos
Proteínas , Reprodutibilidade dos Testes , Proteínas/metabolismo , Ligação ProteicaRESUMO
BACKGROUND: Early-onset renal cell carcinoma (eoRCC) is typically associated with pathogenic germline variants (PGVs) in RCC familial syndrome genes. However, most eoRCC patients lack PGVs in familial RCC genes and their genetic risk remains undefined. METHODS: Here, we analyzed biospecimens from 22 eoRCC patients that were seen at our institution for genetic counseling and tested negative for PGVs in RCC familial syndrome genes. RESULTS: Analysis of whole-exome sequencing (WES) data found enrichment of candidate pathogenic germline variants in DNA repair and replication genes, including multiple DNA polymerases. Induction of DNA damage in peripheral blood monocytes (PBMCs) significantly elevated numbers of [Formula: see text]H2AX foci, a marker of double-stranded breaks, in PBMCs from eoRCC patients versus PBMCs from matched cancer-free controls. Knockdown of candidate variant genes in Caki RCC cells increased [Formula: see text]H2AX foci. Immortalized patient-derived B cell lines bearing the candidate variants in DNA polymerase genes (POLD1, POLH, POLE, POLK) had DNA replication defects compared to control cells. Renal tumors carrying these DNA polymerase variants were microsatellite stable but had a high mutational burden. Direct biochemical analysis of the variant Pol δ and Pol η polymerases revealed defective enzymatic activities. CONCLUSIONS: Together, these results suggest that constitutional defects in DNA repair underlie a subset of eoRCC cases. Screening patient lymphocytes to identify these defects may provide insight into mechanisms of carcinogenesis in a subset of genetically undefined eoRCCs. Evaluation of DNA repair defects may also provide insight into the cancer initiation mechanisms for subsets of eoRCCs and lay the foundation for targeting DNA repair vulnerabilities in eoRCC.
Assuntos
Carcinoma de Células Renais , Neoplasias Renais , Humanos , Predisposição Genética para Doença , Replicação do DNA , Mutação em Linhagem Germinativa , Células GerminativasRESUMO
The Rosetta software for macromolecular modeling, docking and design is extensively used in laboratories worldwide. During two decades of development by a community of laboratories at more than 60 institutions, Rosetta has been continuously refactored and extended. Its advantages are its performance and interoperability between broad modeling capabilities. Here we review tools developed in the last 5 years, including over 80 methods. We discuss improvements to the score function, user interfaces and usability. Rosetta is available at http://www.rosettacommons.org.
Assuntos
Substâncias Macromoleculares/química , Modelos Moleculares , Proteínas/química , Software , Simulação de Acoplamento Molecular , Peptidomiméticos/química , Conformação ProteicaRESUMO
Targeting protein kinases is an important strategy for intervention in cancer. Inhibitors are directed at the active conformation or a variety of inactive conformations. While attempts have been made to classify these conformations, a structurally rigorous catalog of states has not been achieved. The kinase activation loop is crucial for catalysis and begins with the conserved DFGmotif. This motif is observed in two major classes of conformations, DFGin-a set of active and inactive conformations where the Phe residue is in contact with the C-helix of the N-terminal lobe-and DFGout-an inactive form where Phe occupies the ATP site exposing the C-helix pocket. We have developed a clustering of kinase conformations based on the location of the Phe side chain (DFGin, DFGout, and DFGinter or intermediate) and the backbone dihedral angles of the sequence X-D-F, where X is the residue before the DFGmotif, and the DFG-Phe side-chain rotamer, utilizing a density-based clustering algorithm. We have identified eight distinct conformations and labeled them based on the Ramachandran regions (A, alpha; B, beta; L, left) of the XDF motif and the Phe rotamer (minus, plus, trans). Our clustering divides the DFGin group into six clusters including BLAminus, which contains active structures, and two common inactive forms, BLBplus and ABAminus. DFGout structures are predominantly in the BBAminus conformation, which is essentially required for binding type II inhibitors. The inactive conformations have specific features that make them unable to bind ATP, magnesium, and/or substrates. Our structurally intuitive nomenclature will aid in understanding the conformational dynamics of kinases and structure-based development of kinase drugs.
Assuntos
Algoritmos , Modelos Moleculares , Proteínas Quinases , Domínio Catalítico , Inibidores de Proteínas Quinases/química , Proteínas Quinases/química , Proteínas Quinases/genética , Estrutura Secundária de Proteína , Especificidade por Substrato , Terminologia como AssuntoRESUMO
Many scientific disciplines rely on computational methods for data analysis, model generation, and prediction. Implementing these methods is often accomplished by researchers with domain expertise but without formal training in software engineering or computer science. This arrangement has led to underappreciation of sustainability and maintainability of scientific software tools developed in academic environments. Some software tools have avoided this fate, including the scientific library Rosetta. We use this software and its community as a case study to show how modern software development can be accomplished successfully, irrespective of subject area. Rosetta is one of the largest software suites for macromolecular modeling, with 3.1 million lines of code and many state-of-the-art applications. Since the mid 1990s, the software has been developed collaboratively by the RosettaCommons, a community of academics from over 60 institutions worldwide with diverse backgrounds including chemistry, biology, physiology, physics, engineering, mathematics, and computer science. Developing this software suite has provided us with more than two decades of experience in how to effectively develop advanced scientific software in a global community with hundreds of contributors. Here we illustrate the functioning of this development community by addressing technical aspects (like version control, testing, and maintenance), community-building strategies, diversity efforts, software dissemination, and user support. We demonstrate how modern computational research can thrive in a distributed collaborative community. The practices described here are independent of subject area and can be readily adopted by other software development communities.
Assuntos
Biologia Computacional/métodos , Pesquisa/tendências , Software/tendências , Comportamento Cooperativo , Análise de Dados , Engenharia , Biblioteca Gênica , Humanos , Modelos Moleculares , Pesquisadores , Comportamento Social , Interface Usuário-ComputadorRESUMO
Unlike αß-T lineage cells, where the role of ligand in intrathymic selection is well established, the role of ligand in the development of γδ-T cells remains controversial. Here we provide evidence for the role of a bona fide selecting ligand in shaping the γδ-T cell-receptor (TCR) repertoire. Reactivity of the γδ-TCR with the major histocompatibility complex (MHC) Class Ib ligands, H2-T10/22, is critically dependent upon the EGYEL motif in the complementarity determining region 3 (CDR3) of TCRδ. In the absence of H2-T10/22 ligand, the commitment of H2-T10/22 reactive γδ-T cells to the γδ fate is diminished, and the specification of those γδ committed cells to the IFN-γ or interleukin-17 effector fate is altered. Furthermore, those cells that do adopt the γδ fate and mature exhibit a profound alteration in the γδTCR repertoire, including depletion of the EGYEL motif and reductions in both CDR3δ length and charge. Taken together, these data suggest that ligand plays an important role in shaping the TCR repertoire of γδ-T cells.
Assuntos
Receptores de Antígenos de Linfócitos T gama-delta/metabolismo , Subpopulações de Linfócitos T/fisiologia , Animais , Linhagem da Célula , Ligantes , Camundongos , Ligação Proteica , Receptores de Antígenos de Linfócitos T gama-delta/genéticaRESUMO
Protein loops connect regular secondary structures and contain 4-residue beta turns which represent 63% of the residues in loops. The commonly used classification of beta turns (Type I, I', II, II', VIa1, VIa2, VIb, and VIII) was developed in the 1970s and 1980s from analysis of a small number of proteins of average resolution, and represents only two thirds of beta turns observed in proteins (with a generic class Type IV representing the rest). We present a new clustering of beta-turn conformations from a set of 13,030 turns from 1074 ultra-high resolution protein structures (≤1.2 Å). Our clustering is derived from applying the DBSCAN and k-medoids algorithms to this data set with a metric commonly used in directional statistics applied to the set of dihedral angles from the second and third residues of each turn. We define 18 turn types compared to the 8 classical turn types in common use. We propose a new 2-letter nomenclature for all 18 beta-turn types using Ramachandran region names for the two central residues (e.g., 'A' and 'D' for alpha regions on the left side of the Ramachandran map and 'a' and 'd' for equivalent regions on the right-hand side; classical Type I turns are 'AD' turns and Type I' turns are 'ad'). We identify 11 new types of beta turn, 5 of which are sub-types of classical beta-turn types. Up-to-date statistics, probability densities of conformations, and sequence profiles of beta turns in loops were collected and analyzed. A library of turn types, BetaTurnLib18, and cross-platform software, BetaTurnTool18, which identifies turns in an input protein structure, are freely available and redistributable from dunbrack.fccc.edu/betaturn and github.com/sh-maxim/BetaTurn18. Given the ubiquitous nature of beta turns, this comprehensive study updates understanding of beta turns and should also provide useful tools for protein structure determination, refinement, and prediction programs.
Assuntos
Proteínas/química , Terminologia como Assunto , Algoritmos , Sequência de Aminoácidos , Aminoácidos/química , Análise por Conglomerados , Conformação Proteica , Reprodutibilidade dos TestesRESUMO
The NAGLU challenge of the fourth edition of the Critical Assessment of Genome Interpretation experiment (CAGI4) in 2016, invited participants to predict the impact of variants of unknown significance (VUS) on the enzymatic activity of the lysosomal hydrolase α-N-acetylglucosaminidase (NAGLU). Deficiencies in NAGLU activity lead to a rare, monogenic, recessive lysosomal storage disorder, Sanfilippo syndrome type B (MPS type IIIB). This challenge attracted 17 submissions from 10 groups. We observed that top models were able to predict the impact of missense mutations on enzymatic activity with Pearson's correlation coefficients of up to .61. We also observed that top methods were significantly more correlated with each other than they were with observed enzymatic activity values, which we believe speaks to the importance of sequence conservation across the different methods. Improved functional predictions on the VUS will help population-scale analysis of disease epidemiology and rare variant association analysis.
Assuntos
Acetilglucosaminidase/metabolismo , Biologia Computacional/métodos , Mutação de Sentido Incorreto , Acetilglucosaminidase/genética , Humanos , Modelos Genéticos , Análise de RegressãoRESUMO
Mutations in the cystathionine ß-synthase (CBS) gene are the cause of classical homocystinuria, the most common inborn error in sulfur metabolism. The p.G307S mutation is the most frequent cause of CBS deficiency in Ireland, which has the highest prevalence of CBS deficiency in Europe. Individuals homozygous for this mutation tend to be severely affected and are pyridoxine nonresponsive, but the molecular basis for the strong effects of this mutation is unclear. Here, we characterized a transgenic mouse model lacking endogenous Cbs and expressing human p.G307S CBS protein from a zinc-inducible metallothionein promoter (Tg-G307S Cbs-/-). Unlike mice expressing other mutant CBS alleles, the Tg-G307S transgene could not efficiently rescue neonatal lethality of Cbs-/- in a C57BL/6J background. In a C3H/HeJ background, zinc-induced Tg-G307S Cbs-/- mice expressed high levels of p.G307S CBS in the liver, and this protein variant forms multimers, similarly to mice expressing WT human CBS. However, the p.G307S enzyme had no detectable residual activity. Moreover, treating mice with proteasome inhibitors failed to significantly increase CBS-specific activity. These findings indicated that the G307S substitution likely affects catalytic function as opposed to causing a folding defect. Using molecular dynamics simulation techniques, we found that the G307S substitution likely impairs catalytic function by limiting the ability of the tyrosine at position 308 to assume the proper conformational state(s) required for the formation of the pyridoxal-cystathionine intermediate. These results indicate that the p.G307S CBS is stable but enzymatically inert and therefore unlikely to respond to chaperone-based therapy.
Assuntos
Cistationina beta-Sintase/genética , Mutação , Substituição de Aminoácidos , Animais , Catálise , Cistationina beta-Sintase/química , Cistationina beta-Sintase/metabolismo , Homocistinúria/genética , Humanos , Camundongos , Camundongos Endogâmicos C3H , Camundongos Endogâmicos C57BL , Camundongos Transgênicos , Inibidores de Proteassoma/farmacologia , Conformação Proteica , Estabilidade Proteica , Piridoxina/farmacologiaRESUMO
BACKGROUND: Alpha 1 Antitrypsin (AAT) is a key serum proteinase inhibitor encoded by SERPINA1. Sequence variants of the gene can cause Alpha 1 Antitrypsin Deficiency (AATD), a condition associated with lung and liver disease. The majority of AATD cases are caused by the 'Z' and 'S' variants - single-nucleotide variations (SNVs) that result in amino acid substitutions of E342K and E264V. However, SERPINA1 is highly polymorphic, with numerous potentially clinically relevant variants reported. Novel variants continue to be discovered, and without reports of pathogenicity, it can be difficult for clinicians to determine the best course of treatment. METHODS: We assessed the utility of next-generation sequencing (NGS) and predictive computational analysis to guide the diagnosis of patients suspected of having AATD. Blood samples on serum separator cards were submitted to the DNA1 Advanced Screening Program (Biocerna LLC, Fulton, Maryland, USA) by physicians whose patients were suspected of having AATD. Laboratory analyses included quantification of serum AAT levels, qualitative analysis by isoelectric focusing, and targeted genotyping and NGS of the SERPINA1 gene. Molecular modeling software UCSF Chimera (University College of San Francisco, CA) was used to visualize the positions of amino acid changes as a result of rare/novel SNVs. Predictive software was used to assess the potential pathogenicity of these variants; methods included a support vector machine (SVM) program, PolyPhen-2 (Harvard University, Cambridge, MA), and FoldX (Centre for Genomic Regulation, Barcelona, Spain). RESULTS: Samples from 23 patients were analyzed; 21 rare/novel sequence variants were identified by NGS, including splice variants (n = 2), base pair deletions (n = 1), stop codon insertions (n = 2), and SNVs (n = 16). Computational modeling of protein structures caused by the novel SNVs showed that 8 were probably deleterious, and two were possibly deleterious. For the majority of probably/possibly deleterious SNVs (I50N, P289S, M385T, M221T, D341V, V210E, P369H, V333M and A142D), the mechanism is probably via disruption of the packed hydrophobic core of AAT. Several deleterious variants occurred in combination with more common deficiency alleles, resulting in very low AAT levels. CONCLUSIONS: NGS and computational modeling are useful tools that can facilitate earlier, more precise diagnosis, and consideration for AAT therapy in AATD.
Assuntos
Variação Genética , Modelos Moleculares , Deficiência de alfa 1-Antitripsina/genética , alfa 1-Antitripsina/química , alfa 1-Antitripsina/genética , Adulto , Idoso , Idoso de 80 Anos ou mais , Substituição de Aminoácidos , Feminino , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Masculino , Pessoa de Meia-Idade , Pennsylvania , Conformação Proteica em alfa-Hélice , Splicing de RNA , Análise de Sequência de Proteína , Virulência/genética , alfa 1-Antitripsina/sangue , Deficiência de alfa 1-Antitripsina/diagnósticoRESUMO
A structural-bioinformatics-based computational methodology and framework have been developed for the design of antibodies to targets of interest. RosettaAntibodyDesign (RAbD) samples the diverse sequence, structure, and binding space of an antibody to an antigen in highly customizable protocols for the design of antibodies in a broad range of applications. The program samples antibody sequences and structures by grafting structures from a widely accepted set of the canonical clusters of CDRs (North et al., J. Mol. Biol., 406:228-256, 2011). It then performs sequence design according to amino acid sequence profiles of each cluster, and samples CDR backbones using a flexible-backbone design protocol incorporating cluster-based CDR constraints. Starting from an existing experimental or computationally modeled antigen-antibody structure, RAbD can be used to redesign a single CDR or multiple CDRs with loops of different length, conformation, and sequence. We rigorously benchmarked RAbD on a set of 60 diverse antibody-antigen complexes, using two design strategies-optimizing total Rosetta energy and optimizing interface energy alone. We utilized two novel metrics for measuring success in computational protein design. The design risk ratio (DRR) is equal to the frequency of recovery of native CDR lengths and clusters divided by the frequency of sampling of those features during the Monte Carlo design procedure. Ratios greater than 1.0 indicate that the design process is picking out the native more frequently than expected from their sampled rate. We achieved DRRs for the non-H3 CDRs of between 2.4 and 4.0. The antigen risk ratio (ARR) is the ratio of frequencies of the native amino acid types, CDR lengths, and clusters in the output decoys for simulations performed in the presence and absence of the antigen. For CDRs, we achieved cluster ARRs as high as 2.5 for L1 and 1.5 for H2. For sequence design simulations without CDR grafting, the overall recovery for the native amino acid types for residues that contact the antigen in the native structures was 72% in simulations performed in the presence of the antigen and 48% in simulations performed without the antigen, for an ARR of 1.5. For the non-contacting residues, the ARR was 1.08. This shows that the sequence profiles are able to maintain the amino acid types of these conserved, buried sites, while recovery of the exposed, contacting residues requires the presence of the antigen-antibody interface. We tested RAbD experimentally on both a lambda and kappa antibody-antigen complex, successfully improving their affinities 10 to 50 fold by replacing individual CDRs of the native antibody with new CDR lengths and clusters.
Assuntos
Anticorpos/química , Software , Sequência de Aminoácidos , Animais , Anticorpos/genética , Anticorpos/imunologia , Complexo Antígeno-Anticorpo/química , Complexo Antígeno-Anticorpo/genética , Complexo Antígeno-Anticorpo/imunologia , Regiões Determinantes de Complementaridade , Biologia Computacional , Simulação por Computador , Evolução Molecular Direcionada , Desenho de Fármacos , Humanos , Modelos Moleculares , Método de Monte Carlo , Conformação Proteica , Engenharia de Proteínas/métodos , Engenharia de Proteínas/estatística & dados numéricosRESUMO
The Critical Assessment of Genome Interpretation (CAGI) is a global community experiment to objectively assess computational methods for predicting phenotypic impacts of genomic variation. One of the 2015-2016 competitions focused on predicting the influence of mutations on the allosteric regulation of human liver pyruvate kinase. More than 30 different researchers accessed the challenge data. However, only four groups accepted the challenge. Features used for predictions ranged from evolutionary constraints, mutant site locations relative to active and effector binding sites, and computational docking outputs. Despite the range of expertise and strategies used by predictors, the best predictions were marginally greater than random for modified allostery resulting from mutations. In contrast, several groups successfully predicted which mutations severely reduced enzymatic activity. Nonetheless, poor predictions of allostery stands in stark contrast to the impression left by more than 700 PubMed entries identified using the identifiers "computational + allosteric." This contrast highlights a specialized need for new computational tools and utilization of benchmarks that focus on allosteric regulation.
Assuntos
Benchmarking/métodos , Piruvato Quinase/química , Piruvato Quinase/genética , Regulação Alostérica , Sítio Alostérico , Biologia Computacional/métodos , Bases de Dados Genéticas , Frutosedifosfatos/metabolismo , Humanos , Modelos Moleculares , Mutação , Piruvato Quinase/metabolismoRESUMO
Correct phenotypic interpretation of variants of unknown significance for cancer-associated genes is a diagnostic challenge as genetic screenings gain in popularity in the next-generation sequencing era. The Critical Assessment of Genome Interpretation (CAGI) experiment aims to test and define the state of the art of genotype-phenotype interpretation. Here, we present the assessment of the CAGI p16INK4a challenge. Participants were asked to predict the effect on cellular proliferation of 10 variants for the p16INK4a tumor suppressor, a cyclin-dependent kinase inhibitor encoded by the CDKN2A gene. Twenty-two pathogenicity predictors were assessed with a variety of accuracy measures for reliability in a medical context. Different assessment measures were combined in an overall ranking to provide more robust results. The R scripts used for assessment are publicly available from a GitHub repository for future use in similar assessment exercises. Despite a limited test-set size, our findings show a variety of results, with some methods performing significantly better. Methods combining different strategies frequently outperform simpler approaches. The best predictor, Yang&Zhou lab, uses a machine learning method combining an empirical energy function measuring protein stability with an evolutionary conservation term. The p16INK4a challenge highlights how subtle structural effects can neutralize otherwise deleterious variants.
Assuntos
Biologia Computacional/métodos , Inibidor de Quinase Dependente de Ciclina p18/genética , Variação Genética , Linhagem Celular Tumoral , Proliferação de Células , Simulação por Computador , Inibidor p16 de Quinase Dependente de Ciclina , Inibidor de Quinase Dependente de Ciclina p18/química , Bases de Dados Genéticas , Predisposição Genética para Doença , Humanos , Aprendizado de Máquina , Estabilidade ProteicaRESUMO
Classification of the structures of the complementarity determining regions (CDRs) of antibodies is critically important for antibody structure prediction and computational design. We have previously performed a clustering of antibody CDR conformations and defined a systematic nomenclature consisting of the CDR, length and an integer starting from the largest to the smallest cluster in the data set (e.g. L1-11-1). We present PyIgClassify (for Python-based immunoglobulin classification; available at http://dunbrack2.fccc.edu/pyigclassify/), a database and web server that provides access to assignments of all CDR structures in the PDB to our classification system. The database includes assignments to the IMGT germline V regions for heavy and light chains for several species. For humanized antibodies, the assignment of the frameworks is to human germlines and the CDRs to the germlines of mice or other species sources. The database can be searched by PDB entry, cluster identifier and IMGT germline group (e.g. human IGHV1). The entire database is downloadable so that users may filter the data as needed for antibody structure analysis, prediction and design.