Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 23
Filtrar
1.
Proteomics ; 23(17): e2200323, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-37365936

RESUMO

Reliably scoring and ranking candidate models of protein complexes and assigning their oligomeric state from the structure of the crystal lattice represent outstanding challenges. A community-wide effort was launched to tackle these challenges. The latest resources on protein complexes and interfaces were exploited to derive a benchmark dataset consisting of 1677 homodimer protein crystal structures, including a balanced mix of physiological and non-physiological complexes. The non-physiological complexes in the benchmark were selected to bury a similar or larger interface area than their physiological counterparts, making it more difficult for scoring functions to differentiate between them. Next, 252 functions for scoring protein-protein interfaces previously developed by 13 groups were collected and evaluated for their ability to discriminate between physiological and non-physiological complexes. A simple consensus score generated using the best performing score of each of the 13 groups, and a cross-validated Random Forest (RF) classifier were created. Both approaches showed excellent performance, with an area under the Receiver Operating Characteristic (ROC) curve of 0.93 and 0.94, respectively, outperforming individual scores developed by different groups. Additionally, AlphaFold2 engines recalled the physiological dimers with significantly higher accuracy than the non-physiological set, lending support to the reliability of our benchmark dataset annotations. Optimizing the combined power of interface scoring functions and evaluating it on challenging benchmark datasets appears to be a promising strategy.


Assuntos
Proteínas , Reprodutibilidade dos Testes , Proteínas/metabolismo , Ligação Proteica
2.
bioRxiv ; 2023 Mar 17.
Artigo em Inglês | MEDLINE | ID: mdl-36945596

RESUMO

The Ser/Thr protein phosphatase 2A (PP2A) is a highly conserved collection of heterotrimeric holoenzymes responsible for the dephosphorylation of many regulated phosphoproteins. Substrate recognition and the integration of regulatory cues are mediated by B regulatory subunits that are complexed to the catalytic subunit (C) by a scaffold protein (A). PP2A/B55 substrate recruitment was thought to be mediated by charge-charge interactions between the surface of B55α and its substrates. Challenging this view, we recently discovered a conserved SLiM [ RK ]- V -x-x-[ VI ]- R in a range of proteins, including substrates such as the retinoblastoma-related protein p107 and TAU (Fowle et al. eLife 2021;10:e63181). Here we report the identification of this SLiM in FAM122A, an inhibitor of B55α/PP2A. This conserved SLiM is necessary for FAM122A binding to B55α in vitro and in cells. Computational structure prediction with AlphaFold2 predicts an interaction consistent with the mutational and biochemical data and supports a mechanism whereby FAM122A uses the 'SLiM' in the form of a short α-helix to dock to the B55α top groove. In this model, FAM122A spatially constrains substrate access by occluding the catalytic subunit with a second α-helix immediately adjacent to helix 1. Consistently, FAM122A functions as a competitive inhibitor as it prevents binding of substrates in in vitro competition assays and the dephosphorylation of CDK substrates by B55α/PP2A in cell lysates. Ablation of FAM122A in human cell lines reduces the rate of proliferation, progression through cell cycle transitions and abrogates G1/S and intra-S phase cell cycle checkpoints. FAM122A-KO in HEK293 cells results in attenuation of CHK1 and CHK2 activation in response to replication stress. Overall, these data strongly suggest that FAM122A is a 'SLiM'-dependent, substrate-competitive inhibitor of B55α/PP2A that suppresses multiple functions of B55α in the DNA damage response and in timely progression through the cell cycle interphase.

3.
Nucleic Acids Res ; 51(D1): D466-D478, 2023 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-36300618

RESUMO

Proteins often act through oligomeric interactions with other proteins. X-ray crystallography and cryo-electron microscopy provide detailed information on the structures of biological assemblies, defined as the most likely biologically relevant structures derived from experimental data. In crystal structures, the most relevant assembly may be ambiguously determined, since multiple assemblies observed in the crystal lattice may be plausible. It is estimated that 10-15% of PDB entries may have incorrect or ambiguous assembly annotations. Accurate assemblies are required for understanding functional data and training of deep learning methods for predicting assembly structures. As with any other kind of biological data, replication via multiple independent experiments provides important validation for the determination of biological assembly structures. Here we present the Protein Common Assembly Database (ProtCAD), which presents clusters of protein assembly structures observed in independent structure determinations of homologous proteins in the Protein Data Bank (PDB). ProtCAD is searchable by PDB entry, UniProt identifiers, or Pfam domain designations and provides downloads of coordinate files, PyMol scripts, and publicly available assembly annotations for each cluster of assemblies. About 60% of PDB entries contain assemblies in clusters of at least 2 independent experiments. All clusters and coordinates are available on ProtCAD web site (http://dunbrack2.fccc.edu/protcad).


Assuntos
Bases de Dados de Proteínas , Complexos Multiproteicos , Proteínas , Microscopia Crioeletrônica , Cristalografia por Raios X , Proteínas/química , Complexos Multiproteicos/química
4.
Elife ; 102021 10 18.
Artigo em Inglês | MEDLINE | ID: mdl-34661528

RESUMO

Protein phosphorylation is a reversible post-translation modification essential in cell signaling. This study addresses a long-standing question as to how the most abundant serine/threonine protein phosphatase 2 (PP2A) holoenzyme, PP2A/B55α, specifically recognizes substrates and presents them to the enzyme active site. Here, we show how the PP2A regulatory subunit B55α recruits p107, a pRB-related tumor suppressor and B55α substrate. Using molecular and cellular approaches, we identified a conserved region 1 (R1, residues 615-626) encompassing the strongest p107 binding site. This enabled us to identify an 'HxRVxxV619-625' short linear motif (SLiM) in p107 as necessary for B55α binding and dephosphorylation of the proximal pSer-615 in vitro and in cells. Numerous B55α/PP2A substrates, including TAU, contain a related SLiM C-terminal from a proximal phosphosite, 'p[ST]-P-x(4,10)-[RK]-V-x-x-[VI]-R.' Mutation of conserved SLiM residues in TAU dramatically inhibits dephosphorylation by PP2A/B55α, validating its generality. A data-guided computational model details the interaction of residues from the conserved p107 SLiM, the B55α groove, and phosphosite presentation. Altogether, these data provide key insights into PP2A/B55α's mechanisms of substrate recruitment and active site engagement, and also facilitate identification and validation of new substrates, a key step towards understanding PP2A/B55α's role in multiple cellular processes.


Assuntos
Proteína Fosfatase 2/genética , Proteína p107 Retinoblastoma-Like/genética , Células HEK293 , Holoenzimas/metabolismo , Humanos , Fosforilação , Proteína Fosfatase 2/metabolismo , Proteína p107 Retinoblastoma-Like/metabolismo
5.
Nat Commun ; 11(1): 711, 2020 02 05.
Artigo em Inglês | MEDLINE | ID: mdl-32024829

RESUMO

Structural information on the interactions of proteins with other molecules is plentiful, and for some proteins and protein families, there may be 100s of available structures. It can be very difficult for a scientist who is not trained in structural bioinformatics to access this information comprehensively. Previously, we developed the Protein Common Interface Database (ProtCID), which provided clusters of the interfaces of full-length protein chains as a means of identifying biological assemblies. Because proteins consist of domains that act as modular functional units, we have extended the analysis in ProtCID to the individual domain level. This has greatly increased the number of large protein-protein clusters in ProtCID, enabling the generation of hypotheses on the structures of biological assemblies of many systems. The analysis of domain families allows us to extend ProtCID to the interactions of domains with peptides, nucleic acids, and ligands. ProtCID provides complete annotations and coordinate sets for every cluster.


Assuntos
Bases de Dados de Proteínas , Domínios e Motivos de Interação entre Proteínas , Mapeamento de Interação de Proteínas , Internet , Modelos Moleculares , Multimerização Proteica
6.
BMC Med Genet ; 20(1): 125, 2019 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-31307431

RESUMO

BACKGROUND: Alpha 1 Antitrypsin (AAT) is a key serum proteinase inhibitor encoded by SERPINA1. Sequence variants of the gene can cause Alpha 1 Antitrypsin Deficiency (AATD), a condition associated with lung and liver disease. The majority of AATD cases are caused by the 'Z' and 'S' variants - single-nucleotide variations (SNVs) that result in amino acid substitutions of E342K and E264V. However, SERPINA1 is highly polymorphic, with numerous potentially clinically relevant variants reported. Novel variants continue to be discovered, and without reports of pathogenicity, it can be difficult for clinicians to determine the best course of treatment. METHODS: We assessed the utility of next-generation sequencing (NGS) and predictive computational analysis to guide the diagnosis of patients suspected of having AATD. Blood samples on serum separator cards were submitted to the DNA1 Advanced Screening Program (Biocerna LLC, Fulton, Maryland, USA) by physicians whose patients were suspected of having AATD. Laboratory analyses included quantification of serum AAT levels, qualitative analysis by isoelectric focusing, and targeted genotyping and NGS of the SERPINA1 gene. Molecular modeling software UCSF Chimera (University College of San Francisco, CA) was used to visualize the positions of amino acid changes as a result of rare/novel SNVs. Predictive software was used to assess the potential pathogenicity of these variants; methods included a support vector machine (SVM) program, PolyPhen-2 (Harvard University, Cambridge, MA), and FoldX (Centre for Genomic Regulation, Barcelona, Spain). RESULTS: Samples from 23 patients were analyzed; 21 rare/novel sequence variants were identified by NGS, including splice variants (n = 2), base pair deletions (n = 1), stop codon insertions (n = 2), and SNVs (n = 16). Computational modeling of protein structures caused by the novel SNVs showed that 8 were probably deleterious, and two were possibly deleterious. For the majority of probably/possibly deleterious SNVs (I50N, P289S, M385T, M221T, D341V, V210E, P369H, V333M and A142D), the mechanism is probably via disruption of the packed hydrophobic core of AAT. Several deleterious variants occurred in combination with more common deficiency alleles, resulting in very low AAT levels. CONCLUSIONS: NGS and computational modeling are useful tools that can facilitate earlier, more precise diagnosis, and consideration for AAT therapy in AATD.


Assuntos
Variação Genética , Modelos Moleculares , Deficiência de alfa 1-Antitripsina/genética , alfa 1-Antitripsina/química , alfa 1-Antitripsina/genética , Adulto , Idoso , Idoso de 80 Anos ou mais , Substituição de Aminoácidos , Feminino , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Masculino , Pessoa de Meia-Idade , Pennsylvania , Conformação Proteica em alfa-Hélice , Splicing de RNA , Análise de Sequência de Proteína , Virulência/genética , alfa 1-Antitripsina/sangue , Deficiência de alfa 1-Antitripsina/diagnóstico
7.
Hum Mutat ; 40(9): 1519-1529, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31342580

RESUMO

The NAGLU challenge of the fourth edition of the Critical Assessment of Genome Interpretation experiment (CAGI4) in 2016, invited participants to predict the impact of variants of unknown significance (VUS) on the enzymatic activity of the lysosomal hydrolase α-N-acetylglucosaminidase (NAGLU). Deficiencies in NAGLU activity lead to a rare, monogenic, recessive lysosomal storage disorder, Sanfilippo syndrome type B (MPS type IIIB). This challenge attracted 17 submissions from 10 groups. We observed that top models were able to predict the impact of missense mutations on enzymatic activity with Pearson's correlation coefficients of up to .61. We also observed that top methods were significantly more correlated with each other than they were with observed enzymatic activity values, which we believe speaks to the importance of sequence conservation across the different methods. Improved functional predictions on the VUS will help population-scale analysis of disease epidemiology and rare variant association analysis.


Assuntos
Acetilglucosaminidase/metabolismo , Biologia Computacional/métodos , Mutação de Sentido Incorreto , Acetilglucosaminidase/genética , Humanos , Modelos Genéticos , Análise de Regressão
8.
Curr Opin Struct Biol ; 55: 34-49, 2019 04.
Artigo em Inglês | MEDLINE | ID: mdl-30965224

RESUMO

More than half of all structures in the PDB are assemblies of two or more proteins, including both homooligomers and heterooligomers. Structural information on these assemblies comes from X-ray crystallography, NMR, and cryo-EM spectroscopy. The correct assembly in an X-ray structure is often ambiguous, and computational methods have been developed to identify the most likely biologically relevant assembly based on physical properties of assemblies and sequence conservation in interfaces. Taking advantage of the large number of structures now available, some of the most recent methods have relied on similarity of interfaces and assemblies across structures of homologous proteins.


Assuntos
Complexos Multiproteicos/química , Proteínas/química , Microscopia Crioeletrônica/métodos , Cristalografia por Raios X/métodos , Bases de Dados de Proteínas , Conformação Proteica , Multimerização Proteica
9.
Antiviral Res ; 159: 1-12, 2018 11.
Artigo em Inglês | MEDLINE | ID: mdl-30201396

RESUMO

Native agarose gel electrophoresis-based particle gel assay has been commonly used for examination of hepatitis B virus (HBV) capsid assembly and pregenomic RNA encapsidation in HBV replicating cells. Interestingly, treatment of cells with several chemotypes of HBV core protein allosteric modulators (CpAMs) induced the assembly of both empty and DNA-containing capsids with faster electrophoresis mobility. In an effort to determine the physical basis of CpAM-induced capsid mobility shift, we found that the surface charge, but not the size, of capsids is the primary determinant of electrophoresis mobility. Specifically, through alanine scanning mutagenesis analysis of twenty-seven charged amino acids in core protein assembly domain and hinge region, we showed that except for K7 and E8, substitution of glutamine acid (E) or aspartic acid (D) on the surface of capsids reduced their mobility, but substitution of lysine (K) or arginine (R) on the surface of capsids increased their mobility in variable degrees. However, alanine substitution of the charged amino acids that are not exposed on the surface of capsid did not apparently alter capsid mobility. Hence, CpAM-induced electrophoresis mobility shift of capsids may reflect the global alteration of capsid structure that changes the exposure and/or ionization of charged amino acid side chains of core protein. Our findings imply that CpAM inhibition of pgRNA encapsidation is possibly due to the assembly of structurally altered nucleocapsids. Practically, capsid electrophoresis mobility shift is a diagnostic marker of compounds that target core protein assembly and predicts sensitivity of HBV strains to specific CpAMs.


Assuntos
Antivirais/farmacologia , Capsídeo/metabolismo , Vírus da Hepatite B/fisiologia , RNA/metabolismo , Proteínas do Core Viral/genética , Montagem de Vírus , Regulação Alostérica , Proteínas do Capsídeo/metabolismo , Eletroforese , Ensaio de Desvio de Mobilidade Eletroforética , Células Hep G2 , Antígenos do Núcleo do Vírus da Hepatite B/genética , Vírus da Hepatite B/genética , Humanos , RNA Viral/metabolismo , Replicação Viral
10.
Oncotarget ; 8(25): 39945-39962, 2017 Jun 20.
Artigo em Inglês | MEDLINE | ID: mdl-28591715

RESUMO

Deficient mismatch repair (MMR) and microsatellite instability (MSI) contribute to ~15% of colorectal cancer (CRCs). We hypothesized MSI leads to mutations in DNA repair proteins including BRCA2 and cancer drivers including EGFR. We analyzed mutations among a discovery cohort of 26 MSI-High (MSI-H) and 558 non-MSI-H CRCs profiled at Caris Life Sciences. Caris-profiled MSI-H CRCs had high mutation rates (50% vs 14% in non-MSI-H, P < 0.0001) in BRCA2. Of 1104 profiled CRCs from a second cohort (COSMIC), MSH2/MLH1-mutant CRCs showed higher mutation rates in BRCA2 compared to non-MSH2/MLH1-mutant tumors (38% vs 6%, P < 0.0000001). BRCA2 mutations in MSH2/MLH1-mutant CRCs included 75 unique mutations not known to occur in breast or pancreatic cancer per COSMIC v73. Only 5 deleterious BRCA2 mutations in CRC were previously reported in the BIC database as germ-line mutations in breast cancer. Some BRCA2 mutations were predicted to disrupt interactions with partner proteins DSS1 and RAD51. Some CRCs harbored multiple BRCA2 mutations. EGFR was mutated in 45.5% of MSH2/MLH1-mutant and 6.5% of non-MSH2/MLH1-mutant tumors (P < 0.0000001). Approximately 15% of EGFR mutations found may be actionable through TKI therapy, including N700D, G719D, T725M, T790M, and E884K. NTRK gene mutations were identified in MSH2/MLH1-mutant CRC including NTRK1 I699V, NTRK2 P716S, and NTRK3 R745L. Our findings have clinical relevance regarding therapeutic targeting of BRCA2 vulnerabilities, EGFR mutations or other identified oncogenic drivers such as NTRK in MSH2/MLH1-mutant CRCs or other tumors with mismatch repair deficiency.


Assuntos
Proteína BRCA2/genética , Neoplasias Colorretais/genética , Receptores ErbB/genética , Mutação , Receptor trkA/genética , Receptor trkB/genética , Receptor trkC/genética , Proteína BRCA2/química , Estudos de Coortes , Reparo de Erro de Pareamento de DNA/genética , Receptores ErbB/química , Frequência do Gene , Humanos , Instabilidade de Microssatélites , Modelos Moleculares , Proteína 1 Homóloga a MutL/genética , Proteína 2 Homóloga a MutS/genética , Domínios Proteicos , Receptor trkA/química , Receptor trkB/química , Receptor trkC/química
11.
Hum Mutat ; 38(9): 1123-1131, 2017 09.
Artigo em Inglês | MEDLINE | ID: mdl-28370845

RESUMO

The Critical Assessment of Genome Interpretation (CAGI) is a global community experiment to objectively assess computational methods for predicting phenotypic impacts of genomic variation. One of the 2015-2016 competitions focused on predicting the influence of mutations on the allosteric regulation of human liver pyruvate kinase. More than 30 different researchers accessed the challenge data. However, only four groups accepted the challenge. Features used for predictions ranged from evolutionary constraints, mutant site locations relative to active and effector binding sites, and computational docking outputs. Despite the range of expertise and strategies used by predictors, the best predictions were marginally greater than random for modified allostery resulting from mutations. In contrast, several groups successfully predicted which mutations severely reduced enzymatic activity. Nonetheless, poor predictions of allostery stands in stark contrast to the impression left by more than 700 PubMed entries identified using the identifiers "computational + allosteric." This contrast highlights a specialized need for new computational tools and utilization of benchmarks that focus on allosteric regulation.


Assuntos
Benchmarking/métodos , Piruvato Quinase/química , Piruvato Quinase/genética , Regulação Alostérica , Sítio Alostérico , Biologia Computacional/métodos , Bases de Dados Genéticas , Frutosedifosfatos/metabolismo , Humanos , Modelos Moleculares , Mutação , Piruvato Quinase/metabolismo
12.
Hum Mutat ; 38(9): 1042-1050, 2017 09.
Artigo em Inglês | MEDLINE | ID: mdl-28440912

RESUMO

Correct phenotypic interpretation of variants of unknown significance for cancer-associated genes is a diagnostic challenge as genetic screenings gain in popularity in the next-generation sequencing era. The Critical Assessment of Genome Interpretation (CAGI) experiment aims to test and define the state of the art of genotype-phenotype interpretation. Here, we present the assessment of the CAGI p16INK4a challenge. Participants were asked to predict the effect on cellular proliferation of 10 variants for the p16INK4a tumor suppressor, a cyclin-dependent kinase inhibitor encoded by the CDKN2A gene. Twenty-two pathogenicity predictors were assessed with a variety of accuracy measures for reliability in a medical context. Different assessment measures were combined in an overall ranking to provide more robust results. The R scripts used for assessment are publicly available from a GitHub repository for future use in similar assessment exercises. Despite a limited test-set size, our findings show a variety of results, with some methods performing significantly better. Methods combining different strategies frequently outperform simpler approaches. The best predictor, Yang&Zhou lab, uses a machine learning method combining an empirical energy function measuring protein stability with an evolutionary conservation term. The p16INK4a challenge highlights how subtle structural effects can neutralize otherwise deleterious variants.


Assuntos
Biologia Computacional/métodos , Inibidor de Quinase Dependente de Ciclina p18/genética , Variação Genética , Linhagem Celular Tumoral , Proliferação de Células , Simulação por Computador , Inibidor p16 de Quinase Dependente de Ciclina , Inibidor de Quinase Dependente de Ciclina p18/química , Bases de Dados Genéticas , Predisposição Genética para Doença , Humanos , Aprendizado de Máquina , Estabilidade Proteica
13.
Proteins ; 84 Suppl 1: 370-91, 2016 09.
Artigo em Inglês | MEDLINE | ID: mdl-27181425

RESUMO

In CASP11, the organizers sought to bring the biological inferences from predicted structures to the fore. To accomplish this, we assessed the models for their ability to perform quantifiable tasks related to biological function. First, for 10 targets that were probable homodimers, we measured the accuracy of docking the models into homodimers as a function of GDT-TS of the monomers, which produced characteristic L-shaped plots. At low GDT-TS, none of the models could be docked correctly as homodimers. Above GDT-TS of ∼60%, some models formed correct homodimers in one of the largest docked clusters, while many other models at the same values of GDT-TS did not. Docking was more successful when many of the templates shared the same homodimer. Second, we docked a ligand from an experimental structure into each of the models of one of the targets. Docking to the models with two different programs produced poor ligand RMSDs with the experimental structure. Measures that evaluated similarity of contacts were reasonable for some of the models, although there was not a significant correlation with model accuracy. Finally, we assessed whether models would be useful in predicting the phenotypes of missense mutations in three human targets by comparing features calculated from the models with those calculated from the experimental structures. The models were successful in reproducing accessible surface areas but there was little correlation of model accuracy with calculation of FoldX evaluation of the change in free energy between the wild-type and the mutant. Proteins 2016; 84(Suppl 1):370-391. © 2016 Wiley Periodicals, Inc.


Assuntos
Amidoidrolases/química , Proteínas Quinases Dependentes de AMP Cíclico/química , Proteína gp120 do Envelope de HIV/química , Fator de Crescimento de Hepatócito/química , Modelos Estatísticos , Simulação de Acoplamento Molecular , Proteínas Proto-Oncogênicas/química , Amidoidrolases/genética , Amidoidrolases/metabolismo , Sítios de Ligação , Biologia Computacional/métodos , Proteínas Quinases Dependentes de AMP Cíclico/genética , Proteínas Quinases Dependentes de AMP Cíclico/metabolismo , Proteínas Ligadas por GPI/química , Proteínas Ligadas por GPI/genética , Proteínas Ligadas por GPI/metabolismo , Proteína gp120 do Envelope de HIV/genética , Proteína gp120 do Envelope de HIV/metabolismo , Fator de Crescimento de Hepatócito/genética , Fator de Crescimento de Hepatócito/metabolismo , Humanos , Ligantes , Mutação de Sentido Incorreto , Fenótipo , Ligação Proteica , Domínios Proteicos , Dobramento de Proteína , Multimerização Proteica , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína , Proteínas Proto-Oncogênicas/genética , Proteínas Proto-Oncogênicas/metabolismo , Homologia de Sequência de Aminoácidos , Relação Estrutura-Atividade , Termodinâmica
14.
Proteins ; 84 Suppl 1: 200-20, 2016 09.
Artigo em Inglês | MEDLINE | ID: mdl-27081927

RESUMO

We present the assessment of predictions submitted in the template-based modeling (TBM) category of CASP11 (Critical Assessment of Protein Structure Prediction). Model quality was judged on the basis of global and local measures of accuracy on all atoms including side chains. The top groups on 39 human-server targets based on model 1 predictions were LEER, Zhang, LEE, MULTICOM, and Zhang-Server. The top groups on 81 targets by server groups based on model 1 predictions were Zhang-Server, nns, BAKER-ROSETTASERVER, QUARK, and myprotein-me. In CASP11, the best models for most targets were equal to or better than the best template available in the Protein Data Bank, even for targets with poor templates. The overall performance in CASP11 is similar to the performance of predictors in CASP10 with slightly better performance on the hardest targets. For most targets, assessment measures exhibited bimodal probability density distributions. Multi-dimensional scaling of an RMSD matrix for each target typically revealed a single cluster with models similar to the target structure, with a mode in the GDT-TS density between 40 and 90, and a wide distribution of models highly divergent from each other and from the experimental structure, with density mode at a GDT-TS value of ∼20. The models in this peak in the density were either compact models with entirely the wrong fold, or highly non-compact models. The results argue for a density-driven approach in future CASP TBM assessments that accounts for the bimodal nature of these distributions instead of Z scores, which assume a unimodal, Gaussian distribution. Proteins 2016; 84(Suppl 1):200-220. © 2016 Wiley Periodicals, Inc.


Assuntos
Biologia Computacional/estatística & dados numéricos , Modelos Moleculares , Modelos Estatísticos , Proteínas/química , Software , Algoritmos , Animais , Archaea/química , Bactérias/química , Biologia Computacional/métodos , Simulação por Computador , Bases de Dados de Proteínas , Drosophila melanogaster/química , Humanos , Internet , Dobramento de Proteína , Domínios e Motivos de Interação entre Proteínas , Estrutura Secundária de Proteína , Homologia Estrutural de Proteína , Vírus/química
15.
Sci Signal ; 8(405): rs13, 2015 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-26628682

RESUMO

Protein kinase autophosphorylation is a common regulatory mechanism in cell signaling pathways. Crystal structures of several homomeric protein kinase complexes have a serine, threonine, or tyrosine autophosphorylation site of one kinase monomer located in the active site of another monomer, a structural complex that we call an "autophosphorylation complex." We developed and applied a structural bioinformatics method to identify all such autophosphorylation complexes in x-ray crystallographic structures in the Protein Data Bank (PDB). We identified 15 autophosphorylation complexes in the PDB, of which five complexes had not previously been described in the publications describing the crystal structures. These five complexes consist of tyrosine residues in the N-terminal juxtamembrane regions of colony-stimulating factor 1 receptor (CSF1R, Tyr(561)) and ephrin receptor A2 (EPHA2, Tyr(594)), tyrosine residues in the activation loops of the SRC kinase family member LCK (Tyr(394)) and insulin-like growth factor 1 receptor (IGF1R, Tyr(1166)), and a serine in a nuclear localization signal region of CDC-like kinase 2 (CLK2, Ser(142)). Mutations in the complex interface may alter autophosphorylation activity and contribute to disease; therefore, we mutated residues in the autophosphorylation complex interface of LCK and found that two mutations impaired autophosphorylation (T445V and N446A) and mutation of Pro(447) to Ala, Gly, or Leu increased autophosphorylation. The identified autophosphorylation sites are conserved in many kinases, suggesting that, by homology, these complexes may provide insight into autophosphorylation complex interfaces of kinases that are relevant drug targets.


Assuntos
Bases de Dados de Proteínas , Proteína Tirosina Quinase p56(lck) Linfócito-Específica , Proteínas Serina-Treonina Quinases , Proteínas Tirosina Quinases , Receptor EphA2 , Receptor de Fator Estimulador de Colônias de Macrófagos , Substituição de Aminoácidos , Células HEK293 , Humanos , Proteína Tirosina Quinase p56(lck) Linfócito-Específica/química , Proteína Tirosina Quinase p56(lck) Linfócito-Específica/genética , Proteína Tirosina Quinase p56(lck) Linfócito-Específica/metabolismo , Mutação de Sentido Incorreto , Fosforilação/fisiologia , Proteínas Serina-Treonina Quinases/química , Proteínas Serina-Treonina Quinases/genética , Proteínas Serina-Treonina Quinases/metabolismo , Estrutura Terciária de Proteína , Proteínas Tirosina Quinases/química , Proteínas Tirosina Quinases/genética , Proteínas Tirosina Quinases/metabolismo , Receptor EphA2/química , Receptor EphA2/genética , Receptor EphA2/metabolismo , Receptor de Fator Estimulador de Colônias de Macrófagos/química , Receptor de Fator Estimulador de Colônias de Macrófagos/genética , Receptor de Fator Estimulador de Colônias de Macrófagos/metabolismo
16.
Nucleic Acids Res ; 43(Database issue): D432-8, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25392411

RESUMO

Classification of the structures of the complementarity determining regions (CDRs) of antibodies is critically important for antibody structure prediction and computational design. We have previously performed a clustering of antibody CDR conformations and defined a systematic nomenclature consisting of the CDR, length and an integer starting from the largest to the smallest cluster in the data set (e.g. L1-11-1). We present PyIgClassify (for Python-based immunoglobulin classification; available at http://dunbrack2.fccc.edu/pyigclassify/), a database and web server that provides access to assignments of all CDR structures in the PDB to our classification system. The database includes assignments to the IMGT germline V regions for heavy and light chains for several species. For humanized antibodies, the assignment of the frameworks is to human germlines and the CDRs to the germlines of mice or other species sources. The database can be searched by PDB entry, cluster identifier and IMGT germline group (e.g. human IGHV1). The entire database is downloadable so that users may filter the data as needed for antibody structure analysis, prediction and design.


Assuntos
Regiões Determinantes de Complementaridade/química , Bases de Dados de Proteínas , Animais , Regiões Determinantes de Complementaridade/classificação , Humanos , Cadeias Pesadas de Imunoglobulinas/química , Cadeias Leves de Imunoglobulina/química , Internet , Camundongos
17.
PLoS One ; 9(6): e98309, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24922057

RESUMO

UNLABELLED: Many if not most proteins function in oligomeric assemblies of one or more protein sequences. The Protein Data Bank provides coordinates for biological assemblies for each entry, at least 60% of which are dimers or larger assemblies. BioAssemblyModeler (BAM) is a graphical user interface to the basic steps in homology modeling of protein homooligomers and heterooligomers from the biological assemblies provided in the PDB. BAM takes as input up to six different protein sequences and begins by assigning Pfam domains to the target sequences. The program utilizes a complete assignment of Pfam domains to sequences in the PDB, PDBfam (http://dunbrack2.fccc.edu/protcid/pdbfam), to obtain templates that contain any or all of the domains assigned to the target sequence(s). The contents of the biological assemblies of potential templates are provided, and alignments of the target sequences to the templates are produced with a profile-profile alignment algorithm. BAM provides for visual examination and mouse-editing of the alignments supported by target and template secondary structure information and a 3D viewer of the template biological assembly. Side-chain coordinates for a model of the biological assembly are built with the program SCWRL4. A built-in protocol navigation system guides the user through all stages of homology modeling from input sequences to a three-dimensional model of the target complex. AVAILABILITY: http://dunbrack.fccc.edu/BAM.


Assuntos
Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Homologia de Sequência de Aminoácidos , Software , Animais , Humanos , Subunidades Proteicas/química
18.
Proteins ; 81(2): 199-213, 2013 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-22965855

RESUMO

Single nucleotide polymorphisms (SNPs) are the most frequent variation in the human genome. Nonsynonymous SNPs that lead to missense mutations can be neutral or deleterious, and several computational methods have been presented that predict the phenotype of human missense mutations. These methods use sequence-based and structure-based features in various combinations, relying on different statistical distributions of these features for deleterious and neutral mutations. One structure-based feature that has not been studied significantly is the accessible surface area within biologically relevant oligomeric assemblies. These assemblies are different from the crystallographic asymmetric unit for more than half of X-ray crystal structures. We find that mutations in the core of proteins or in the interfaces in biological assemblies are significantly more likely to be disease-associated than those on the surface of the biological assemblies. For structures with more than one protein in the biological assembly (whether the same sequence or different), we find the accessible surface area from biological assemblies provides a statistically significant improvement in prediction over the accessible surface area of monomers from protein crystal structures (P = 6e-5). When adding this information to sequence-based features such as the difference between wildtype and mutant position-specific profile scores, the improvement from biological assemblies is statistically significant but much smaller (P = 0.018). Combining this information with sequence-based features in a support vector machine leads to 82% accuracy on a balanced dataset of 50% disease-associated mutations from SwissVar and 50% neutral mutations from human/primate sequence differences in orthologous proteins.


Assuntos
Bases de Dados de Proteínas , Mutação de Sentido Incorreto , Proteínas/química , Proteínas/genética , Máquina de Vetores de Suporte , Sequência de Aminoácidos , Animais , Humanos , Dados de Sequência Molecular , Razão de Chances , Fenótipo , Primatas , Conformação Proteica , Subunidades Proteicas , Alinhamento de Sequência
19.
Bioinformatics ; 28(21): 2763-72, 2012 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-22942020

RESUMO

MOTIVATION: Automating the assignment of existing domain and protein family classifications to new sets of sequences is an important task. Current methods often miss assignments because remote relationships fail to achieve statistical significance. Some assignments are not as long as the actual domain definitions because local alignment methods often cut alignments short. Long insertions in query sequences often erroneously result in two copies of the domain assigned to the query. Divergent repeat sequences in proteins are often missed. RESULTS: We have developed a multilevel procedure to produce nearly complete assignments of protein families of an existing classification system to a large set of sequences. We apply this to the task of assigning Pfam domains to sequences and structures in the Protein Data Bank (PDB). We found that HHsearch alignments frequently scored more remotely related Pfams in Pfam clans higher than closely related Pfams, thus, leading to erroneous assignment at the Pfam family level. A greedy algorithm allowing for partial overlaps was, thus, applied first to sequence/HMM alignments, then HMM-HMM alignments and then structure alignments, taking care to join partial alignments split by large insertions into single-domain assignments. Additional assignment of repeat Pfams with weaker E-values was allowed after stronger assignments of the repeat HMM. Our database of assignments, presented in a database called PDBfam, contains Pfams for 99.4% of chains >50 residues. AVAILABILITY: The Pfam assignment data in PDBfam are available at http://dunbrack2.fccc.edu/ProtCid/PDBfam, which can be searched by PDB codes and Pfam identifiers. They will be updated regularly.


Assuntos
Algoritmos , Bases de Dados de Proteínas , Alinhamento de Sequência/métodos , Sequência de Aminoácidos , Bases de Dados Factuais , Armazenamento e Recuperação da Informação , Modelos Moleculares , Modelos Estatísticos , Proteínas/química , Proteínas/genética
20.
Nucleic Acids Res ; 39(Database issue): D761-70, 2011 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-21036862

RESUMO

The protein common interface database (ProtCID) is a database that contains clusters of similar homodimeric and heterodimeric interfaces observed in multiple crystal forms (CFs). Such interfaces, especially of homologous but non-identical proteins, have been associated with biologically relevant interactions. In ProtCID, protein chains in the protein data bank (PDB) are grouped based on their PFAM domain architectures. For a single PFAM architecture, all the dimers present in each CF are constructed and compared with those in other CFs that contain the same domain architecture. Interfaces occurring in two or more CFs comprise an interface cluster in the database. The same process is used to compare heterodimers of chains with different domain architectures. By examining interfaces that are shared by many homologous proteins in different CFs, we find that the PDB and the Protein Interfaces, Surfaces, and Assemblies (PISA) are not always consistent in their annotations of biological assemblies in a homologous family. Our data therefore provide an independent check on publicly available annotations of the structures of biological interactions for PDB entries. Common interfaces may also be useful in studies of protein evolution. Coordinates for all interfaces in a cluster are downloadable for further analysis. ProtCiD is available at http://dunbrack2.fccc.edu/protcid.


Assuntos
Bases de Dados de Proteínas , Domínios e Motivos de Interação entre Proteínas , Mapeamento de Interação de Proteínas , Cristalografia por Raios X , Bases de Dados de Proteínas/estatística & dados numéricos , Dimerização , Modelos Moleculares , Filogenia , Proteínas/química , Proteínas/classificação , Proteínas/genética , Homologia de Sequência de Aminoácidos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA