Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
Protein Sci ; 27(1): 293-315, 2018 01.
Artigo em Inglês | MEDLINE | ID: mdl-29067766

RESUMO

This paper describes the current update on macromolecular model validation services that are provided at the MolProbity website, emphasizing changes and additions since the previous review in 2010. There have been many infrastructure improvements, including rewrite of previous Java utilities to now use existing or newly written Python utilities in the open-source CCTBX portion of the Phenix software system. This improves long-term maintainability and enhances the thorough integration of MolProbity-style validation within Phenix. There is now a complete MolProbity mirror site at http://molprobity.manchester.ac.uk. GitHub serves our open-source code, reference datasets, and the resulting multi-dimensional distributions that define most validation criteria. Coordinate output after Asn/Gln/His "flip" correction is now more idealized, since the post-refinement step has apparently often been skipped in the past. Two distinct sets of heavy-atom-to-hydrogen distances and accompanying van der Waals radii have been researched and improved in accuracy, one for the electron-cloud-center positions suitable for X-ray crystallography and one for nuclear positions. New validations include messages at input about problem-causing format irregularities, updates of Ramachandran and rotamer criteria from the million quality-filtered residues in a new reference dataset, the CaBLAM Cα-CO virtual-angle analysis of backbone and secondary structure for cryoEM or low-resolution X-ray, and flagging of the very rare cis-nonProline and twisted peptides which have recently been greatly overused. Due to wide application of MolProbity validation and corrections by the research community, in Phenix, and at the worldwide Protein Data Bank, newly deposited structures have continued to improve greatly as measured by MolProbity's unique all-atom clashscore.


Assuntos
Bases de Dados de Proteínas , Modelos Moleculares , Linguagens de Programação , Proteínas/química , Proteínas/genética
2.
J Chem Theory Comput ; 11(2): 609-22, 2015 Feb 10.
Artigo em Inglês | MEDLINE | ID: mdl-25866491

RESUMO

Interactions between polar atoms are challenging to model because at very short ranges they form hydrogen bonds (H-bonds) that are partially covalent in character and exhibit strong orientation preferences; at longer ranges the orientation preferences are lost, but significant electrostatic interactions between charged and partially charged atoms remain. To simultaneously model these two types of behavior, we refined an orientation dependent model of hydrogen bonds [Kortemme et al. J. Mol. Biol. 2003, 326, 1239] used by the molecular modeling program Rosetta and then combined it with a distance-dependent Coulomb model of electrostatics. The functional form of the H-bond potential is physically motivated and parameters are fit so that H-bond geometries that Rosetta generates closely resemble H-bond geometries in high-resolution crystal structures. The combined potentials improve performance in a variety of scientific benchmarks including decoy discrimination, side chain prediction, and native sequence recovery in protein design simulations and establishes a new standard energy function for Rosetta.


Assuntos
Modelos Químicos , Modelos Moleculares , Software , Eletricidade Estática , Ligação de Hidrogênio , Estrutura Molecular
3.
Methods Enzymol ; 523: 109-43, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23422428

RESUMO

Accurate energy functions are critical to macromolecular modeling and design. We describe new tools for identifying inaccuracies in energy functions and guiding their improvement, and illustrate the application of these tools to the improvement of the Rosetta energy function. The feature analysis tool identifies discrepancies between structures deposited in the PDB and low-energy structures generated by Rosetta; these likely arise from inaccuracies in the energy function. The optE tool optimizes the weights on the different components of the energy function by maximizing the recapitulation of a wide range of experimental observations. We use the tools to examine three proposed modifications to the Rosetta energy function: improving the unfolded state energy model (reference energies), using bicubic spline interpolation to generate knowledge-based torisonal potentials, and incorporating the recently developed Dunbrack 2010 rotamer library (Shapovalov & Dunbrack, 2011).


Assuntos
Substâncias Macromoleculares/química , Algoritmos , Conformação Proteica , Software
4.
IEEE Trans Inf Technol Biomed ; 14(5): 1137-43, 2010 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-20570776

RESUMO

We describe a new approach for inferring the functional relationships between nonhomologous protein families by looking at statistical enrichment of alternative function predictions in classification hierarchies such as Gene Ontology (GO) and Structural Classification of Proteins (SCOP). Protein structures are represented by robust graph representations, and the fast frequent subgraph mining algorithm is applied to protein families to generate sets of family-specific packing motifs, i.e., amino acid residue-packing patterns shared by most family members but infrequent in other proteins. The function of a protein is inferred by identifying in it motifs characteristic of a known family. We employ these family-specific motifs to elucidate functional relationships between families in the GO and SCOP hierarchies. Specifically, we postulate that two families are functionally related if one family is statistically enriched by motifs characteristic of another family, i.e., if the number of proteins in a family containing a motif from another family is greater than expected by chance. This function-inference method can help annotate proteins of unknown function, establish functional neighbors of existing families, and help specify alternate functions for known proteins.


Assuntos
Algoritmos , Biologia Computacional/métodos , Mineração de Dados/métodos , Domínios e Motivos de Interação entre Proteínas , Proteínas/química , Genômica/métodos , Modelos Moleculares , NADP/química , Proteínas Nucleares/química , Fosfoproteínas Fosfatases/química , Conformação Proteica , Proteínas/classificação
5.
J Comput Aided Mol Des ; 23(11): 785-97, 2009 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-19548090

RESUMO

This paper describes several case studies concerning protein function inference from its structure using our novel approach described in the accompanying paper. This approach employs family-specific motifs, i.e. three-dimensional amino acid packing patterns that are statistically prevalent within a protein family. For our case studies we have selected families from the SCOP and EC classifications and analyzed the discriminating power of the motifs in depth. We have devised several benchmarks to compare motifs mined from unweighted topological graph representations of protein structures with those from distance-labeled (weighted) representations, demonstrating the superiority of the latter for function inference in most families. We have tested the robustness of our motif library by inferring the function of new members added to SCOP families, and discriminating between several families that are structurally similar but functionally divergent. Furthermore we have applied our method to predict function for several proteins characterized in structural genomics projects, including orphan structures, and we discuss several selected predictions in depth. Some of our predictions have been corroborated by other computational methods, and some have been validated by independent experimental studies, validating our approach for protein function inference from structure.


Assuntos
Modelos Moleculares , Proteínas/química , Proteínas/metabolismo , Algoritmos , Motivos de Aminoácidos , Domínio Catalítico , Biologia Computacional , Bases de Dados de Proteínas , Proteínas/classificação , Sensibilidade e Especificidade
6.
J Comput Aided Mol Des ; 23(11): 773-84, 2009 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-19543979

RESUMO

Protein function prediction is one of the central problems in computational biology. We present a novel automated protein structure-based function prediction method using libraries of local residue packing patterns that are common to most proteins in a known functional family. Critical to this approach is the representation of a protein structure as a graph where residue vertices (residue name used as a vertex label) are connected by geometrical proximity edges. The approach employs two steps. First, it uses a fast subgraph mining algorithm to find all occurrences of family-specific labeled subgraphs for all well characterized protein structural and functional families. Second, it queries a new structure for occurrences of a set of motifs characteristic of a known family, using a graph index to speed up Ullman's subgraph isomorphism algorithm. The confidence of function inference from structure depends on the number of family-specific motifs found in the query structure compared with their distribution in a large non-redundant database of proteins. This method can assign a new structure to a specific functional family in cases where sequence alignments, sequence patterns, structural superposition and active site templates fail to provide accurate annotation.


Assuntos
Modelos Moleculares , Modelos Estatísticos , Proteínas/química , Proteínas/metabolismo , Algoritmos , Motivos de Aminoácidos , Biologia Computacional , Bases de Dados de Proteínas , Proteínas/classificação , Sensibilidade e Especificidade
7.
Artigo em Inglês | MEDLINE | ID: mdl-18989040

RESUMO

Pairwise structure alignment commonly uses root mean square deviation (RMSD) to measure the structural similarity, and methods for optimizing RMSD are well established. We extend RMSD to weighted RMSD for multiple structures. By using multiplicative weights, we show that weighted RMSD for all pairs is the same as weighted RMSD to an average of the structures. Thus, using RMSD or weighted RMSD implies that the average is a consensus structure. Although we show that in general, the two tasks of finding the optimal translations and rotations for minimizing weighted RMSD cannot be separated for multiple structures like they can for pairs, an inherent difficulty and a fact ignored by previous work, we develop a near-linear iterative algorithm to converge weighted RMSD to a local minimum. 10,000 experiments of gapped alignment done on each of 23 protein families from HOMSTRAD (where each structure starts with a random translation and rotation) converge rapidly to the same minimum. Finally we propose a heuristic method to iteratively remove the effect of outliers and find well-aligned positions that determine the structural conserved region by modeling B-factors and deviations from the average positions as weights and iteratively assigning higher weights to better aligned atoms.


Assuntos
Algoritmos , Reconhecimento Automatizado de Padrão/métodos , Proteínas/química , Proteínas/ultraestrutura , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Interpretação Estatística de Dados , Análise dos Mínimos Quadrados , Dados de Sequência Molecular
8.
J Math Biol ; 56(1-2): 253-78, 2008 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-17401565

RESUMO

Although accurate details in RNA structure are of great importance for understanding RNA function, the backbone conformation is difficult to determine, and most existing RNA structures show serious steric clashes (>or= 0.4 A overlap) when hydrogen atoms are taken into account. We have developed a program called RNABC (RNA Backbone Correction) that performs local perturbations to search for alternative conformations that avoid those steric clashes or other local geometry problems. Its input is an all-atom coordinate file for an RNA crystal structure (usually from the MolProbity web service), with problem areas specified. RNABC rebuilds a suite (the unit from sugar to sugar) by anchoring the phosphorus and base positions, which are clearest in crystallographic electron density, and reconstructing the other atoms using forward kinematics. Geometric parameters are constrained within user-specified tolerance of canonical or original values, and torsion angles are constrained to ranges defined through empirical database analyses. Several optimizations reduce the time required to search the many possible conformations. The output results are clustered and presented to the user, who can choose whether to accept one of the alternative conformations. Two test evaluations show the effectiveness of RNABC, first on the S-motifs from 42 RNA structures, and second on the worst problem suites (clusters of bad clashes, or serious sugar pucker outliers) in 25 unrelated RNA structures. Among the 101 S-motifs, 88 had diagnosed problems, and RNABC produced clash-free conformations with acceptable geometry for 71 of those (about 80%). For the 154 worst problem suites, RNABC proposed alternative conformations for 72. All but 8 of those were judged acceptable after examining electron density (where available) and local conformation. Thus, even for these worst cases, nearly half the time RNABC suggested corrections suitable to initiate further crystallographic refinement. The program is available from http://kinemage.biochem.duke.edu .


Assuntos
Biologia Computacional/métodos , Conformação de Ácido Nucleico , RNA/química , Cristalografia por Raios X , Software
9.
Nucleic Acids Res ; 35(Web Server issue): W375-83, 2007 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-17452350

RESUMO

MolProbity is a general-purpose web server offering quality validation for 3D structures of proteins, nucleic acids and complexes. It provides detailed all-atom contact analysis of any steric problems within the molecules as well as updated dihedral-angle diagnostics, and it can calculate and display the H-bond and van der Waals contacts in the interfaces between components. An integral step in the process is the addition and full optimization of all hydrogen atoms, both polar and nonpolar. New analysis functions have been added for RNA, for interfaces, and for NMR ensembles. Additionally, both the web site and major component programs have been rewritten to improve speed, convenience, clarity and integration with other resources. MolProbity results are reported in multiple forms: as overall numeric scores, as lists or charts of local problems, as downloadable PDB and graphics files, and most notably as informative, manipulable 3D kinemage graphics shown online in the KiNG viewer. This service is available free to all users at http://molprobity.biochem.duke.edu.


Assuntos
Biologia Computacional/métodos , Conformação de Ácido Nucleico , Ácidos Nucleicos/química , Conformação Proteica , Software , Ligação de Hidrogênio , Internet , Substâncias Macromoleculares , Modelos Moleculares , Estrutura Molecular , Proteínas/química , Reprodutibilidade dos Testes , Interface Usuário-Computador
10.
J Comput Chem ; 28(8): 1336-41, 2007 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-17285560

RESUMO

Although quantities derived from solvent accessible surface areas (SASA) are useful in many applications in protein design and structural biology, the computational cost of accurate SASA calculation makes SASA-based scores difficult to integrate into commonly used protein design methodologies. We demonstrate a method for maintaining accurate SASA during a Monte Carlo search of sequence and rotamer space for a fixed protein backbone. We extend the fast Le Grand and Merz algorithm (Le Grand and Merz, J Comput Chem, 14, 349), which discretizes the solvent accessible surface for each atom by placing dots on a sphere and combines Boolean masks to determine which dots are exposed. By replacing semigroup operations with group operations (from Boolean logic to counting dot coverage) we support SASA updates. Our algorithm takes time proportional to the number of atoms affected by rotamer substitution, rather than the number of atoms in the protein. For design simulations with a one hundred residue protein our approach is approximately 145 times faster than performing a Le Grand and Merz SASA calculation from scratch following each rotamer substitution. To demonstrate practical effectiveness, we optimize a SASA-based measure of protein packing in the complete redesign of a large set of proteins and protein-protein interfaces.


Assuntos
Biologia Computacional/métodos , Proteínas/química , Solventes/química , Algoritmos , Método de Monte Carlo , Propriedades de Superfície
11.
Protein Sci ; 15(6): 1537-43, 2006 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-16731985

RESUMO

We describe a method to assign a protein structure to a functional family using family-specific fingerprints. Fingerprints represent amino acid packing patterns that occur in most members of a family but are rare in the background, a nonredundant subset of PDB; their information is additional to sequence alignments, sequence patterns, structural superposition, and active-site templates. Fingerprints were derived for 120 families in SCOP using Frequent Subgraph Mining. For a new structure, all occurrences of these family-specific fingerprints may be found by a fast algorithm for subgraph isomorphism; the structure can then be assigned to a family with a confidence value derived from the number of fingerprints found and their distribution in background proteins. In validation experiments, we infer the function of new members added to SCOP families and we discriminate between structurally similar, but functionally divergent TIM barrel families. We then apply our method to predict function for several structural genomics proteins, including orphan structures. Some predictions have been corroborated by other computational methods and some validated by subsequent functional characterization.


Assuntos
Proteínas/química , Proteínas/metabolismo , Relação Estrutura-Atividade , Proteínas de Bactérias/química , Proteínas de Bactérias/metabolismo , Proteínas de Escherichia coli/química , Proteínas de Escherichia coli/metabolismo , Modelos Moleculares , Conformação Proteica , Proteínas/genética , Reprodutibilidade dos Testes , Software
12.
IEEE Trans Vis Comput Graph ; 12(2): 231-42, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-16509382

RESUMO

We review schemes for dividing cubic cells into simplices (tetrahedra) for interpolating from sampled data to IR3, present visual and geometric artifacts generated in isosurfaces and volume renderings, and discuss how these artifacts relate to the filter kernels corresponding to the subdivision schemes.


Assuntos
Algoritmos , Artefatos , Gráficos por Computador , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Armazenamento e Recuperação da Informação/métodos , Processamento de Sinais Assistido por Computador , Interpretação Estatística de Dados , Análise Numérica Assistida por Computador , Tamanho da Amostra , Interface Usuário-Computador
13.
Artigo em Inglês | MEDLINE | ID: mdl-17369627

RESUMO

Root mean square deviation (RMSD) is often used to measure the difference between structures. We show mathematically that, for multiple structure alignment, the minimum RMSD (weighted at aligned positions or unweighted) for all pairs is the same as the RMSD to the average of the structures. Thus, using RMSD implies that the average is a consensus structure. We use this property to validate and improve algorithms for multiple structure alignment. In particular, we establish the properties of the average structure, and show that an iterative algorithm proposed by Sutcliffe and co-authors can find it efficiently--each iteration takes linear time and the number of iterations is small. We explore the residuals after alignment and assign weights to positions to identify aligned cores of structures. Observing this property also calls into question whether global RMSD is the right way to compare multiple protein structures, and guides the search for more local techniques.


Assuntos
Biologia Computacional/métodos , Proteínas/química , Algoritmos , Bases de Dados de Proteínas , Modelos Moleculares , Modelos Estatísticos , Distribuição Normal , Conformação Proteica , Estrutura Terciária de Proteína , Reprodutibilidade dos Testes
14.
Artigo em Inglês | MEDLINE | ID: mdl-17369641

RESUMO

Structure motifs are amino acid packing patterns that occur frequently within a set of protein structures. We define a labeled graph representation of protein structure in which vertices correspond to amino acid residues and edges connect pairs of residues and are labeled by (1) the Euclidian distance between the C(alpha) atoms of the two residues and (2) a boolean indicating whether the two residues are in physical/chemical contact. Using this representation, a structure motif corresponds to a labeled clique that occurs frequently among the graphs representing the protein structures. The pairwise distance constraints on each edge in a clique serve to limit the variation in geometry among different occurrences of a structure motif. We present an efficient constrained subgraph mining algorithm to discover structure motifs in this setting. Compared with contact graph representations, the number of spurious structure motifs is greatly reduced. Using this algorithm, structure motifs were located for several SCOP families including the Eukaryotic Serine Proteases, Nuclear Binding Domains, Papain-like Cysteine Proteases, and FAD/NAD-linked Reductases. For each family, we typically obtain a handful of motifs within seconds of processing time. The occurrences of these motifs throughout the PDB were strongly associated with the original SCOP family, as measured using a hyper-geometric distribution. The motifs were found to cover functionally important sites like the catalytic triad for Serine Proteases and co-factor binding sites for Nuclear Binding Domains. The fact that many motifs are highly family-specific can be used to classify new proteins or to provide functional annotation in Structural Genomics Projects.


Assuntos
Biologia Computacional/métodos , Proteínas/química , Proteômica/métodos , Algoritmos , Motivos de Aminoácidos , Animais , Cisteína Endopeptidases/química , Modelos Moleculares , Modelos Estatísticos , Família Multigênica , Oxirredutases/química , Ligação Proteica , Estrutura Terciária de Proteína
15.
J Comput Biol ; 12(6): 657-71, 2005.
Artigo em Inglês | MEDLINE | ID: mdl-16108709

RESUMO

We find recurring amino-acid residue packing patterns, or spatial motifs, that are characteristic of protein structural families, by applying a novel frequent subgraph mining algorithm to graph representations of protein three-dimensional structure. Graph nodes represent amino acids, and edges are chosen in one of three ways: first, using a threshold for contact distance between residues; second, using Delaunay tessellation; and third, using the recently developed almost-Delaunay edges. For a set of graphs representing a protein family from the Structural Classification of Proteins (SCOP) database, subgraph mining typically identifies several hundred common subgraphs corresponding to spatial motifs that are frequently found in proteins in the family but rarely found outside of it. We find that some of the large motifs map onto known functional regions in two protein families explored in this study, i.e., serine proteases and kinases. We find that graphs based on almost-Delaunay edges significantly reduce the number of edges in the graph representation and hence present computational advantage, yet the patterns extracted from such graphs have a biological interpretation approximately equivalent to that of those extracted from distance based graphs.


Assuntos
Algoritmos , Biologia Computacional , Gráficos por Computador , Proteínas/química , Proteínas/classificação , Homologia Estrutural de Proteína , Motivos de Aminoácidos , Bases de Dados de Proteínas , Modelos Moleculares , Modelos Estatísticos , Estrutura Molecular
16.
Pac Symp Biocomput ; : 16-27, 2005.
Artigo em Inglês | MEDLINE | ID: mdl-15759610

RESUMO

Larger rotamer libraries, which provide a fine grained discretization of side chain conformation space by sampling near the canonical rotamers, allow protein designers to find better conformations, but slow down the algorithms that search for them. We present a dynamic programming solution to the side chain placement problem which treats rotamers at high or low resolution only as necessary. Dynamic programming is an exact technique; we turn it into an approximation, but can still analyze the error that can be introduced. We have used our algorithm to redesign the surface residues of ubiquitin's beta sheet.


Assuntos
Modelos Moleculares , Conformação Proteica , Algoritmos , Biologia Computacional , Estrutura Secundária de Proteína
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA