Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 38
Filtrar
1.
Pac Symp Biocomput ; 28: 383-394, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36540993

RESUMO

As the diversity of genomic variation data increases with our growing understanding of the role of variation in health and disease, it is critical to develop standards for precise inter-system exchange of these data for research and clinical applications. The Global Alliance for Genomics and Health (GA4GH) Variation Representation Specification (VRS) meets this need through a technical terminology and information model for disambiguating and concisely representing variation concepts. Here we discuss the recent Genotype model in VRS, which may be used to represent the allelic composition of a genetic locus. We demonstrate the use of the Genotype model and the constituent Haplotype model for the precise and interoperable representation of pharmacogenomic diplotypes, HGVS variants, and VCF records using VRS and discuss how this can be leveraged to enable interoperable exchange and search operations between assayed variation and genomic knowledgebases.


Assuntos
Biologia Computacional , Variação Genética , Humanos , Bases de Dados Genéticas , Genômica , Genótipo
2.
PLoS One ; 15(12): e0239883, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33270643

RESUMO

MOTIVATION: Access to biological sequence data, such as genome, transcript, or protein sequence, is at the core of many bioinformatics analysis workflows. The National Center for Biotechnology Information (NCBI), Ensembl, and other sequence database maintainers provide methods to access sequences through network connections. For many users, the convenience and currency of remotely managed data are compelling, and the network latency is non-consequential. However, for high-throughput and clinical applications, local sequence collections are essential for performance, stability, privacy, and reproducibility. RESULTS: Here we describe SeqRepo, a novel system for building a local, high-performance, non-redundant collection of biological sequences. SeqRepo enables clients to use primary database identifiers and several digests to identify sequences and sequence alises. SeqRepo provides a native Python interface and a REST interface, which can run locally and enables access from other programming languages. SeqRepo also provides an alternative REST interface based on the GA4GH refget protocol. SeqRepo provides fast random access to sequence slices. We provide results that demonstrate that a local SeqRepo sequence collection yields significant performance benefits of up to 1300-fold over remote sequence collections. In our use case for a variant validation and normalization pipeline, SeqRepo improved throughput 50-fold relative to use with remote sequences. SeqRepo may be used with any species or sequence type. Regular snapshots of Human sequence collections are available. It is often convenient or necessary to use a computed digest as a sequence identifier. For example, a digest-based identifier may be used to refer to proprietary reference genomes or segments of a graph genome, for which conventional identifiers will not be available. Here we also introduce a convention for the application of the SHA-512 hashing algorithm with Base64 encoding to generate URL-safe identifiers. This convention, sha512t24u, combines a fast digest mechanism with a space-efficient representation that can be used for any object. Our report includes an analysis of timing and collision probabilities for sha512t24u. SeqRepo enables clients to use sha512t24u as identifiers, thereby seamlessly integrating public and private sequence sets. AVAILABILITY: SeqRepo is released under the Apache License 2.0 and is available on github and PyPi. Docker images and database snapshots are also available. See https://github.com/biocommons/biocommons.seqrepo.


Assuntos
Biologia Computacional/métodos , Bases de Dados de Ácidos Nucleicos , Software , Algoritmos , Genoma Humano/genética , Genômica/métodos , Humanos , Linguagens de Programação , Reprodutibilidade dos Testes
3.
PLoS Comput Biol ; 15(4): e1006842, 2019 04.
Artigo em Inglês | MEDLINE | ID: mdl-31009453

RESUMO

Many proteins fold into highly regular and repetitive three dimensional structures. The analysis of structural patterns and repeated elements is fundamental to understand protein function and evolution. We present recent improvements to the CE-Symm tool for systematically detecting and analyzing the internal symmetry and structural repeats in proteins. In addition to the accurate detection of internal symmetry, the tool is now capable of i) reporting the type of symmetry, ii) identifying the smallest repeating unit, iii) describing the arrangement of repeats with transformation operations and symmetry axes, and iv) comparing the similarity of all the internal repeats at the residue level. CE-Symm 2.0 helps the user investigate proteins with a robust and intuitive sequence-to-structure analysis, with many applications in protein classification, functional annotation and evolutionary studies. We describe the algorithmic extensions of the method and demonstrate its applications to the study of interesting cases of protein evolution.


Assuntos
Algoritmos , Biologia Computacional/métodos , Proteínas/química , Software , Sequência de Aminoácidos , Bases de Dados de Proteínas , Modelos Moleculares , Análise de Sequência de Proteína
4.
PLoS Comput Biol ; 15(2): e1006791, 2019 02.
Artigo em Inglês | MEDLINE | ID: mdl-30735498

RESUMO

BioJava is an open-source project that provides a Java library for processing biological data. The project aims to simplify bioinformatic analyses by implementing parsers, data structures, and algorithms for common tasks in genomics, structural biology, ontologies, phylogenetics, and more. Since 2012, we have released two major versions of the library (4 and 5) that include many new features to tackle challenges with increasingly complex macromolecular structure data. BioJava requires Java 8 or higher and is freely available under the LGPL 2.1 license. The project is hosted on GitHub at https://github.com/biojava/biojava. More information and documentation can be found online on the BioJava website (http://www.biojava.org) and tutorial (https://github.com/biojava/biojava-tutorial). All inquiries should be directed to the GitHub page or the BioJava mailing list (http://lists.open-bio.org/mailman/listinfo/biojava-l).


Assuntos
Biologia Computacional/métodos , Acesso à Informação , Algoritmos , Biblioteca Gênica , Genoma/genética , Genômica , Armazenamento e Recuperação da Informação , Internet , Software
5.
Nucleic Acids Res ; 47(D1): D464-D474, 2019 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-30357411

RESUMO

The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB, rcsb.org), the US data center for the global PDB archive, serves thousands of Data Depositors in the Americas and Oceania and makes 3D macromolecular structure data available at no charge and without usage restrictions to more than 1 million rcsb.org Users worldwide and 600 000 pdb101.rcsb.org education-focused Users around the globe. PDB Data Depositors include structural biologists using macromolecular crystallography, nuclear magnetic resonance spectroscopy and 3D electron microscopy. PDB Data Consumers include researchers, educators and students studying Fundamental Biology, Biomedicine, Biotechnology and Energy. Recent reorganization of RCSB PDB activities into four integrated, interdependent services is described in detail, together with tools and resources added over the past 2 years to RCSB PDB web portals in support of a 'Structural View of Biology.'


Assuntos
Bases de Dados de Proteínas , Conformação Proteica , Pesquisa Biomédica/educação , Biotecnologia/educação , Curadoria de Dados , Software
6.
Hum Mutat ; 39(12): 1803-1813, 2018 12.
Artigo em Inglês | MEDLINE | ID: mdl-30129167

RESUMO

The Human Genome Variation Society (HGVS) nomenclature guidelines encourage the accurate and standard description of DNA, RNA, and protein sequence variants in public variant databases and the scientific literature. Inconsistent application of the HGVS guidelines can lead to misinterpretation of variants in clinical settings. Reliable software tools are essential to ensure consistent application of the HGVS guidelines when reporting and interpreting variants. We present the hgvs Python package, a comprehensive tool for manipulating sequence variants according to the HGVS nomenclature guidelines. Distinguishing features of the hgvs package include: (1) parsing, formatting, validating, and normalizing variants on genome, transcript, and protein sequences; (2) projecting variants between aligned sequences, including those with gapped alignments; (3) flexible installation using remote or local data (fully local installations eliminate network dependencies); (4) extensive automated tests; and (5) open source development by a community from eight organizations worldwide. This report summarizes recent and significant updates to the hgvs package since its original release in 2014, and presents results of extensive validation using clinical relevant variants from ClinVar and HGMD.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Variação Genética , Genoma Humano , Guias como Assunto , Humanos , Sociedades Médicas , Software
7.
Bioinformatics ; 34(21): 3755-3758, 2018 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-29850778

RESUMO

Motivation: The interactive visualization of very large macromolecular complexes on the web is becoming a challenging problem as experimental techniques advance at an unprecedented rate and deliver structures of increasing size. Results: We have tackled this problem by developing highly memory-efficient and scalable extensions for the NGL WebGL-based molecular viewer and by using Macromolecular Transmission Format (MMTF), a binary and compressed MMTF. These enable NGL to download and render molecular complexes with millions of atoms interactively on desktop computers and smartphones alike, making it a tool of choice for web-based molecular visualization in research and education. Availability and implementation: The source code is freely available under the MIT license at github.com/arose/ngl and distributed on NPM (npmjs.com/package/ngl). MMTF-JavaScript encoders and decoders are available at github.com/rcsb/mmtf-javascript.


Assuntos
Gráficos por Computador , Internet , Substâncias Macromoleculares , Software
8.
PLoS One ; 13(6): e0197176, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29864163

RESUMO

The Protein Data Bank (PDB) is the single worldwide archive of experimentally-determined three-dimensional (3D) structures of proteins and nucleic acids. As of January 2017, the PDB housed more than 125,000 structures and was growing by more than 11,000 structures annually. Since the 3D structure of a protein is vital to understand the mechanisms of biological processes, diseases, and drug design, correct oligomeric assembly information is of critical importance. Unfortunately, the biologically relevant oligomeric form of a 3D structure is not directly obtainable by X-ray crystallography, whilst in solution methods (NMR or single particle EM) it is known from the experiment. Instead, this information may be provided by the PDB Depositor as metadata coming from additional experiments, be inferred by sequence-sequence comparisons with similar proteins of known oligomeric state, or predicted using software, such as PISA (Proteins, Interfaces, Structures and Assemblies) or EPPIC (Evolutionary Protein Protein Interface Classifier). Despite significant efforts by professional PDB Biocurators during data deposition, there remain a number of structures in the archive with incorrect quaternary structure descriptions (or annotations). Further investigation is, therefore, needed to evaluate the correctness of quaternary structure annotations. In this study, we aim to identify the most probable oligomeric states for proteins represented in the PDB. Our approach evaluated the performance of four independent prediction methods, including text mining of primary publications, inference from homologous protein structures, and two computational methods (PISA and EPPIC). Aggregating predictions to give consensus results outperformed all four of the independent prediction methods, yielding 83% correct, 9% wrong, and 8% inconclusive predictions, when tested with a well-curated benchmark dataset. We have developed a freely-available web-based tool to make this approach accessible to researchers and PDB Biocurators (http://quatstruct.rcsb.org/).


Assuntos
Bases de Dados de Proteínas , Análise de Sequência de Proteína/métodos , Software , Cristalografia por Raios X , Ressonância Magnética Nuclear Biomolecular , Estrutura Quaternária de Proteína
9.
Nat Biotechnol ; 36(3): 272-281, 2018 03.
Artigo em Inglês | MEDLINE | ID: mdl-29457794

RESUMO

Genome-scale network reconstructions have helped uncover the molecular basis of metabolism. Here we present Recon3D, a computational resource that includes three-dimensional (3D) metabolite and protein structure data and enables integrated analyses of metabolic functions in humans. We use Recon3D to functionally characterize mutations associated with disease, and identify metabolic response signatures that are caused by exposure to certain drugs. Recon3D represents the most comprehensive human metabolic network model to date, accounting for 3,288 open reading frames (representing 17% of functionally annotated human genes), 13,543 metabolic reactions involving 4,140 unique metabolites, and 12,890 protein structures. These data provide a unique resource for investigating molecular mechanisms of human metabolism. Recon3D is available at http://vmh.life.


Assuntos
Biologia Computacional/métodos , Bases de Dados de Proteínas , Redes e Vias Metabólicas/genética , Bases de Dados Genéticas , Humanos , Internet , Anotação de Sequência Molecular , Fases de Leitura Aberta/genética
10.
Genome Med ; 9(1): 113, 2017 Dec 18.
Artigo em Inglês | MEDLINE | ID: mdl-29254494

RESUMO

The translation of personal genomics to precision medicine depends on the accurate interpretation of the multitude of genetic variants observed for each individual. However, even when genetic variants are predicted to modify a protein, their functional implications may be unclear. Many diseases are caused by genetic variants affecting important protein features, such as enzyme active sites or interaction interfaces. The scientific community has catalogued millions of genetic variants in genomic databases and thousands of protein structures in the Protein Data Bank. Mapping mutations onto three-dimensional (3D) structures enables atomic-level analyses of protein positions that may be important for the stability or formation of interactions; these may explain the effect of mutations and in some cases even open a path for targeted drug development. To accelerate progress in the integration of these data types, we held a two-day Gene Variation to 3D (GVto3D) workshop to report on the latest advances and to discuss unmet needs. The overarching goal of the workshop was to address the question: what can be done together as a community to advance the integration of genetic variants and 3D protein structures that could not be done by a single investigator or laboratory? Here we describe the workshop outcomes, review the state of the field, and propose the development of a framework with which to promote progress in this arena. The framework will include a set of standard formats, common ontologies, a common application programming interface to enable interoperation of the resources, and a Tool Registry to make it easy to find and apply the tools to specific analysis problems. Interoperability will enable integration of diverse data sources and tools and collaborative development of variant effect prediction methods.


Assuntos
Estudo de Associação Genômica Ampla/métodos , Polimorfismo Genético , Conformação Proteica , Análise de Sequência de Proteína/métodos , Algoritmos , Congressos como Assunto , Estudo de Associação Genômica Ampla/normas , Humanos , Análise de Sequência de Proteína/normas
11.
PLoS Comput Biol ; 13(6): e1005575, 2017 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-28574982

RESUMO

Recent advances in experimental techniques have led to a rapid growth in complexity, size, and number of macromolecular structures that are made available through the Protein Data Bank. This creates a challenge for macromolecular visualization and analysis. Macromolecular structure files, such as PDB or PDBx/mmCIF files can be slow to transfer, parse, and hard to incorporate into third-party software tools. Here, we present a new binary and compressed data representation, the MacroMolecular Transmission Format, MMTF, as well as software implementations in several languages that have been developed around it, which address these issues. We describe the new format and its APIs and demonstrate that it is several times faster to parse, and about a quarter of the file size of the current standard format, PDBx/mmCIF. As a consequence of the new data representation, it is now possible to visualize structures with millions of atoms in a web browser, keep the whole PDB archive in memory or parse it within few minutes on average computers, which opens up a new way of thinking how to design and implement efficient algorithms in structural bioinformatics. The PDB archive is available in MMTF file format through web services and data that are updated on a weekly basis.


Assuntos
Biologia Computacional/métodos , Bases de Dados de Compostos Químicos , Substâncias Macromoleculares , Software , Internet , Substâncias Macromoleculares/análise , Substâncias Macromoleculares/química , Substâncias Macromoleculares/classificação , Estrutura Molecular
12.
PLoS One ; 12(3): e0174846, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28362865

RESUMO

The size and complexity of 3D macromolecular structures available in the Protein Data Bank is constantly growing. Current tools and file formats have reached limits of scalability. New compression approaches are required to support the visualization of large molecular complexes and enable new and scalable means for data analysis. We evaluated a series of compression techniques for coordinates of 3D macromolecular structures and identified the best performing approaches. By balancing compression efficiency in terms of the decompression speed and compression ratio, and code complexity, our results provide the foundation for a novel standard to represent macromolecular coordinates in a compact and useful file format.


Assuntos
Bases de Dados de Proteínas , Algoritmos , Compressão de Dados , Espectroscopia de Ressonância Magnética , Modelos Teóricos , Estrutura Molecular , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína
13.
Bioinformatics ; 33(13): 2047-2049, 2017 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-28334105

RESUMO

SUMMARY: We developed a new software tool, BioJava-ModFinder, for identifying protein modifications observed in 3D structures archived in the Protein Data Bank (PDB). Information on more than 400 types of protein modifications were collected and curated from annotations in PDB, RESID, and PSI-MOD. We divided these modifications into three categories: modified residues, attachment modifications, and cross-links. We have developed a systematic method to identify these modifications in 3D protein structures. We have integrated this package with the RCSB PDB web application and added protein modification annotations to the sequence diagram and structure display. By scanning all 3D structures in the PDB using BioJava-ModFinder, we identified more than 30 000 structures with protein modifications, which can be searched, browsed, and visualized on the RCSB PDB website. AVAILABILITY AND IMPLEMENTATION: BioJava-ModFinder is available as open source (LGPL license) at ( https://github.com/biojava/biojava/tree/master/biojava-modfinder ). The RCSB PDB can be accessed at http://www.rcsb.org . CONTACT: pwrose@ucsd.edu.


Assuntos
Biologia Computacional/métodos , Bases de Dados de Proteínas , Conformação Proteica , Software , Internet
14.
PLoS One ; 12(3): e0171355, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28296894

RESUMO

The Protein Data Bank (PDB; http://wwpdb.org) was established in 1971 as the first open access digital data resource in biology with seven protein structures as its initial holdings. The global PDB archive now contains more than 126,000 experimentally determined atomic level three-dimensional (3D) structures of biological macromolecules (proteins, DNA, RNA), all of which are freely accessible via the Internet. Knowledge of the 3D structure of the gene product can help in understanding its function and role in disease. Of particular interest in the PDB archive are proteins for which 3D structures of genetic variant proteins have been determined, thus revealing atomic-level structural differences caused by the variation at the DNA level. Herein, we present a systematic and qualitative analysis of such cases. We observe a wide range of structural and functional changes caused by single amino acid differences, including changes in enzyme activity, aggregation propensity, structural stability, binding, and dissociation, some in the context of large assemblies. Structural comparison of wild type and mutated proteins, when both are available, provide insights into atomic-level structural differences caused by the genetic variation.


Assuntos
Variação Genética , Proteínas/química , Proteínas/fisiologia , Exoma , Polimorfismo de Nucleotídeo Único , Relação Estrutura-Atividade
15.
Nucleic Acids Res ; 45(D1): D271-D281, 2017 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-27794042

RESUMO

The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB, http://rcsb.org), the US data center for the global PDB archive, makes PDB data freely available to all users, from structural biologists to computational biologists and beyond. New tools and resources have been added to the RCSB PDB web portal in support of a 'Structural View of Biology.' Recent developments have improved the User experience, including the high-speed NGL Viewer that provides 3D molecular visualization in any web browser, improved support for data file download and enhanced organization of website pages for query, reporting and individual structure exploration. Structure validation information is now visible for all archival entries. PDB data have been integrated with external biological resources, including chromosomal position within the human genome; protein modifications; and metabolic pathways. PDB-101 educational materials have been reorganized into a searchable website and expanded to include new features such as the Geis Digital Archive.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Proteínas/química , Proteínas/genética , Conjuntos de Dados como Assunto , Redes e Vias Metabólicas , Modelos Moleculares , Conformação Proteica , Proteínas/metabolismo , Software , Relação Estrutura-Atividade , Interface Usuário-Computador , Navegador
16.
Bioinformatics ; 32(24): 3833-3835, 2016 12 15.
Artigo em Inglês | MEDLINE | ID: mdl-27551105

RESUMO

The Protein Data Bank (PDB) now contains more than 120,000 three-dimensional (3D) structures of biological macromolecules. To allow an interpretation of how PDB data relates to other publicly available annotations, we developed a novel data integration platform that maps 3D structural information across various datasets. This integration bridges from the human genome across protein sequence to 3D structure space. We developed novel software solutions for data management and visualization, while incorporating new libraries for web-based visualization using SVG graphics. AVAILABILITY AND IMPLEMENTATION: The new views are available from http://www.rcsb.org and software is available from https://github.com/rcsb/. CONTACT: andreas.prlic@rcsb.orgSupplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Bases de Dados de Proteínas , Conformação Proteica , Software , Sequência de Aminoácidos , Gráficos por Computador , Genômica , Humanos , Interface Usuário-Computador
17.
Nucleic Acids Res ; 43(Database issue): D345-56, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25428375

RESUMO

The RCSB Protein Data Bank (RCSB PDB, http://www.rcsb.org) provides access to 3D structures of biological macromolecules and is one of the leading resources in biology and biomedicine worldwide. Our efforts over the past 2 years focused on enabling a deeper understanding of structural biology and providing new structural views of biology that support both basic and applied research and education. Herein, we describe recently introduced data annotations including integration with external biological resources, such as gene and drug databases, new visualization tools and improved support for the mobile web. We also describe access to data files, web services and open access software components to enable software developers to more effectively mine the PDB archive and related annotations. Our efforts are aimed at expanding the role of 3D structure in understanding biology and medicine.


Assuntos
Bases de Dados de Proteínas , Conformação Proteica , Sítios de Ligação , Internet , Proteínas de Membrana/química , Biologia Molecular/educação , Anotação de Sequência Molecular , Complexos Multiproteicos/química , Peptídeos/química , Preparações Farmacêuticas/química , Pesquisa , Software
18.
Bioinformatics ; 31(1): 126-7, 2015 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-25183487

RESUMO

SUMMARY: The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) resource provides tools for query, analysis and visualization of the 3D structures in the PDB archive. As the mobile Web is starting to surpass desktop and laptop usage, scientists and educators are beginning to integrate mobile devices into their research and teaching. In response, we have developed the RCSB PDB Mobile app for the iOS and Android mobile platforms to enable fast and convenient access to RCSB PDB data and services. Using the app, users from the general public to expert researchers can quickly search and visualize biomolecules, and add personal annotations via the RCSB PDB's integrated MyPDB service. AVAILABILITY AND IMPLEMENTATION: RCSB PDB Mobile is freely available from the Apple App Store and Google Play (http://www.rcsb.org).


Assuntos
Biologia Computacional/métodos , Gráficos por Computador , Bases de Dados de Proteínas , Aplicativos Móveis , Software , Pesquisa Biomédica , Humanos , Interface Usuário-Computador , Fluxo de Trabalho
19.
Bioinformatics ; 31(8): 1316-8, 2015 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-25505094

RESUMO

MOTIVATION: Circular permutation is an important type of protein rearrangement. Natural circular permutations have implications for protein function, stability and evolution. Artificial circular permutations have also been used for protein studies. However, such relationships are difficult to detect for many sequence and structure comparison algorithms and require special consideration. RESULTS: We developed a new algorithm, called Combinatorial Extension for Circular Permutations (CE-CP), which allows the structural comparison of circularly permuted proteins. CE-CP was designed to be user friendly and is integrated into the RCSB Protein Data Bank. It was tested on two collections of circularly permuted proteins. Pairwise alignments can be visualized both in a desktop application or on the web using Jmol and exported to other programs in a variety of formats. AVAILABILITY AND IMPLEMENTATION: The CE-CP algorithm can be accessed through the RCSB website at http://www.rcsb.org/pdb/workbench/workbench.do. Source code is available under the LGPL 2.1 as part of BioJava 3 (http://biojava.org; http://github.com/biojava/biojava). CONTACT: sbliven@ucsd.edu or info@rcsb.org.


Assuntos
Algoritmos , Biologia Computacional/métodos , Bases de Dados de Proteínas , Dinaminas/química , Homologia Estrutural de Proteína , Humanos , Linguagens de Programação , Estrutura Terciária de Proteína , Análise de Sequência de Proteína/métodos
20.
J Mol Biol ; 426(11): 2255-68, 2014 May 29.
Artigo em Inglês | MEDLINE | ID: mdl-24681267

RESUMO

Symmetry is an important feature of protein tertiary and quaternary structures that has been associated with protein folding, function, evolution, and stability. Its emergence and ensuing prevalence has been attributed to gene duplications, fusion events, and subsequent evolutionary drift in sequence. This process maintains structural similarity and is further supported by this study. To further investigate the question of how internal symmetry evolved, how symmetry and function are related, and the overall frequency of internal symmetry, we developed an algorithm, CE-Symm, to detect pseudo-symmetry within the tertiary structure of protein chains. Using a large manually curated benchmark of 1007 protein domains, we show that CE-Symm performs significantly better than previous approaches. We use CE-Symm to build a census of symmetry among domain superfamilies in SCOP and note that 18% of all superfamilies are pseudo-symmetric. Our results indicate that more domains are pseudo-symmetric than previously estimated. We establish a number of recurring types of symmetry-function relationships and describe several characteristic cases in detail. With the use of the Enzyme Commission classification, symmetry was found to be enriched in some enzyme classes but depleted in others. CE-Symm thus provides a methodology for a more complete and detailed study of the role of symmetry in tertiary protein structure [availability: CE-Symm can be run from the Web at http://source.rcsb.org/jfatcatserver/symmetry.jsp. Source code and software binaries are also available under the GNU Lesser General Public License (version 2.1) at https://github.com/rcsb/symmetry. An interactive census of domains identified as symmetric by CE-Symm is available from http://source.rcsb.org/jfatcatserver/scopResults.jsp].


Assuntos
Algoritmos , Estrutura Terciária de Proteína , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Software , Sequência de Aminoácidos , Biologia Computacional/métodos , Bases de Dados de Proteínas , Humanos , Modelos Moleculares , Dobramento de Proteína , Homologia de Sequência de Aminoácidos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...