Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 21
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Front Bioinform ; 3: 1311287, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38111685

RESUMO

Recent advances in Artificial Intelligence and Machine Learning (e.g., AlphaFold, RosettaFold, and ESMFold) enable prediction of three-dimensional (3D) protein structures from amino acid sequences alone at accuracies comparable to lower-resolution experimental methods. These tools have been employed to predict structures across entire proteomes and the results of large-scale metagenomic sequence studies, yielding an exponential increase in available biomolecular 3D structural information. Given the enormous volume of this newly computed biostructure data, there is an urgent need for robust tools to manage, search, cluster, and visualize large collections of structures. Equally important is the capability to efficiently summarize and visualize metadata, biological/biochemical annotations, and structural features, particularly when working with vast numbers of protein structures of both experimental origin from the Protein Data Bank (PDB) and computationally-predicted models. Moreover, researchers require advanced visualization techniques that support interactive exploration of multiple sequences and structural alignments. This paper introduces a suite of tools provided on the RCSB PDB research-focused web portal RCSB. org, tailor-made for efficient management, search, organization, and visualization of this burgeoning corpus of 3D macromolecular structure data.

2.
J Mol Biol ; 435(14): 167994, 2023 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-36738985

RESUMO

The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) provides open access to experimentally-determined three-dimensional (3D) structures of biomolecules. The RCSB PDB RCSB.org research-focused web portal is used annually by many millions of users around the world. They access biostructure information, run complex queries utilizing various search services (e.g., full-text, structural and chemical attribute, chemical, sequence, and structure similarity searches), and visualize macromolecules in 3D, all at no charge and with no limitations on data usage. Notwithstanding more than 24,000-fold growth of the PDB over the past five decades, experimentally-determined structures are only available for a small subset of the millions of proteins of known sequence. Recently developed machine learning software tools can predict 3D structures of proteins at accuracies comparable to lower-resolution experimental methods. The RCSB PDB now provides access to ∼1,000,000 Computed Structure Models (CSMs) of proteins coming from AlphaFold DB and the ModelArchive alongside ∼200,000 experimentally-determined PDB structures. Both CSMs and PDB structures are available on RCSB.org and via well-established RCSB PDB Data, Search, and 1D-Coordinates application programming interfaces (APIs). Simultaneous delivery of PDB data and CSMs provides users with access to complementary structural information across the human proteome and those of model organisms and selected pathogens. API enhancements are backwards-compatible and programmatic users can "opt in" to access CSMs with minimal effort. Herein, we describe modifications to RCSB PDB cyberinfrastructure required to support sixfold scaling of 3D biostructure data delivery and lay the groundwork for scaling to accommodate hundreds of millions of CSMs.


Assuntos
Biologia Computacional , Bases de Dados de Proteínas , Humanos , Biologia Computacional/métodos , Conformação Proteica , Proteoma , Software
3.
Nucleic Acids Res ; 51(D1): D488-D508, 2023 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-36420884

RESUMO

The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB), founding member of the Worldwide Protein Data Bank (wwPDB), is the US data center for the open-access PDB archive. As wwPDB-designated Archive Keeper, RCSB PDB is also responsible for PDB data security. Annually, RCSB PDB serves >10 000 depositors of three-dimensional (3D) biostructures working on all permanently inhabited continents. RCSB PDB delivers data from its research-focused RCSB.org web portal to many millions of PDB data consumers based in virtually every United Nations-recognized country, territory, etc. This Database Issue contribution describes upgrades to the research-focused RCSB.org web portal that created a one-stop-shop for open access to ∼200 000 experimentally-determined PDB structures of biological macromolecules alongside >1 000 000 incorporated Computed Structure Models (CSMs) predicted using artificial intelligence/machine learning methods. RCSB.org is a 'living data resource.' Every PDB structure and CSM is integrated weekly with related functional annotations from external biodata resources, providing up-to-date information for the entire corpus of 3D biostructure data freely available from RCSB.org with no usage limitations. Within RCSB.org, PDB structures and the CSMs are clearly identified as to their provenance and reliability. Both are fully searchable, and can be analyzed and visualized using the full complement of RCSB.org web portal capabilities.


Assuntos
Inteligência Artificial , Bases de Dados de Proteínas , Proteínas , Aprendizado de Máquina , Conformação Proteica , Proteínas/química , Reprodutibilidade dos Testes
4.
Protein Sci ; 31(12): e4482, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-36281733

RESUMO

Now in its 52nd year of continuous operations, the Protein Data Bank (PDB) is the premiere open-access global archive housing three-dimensional (3D) biomolecular structure data. It is jointly managed by the Worldwide Protein Data Bank (wwPDB) partnership. The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) is funded by the National Science Foundation, National Institutes of Health, and US Department of Energy and serves as the US data center for the wwPDB. RCSB PDB is also responsible for the security of PDB data in its role as wwPDB-designated Archive Keeper. Every year, RCSB PDB serves tens of thousands of depositors of 3D macromolecular structure data (coming from macromolecular crystallography, nuclear magnetic resonance spectroscopy, electron microscopy, and micro-electron diffraction). The RCSB PDB research-focused web portal (RCSB.org) makes PDB data available at no charge and without usage restrictions to many millions of PDB data consumers around the world. The RCSB PDB training, outreach, and education web portal (PDB101.RCSB.org) serves nearly 700 K educators, students, and members of the public worldwide. This invited Tools Issue contribution describes how RCSB PDB (i) is organized; (ii) works with wwPDB partners to process new depositions; (iii) serves as the wwPDB-designated Archive Keeper; (iv) enables exploration and 3D visualization of PDB data via RCSB.org; and (v) supports training, outreach, and education via PDB101.RCSB.org. New tools and features at RCSB.org are presented using examples drawn from high-resolution structural studies of proteins relevant to treatment of human cancers by targeting immune checkpoints.


Assuntos
Biologia Computacional , Proteínas , Humanos , Conformação Proteica , Bases de Dados de Proteínas , Proteínas/química , Biologia Computacional/métodos , Substâncias Macromoleculares/química
5.
Structure ; 30(10): 1385-1394.e3, 2022 10 06.
Artigo em Inglês | MEDLINE | ID: mdl-36049478

RESUMO

Approximately 87% of the more than 190,000 atomic-level three-dimensional (3D) biostructures in the PDB were determined using macromolecular crystallography (MX). Agreement between 3D atomic coordinates and experimental data for >100 million individual amino acid residues occurring within ∼150,000 PDB MX structures was analyzed in detail. The real-space correlation coefficient (RSCC) calculated using the 3D atomic coordinates for each residue and experimental-data-derived electron density enables outlier detection of unreliable atomic coordinates (particularly important for poorly resolved side-chain atoms) and ready evaluation of local structure quality by PDB users. For human protein MX structures in PDB, comparisons of the per-residue RSCC metric with AlphaFold2-computed structure model confidence (pLDDT-predicted local distance difference test) document (1) that RSCC values and pLDDT scores are correlated (median correlation coefficient ∼0.41), and (2) that experimentally determined MX structures (3.5 Å resolution or better) are more reliable than AlphaFold2-computed structure models and should be used preferentially whenever possible.


Assuntos
Aminoácidos , Bases de Dados de Proteínas , Humanos , Substâncias Macromoleculares , Proteínas de Resistência a Myxovirus , Conformação Proteica
6.
Bioinformatics ; 38(12): 3304-3305, 2022 06 13.
Artigo em Inglês | MEDLINE | ID: mdl-35543462

RESUMO

MOTIVATION: Mapping positional features from one-dimensional (1D) sequences onto three-dimensional (3D) structures of biological macromolecules is a powerful tool to show geometric patterns of biochemical annotations and provide a better understanding of the mechanisms underpinning protein and nucleic acid function at the atomic level. RESULTS: We present a new library designed to display fully customizable interactive views between 1D positional features of protein and/or nucleic acid sequences and their 3D structures as isolated chains or components of macromolecular assemblies. AVAILABILITY AND IMPLEMENTATION: https://github.com/rcsb/rcsb-saguaro-3d. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Ácidos Nucleicos , Software , Bases de Dados de Proteínas , Substâncias Macromoleculares/química , Proteínas/química
7.
Protein Sci ; 31(1): 187-208, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-34676613

RESUMO

The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB), funded by the US National Science Foundation, National Institutes of Health, and Department of Energy, has served structural biologists and Protein Data Bank (PDB) data consumers worldwide since 1999. RCSB PDB, a founding member of the Worldwide Protein Data Bank (wwPDB) partnership, is the US data center for the global PDB archive housing biomolecular structure data. RCSB PDB is also responsible for the security of PDB data, as the wwPDB-designated Archive Keeper. Annually, RCSB PDB serves tens of thousands of three-dimensional (3D) macromolecular structure data depositors (using macromolecular crystallography, nuclear magnetic resonance spectroscopy, electron microscopy, and micro-electron diffraction) from all inhabited continents. RCSB PDB makes PDB data available from its research-focused RCSB.org web portal at no charge and without usage restrictions to millions of PDB data consumers working in every nation and territory worldwide. In addition, RCSB PDB operates an outreach and education PDB101.RCSB.org web portal that was used by more than 800,000 educators, students, and members of the public during calendar year 2020. This invited Tools Issue contribution describes (i) how the archive is growing and evolving as new experimental methods generate ever larger and more complex biomolecular structures; (ii) the importance of data standards and data remediation in effective management of the archive and facile integration with more than 50 external data resources; and (iii) new tools and features for 3D structure analysis and visualization made available during the past year via the RCSB.org web portal.


Assuntos
Biologia Computacional/história , Bases de Dados de Proteínas/história , Interface Usuário-Computador , Aniversários e Eventos Especiais , História do Século XX , História do Século XXI
8.
Bioinformatics ; 38(5): 1452-1454, 2022 02 07.
Artigo em Inglês | MEDLINE | ID: mdl-34864908

RESUMO

MOTIVATION: Membrane proteins are encoded by approximately one fifth of human genes but account for more than half of all US FDA approved drug targets. Thanks to new technological advances, the number of membrane proteins archived in the PDB is growing rapidly. However, automatic identification of membrane proteins or inference of membrane location is not a trivial task. RESULTS: We present recent improvements to the RCSB Protein Data Bank web portal (RCSB PDB, rcsb.org) that provide a wealth of new membrane protein annotations integrated from four external resources: OPM, PDBTM, MemProtMD and mpstruc. We have substantially enhanced the presentation of data on membrane proteins. The number of membrane proteins with annotations available on rcsb.org was increased by ∼80%. Users can search for these annotations, explore corresponding tree hierarchies, display membrane segments at the 1D amino acid sequence level, and visualize the predicted location of the membrane layer in 3D. AVAILABILITY AND IMPLEMENTATION: Annotations, search, tree data and visualization are available at our rcsb.org web portal. Membrane visualization is supported by the open-source Mol* viewer (molstar.org and github.com/molstar/molstar). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Proteínas de Membrana , Software , Humanos , Conformação Proteica , Bases de Dados de Proteínas , Sequência de Aminoácidos
9.
Nucleic Acids Res ; 49(W1): W431-W437, 2021 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-33956157

RESUMO

Large biomolecular structures are being determined experimentally on a daily basis using established techniques such as crystallography and electron microscopy. In addition, emerging integrative or hybrid methods (I/HM) are producing structural models of huge macromolecular machines and assemblies, sometimes containing 100s of millions of non-hydrogen atoms. The performance requirements for visualization and analysis tools delivering these data are increasing rapidly. Significant progress in developing online, web-native three-dimensional (3D) visualization tools was previously accomplished with the introduction of the LiteMol suite and NGL Viewers. Thereafter, Mol* development was jointly initiated by PDBe and RCSB PDB to combine and build on the strengths of LiteMol (developed by PDBe) and NGL (developed by RCSB PDB). The web-native Mol* Viewer enables 3D visualization and streaming of macromolecular coordinate and experimental data, together with capabilities for displaying structure quality, functional, or biological context annotations. High-performance graphics and data management allows users to simultaneously visualise up to hundreds of (superimposed) protein structures, stream molecular dynamics simulation trajectories, render cell-level models, or display huge I/HM structures. It is the primary 3D structure viewer used by PDBe and RCSB PDB. It can be easily integrated into third-party services. Mol* Viewer is open source and freely available at https://molstar.org/.


Assuntos
Substâncias Macromoleculares/química , Modelos Moleculares , Software , Internet , Conformação Proteica
10.
J Mol Biol ; 433(11): 166704, 2021 05 28.
Artigo em Inglês | MEDLINE | ID: mdl-33186584

RESUMO

The US Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) serves many millions of unique users worldwide by delivering experimentally-determined 3D structures of biomolecules integrated with >40 external data resources via RCSB.org, application programming interfaces (APIs), and FTP downloads. Herein, we present the architectural redesign of RCSB PDB data delivery services that build on existing PDBx/mmCIF data schemas. New data access APIs (data.rcsb.org) enable efficient delivery of all PDB archive data. A novel GraphQL-based API provides flexible, declarative data retrieval along with a simple-to-use REST API. A powerful new search system (search.rcsb.org) seamlessly integrates heterogeneous types of searches across the PDB archive. Searches may combine text attributes, protein or nucleic acid sequences, small-molecule chemical descriptors, 3D macromolecular shapes, and sequence motifs. The new RCSB.org architecture adheres to the FAIR Principles, empowering users to address a wide array of research problems in fundamental biology, biomedicine, biotechnology, bioengineering, and bioenergy.


Assuntos
Biologia Computacional , Bases de Dados de Proteínas , Substâncias Macromoleculares/química , Ferramenta de Busca
11.
Nucleic Acids Res ; 49(D1): D437-D451, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33211854

RESUMO

The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB), the US data center for the global PDB archive and a founding member of the Worldwide Protein Data Bank partnership, serves tens of thousands of data depositors in the Americas and Oceania and makes 3D macromolecular structure data available at no charge and without restrictions to millions of RCSB.org users around the world, including >660 000 educators, students and members of the curious public using PDB101.RCSB.org. PDB data depositors include structural biologists using macromolecular crystallography, nuclear magnetic resonance spectroscopy, 3D electron microscopy and micro-electron diffraction. PDB data consumers accessing our web portals include researchers, educators and students studying fundamental biology, biomedicine, biotechnology, bioengineering and energy sciences. During the past 2 years, the research-focused RCSB PDB web portal (RCSB.org) has undergone a complete redesign, enabling improved searching with full Boolean operator logic and more facile access to PDB data integrated with >40 external biodata resources. New features and resources are described in detail using examples that showcase recently released structures of SARS-CoV-2 proteins and host cell proteins relevant to understanding and addressing the COVID-19 global pandemic.


Assuntos
Biologia Computacional/métodos , Bases de Dados de Proteínas , Substâncias Macromoleculares/química , Conformação Proteica , Proteínas/química , Bioengenharia/métodos , Pesquisa Biomédica/métodos , Biotecnologia/métodos , COVID-19/epidemiologia , COVID-19/prevenção & controle , COVID-19/virologia , Humanos , Substâncias Macromoleculares/metabolismo , Pandemias , Proteínas/genética , Proteínas/metabolismo , SARS-CoV-2/genética , SARS-CoV-2/metabolismo , SARS-CoV-2/fisiologia , Software , Proteínas Virais/química , Proteínas Virais/genética , Proteínas Virais/metabolismo
12.
PLoS Comput Biol ; 16(12): e1008502, 2020 12.
Artigo em Inglês | MEDLINE | ID: mdl-33284792

RESUMO

Biochemical and biological functions of proteins are the product of both the overall fold of the polypeptide chain, and, typically, structural motifs made up of smaller numbers of amino acids constituting a catalytic center or a binding site that may be remote from one another in amino acid sequence. Detection of such structural motifs can provide valuable insights into the function(s) of previously uncharacterized proteins. Technically, this remains an extremely challenging problem because of the size of the Protein Data Bank (PDB) archive. Existing methods depend on a clustering by sequence similarity and can be computationally slow. We have developed a new approach that uses an inverted index strategy capable of analyzing >170,000 PDB structures with unmatched speed. The efficiency of the inverted index method depends critically on identifying the small number of structures containing the query motif and ignoring most of the structures that are irrelevant. Our approach (implemented at motif.rcsb.org) enables real-time retrieval and superposition of structural motifs, either extracted from a reference structure or uploaded by the user. Herein, we describe the method and present five case studies that exemplify its efficacy and speed for analyzing 3D structures of both proteins and nucleic acids.


Assuntos
Proteínas/química , Catálise , Análise por Conglomerados , Bases de Dados de Proteínas , Armazenamento e Recuperação da Informação , Ácidos Nucleicos/química , Conformação Proteica
13.
PLoS Comput Biol ; 16(10): e1008247, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-33075050

RESUMO

3D macromolecular structural data is growing ever more complex and plentiful in the wake of substantive advances in experimental and computational structure determination methods including macromolecular crystallography, cryo-electron microscopy, and integrative methods. Efficient means of working with 3D macromolecular structural data for archiving, analyses, and visualization are central to facilitating interoperability and reusability in compliance with the FAIR Principles. We address two challenges posed by growth in data size and complexity. First, data size is reduced by bespoke compression techniques. Second, complexity is managed through improved software tooling and fully leveraging available data dictionary schemas. To this end, we introduce BinaryCIF, a serialization of Crystallographic Information File (CIF) format files that maintains full compatibility to related data schemas, such as PDBx/mmCIF, while reducing file sizes by more than a factor of two versus gzip compressed CIF files. Moreover, for the largest structures, BinaryCIF provides even better compression-factor ten and four versus CIF files and gzipped CIF files, respectively. Herein, we describe CIFTools, a set of libraries in Java and TypeScript for generic and typed handling of CIF and BinaryCIF files. Together, BinaryCIF and CIFTools enable lightweight, efficient, and extensible handling of 3D macromolecular structural data.


Assuntos
Cristalografia/métodos , Compressão de Dados/métodos , Modelos Moleculares , Software , Bases de Dados de Compostos Químicos , Substâncias Macromoleculares/química , Substâncias Macromoleculares/ultraestrutura
14.
Sci Rep ; 10(1): 12647, 2020 07 28.
Artigo em Inglês | MEDLINE | ID: mdl-32724042

RESUMO

Storage and directed transfer of information is the key requirement for the development of life. Yet any information stored on our genes is useless without its correct interpretation. The genetic code defines the rule set to decode this information. Aminoacyl-tRNA synthetases are at the heart of this process. We extensively characterize how these enzymes distinguish all natural amino acids based on the computational analysis of crystallographic structure data. The results of this meta-analysis show that the correct read-out of genetic information is a delicate interplay between the composition of the binding site, non-covalent interactions, error correction mechanisms, and steric effects.


Assuntos
Aminoácidos/metabolismo , Aminoacil-tRNA Sintetases/metabolismo , Evolução Biológica , Código Genético , Biossíntese de Proteínas , RNA de Transferência/metabolismo , Aminoacil-tRNA Sintetases/genética , Animais , Archaea , Bactérias , Humanos , Metanálise como Assunto , RNA de Transferência/genética
15.
Sci Rep ; 9(1): 18517, 2019 12 06.
Artigo em Inglês | MEDLINE | ID: mdl-31811259

RESUMO

Protein folding and structure prediction are two sides of the same coin. Contact maps and the related techniques of constraint-based structure reconstruction can be considered as unifying aspects of both processes. We present the Structural Relevance (SR) score which quantifies the information content of individual contacts and residues in the context of the whole native structure. The physical process of protein folding is commonly characterized with spatial and temporal resolution: some residues are Early Folding while others are Highly Stable with respect to unfolding events. We employ the proposed SR score to demonstrate that folding initiation and structure stabilization are subprocesses realized by distinct sets of residues. The example of cytochrome c is used to demonstrate how StructureDistiller identifies the most important contacts needed for correct protein folding. This shows that entries of a contact map are not equally relevant for structural integrity. The proposed StructureDistiller algorithm identifies contacts with the highest information content; these entries convey unique constraints not captured by other contacts. Identification of the most informative contacts effectively doubles resilience toward contacts which are not observed in the native contact map. Furthermore, this knowledge increases reconstruction fidelity on sparse contact maps significantly by 0.4 Å.


Assuntos
Biologia Computacional/métodos , Bases de Dados de Proteínas , Conformação Proteica , Algoritmos , Animais , Citocromos c/química , Cavalos , Hidrogênio/química , Ligação de Hidrogênio , Mutação , Miocárdio/metabolismo , Dobramento de Proteína , Proteínas/química , Software
16.
Int J Food Microbiol ; 305: 108240, 2019 Sep 16.
Artigo em Inglês | MEDLINE | ID: mdl-31202151

RESUMO

The lantibiotic nisin is used as a food additive to effectively inactivate a broad spectrum of Gram-positive bacteria such as Listeria monocytogenes. In total, 282 L. monocytogenes field isolates from German ready-to-eat food products, food-processing environments and patient samples and 39 Listeria reference strains were evaluated for their susceptibility to nisin. The MIC90 value was <1500 IU ml-1. Whole genome sequences (WGS) of four nisin susceptible (NS; growth <200 IU ml-1) and two nisin resistant L. monocytogenes field isolates (NR; growth >1500 IU ml-1) of serotype IIa were analyzed for DNA sequence variants (DSVs) in genes putatively associated with NR and its regulation. WGS of NR differed from NS in the gadD2 gene encoding for the glutamate decarboxylase system (GAD). Moreover, homology modeling predicted a protein structure of GadD2 in NR that promoted a less pH dependent GAD activity and may therefore be beneficial for nisin resistance. Likewise NR had a significant faster growth rate compared to NS in presence of nisin at pH 7. In conclusion, results contributed to ongoing debate that a genetic shift in GAD supports NR state.


Assuntos
Antibacterianos/farmacologia , Proteínas de Bactérias/química , Glutamato Descarboxilase/química , Listeria monocytogenes/efeitos dos fármacos , Nisina/farmacologia , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , Farmacorresistência Bacteriana , Fast Foods/microbiologia , Aditivos Alimentares/farmacologia , Manipulação de Alimentos/métodos , Glutamato Descarboxilase/genética , Glutamato Descarboxilase/metabolismo , Humanos , Listeria monocytogenes/genética , Listeria monocytogenes/isolamento & purificação , Listeria monocytogenes/metabolismo , Conformação Proteica/efeitos dos fármacos , Sequenciamento Completo do Genoma
17.
BioData Min ; 12: 1, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30627219

RESUMO

BACKGROUND: Machine learning strategies are prominent tools for data analysis. Especially in life sciences, they have become increasingly important to handle the growing datasets collected by the scientific community. Meanwhile, algorithms improve in performance, but also gain complexity, and tend to neglect interpretability and comprehensiveness of the resulting models. RESULTS: Generalized Matrix Learning Vector Quantization (GMLVQ) is a supervised, prototype-based machine learning method and provides comprehensive visualization capabilities not present in other classifiers which allow for a fine-grained interpretation of the data. In contrast to commonly used machine learning strategies, GMLVQ is well-suited for imbalanced classification problems which are frequent in life sciences. We present a Weka plug-in implementing GMLVQ. The feasibility of GMLVQ is demonstrated on a dataset of Early Folding Residues (EFR) that have been shown to initiate and guide the protein folding process. Using 27 features, an area under the receiver operating characteristic of 76.6% was achieved which is comparable to other state-of-the-art classifiers. The obtained model is accessible at https://biosciences.hs-mittweida.de/efpred/. CONCLUSIONS: The application on EFR prediction demonstrates how an easy interpretation of classification models can promote the comprehension of biological mechanisms. The results shed light on the special features of EFR which were reported as most influential for the classification: EFR are embedded in ordered secondary structure elements and they participate in networks of hydrophobic residues. Visualization capabilities of GMLVQ are presented as we demonstrate how to interpret the results.

18.
PLoS One ; 13(10): e0206369, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30376559

RESUMO

Proteins are chains of amino acids which adopt a three-dimensional structure and are then able to catalyze chemical reactions or propagate signals in organisms. Without external influence, many proteins fold into their native structure, and a small number of Early Folding Residues (EFR) have previously been shown to initiate the formation of secondary structure elements and guide their respective assembly. Using the two diverse superfamilies of aminoacyl-tRNA synthetases (aaRS), it is shown that the position of EFR is preserved over the course of evolution even when the corresponding sequence conservation is small. Folding initiation sites are positioned in the center of secondary structure elements, independent of aaRS class. In class I, the predicted position of EFR resembles an ancient structural packing motif present in many seemingly unrelated proteins. Furthermore, it is shown that EFR and functionally relevant residues in aaRS are almost entirely disjoint sets of residues. The Start2Fold database is used to investigate whether this separation of EFR and functional residues can be observed for other proteins. EFR are found to constitute crucial connectors of protein regions which are distant at sequence level. Especially, these residues exhibit a high number of non-covalent residue-residue contacts such as hydrogen bonds and hydrophobic interactions. This tendency also manifests as energetically stable local regions, as substantiated by a knowledge-based potential. Despite profound differences regarding how EFR and functional residues are embedded in protein structures, a strict separation of structurally and functionally relevant residues cannot be observed for a more general collection of proteins.


Assuntos
Aminoacil-tRNA Sintetases/metabolismo , Trifosfato de Adenosina/química , Trifosfato de Adenosina/metabolismo , Aminoácidos/química , Aminoácidos/metabolismo , Aminoacil-tRNA Sintetases/química , Sítios de Ligação , Bases de Dados de Proteínas , Dobramento de Proteína , Estrutura Secundária de Proteína
19.
PLoS Comput Biol ; 14(4): e1006101, 2018 04.
Artigo em Inglês | MEDLINE | ID: mdl-29659563

RESUMO

The origin of the machinery that realizes protein biosynthesis in all organisms is still unclear. One key component of this machinery are aminoacyl tRNA synthetases (aaRS), which ligate tRNAs to amino acids while consuming ATP. Sequence analyses revealed that these enzymes can be divided into two complementary classes. Both classes differ significantly on a sequence and structural level, feature different reaction mechanisms, and occur in diverse oligomerization states. The one unifying aspect of both classes is their function of binding ATP. We identified Backbone Brackets and Arginine Tweezers as most compact ATP binding motifs characteristic for each Class. Geometric analysis shows a structural rearrangement of the Backbone Brackets upon ATP binding, indicating a general mechanism of all Class I structures. Regarding the origin of aaRS, the Rodin-Ohno hypothesis states that the peculiar nature of the two aaRS classes is the result of their primordial forms, called Protozymes, being encoded on opposite strands of the same gene. Backbone Brackets and Arginine Tweezers were traced back to the proposed Protozymes and their more efficient successors, the Urzymes. Both structural motifs can be observed as pairs of residues in contemporary structures and it seems that the time of their addition, indicated by their placement in the ancient aaRS, coincides with the evolutionary trace of Proto- and Urzymes.


Assuntos
Aminoacil-tRNA Sintetases/classificação , Aminoacil-tRNA Sintetases/metabolismo , Trifosfato de Adenosina/metabolismo , Sequência de Aminoácidos , Aminoacil-tRNA Sintetases/genética , Arginina/química , Sequência de Bases , Domínio Catalítico/genética , Códon/genética , Biologia Computacional , Evolução Molecular , Variação Genética , Humanos , Ligantes , Modelos Moleculares , Mutagênese , Conformação Proteica , RNA de Transferência/química , RNA de Transferência/genética , RNA de Transferência/metabolismo
20.
BioData Min ; 9: 6, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-26819632

RESUMO

BACKGROUND: To understand the molecular function of biopolymers, studying their structural characteristics is of central importance. Graphics programs are often utilized to conceive these properties, but with the increasing number of available structures in databases or structure models produced by automated modeling frameworks this process requires assistance from tools that allow automated structure visualization. In this paper a web server and its underlying method for generating graphical sequence representations of molecular structures is presented. RESULTS: The method, called SequenceCEROSENE (color encoding of residues obtained by spatial neighborhood embedding), retrieves the sequence of each amino acid or nucleotide chain in a given structure and produces a color coding for each residue based on three-dimensional structure information. From this, color-highlighted sequences are obtained, where residue coloring represent three-dimensional residue locations in the structure. This color encoding thus provides a one-dimensional representation, from which spatial interactions, proximity and relations between residues or entire chains can be deduced quickly and solely from color similarity. Furthermore, additional heteroatoms and chemical compounds bound to the structure, like ligands or coenzymes, are processed and reported as well. To provide free access to SequenceCEROSENE, a web server has been implemented that allows generating color codings for structures deposited in the Protein Data Bank or structure models uploaded by the user. Besides retrieving visualizations in popular graphic formats, underlying raw data can be downloaded as well. In addition, the server provides user interactivity with generated visualizations and the three-dimensional structure in question. CONCLUSIONS: Color encoded sequences generated by SequenceCEROSENE can aid to quickly perceive the general characteristics of a structure of interest (or entire sets of complexes), thus supporting the researcher in the initial phase of structure-based studies. In this respect, the web server can be a valuable tool, as users are allowed to process multiple structures, quickly switch between results, and interact with generated visualizations in an intuitive manner. The SequenceCEROSENE web server is available at https://biosciences.hs-mittweida.de/seqcerosene.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...