ABSTRACT
The Protein Data Bank in Europe (PDBe), a founding member of the Worldwide Protein Data Bank (wwPDB), actively participates in the deposition, curation, validation, archiving and dissemination of macromolecular structure data. PDBe supports diverse research communities in their use of macromolecular structures by enriching the PDB data and by providing advanced tools and services for effective data access, visualization and analysis. This paper details the enrichment of data at PDBe, including mapping of RNA structures to Rfam, and identification of molecules that act as cofactors. PDBe has developed an advanced search facility with â¼100 data categories and sequence searches. New features have been included in the LiteMol viewer at PDBe, with updated visualization of carbohydrates and nucleic acids. Small molecules are now mapped more extensively to external databases and their visual representation has been enhanced. These advances help users to more easily find and interpret macromolecular structure data in order to solve scientific problems.
Subject(s)
Databases, Protein , Software , Cluster Analysis , Data Accuracy , Europe , Protein Conformation , User-Computer InterfaceABSTRACT
The Protein Data Bank in Europe (PDBe, pdbe.org) is actively engaged in the deposition, annotation, remediation, enrichment and dissemination of macromolecular structure data. This paper describes new developments and improvements at PDBe addressing three challenging areas: data enrichment, data dissemination and functional reusability. New features of the PDBe Web site are discussed, including a context dependent menu providing links to raw experimental data and improved presentation of structures solved by hybrid methods. The paper also summarizes the features of the LiteMol suite, which is a set of services enabling fast and interactive 3D visualization of structures, with associated experimental maps, annotations and quality assessment information. We introduce a library of Web components which can be easily reused to port data and functionality available at PDBe to other services. We also introduce updates to the SIFTS resource which maps PDB data to other bioinformatics resources, and the PDBe REST API.
Subject(s)
Computational Biology/methods , Databases, Protein , Proteins/chemistry , Sequence Analysis, Protein/methods , User-Computer Interface , Amino Acid Sequence , Computer Graphics , Databases as Topic , Europe , Humans , Information Dissemination , Internet , Models, Molecular , Molecular Sequence Annotation , Protein Conformation, alpha-Helical , Protein Conformation, beta-Strand , Proteins/genetics , Proteins/metabolismABSTRACT
The Protein Data Bank in Europe (http://pdbe.org) accepts and annotates depositions of macromolecular structure data in the PDB and EMDB archives and enriches, integrates and disseminates structural information in a variety of ways. The PDBe website has been redesigned based on an analysis of user requirements, and now offers intuitive access to improved and value-added macromolecular structure information. Unique value-added information includes lists of reviews and research articles that cite or mention PDB entries as well as access to figures and legends from full-text open-access publications that describe PDB entries. A powerful new query system not only shows all the PDB entries that match a given query, but also shows the 'best structures' for a given macromolecule, ligand complex or sequence family using data-quality information from the wwPDB validation reports. A PDBe RESTful API has been developed to provide unified access to macromolecular structure data available in the PDB and EMDB archives as well as value-added annotations, e.g. regarding structure quality and up-to-date cross-reference information from the SIFTS resource. Taken together, these new developments facilitate unified access to macromolecular structure data in an intuitive way for non-expert users and support expert users in analysing macromolecular structure data.
Subject(s)
Databases, Protein , Protein Conformation , Internet , Microscopy, Electron , Models, Molecular , User-Computer InterfaceABSTRACT
ChEMBL is an open large-scale bioactivity database (https://www.ebi.ac.uk/chembl), previously described in the 2012 Nucleic Acids Research Database Issue. Since then, a variety of new data sources and improvements in functionality have contributed to the growth and utility of the resource. In particular, more comprehensive tracking of compounds from research stages through clinical development to market is provided through the inclusion of data from United States Adopted Name applications; a new richer data model for representing drug targets has been developed; and a number of methods have been put in place to allow users to more easily identify reliable data. Finally, access to ChEMBL is now available via a new Resource Description Framework format, in addition to the web-based interface, data downloads and web services.
Subject(s)
Databases, Chemical , Drug Discovery , Binding Sites , Humans , Internet , Ligands , Pharmaceutical Preparations/chemistry , Proteins/chemistry , Proteins/drug effectsABSTRACT
The use of spherical harmonics in the molecular sciences is widespread. They have been employed with success in, for instance, the crystallographic fast rotation function, small-angle scattering particle reconstruction, molecular surface visualisation, protein-protein docking, active site analysis and protein function prediction. An extension of the spherical harmonic expansion method is presented here that enables regions (bodies) rather than contours (surfaces) to be described and which lends itself favourably to the construction of rotationally invariant shape descriptors. This method introduces a radial term that extends the spherical harmonics to 3D polynomials. These polynomials maintain the advantages of the spherical harmonics (orthonormality, completeness, uniqueness and fast computation) but correct the drawbacks (contour based shape description and star-shape objects) and give rise to powerful invariant descriptors. We provide proof-of-principle examples illustrating the potential of this method for accurate object representation, an analysis of the descriptor classification power, and comparisons to other methods.
Subject(s)
Acoustics , Models, Molecular , Models, Statistical , Molecular Structure , Protein Conformation , Cluster Analysis , Imaging, Three-Dimensional , Reproducibility of Results , Rotation , VibrationABSTRACT
Database URL: https://www.wwpdb.org/.
Subject(s)
Data Curation , Databases, Protein , Protein Conformation , Vocabulary, ControlledABSTRACT
The Worldwide PDB recently launched a deposition, biocuration, and validation tool: OneDep. At various stages of OneDep data processing, validation reports for three-dimensional structures of biological macromolecules are produced. These reports are based on recommendations of expert task forces representing crystallography, nuclear magnetic resonance, and cryoelectron microscopy communities. The reports provide useful metrics with which depositors can evaluate the quality of the experimental data, the structural model, and the fit between them. The validation module is also available as a stand-alone web server and as a programmatically accessible web service. A growing number of journals require the official wwPDB validation reports (produced at biocuration) to accompany manuscripts describing macromolecular structures. Upon public release of the structure, the validation report becomes part of the public PDB archive. Geometric quality scores for proteins in the PDB archive have improved over the past decade.
Subject(s)
Databases, Protein/standards , Validation Studies as Topic , Sequence Analysis, Protein/methods , Sequence Analysis, Protein/standardsABSTRACT
OneDep, a unified system for deposition, biocuration, and validation of experimentally determined structures of biological macromolecules to the PDB archive, has been developed as a global collaboration by the worldwide PDB (wwPDB) partners. This new system was designed to ensure that the wwPDB could meet the evolving archiving requirements of the scientific community over the coming decades. OneDep unifies deposition, biocuration, and validation pipelines across all wwPDB, EMDB, and BMRB deposition sites with improved focus on data quality and completeness in these archives, while supporting growth in the number of depositions and increases in their average size and complexity. In this paper, we describe the design, functional operation, and supporting infrastructure of the OneDep system, and provide initial performance assessments.
Subject(s)
Proteins/chemistry , Data Curation , Databases, Protein , Internet , Models, Molecular , Nuclear Magnetic Resonance, Biomolecular , Protein Conformation , User-Computer InterfaceABSTRACT
Root nodule extensins (RNEs) are highly glycosylated plant glycoproteins localized in the extracellular matrix of legume tissues and in the lumen of Rhizobium-induced infection threads. In pea and other legumes, a family of genes encode glycoproteins of different overall length but with the same basic composition. The predicted polypeptide sequence reveals repeating and alternating motifs characteristic of extensins and arabinogalactan proteins. In order to monitor the behavior of individual RNE gene products in the plant extracellular matrix, the coding sequence of PsRNE1 from Pisum sativum was expressed in insect cells and in tobacco leaves. RNE products extracted from tobacco tissues were of high molecular weight (in excess of 80 kDa), indicating extensive glycosylation similar to that in pea tissues. Epitope-tagged derivatives of PsRNE1 could be localized in cell walls. However, the introduction of epitope tags at the C-terminus of RNE altered the behavior of RNE in the extracellular matrix, apparently preventing intermolecular crosslinking of RNE molecules and their covalent association with other cell wall components. These observations are discussed in the light of a computational model for the RNE glycoprotein that is consistent with an extended rod-like structure. It is proposed that RNE can undergo three classes of tyrosine-based crosslinking. Intramolecular crosslinking of vicinal Tyr residues is rod stiffening, end-to-end linkage is rod lengthening, and side-to-side intermolecular crosslinking is rod bundling. The control of these interconversions could have important implications for the biomechanics of infection thread growth.
Subject(s)
Glycoproteins/physiology , Pisum sativum/physiology , Plant Proteins/physiology , Amino Acid Sequence , Animals , Cell Line , Cell Wall/chemistry , Epitopes , Fabaceae/chemistry , Glycoproteins/chemistry , Models, Molecular , Molecular Sequence Data , Organisms, Genetically Modified , Plant Proteins/chemistry , Protein Conformation , Nicotiana/geneticsABSTRACT
ABSTRACT: Both metabolism and transport are key elements defining the bioavailability and biological activity of molecules, i.e. their adverse and therapeutic effects. Structured and high quality experimental data stored in a suitable container, such as a relational database, facilitates easy computational processing and thus allows for high quality information/knowledge to be efficiently inferred by computational analyses. Our aim was to create a freely accessible database that would provide easy access to data describing interactions between proteins involved in transport and xenobiotic metabolism and their small molecule substrates and modulators. We present Metrabase, an integrated cheminformatics and bioinformatics resource containing curated data related to human transport and metabolism of chemical compounds. Its primary content includes over 11,500 interaction records involving nearly 3,500 small molecule substrates and modulators of transport proteins and, currently to a much smaller extent, cytochrome P450 enzymes. Data was manually extracted from the published literature and supplemented with data integrated from other available resources. Metrabase version 1.0 is freely available under a CC BY-SA 4.0 license at http://www-metrabase.ch.cam.ac.uk.
ABSTRACT
Cancer remains a fundamental burden to public health despite substantial efforts aimed at developing effective chemotherapeutics and significant advances in chemotherapeutic regimens. The major challenge in anti-cancer drug design is to selectively target cancer cells with high specificity. Research into treating malignancies by targeting altered metabolism in cancer cells is supported by computational approaches, which can take a leading role in identifying candidate targets for anti-cancer therapy as well as assist in the discovery and optimisation of anti-cancer agents. Natural products appear to have privileged structures for anti-cancer drug development and the bulk of this particularly valuable chemical space still remains to be explored. In this review we aim to provide a comprehensive overview of current strategies for computer-guided anti-cancer drug development. We start with a discussion of state-of-the art bioinformatics methods applied to the identification of novel anti-cancer targets, including machine learning techniques, the Connectivity Map and biological network analysis. This is followed by an extensive survey of molecular modelling and cheminformatics techniques employed to develop agents targeting proteins involved in the glycolytic, lipid, NAD+, mitochondrial (TCA cycle), amino acid and nucleic acid metabolism of cancer cells. A dedicated section highlights the most promising strategies to develop anti-cancer therapeutics from natural products and the role of metabolism and some of the many targets which are under investigation are reviewed. Recent success stories are reported for all the areas covered in this review. We conclude with a brief summary of the most interesting strategies identified and with an outlook on future directions in anti-cancer drug development.