ABSTRACT
The seventh BMC Ecology competition attracted entries from talented ecologists from around the world. Together, they showcase the beauty and diversity of life on our planet as well as providing an insight into the biological interactions found in nature. This editorial celebrates the winning images as selected by the Editor of BMC Ecology and senior members of the journal's editorial board. Enjoy!
Subject(s)
EcologyABSTRACT
The sixth BMC Ecology Image Competition received more than 145 photographs from talented ecologists around the world, showcasing the amazing biodiversity, natural beauty and biological interactions found in nature. In this editorial, we showcase the winning images, as selected by our guest judge, Professor Zhigang Jiang from the Institute of Zoology of the Chinese Academy of Sciences, with help from the journal's editorial board. Enjoy!
Subject(s)
Ecology , PhotographyABSTRACT
The latest version of the CATH-Gene3D protein structure classification database (4.0, http://www.cathdb.info) provides annotations for over 235,000 protein domain structures and includes 25 million domain predictions. This article provides an update on the major developments in the 2 years since the last publication in this journal including: significant improvements to the predictive power of our functional families (FunFams); the release of our 'current' putative domain assignments (CATH-B); a new, strictly non-redundant data set of CATH domains suitable for homology benchmarking experiments (CATH-40) and a number of improvements to the web pages.
Subject(s)
Databases, Protein , Molecular Sequence Annotation , Protein Structure, Tertiary , Genomics , Internet , Protein Structure, Tertiary/genetics , Proteins/classificationABSTRACT
CATH version 3.5 (Class, Architecture, Topology, Homology, available at http://www.cathdb.info/) contains 173 536 domains, 2626 homologous superfamilies and 1313 fold groups. When focusing on structural genomics (SG) structures, we observe that the number of new folds for CATH v3.5 is slightly less than for previous releases, and this observation suggests that we may now know the majority of folds that are easily accessible to structure determination. We have improved the accuracy of our functional family (FunFams) sub-classification method and the CATH sequence domain search facility has been extended to provide FunFam annotations for each domain. The CATH website has been redesigned. We have improved the display of functional data and of conserved sequence features associated with FunFams within each CATH superfamily.
Subject(s)
Databases, Protein , Protein Structure, Tertiary , Genomics , Internet , Molecular Sequence Annotation , Protein Folding , Proteins/chemistry , Proteins/classification , Proteins/genetics , Sequence Alignment , Sequence Analysis, Protein , Structural Homology, ProteinABSTRACT
Genome3D, available at http://www.genome3d.eu, is a new collaborative project that integrates UK-based structural resources to provide a unique perspective on sequence-structure-function relationships. Leading structure prediction resources (DomSerf, FUGUE, Gene3D, pDomTHREADER, Phyre and SUPERFAMILY) provide annotations for UniProt sequences to indicate the locations of structural domains (structural annotations) and their 3D structures (structural models). Structural annotations and 3D model predictions are currently available for three model genomes (Homo sapiens, E. coli and baker's yeast), and the project will extend to other genomes in the near future. As these resources exploit different strategies for predicting structures, the main aim of Genome3D is to enable comparisons between all the resources so that biologists can see where predictions agree and are therefore more trusted. Furthermore, as these methods differ in whether they build their predictions using CATH or SCOP, Genome3D also contains the first official mapping between these two databases. This has identified pairs of similar superfamilies from the two resources at various degrees of consensus (532 bronze pairs, 527 silver pairs and 370 gold pairs).
Subject(s)
Databases, Protein , Protein Structure, Tertiary , Genomics , Humans , Internet , Molecular Sequence Annotation , Proteins/chemistry , Proteins/classification , Proteins/genetics , SoftwareABSTRACT
FunTree is a new resource that brings together sequence, structure, phylogenetic, chemical and mechanistic information for structurally defined enzyme superfamilies. Gathering together this range of data into a single resource allows the investigation of how novel enzyme functions have evolved within a structurally defined superfamily as well as providing a means to analyse trends across many superfamilies. This is done not only within the context of an enzyme's sequence and structure but also the relationships of their reactions. Developed in tandem with the CATH database, it currently comprises 276 superfamilies covering ~1800 (70%) of sequence assigned enzyme reactions. Central to the resource are phylogenetic trees generated from structurally informed multiple sequence alignments using both domain structural alignments supplemented with domain sequences and whole sequence alignments based on commonality of multi-domain architectures. These trees are decorated with functional annotations such as metabolite similarity as well as annotations from manually curated resources such the catalytic site atlas and MACiE for enzyme mechanisms. The resource is freely available through a web interface: www.ebi.ac.uk/thorton-srv/databases/FunTree.
Subject(s)
Databases, Protein , Enzymes/chemistry , Enzymes/classification , Biological Evolution , Enzymes/metabolism , Phylogeny , Protein Structure, Tertiary , Sequence Alignment , Sequence Analysis, ProteinABSTRACT
In order to understand the evolution of enzyme reactions and to gain an overview of biological catalysis we have combined sequence and structural data to generate phylogenetic trees in an analysis of 276 structurally defined enzyme superfamilies, and used these to study how enzyme functions have evolved. We describe in detail the analysis of two superfamilies to illustrate different paradigms of enzyme evolution. Gathering together data from all the superfamilies supports and develops the observation that they have all evolved to act on a diverse set of substrates, whilst the evolution of new chemistry is much less common. Despite that, by bringing together so much data, we can provide a comprehensive overview of the most common and rare types of changes in function. Our analysis demonstrates on a larger scale than previously studied, that modifications in overall chemistry still occur, with all possible changes at the primary level of the Enzyme Commission (E.C.) classification observed to a greater or lesser extent. The phylogenetic trees map out the evolutionary route taken within a superfamily, as well as all the possible changes within a superfamily. This has been used to generate a matrix of observed exchanges from one enzyme function to another, revealing the scale and nature of enzyme evolution and that some types of exchanges between and within E.C. classes are more prevalent than others. Surprisingly a large proportion (71%) of all known enzyme functions are performed by this relatively small set of 276 superfamilies. This reinforces the hypothesis that relatively few ancient enzymatic domain superfamilies were progenitors for most of the chemistry required for life.
Subject(s)
Enzymes/chemistry , Enzymes/physiology , Evolution, Molecular , Sequence Analysis, Protein/methods , Amino Acid Sequence , Molecular Sequence Data , Structure-Activity RelationshipABSTRACT
CATH version 3.3 (class, architecture, topology, homology) contains 128,688 domains, 2386 homologous superfamilies and 1233 fold groups, and reflects a major focus on classifying structural genomics (SG) structures and transmembrane proteins, both of which are likely to add structural novelty to the database and therefore increase the coverage of protein fold space within CATH. For CATH version 3.4 we have significantly improved the presentation of sequence information and associated functional information for CATH superfamilies. The CATH superfamily pages now reflect both the functional and structural diversity within the superfamily and include structural alignments of close and distant relatives within the superfamily, annotated with functional information and details of conserved residues. A significantly more efficient search function for CATH has been established by implementing the search server Solr (http://lucene.apache.org/solr/). The CATH v3.4 webpages have been built using the Catalyst web framework.
Subject(s)
Databases, Protein , Protein Structure, Tertiary , Phylogeny , Protein Folding , Proteins/chemistry , Proteins/classificationABSTRACT
The latest version of CATH (class, architecture, topology, homology) (version 3.2), released in July 2008 (http://www.cathdb.info), contains 114,215 domains, 2178 Homologous superfamilies and 1110 fold groups. We have assigned 20,330 new domains, 87 new homologous superfamilies and 26 new folds since CATH release version 3.1. A total of 28,064 new domains have been assigned since our NAR 2007 database publication (CATH version 3.0). The CATH website has been completely redesigned and includes more comprehensive documentation. We have revisited the CATH architecture level as part of the development of a 'Protein Chart' and present information on the population of each architecture. The CATHEDRAL structure comparison algorithm has been improved and used to characterize structural diversity in CATH superfamilies and structural overlaps between superfamilies. Although the majority of superfamilies in CATH are not structurally diverse and do not overlap significantly with other superfamilies, approximately 4% of superfamilies are very diverse and these are the superfamilies that are most highly populated in both the PDB and in the genomes. Information on the degree of structural diversity in each superfamily and structural overlaps between superfamilies can now be downloaded from the CATH website.
Subject(s)
Databases, Protein , Protein Structure, Tertiary , Models, Molecular , Protein Folding , Protein Structure, Secondary , Proteins/classification , Sequence Homology, Amino AcidABSTRACT
The inaugural BMC Ecology and Evolution image competition attracted entries from talented ecologists and evolutionary biologists worldwide. Together, these photos beautifully capture biodiversity, how it arose and why we should conserve it. This editorial celebrates the winning images as selected by the Editor of BMC Ecology and Evolution and senior members of the journal's editorial board.
Subject(s)
Biodiversity , Ecology , Health Personnel , HumansABSTRACT
The study of superfamilies of protein domains using a combination of structure, sequence and function data provides insights into deep evolutionary history. In the present paper, analyses of functional diversity within such superfamilies as defined in the CATH-Gene3D resource are described. These analyses focus on structure-function relationships in very large and diverse superfamilies, and on the evolution of domain superfamily members in protein-protein complexes.
Subject(s)
Evolution, Molecular , Proteins/chemistry , Proteins/metabolism , Animals , Databases, Protein , Humans , Protein Structure, Secondary , Protein Structure, Tertiary , Proteins/classification , Structure-Activity RelationshipABSTRACT
We report the latest release (version 3.0) of the CATH protein domain database (http://www.cathdb.info). There has been a 20% increase in the number of structural domains classified in CATH, up to 86 151 domains. Release 3.0 comprises 1110 fold groups and 2147 homologous superfamilies. To cope with the increases in diverse structural homologues being determined by the structural genomics initiatives, more sensitive methods have been developed for identifying boundaries in multi-domain proteins and for recognising homologues. The CATH classification update is now being driven by an integrated pipeline that links these automated procedures with validation steps, that have been made easier by the provision of information rich web pages summarising comparison scores and relevant links to external sites for each domain being classified. An analysis of the population of domains in the CATH hierarchy and several domain characteristics are presented for version 3.0. We also report an update of the CATH Dictionary of homologous structures (CATH-DHS) which now contains multiple structural alignments, consensus information and functional annotations for 1459 well populated superfamilies in CATH. CATH is directly linked to the Gene3D database which is a projection of CATH structural data onto approximately 2 million sequences in completed genomes and UniProt.
Subject(s)
Databases, Protein , Protein Structure, Tertiary , Classification/methods , Evolution, Molecular , Internet , Protein Folding , Protein Structure, Tertiary/genetics , Proteins/classification , Sequence Homology, Amino Acid , Structural Homology, Protein , User-Computer InterfaceABSTRACT
We have developed a new method for the analysis of voids in proteins (defined as empty cavities not accessible to solvent). This method combines analysis of individual discrete voids with analysis of packing quality. While these are different aspects of the same effect, they have traditionally been analysed using different approaches. The method has been applied to the calculation of total void volume and maximum void size in a non-redundant set of protein domains and has been used to examine correlations between thermal stability and void size. The tumour-suppressor protein p53 has then been compared with the non-redundant data set to determine whether its low thermal stability results from poor packing. We found that p53 has average packing, but the detrimental effects of some previously unexplained mutations to p53 observed in cancer can be explained by the creation of unusually large voids.
Subject(s)
Solvents/chemistry , Tumor Suppressor Protein p53/chemistry , Tumor Suppressor Protein p53/metabolism , Amino Acid Substitution , Databases, Protein , Models, Chemical , Models, Molecular , Mutation/genetics , Protein Structure, Tertiary , Thermodynamics , Tumor Suppressor Protein p53/geneticsABSTRACT
TP53 encodes p53, which is a nuclear phosphoprotein with cancer-inhibiting properties. In response to DNA damage, p53 is activated and mediates a set of antiproliferative responses including cell-cycle arrest and apoptosis. Mutations in the TP53 gene are associated with more than 50% of human cancers, and 90% of these affect p53-DNA interactions, resulting in a partial or complete loss of transactivation functions. These mutations affect the structural integrity and/or p53-DNA interactions, leading to the partial or complete loss of the protein's function. We report here the results of a systematic automated analysis of the effects of p53 mutations on the structure of the core domain of the protein. We found that 304 of the 882 (34.4%) distinct mutations reported in the core domain can be explained in structural terms by their predicted effects on protein folding or on protein-DNA contacts. The proportion of "explained" mutations increased to 55.6% when substitutions of evolutionary conserved amino acids were included. The automated method of structural analysis developed here may be applied to other frequently mutated gene mutations such as dystrophin, BRCA1, and G6PD.
Subject(s)
Genes, p53/genetics , Mutation/genetics , Tumor Suppressor Protein p53/chemistry , Tumor Suppressor Protein p53/genetics , Amino Acid Substitution/genetics , Automation , Conserved Sequence/genetics , DNA/metabolism , DNA Mutational Analysis , Databases, Nucleic Acid , Glycine/genetics , Glycine/metabolism , Humans , Hydrogen Bonding , Models, Molecular , Molecular Sequence Data , Proline/genetics , Proline/metabolism , Protein Binding , Protein Folding , Protein Structure, Tertiary , Structure-Activity Relationship , Tumor Suppressor Protein p53/metabolism , Zinc/metabolismABSTRACT
Some superfamilies contain large numbers of protein domains with very different functions. The ability to refine the functional classification of domains within these superfamilies is necessary for better understanding the evolution of functions and to guide function prediction of new relatives. To achieve this, a suitable starting point is the detailed analysis of functional divisions and mechanisms of functional divergence in a single superfamily. Here, we present such a detailed analysis in the superfamily of HUP domains. A biologically meaningful functional classification of HUP domains is obtained manually. Mechanisms of function diversification are investigated in detail using this classification. We observe that structural motifs play an important role in shaping broad functional divergence, whereas residue-level changes shape diversity at a more specific level. In parallel we examine the ability of an automated protocol to capture the biologically meaningful classification, with a view to automatically extending this classification in the future.
Subject(s)
Evolution, Molecular , Models, Molecular , Molecular Dynamics Simulation , Protein Conformation , Protein Structure, Tertiary/genetics , Protein Subunits/chemistry , Protein Subunits/classification , Cluster Analysis , Databases, Protein , Protein Subunits/geneticsABSTRACT
The ability to assign function to proteins has become a major bottleneck for comprehensively understanding cellular mechanisms at the molecular level. Here we discuss the extent to which structural domain classifications can help in deciphering the complex relationship between the functions of proteins and their sequences and structures. Structural classifications are particularly helpful in understanding the mosaic manner in which new proteins and functions emerge through evolution. This is partly because they provide reliable and concrete domain definitions and enable the detection of very remote structural similarities and homologies. It is also because structural data can illuminate more clearly the mechanisms by which a broader functional repertoire can emerge during evolution.
Subject(s)
Proteins/chemistry , Proteins/metabolism , Animals , Humans , Models, Molecular , Protein Folding , Protein Structure, Tertiary , Proteins/classificationABSTRACT
This paper explores the structural continuum in CATH and the extent to which superfamilies adopt distinct folds. Although most superfamilies are structurally conserved, in some of the most highly populated superfamilies (4% of all superfamilies) there is considerable structural divergence. While relatives share a similar fold in the evolutionary conserved core, diverse elaborations to this core can result in significant differences in the global structures. Applying similar protocols to examine the extent to which structural overlaps occur between different fold groups, it appears this effect is confined to just a few architectures and is largely due to small, recurring super-secondary motifs (e.g., alphabeta-motifs, alpha-hairpins). Although 24% of superfamilies overlap with superfamilies having different folds, only 14% of nonredundant structures in CATH are involved in overlaps. Nevertheless, the existence of these overlaps suggests that, in some regions of structure space, the fold universe should be seen as more continuous.
Subject(s)
Databases, Protein , Protein Structure, Tertiary , Proteins/chemistry , Computational Biology/methods , Models, Molecular , Protein Folding , Protein Structure, Secondary , Proteins/classificationABSTRACT
MOTIVATION: Hydrogen bonds are one of the most important inter-atomic interactions in biology. Previous experimental, theoretical and bioinformatics analyses have shown that the hydrogen bonding potential of amino acids is generally satisfied and that buried unsatisfied hydrogen-bond-capable residues are destabilizing. When studying mutant proteins, or introducing mutations to residues involved in hydrogen bonding, one needs to know whether a hydrogen bond can be maintained. Our aim, therefore, was to develop a rapid method to evaluate whether a sidechain can form a hydrogen-bond. RESULTS: A novel knowledge-based approach was developed in which the conformations accessible to the residues involved are taken into account. Residues involved in hydrogen bonds in a set of high resolution crystal structures were analyzed and this analysis is then applied to a given protein. The program was applied to assess mutations in the tumour-suppressor protein, p53. This raised the number of distinct mutations identified as disrupting sidechain-sidechain hydrogen bonding from 181 in our previous analysis to 202 in this analysis.