Search | Virtual Health Library

1.

Genome3D: integrating a collaborative data pipeline to expand the depth and breadth of consensus protein structure annotation.

Sillitoe, Ian; Andreeva, Antonina; Blundell, Tom L; Buchan, Daniel W A; Finn, Robert D; Gough, Julian; Jones, David; Kelley, Lawrence A; Paysan-Lafosse, Typhaine; Lam, Su Datt; Murzin, Alexey G; Pandurangan, Arun Prasad; Salazar, Gustavo A; Skwark, Marcin J; Sternberg, Michael J E; Velankar, Sameer; Orengo, Christine.

Nucleic Acids Res ; 48(D1): D314-D319, 2020 01 08.

Article in English | MEDLINE | ID: mdl-31733063

ABSTRACT

Genome3D (https://www.genome3d.eu) is a freely available resource that provides consensus structural annotations for representative protein sequences taken from a selection of model organisms. Since the last NAR update in 2015, the method of data submission has been overhauled, with annotations now being 'pushed' to the database via an API. As a result, contributing groups are now able to manage their own structural annotations, making the resource more flexible and maintainable. The new submission protocol brings a number of additional benefits including: providing instant validation of data and avoiding the requirement to synchronise releases between resources. It also makes it possible to implement the submission of these structural annotations as an automated part of existing internal workflows. In turn, these improvements facilitate Genome3D being opened up to new prediction algorithms and groups. For the latest release of Genome3D (v2.1), the underlying dataset of sequences used as prediction targets has been updated using the latest reference proteomes available in UniProtKB. A number of new reference proteomes have also been added of particular interest to the wider scientific community: cow, pig, wheat and mycobacterium tuberculosis. These additions, along with improvements to the underlying predictions from contributing resources, has ensured that the number of annotations in Genome3D has nearly doubled since the last NAR update article. The new API has also been used to facilitate the dissemination of Genome3D data into InterPro, thereby widening the visibility of both the annotation data and annotation algorithms.

Subject(s)

Proteins/chemistry , Databases, Protein , Proteins/classification , Proteins/genetics , User-Computer Interface

2.

ePlant: Visualizing and Exploring Multiple Levels of Data for Hypothesis Generation in Plant Biology.

Waese, Jamie; Fan, Jim; Pasha, Asher; Yu, Hans; Fucile, Geoffrey; Shi, Ruian; Cumming, Matthew; Kelley, Lawrence A; Sternberg, Michael J; Krishnakumar, Vivek; Ferlanti, Erik; Miller, Jason; Town, Chris; Stuerzlinger, Wolfgang; Provart, Nicholas J.

Plant Cell ; 29(8): 1806-1821, 2017 Aug.

Article in English | MEDLINE | ID: mdl-28808136

ABSTRACT

A big challenge in current systems biology research arises when different types of data must be accessed from separate sources and visualized using separate tools. The high cognitive load required to navigate such a workflow is detrimental to hypothesis generation. Accordingly, there is a need for a robust research platform that incorporates all data and provides integrated search, analysis, and visualization features through a single portal. Here, we present ePlant (http://bar.utoronto.ca/eplant), a visual analytic tool for exploring multiple levels of Arabidopsis thaliana data through a zoomable user interface. ePlant connects to several publicly available web services to download genome, proteome, interactome, transcriptome, and 3D molecular structure data for one or more genes or gene products of interest. Data are displayed with a set of visualization tools that are presented using a conceptual hierarchy from big to small, and many of the tools combine information from more than one data type. We describe the development of ePlant in this article and present several examples illustrating its integrative features for hypothesis generation. We also describe the process of deploying ePlant as an "app" on Araport. Building on readily available web services, the code for ePlant is freely available for any other biological species research.

Subject(s)

Botany , Software , Statistics as Topic , Systems Biology , Base Sequence , Chromosomes, Plant/genetics , Gene Expression Regulation, Plant , Subcellular Fractions/metabolism , User-Computer Interface

3.

Genome3D: exploiting structure to help users understand their sequences.

Lewis, Tony E; Sillitoe, Ian; Andreeva, Antonina; Blundell, Tom L; Buchan, Daniel W A; Chothia, Cyrus; Cozzetto, Domenico; Dana, José M; Filippis, Ioannis; Gough, Julian; Jones, David T; Kelley, Lawrence A; Kleywegt, Gerard J; Minneci, Federico; Mistry, Jaina; Murzin, Alexey G; Ochoa-Montaño, Bernardo; Oates, Matt E; Punta, Marco; Rackham, Owen J L; Stahlhacke, Jonathan; Sternberg, Michael J E; Velankar, Sameer; Orengo, Christine.

Nucleic Acids Res ; 43(Database issue): D382-6, 2015 Jan.

Article in English | MEDLINE | ID: mdl-25348407

ABSTRACT

Genome3D (http://www.genome3d.eu) is a collaborative resource that provides predicted domain annotations and structural models for key sequences. Since introducing Genome3D in a previous NAR paper, we have substantially extended and improved the resource. We have annotated representatives from Pfam families to improve coverage of diverse sequences and added a fast sequence search to the website to allow users to find Genome3D-annotated sequences similar to their own. We have improved and extended the Genome3D data, enlarging the source data set from three model organisms to 10, and adding VIVACE, a resource new to Genome3D. We have analysed and updated Genome3D's SCOP/CATH mapping. Finally, we have improved the superposition tools, which now give users a more powerful interface for investigating similarities and differences between structural models.

Subject(s)

Databases, Protein , Molecular Sequence Annotation , Protein Structure, Tertiary , Algorithms , Genomics , Internet , Models, Molecular , Protein Structure, Tertiary/genetics , Sequence Analysis, Protein

4.

Three-dimensional structure of the human breast cancer resistance protein (BCRP/ABCG2) in an inward-facing conformation.

Rosenberg, Mark F; Bikadi, Zsolt; Hazai, Eszter; Starborg, Tobias; Kelley, Lawrence; Chayen, Naomi E; Ford, Robert C; Mao, Qingcheng.

Acta Crystallogr D Biol Crystallogr ; 71(Pt 8): 1725-35, 2015 Aug.

Article in English | MEDLINE | ID: mdl-26249353

ABSTRACT

ABCG2 is an efflux drug transporter that plays an important role in drug resistance and drug disposition. In this study, the first three-dimensional structure of human full-length ABCG2 analysed by electron crystallography from two-dimensional crystals in the absence of nucleotides and transported substrates is reported at 2 nm resolution. In this state, ABCG2 forms a symmetric homodimer with a noncrystallographic twofold axis perpendicular to the two-dimensional crystal plane, as confirmed by subtomogram averaging. This configuration suggests an inward-facing configuration similar to murine ABCB1, with the nucleotide-binding domains (NBDs) widely separated from each other. In the three-dimensional map, densities representing the long cytoplasmic extensions from the transmembrane domains that connect the NBDs are clearly visible. The structural data have allowed the atomic model of ABCG2 to be refined, in which the two arms of the V-shaped ABCG2 homodimeric complex are in a more closed and narrower conformation. The structural data and the refined model of ABCG2 are compatible with the biochemical analysis of the previously published mutagenesis studies, providing novel insight into the structure and function of the transporter.

Subject(s)

ATP-Binding Cassette Transporters/chemistry , Cryoelectron Microscopy , Neoplasm Proteins/chemistry , Protein Structure, Quaternary , ATP Binding Cassette Transporter, Subfamily G, Member 2 , ATP-Binding Cassette Transporters/metabolism , ATP-Binding Cassette Transporters/ultrastructure , Breast/metabolism , Breast Neoplasms/metabolism , Cryoelectron Microscopy/methods , Crystallization/methods , Female , Humans , Models, Molecular , Neoplasm Proteins/metabolism , Neoplasm Proteins/ultrastructure , Protein Multimerization

5.

Genome3D: a UK collaborative project to annotate genomic sequences with predicted 3D structures based on SCOP and CATH domains.

Lewis, Tony E; Sillitoe, Ian; Andreeva, Antonina; Blundell, Tom L; Buchan, Daniel W A; Chothia, Cyrus; Cuff, Alison; Dana, Jose M; Filippis, Ioannis; Gough, Julian; Hunter, Sarah; Jones, David T; Kelley, Lawrence A; Kleywegt, Gerard J; Minneci, Federico; Mitchell, Alex; Murzin, Alexey G; Ochoa-Montaño, Bernardo; Rackham, Owen J L; Smith, James; Sternberg, Michael J E; Velankar, Sameer; Yeats, Corin; Orengo, Christine.

Nucleic Acids Res ; 41(Database issue): D499-507, 2013 Jan.

Article in English | MEDLINE | ID: mdl-23203986

ABSTRACT

Genome3D, available at http://www.genome3d.eu, is a new collaborative project that integrates UK-based structural resources to provide a unique perspective on sequence-structure-function relationships. Leading structure prediction resources (DomSerf, FUGUE, Gene3D, pDomTHREADER, Phyre and SUPERFAMILY) provide annotations for UniProt sequences to indicate the locations of structural domains (structural annotations) and their 3D structures (structural models). Structural annotations and 3D model predictions are currently available for three model genomes (Homo sapiens, E. coli and baker's yeast), and the project will extend to other genomes in the near future. As these resources exploit different strategies for predicting structures, the main aim of Genome3D is to enable comparisons between all the resources so that biologists can see where predictions agree and are therefore more trusted. Furthermore, as these methods differ in whether they build their predictions using CATH or SCOP, Genome3D also contains the first official mapping between these two databases. This has identified pairs of similar superfamilies from the two resources at various degrees of consensus (532 bronze pairs, 527 silver pairs and 370 gold pairs).

Subject(s)

Databases, Protein , Protein Structure, Tertiary , Genomics , Humans , Internet , Molecular Sequence Annotation , Proteins/chemistry , Proteins/classification , Proteins/genetics , Software

6.

Functional significance of mutations in the Snf2 domain of ATRX.

Mitson, Matthew; Kelley, Lawrence A; Sternberg, Michael J E; Higgs, Douglas R; Gibbons, Richard J.

Hum Mol Genet ; 20(13): 2603-10, 2011 Jul 01.

Article in English | MEDLINE | ID: mdl-21505078

ABSTRACT

ATRX is a member of the Snf2 family of chromatin-remodelling proteins and is mutated in an X-linked mental retardation syndrome associated with alpha-thalassaemia (ATR-X syndrome). We have carried out an analysis of 21 disease-causing mutations within the Snf2 domain of ATRX by quantifying the expression of the ATRX protein and placing all missense mutations in their structural context by homology modelling. While demonstrating the importance of protein dosage to the development of ATR-X syndrome, we also identified three mutations which primarily affect function rather than protein structure. We show that all three of these mutant proteins are defective in translocating along DNA while one mutant, uniquely for a human disease-causing mutation, partially uncouples adenosine triphosphate (ATP) hydrolysis from DNA binding. Our results highlight important mechanistic aspects in the development of ATR-X syndrome and identify crucial functional residues within the Snf2 domain of ATRX. These findings are important for furthering our understanding of how ATP hydrolysis is harnessed as useful work in chromatin remodelling proteins and the wider family of nucleic acid translocating motors.

Subject(s)

DNA Helicases/genetics , DNA Helicases/metabolism , Mutation/genetics , Nuclear Proteins/genetics , Nuclear Proteins/metabolism , Ubiquitin-Protein Ligases/genetics , Amino Acid Sequence , Animals , Cell Line , DNA Helicases/chemistry , Enzyme Activation/physiology , Humans , Insecta , Mental Retardation, X-Linked/enzymology , Mental Retardation, X-Linked/genetics , Models, Molecular , Molecular Sequence Data , Nuclear Proteins/chemistry , Protein Conformation , Protein Stability , Sequence Alignment , Translocation, Genetic/genetics , Ubiquitin-Protein Ligases/chemistry , X-linked Nuclear Protein , alpha-Thalassemia/enzymology , alpha-Thalassemia/genetics

7.

High-quality protein backbone reconstruction from alpha carbons using Gaussian mixture models.

Moore, Benjamin L; Kelley, Lawrence A; Barber, James; Murray, James W; MacDonald, James T.

J Comput Chem ; 34(22): 1881-9, 2013 Aug 15.

Article in English | MEDLINE | ID: mdl-23703289

ABSTRACT

Coarse-grained protein structure models offer increased efficiency in structural modeling, but these must be coupled with fast and accurate methods to revert to a full-atom structure. Here, we present a novel algorithm to reconstruct mainchain models from C traces. This has been parameterized by fitting Gaussian mixture models (GMMs) to short backbone fragments centered on idealized peptide bonds. The method we have developed is statistically significantly more accurate than several competing methods, both in terms of RMSD values and dihedral angle differences. The method produced Ramachandran dihedral angle distributions that are closer to that observed in real proteins and better Phaser molecular replacement log-likelihood gains. Amino acid residue sidechain reconstruction accuracy using SCWRL4 was found to be statistically significantly correlated to backbone reconstruction accuracy. Finally, the PD2 method was found to produce significantly lower energy full-atom models using Rosetta which has implications for multiscale protein modeling using coarse-grained models. A webserver and C++ source code is freely available for noncommercial use from: http://www.sbg.bio.ic.ac.uk/phyre2/PD2_ca2main/.

Subject(s)

Algorithms , Carbon/chemistry , Molecular Dynamics Simulation , Proteins/chemistry , Software , Protein Conformation

8.

3DLigandSite: predicting ligand-binding sites using similar structures.

Wass, Mark N; Kelley, Lawrence A; Sternberg, Michael J E.

Nucleic Acids Res ; 38(Web Server issue): W469-73, 2010 Jul.

Article in English | MEDLINE | ID: mdl-20513649

ABSTRACT

3DLigandSite is a web server for the prediction of ligand-binding sites. It is based upon successful manual methods used in the eighth round of the Critical Assessment of techniques for protein Structure Prediction (CASP8). 3DLigandSite utilizes protein-structure prediction to provide structural models for proteins that have not been solved. Ligands bound to structures similar to the query are superimposed onto the model and used to predict the binding site. In benchmarking against the CASP8 targets 3DLigandSite obtains a Matthew's correlation co-efficient (MCC) of 0.64, and coverage and accuracy of 71 and 60%, respectively, similar results to our manual performance in CASP8. In further benchmarking using a large set of protein structures, 3DLigandSite obtains an MCC of 0.68. The web server enables users to submit either a query sequence or structure. Predictions are visually displayed via an interactive Jmol applet. 3DLigandSite is available for use at http://www.sbg.bio.ic.ac.uk/3dligandsite.

Subject(s)

Software , Structural Homology, Protein , Algorithms , Binding Sites , Internet , Ligands , Models, Molecular , Reproducibility of Results , Sequence Analysis, Protein , User-Computer Interface

9.

Sequencing delivers diminishing returns for homology detection: implications for mapping the protein universe.

Chubb, Daniel; Jefferys, Benjamin R; Sternberg, Michael J E; Kelley, Lawrence A.

Bioinformatics ; 26(21): 2664-71, 2010 Nov 01.

Article in English | MEDLINE | ID: mdl-20843957

ABSTRACT

MOTIVATION: Databases of sequenced genomes are widely used to characterize the structure, function and evolutionary relationships of proteins. The ability to discern such relationships is widely expected to grow as sequencing projects provide novel information, bridging gaps in our map of the protein universe. RESULTS: We have plotted our progress in protein sequencing over the last two decades and found that the rate of novel sequence discovery is in a sustained period of decline. Consequently, PSI-BLAST, the most widely used method to detect remote evolutionary relationships, which relies upon the accumulation of novel sequence data, is now showing a plateau in performance. We interpret this trend as signalling our approach to a representative map of the protein universe and discuss its implications.

Subject(s)

Proteins/chemistry , Sequence Analysis, Protein/methods , Algorithms , Databases, Protein , Genome , Sequence Alignment , Sequence Homology, Amino Acid

10.

The Genome3D Consortium for Structural Annotations of Selected Model Organisms.

Waman, Vaishali P; Blundell, Tom L; Buchan, Daniel W A; Gough, Julian; Jones, David; Kelley, Lawrence; Murzin, Alexey; Pandurangan, Arun Prasad; Sillitoe, Ian; Sternberg, Michael; Torres, Pedro; Orengo, Christine.

Methods Mol Biol ; 2165: 27-67, 2020.

Article in English | MEDLINE | ID: mdl-32621218

ABSTRACT

Genome3D consortium is a collaborative project involving protein structure prediction and annotation resources developed by six world-leading structural bioinformatics groups, based in the United Kingdom (namely Blundell, Murzin, Gough, Sternberg, Orengo, and Jones). The main objective of Genome3D serves as a common portal to provide both predicted models and annotations of proteins in model organisms, using several resources developed by these labs such as CATH-Gene3D, DOMSERF, pDomTHREADER, PHYRE, SUPERFAMILY, FUGUE/TOCATTA, and VIVACE. These resources primarily use SCOP- and/or CATH-based protein domain assignments. Another objective of Genome3D is to compare structural classifications of protein domains in CATH and SCOP databases and to provide a consensus mapping of CATH and SCOP protein superfamilies. CATH/SCOP mapping analyses led to the identification of total of 1429 consensus superfamilies.Currently, Genome3D provides structural annotations for ten model organisms, including Homo sapiens, Arabidopsis thaliana, Mus musculus, Escherichia coli, Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, Plasmodium falciparum, Staphylococcus aureus, and Schizosaccharomyces pombe. Thus, Genome3D serves as a common gateway to each structure prediction/annotation resource and allows users to perform comparative assessment of the predictions. It, thus, assists researchers to broaden their perspective on structure/function predictions of their query protein of interest in selected model organisms.

Subject(s)

Genomics/organization & administration , Knowledge Bases , Molecular Sequence Annotation/methods , Proteome/chemistry , Animals , Arabidopsis , Genome , Genomics/methods , Humans , Information Dissemination , Sequence Alignment/methods , United Kingdom , Yeasts

11.

PhyreRisk: A Dynamic Web Application to Bridge Genomics, Proteomics and 3D Structural Data to Guide Interpretation of Human Genetic Variants.

Ofoegbu, Tochukwu C; David, Alessia; Kelley, Lawrence A; Mezulis, Stefans; Islam, Suhail A; Mersmann, Sophia F; Strömich, Léonie; Vakser, Ilya A; Houlston, Richard S; Sternberg, Michael J E.

J Mol Biol ; 431(13): 2460-2466, 2019 06 14.

Article in English | MEDLINE | ID: mdl-31075275

ABSTRACT

PhyreRisk is an open-access, publicly accessible web application for interactively bridging genomic, proteomic and structural data facilitating the mapping of human variants onto protein structures. A major advance over other tools for sequence-structure variant mapping is that PhyreRisk provides information on 20,214 human canonical proteins and an additional 22,271 alternative protein sequences (isoforms). Specifically, PhyreRisk provides structural coverage (partial or complete) for 70% (14,035 of 20,214 canonical proteins) of the human proteome, by storing 18,874 experimental structures and 84,818 pre-built models of canonical proteins and their isoforms generated using our in house Phyre2. PhyreRisk reports 55,732 experimentally, multi-validated protein interactions from IntAct and 24,260 experimental structures of protein complexes. Another major feature of PhyreRisk is that, rather than presenting a limited set of precomputed variant-structure mapping of known genetic variants, it allows the user to explore novel variants using, as input, genomic coordinates formats (Ensembl, VCF, reference SNP ID and HGVS notations) and Human Build GRCh37 and GRCh38. PhyreRisk also supports mapping variants using amino acid coordinates and searching for genes or proteins of interest. PhyreRisk is designed to empower researchers to translate genetic data into protein structural information, thereby providing a more comprehensive appreciation of the functional impact of variants. PhyreRisk is freely available at http://phyrerisk.bc.ic.ac.uk.

Subject(s)

Computational Biology/methods , Genetic Variation , Proteins/chemistry , Genomics , Humans , Protein Conformation , Proteins/genetics , Proteins/metabolism , Proteomics , Software

12.

Exploring the extremes of sequence/structure space with ensemble fold recognition in the program Phyre.

Bennett-Lovsey, Riccardo M; Herbert, Alex D; Sternberg, Michael J E; Kelley, Lawrence A.

Proteins ; 70(3): 611-25, 2008 Feb 15.

Article in English | MEDLINE | ID: mdl-17876813

ABSTRACT

Structural and functional annotation of the large and growing database of genomic sequences is a major problem in modern biology. Protein structure prediction by detecting remote homology to known structures is a well-established and successful annotation technique. However, the broad spectrum of evolutionary change that accompanies the divergence of close homologues to become remote homologues cannot easily be captured with a single algorithm. Recent advances to tackle this problem have involved the use of multiple predictive algorithms available on the Internet. Here we demonstrate how such ensembles of predictors can be designed in-house under controlled conditions and permit significant improvements in recognition by using a concept taken from protein loop energetics and applying it to the general problem of 3D clustering. We have developed a stringent test that simulates the situation where a protein sequence of interest is submitted to multiple different algorithms and not one of these algorithms can make a confident (95%) correct assignment. A method of meta-server prediction (Phyre) that exploits the benefits of a controlled environment for the component methods was implemented. At 95% precision or higher, Phyre identified 64.0% of all correct homologous query-template relationships, and 84.0% of the individual test query proteins could be accurately annotated. In comparison to the improvement that the single best fold recognition algorithm (according to training) has over PSI-Blast, this represents a 29.6% increase in the number of correct homologous query-template relationships, and a 46.2% increase in the number of accurately annotated queries. It has been well recognised in fold prediction, other bioinformatics applications, and in many other areas, that ensemble predictions generally are superior in accuracy to any of the component individual methods. However there is a paucity of information as to why the ensemble methods are superior and indeed this has never been systematically addressed in fold recognition. Here we show that the source of ensemble power stems from noise reduction in filtering out false positive matches. The results indicate greater coverage of sequence space and improved model quality, which can consequently lead to a reduction in the experimental workload of structural genomics initiatives.

Subject(s)

Algorithms , Protein Conformation , Sequence Analysis, Protein , Software , Animals , Databases, Protein , Humans , Protein Folding , Proteins/chemistry , Proteins/metabolism , Sequence Alignment

13.

Identification of the Autochaperone Domain in the Type Va Secretion System (T5aSS): Prevalent Feature of Autotransporters with a ß-Helical Passenger.

Rojas-Lopez, Maricarmen; Zorgani, Mohamed A; Kelley, Lawrence A; Bailly, Xavier; Kajava, Andrey V; Henderson, Ian R; Polticelli, Fabio; Pizza, Mariagrazia; Rosini, Roberto; Desvaux, Mickaël.

Front Microbiol ; 8: 2607, 2017.

Article in English | MEDLINE | ID: mdl-29375499

ABSTRACT

Autotransporters (ATs) belong to a family of modular proteins secreted by the Type V, subtype a, secretion system (T5aSS) and considered as an important source of virulence factors in lipopolysaccharidic diderm bacteria (archetypical Gram-negative bacteria). While exported by the Sec pathway, the ATs are further secreted across the outer membrane via their own C-terminal translocator forming a ß-barrel, through which the rest of the protein, namely the passenger, can pass. In several ATs, an autochaperone domain (AC) present at the C-terminal region of the passenger and upstream of the translocator was demonstrated as strictly required for proper secretion and folding. However, considering it was functionally characterised and identified only in a handful of ATs, wariness recently fells on the commonality and conservation of this structural element in the T5aSS. To circumvent the issue of sequence divergence and taking advantage of the resolved three-dimensional structure of some ACs, identification of this domain was performed following structural alignment among all AT passengers experimentally resolved by crystallography before searching in a dataset of 1523 ATs. While demonstrating that the AC is indeed a conserved structure found in numerous ATs, phylogenetic analysis further revealed a distribution into deeply rooted branches, from which emerge 20 main clusters. Sequence analysis revealed that an AC could be identified in the large majority of SAATs (self-associating ATs) but not in any LEATs (lipase/esterase ATs) nor in some PATs (protease autotransporters) and PHATs (phosphatase/hydrolase ATs). Structural analysis indicated that an AC was present in passengers exhibiting single-stranded right-handed parallel ß-helix, whatever the type of ß-solenoid, but not with α-helical globular fold. From this investigation, the AC of type 1 appears as a prevalent and conserved structural element exclusively associated to ß-helical AT passenger and should promote further studies about the protein secretion and folding via the T5aSS, especially toward α-helical AT passengers.

14.

PhyreStorm: A Web Server for Fast Structural Searches Against the PDB.

Mezulis, Stefans; Sternberg, Michael J E; Kelley, Lawrence A.

J Mol Biol ; 428(4): 702-708, 2016 Feb 22.

Article in English | MEDLINE | ID: mdl-26517951

ABSTRACT

The identification of structurally similar proteins can provide a range of biological insights, and accordingly, the alignment of a query protein to a database of experimentally determined protein structures is a technique commonly used in the fields of structural and evolutionary biology. The PhyreStorm Web server has been designed to provide comprehensive, up-to-date and rapid structural comparisons against the Protein Data Bank (PDB) combined with a rich and intuitive user interface. It is intended that this facility will enable biologists inexpert in bioinformatics access to a powerful tool for exploring protein structure relationships beyond what can be achieved by sequence analysis alone. By partitioning the PDB into similar structures, PhyreStorm is able to quickly discard the majority of structures that cannot possibly align well to a query protein, reducing the number of alignments required by an order of magnitude. PhyreStorm is capable of finding 93±2% of all highly similar (TM-score>0.7) structures in the PDB for each query structure, usually in less than 60s. PhyreStorm is available at http://www.sbg.bio.ic.ac.uk/phyrestorm/.

Subject(s)

Computational Biology/methods , Databases, Protein , Protein Conformation , Proteins/chemistry , Internet

15.

The evolution of biology. A shift towards the engineering of prediction-generating tools and away from traditional research practice.

Kelley, Lawrence; Scott, Michael.

EMBO Rep ; 9(12): 1163-7, 2008 Dec.

Article in English | MEDLINE | ID: mdl-19008917

Subject(s)

Biology/methods , Biology/trends , Computational Biology/trends , Research Design , Research/trends , Animals , Proteins/chemistry , Proteins/metabolism , Proteomics

16.

Partial protein domains: evolutionary insights and bioinformatics challenges.

Kelley, Lawrence A; Sternberg, Michael J E.

Genome Biol ; 16: 100, 2015 May 19.

Article in English | MEDLINE | ID: mdl-25986583

ABSTRACT

Protein domains are generally thought to correspond to units of evolution. New research raises questions about how such domains are defined with bioinformatics tools and sheds light on how evolution has enabled partial domains to be viable.

Subject(s)

Bacterial Proteins/genetics , Gene Deletion , Genes, Bacterial , Luciferases/genetics , Molecular Sequence Annotation , Oxidoreductases/genetics , Protein Structure, Tertiary , Proteins/chemistry , Sequence Alignment , Animals , Humans

17.

The Phyre2 web portal for protein modeling, prediction and analysis.

Kelley, Lawrence A; Mezulis, Stefans; Yates, Christopher M; Wass, Mark N; Sternberg, Michael J E.

Nat Protoc ; 10(6): 845-58, 2015 Jun.

Article in English | MEDLINE | ID: mdl-25950237

ABSTRACT

Phyre2 is a suite of tools available on the web to predict and analyze protein structure, function and mutations. The focus of Phyre2 is to provide biologists with a simple and intuitive interface to state-of-the-art protein bioinformatics tools. Phyre2 replaces Phyre, the original version of the server for which we previously published a paper in Nature Protocols. In this updated protocol, we describe Phyre2, which uses advanced remote homology detection methods to build 3D models, predict ligand binding sites and analyze the effect of amino acid variants (e.g., nonsynonymous SNPs (nsSNPs)) for a user's protein sequence. Users are guided through results by a simple interface at a level of detail they determine. This protocol will guide users from submitting a protein sequence to interpreting the secondary and tertiary structure of their models, their domain composition and model quality. A range of additional available tools is described to find a protein structure in a genome, to submit large number of sequences at once and to automatically run weekly searches for proteins that are difficult to model. The server is available at http://www.sbg.bio.ic.ac.uk/phyre2. A typical structure prediction will be returned between 30 min and 2 h after submission.

Subject(s)

Models, Molecular , Protein Conformation , Software , Computational Biology , Internet

18.

SuSPect: enhanced prediction of single amino acid variant (SAV) phenotype using network features.

Yates, Christopher M; Filippis, Ioannis; Kelley, Lawrence A; Sternberg, Michael J E.

J Mol Biol ; 426(14): 2692-701, 2014 Jul 15.

Article in English | MEDLINE | ID: mdl-24810707

ABSTRACT

Whole-genome and exome sequencing studies reveal many genetic variants between individuals, some of which are linked to disease. Many of these variants lead to single amino acid variants (SAVs), and accurate prediction of their phenotypic impact is important. Incorporating sequence conservation and network-level features, we have developed a method, SuSPect (Disease-Susceptibility-based SAV Phenotype Prediction), for predicting how likely SAVs are to be associated with disease. SuSPect performs significantly better than other available batch methods on the VariBench benchmarking dataset, with a balanced accuracy of 82%. SuSPect is available at www.sbg.bio.ic.ac.uk/suspect. The Web site has been implemented in Perl and SQLite and is compatible with modern browsers. An SQLite database of possible missense variants in the human proteome is available to download at www.sbg.bio.ic.ac.uk/suspect/download.html.

Subject(s)

Amino Acid Substitution , Disease Susceptibility , Proteins/chemistry , Software , Child , Child Abuse , Computational Biology/methods , Humans , Models, Molecular , Mutation, Missense , Phenotype , Protein Conformation , Proteins/genetics , Proteins/metabolism

19.

Validating a Coarse-Grained Potential Energy Function through Protein Loop Modelling.

Macdonald, James T; Kelley, Lawrence A; Freemont, Paul S.

PLoS One ; 8(6): e65770, 2013.

Article in English | MEDLINE | ID: mdl-23824634

ABSTRACT

Coarse-grained (CG) methods for sampling protein conformational space have the potential to increase computational efficiency by reducing the degrees of freedom. The gain in computational efficiency of CG methods often comes at the expense of non-protein like local conformational features. This could cause problems when transitioning to full atom models in a hierarchical framework. Here, a CG potential energy function was validated by applying it to the problem of loop prediction. A novel method to sample the conformational space of backbone atoms was benchmarked using a standard test set consisting of 351 distinct loops. This method used a sequence-independent CG potential energy function representing the protein using [Formula: see text]-carbon positions only and sampling conformations with a Monte Carlo simulated annealing based protocol. Backbone atoms were added using a method previously described and then gradient minimised in the Rosetta force field. Despite the CG potential energy function being sequence-independent, the method performed similarly to methods that explicitly use either fragments of known protein backbones with similar sequences or residue-specific [Formula: see text]/[Formula: see text]-maps to restrict the search space. The method was also able to predict with sub-Angstrom accuracy two out of seven loops from recently solved crystal structures of proteins with low sequence and structure similarity to previously deposited structures in the PDB. The ability to sample realistic loop conformations directly from a potential energy function enables the incorporation of additional geometric restraints and the use of more advanced sampling methods in a way that is not possible to do easily with fragment replacement methods and also enable multi-scale simulations for protein design and protein structure prediction. These restraints could be derived from experimental data or could be design restraints in the case of computational protein design. C++ source code is available for download from http://www.sbg.bio.ic.ac.uk/phyre2/PD2/.

Subject(s)

Models, Chemical , Proteins/chemistry , Crystallography, X-Ray , Monte Carlo Method , Protein Conformation

20.

Functional assignment of Mycobacterium tuberculosis proteome revealed by genome-scale fold-recognition.

Mao, Chunhong; Shukla, Maulik; Larrouy-Maumus, Gérald; Dix, Flora L; Kelley, Lawrence A; Sternberg, Michael J; Sobral, Bruno W; de Carvalho, Luiz Pedro S.

Tuberculosis (Edinb) ; 93(1): 40-6, 2013 Jan.

Article in English | MEDLINE | ID: mdl-23287603

ABSTRACT

Hundreds of putative enzymes from Mycobacterium tuberculosis as well as other mycobacteria remain categorized as "conserved hypothetical proteins" or "hypothetical proteins", offering little or no information on their functional role in pathogenic and non-pathogenic processes. In this study we have predicted the fold and 3-D structure of more than 99% of all proteins encoded in the genome of M. tuberculosis H37Rv. Fold-recognition, database search, 3-D modelling was performed using Protein Homology/analogy Recognition Engine V 2.0 (Phyre2). These results are used to tentatively assign potential function for unannotated enzymes and proteins. In summary, fold-recognition and structural homology might be used as a complementary tool in genome annotation efforts and furthermore, it can deliver primary sequence-independent information regarding structure, ligands and even substrate specificity for enzymes that display low primary sequence identity with potential homologues in other species.

Subject(s)

Bacterial Proteins/physiology , Mycobacterium tuberculosis/genetics , Bacterial Proteins/genetics , Computational Biology/methods , Genome, Bacterial , Humans , Models, Molecular , Mycobacterium tuberculosis/enzymology , Protein Folding , Proteome/physiology

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL