Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 19 de 19
Filter
Add more filters










Publication year range
1.
PLoS Comput Biol ; 15(9): e1007276, 2019 09.
Article in English | MEDLINE | ID: mdl-31479437

ABSTRACT

In-silico identification of potential target genes for disease is an essential aspect of drug target discovery. Recent studies suggest that successful targets can be found through by leveraging genetic, genomic and protein interaction information. Here, we systematically tested the ability of 12 varied algorithms, based on network propagation, to identify genes that have been targeted by any drug, on gene-disease data from 22 common non-cancerous diseases in OpenTargets. We considered two biological networks, six performance metrics and compared two types of input gene-disease association scores. The impact of the design factors in performance was quantified through additive explanatory models. Standard cross-validation led to over-optimistic performance estimates due to the presence of protein complexes. In order to obtain realistic estimates, we introduced two novel protein complex-aware cross-validation schemes. When seeding biological networks with known drug targets, machine learning and diffusion-based methods found around 2-4 true targets within the top 20 suggestions. Seeding the networks with genes associated to disease by genetics decreased performance below 1 true hit on average. The use of a larger network, although noisier, improved overall performance. We conclude that diffusion-based prioritisers and machine learning applied to diffusion-based features are suited for drug discovery in practice and improve over simpler neighbour-voting methods. We also demonstrate the large impact of choosing an adequate validation strategy and the definition of seed disease genes.


Subject(s)
Computational Biology/methods , Computer Simulation , Drug Discovery/methods , Algorithms , Benchmarking , Databases, Genetic , Disease/genetics , Humans , Machine Learning
2.
PLoS One ; 9(6): e99030, 2014.
Article in English | MEDLINE | ID: mdl-24918583

ABSTRACT

Prioritising candidate genes for further experimental characterisation is an essential, yet challenging task in biomedical research. One way of achieving this goal is to identify specific biological themes that are enriched within the gene set of interest to obtain insights into the biological phenomena under study. Biological pathway data have been particularly useful in identifying functional associations of genes and/or gene sets. However, biological pathway information as compiled in varied repositories often differs in scope and content, preventing a more effective and comprehensive characterisation of gene sets. Here we describe a new approach to constructing biologically coherent gene sets from pathway data in major public repositories and employing them for functional analysis of large gene sets. We first revealed significant overlaps in gene content between different pathways and then defined a clustering method based on the shared gene content and the similarity of gene overlap patterns. We established the biological relevance of the constructed pathway clusters using independent quantitative measures and we finally demonstrated the effectiveness of the constructed pathway clusters in comparative functional enrichment analysis of gene sets associated with diverse human diseases gathered from the literature. The pathway clusters and gene mappings have been integrated into the TargetMine data warehouse and are likely to provide a concise, manageable and biologically relevant means of functional analysis of gene sets and to facilitate candidate gene prioritisation.


Subject(s)
Biomedical Research , Cluster Analysis
3.
Methods Mol Biol ; 1140: 35-51, 2014.
Article in English | MEDLINE | ID: mdl-24590707

ABSTRACT

This chapter describes the protocols used to identify, filter, and annotate potential protein targets from an organism associated with infectious diseases. Protocols often combine computational approaches for mining information in public databases or for checking whether the protein has already been targeted for structure determination, with manual strategies that examine the literature for information on the biological role of the protein or the experimental strategies that explore the effects of knocking out the protein. Publicly available computational tools have been cited as much as possible. Where these do not exist, the concepts underlying in-house tools developed for the Center for Structural Genomics of Infectious Diseases have been described.


Subject(s)
Communicable Diseases/genetics , Molecular Biology/methods , Protein Conformation , Communicable Diseases/pathology , Computational Biology , Genomics/methods , Humans
4.
Nucleic Acids Res ; 42(Database issue): D240-5, 2014 Jan.
Article in English | MEDLINE | ID: mdl-24270792

ABSTRACT

Gene3D (http://gene3d.biochem.ucl.ac.uk) is a database of protein domain structure annotations for protein sequences. Domains are predicted using a library of profile HMMs from 2738 CATH superfamilies. Gene3D assigns domain annotations to Ensembl and UniProt sequence sets including >6000 cellular genomes and >20 million unique protein sequences. This represents an increase of 45% in the number of protein sequences since our last publication. Thanks to improvements in the underlying data and pipeline, we see large increases in the domain coverage of sequences. We have expanded this coverage by integrating Pfam and SUPERFAMILY domain annotations, and we now resolve domain overlaps to provide highly comprehensive composite multi-domain architectures. To make these data more accessible for comparative genome analyses, we have developed novel search algorithms for searching genomes to identify related multi-domain architectures. In addition to providing domain family annotations, we have now developed a pipeline for 3D homology modelling of domains in Gene3D. This has been applied to the human genome and will be rolled out to other major organisms over the next year.


Subject(s)
Databases, Protein , Molecular Sequence Annotation , Protein Structure, Tertiary , Genome , Genomics , Internet , Models, Molecular , Protein Structure, Tertiary/genetics , Sequence Analysis, Protein
5.
PLoS One ; 8(3): e60038, 2013.
Article in English | MEDLINE | ID: mdl-23555875

ABSTRACT

5,6-Dimethylxanthenone-4-acetic acid (DMXAA), a potent type I interferon (IFN) inducer, was evaluated as a chemotherapeutic agent in mouse cancer models and proved to be well tolerated in human cancer clinical trials. Despite its multiple biological functions, DMXAA has not been fully characterized for the potential application as a vaccine adjuvant. In this report, we show that DMXAA does act as an adjuvant due to its unique property as a soluble innate immune activator. Using OVA as a model antigen, DMXAA was demonstrated to improve on the antigen specific immune responses and induce a preferential Th2 (Type-2) response. The adjuvant effect was directly dependent on the IRF3-mediated production of type-I-interferon, but not IL-33. DMXAA could also enhance the immunogenicity of influenza split vaccine which led to significant increase in protective responses against live influenza virus challenge in mice compared to split vaccine alone. We propose that DMXAA can be used as an adjuvant that targets a specific innate immune signaling pathway via IRF3 for potential applications including vaccines against influenza which requires a high safety profile.


Subject(s)
Adjuvants, Immunologic/therapeutic use , Interferon Regulatory Factor-3/metabolism , Xanthones/therapeutic use , Animals , Enzyme-Linked Immunosorbent Assay , Female , Immunity, Innate/drug effects , Interferon Regulatory Factor-3/genetics , Interleukin-33 , Interleukins/genetics , Interleukins/metabolism , Male , Mice , Mice, Knockout
6.
Biochim Biophys Acta ; 1834(5): 874-89, 2013 May.
Article in English | MEDLINE | ID: mdl-23499848

ABSTRACT

We present, to our knowledge, the first quantitative analysis of functional site diversity in homologous domain superfamilies. Different types of functional sites are considered separately. Our results show that most diverse superfamilies are very plastic in terms of the spatial location of their functional sites. This is especially true for protein-protein interfaces. In contrast, we confirm that catalytic sites typically occupy only a very small number of topological locations. Small-ligand binding sites are more diverse than expected, although in a more limited manner than protein-protein interfaces. In spite of the observed diversity, our results also confirm the previously reported preferential location of functional sites. We identify a subset of homologous domain superfamilies where diversity is particularly extreme, and discuss possible reasons for such plasticity, i.e. structural diversity. Our results do not contradict previous reports of preferential co-location of sites among homologues, but rather point at the importance of not ignoring other sites, especially in large and diverse superfamilies. Data on sites exploited by different relatives, within each well annotated domain superfamily, has been made accessible from the CATH website in order to highlight versatile superfamilies or superfamilies with highly preferential sites. This information is valuable for system biology and knowledge of any constraints on protein interactions could help in understanding the dynamic control of networks in which these proteins participate. The novelty of our work lies in the comprehensive nature of the analysis - we have used a significantly larger dataset than previous studies - and the fact that in many superfamilies we show that different parts of the domain surface are exploited by different relatives for ligand/protein interactions, particularly in superfamilies which are diverse in sequence and structure, an observation not previously reported on such a large scale. This article is part of a Special Issue entitled: The emerging dynamic view of proteins: Protein plasticity in allostery, evolution and self-assembly.


Subject(s)
Proteins/physiology , Protein Binding
7.
Nucleic Acids Res ; 41(5): 2832-45, 2013 Mar 01.
Article in English | MEDLINE | ID: mdl-23376926

ABSTRACT

The TATA binding protein (TBP) is an essential transcription initiation factor in Archaea and Eucarya. Bacteria lack TBP, and instead use sigma factors for transcription initiation. TBP has a symmetric structure comprising two repeated TBP domains. Using sequence, structural and phylogenetic analyses, we examine the distribution and evolutionary history of the TBP domain, a member of the helix-grip fold family. Our analyses reveal a broader distribution than for TBP, with TBP-domains being present across all three domains of life. In contrast to TBP, all other characterized examples of the TBP domain are present as single copies, primarily within multidomain proteins. The presence of the TBP domain in the ubiquitous DNA glycosylases suggests that this fold traces back to the ancestor of all three domains of life. The TBP domain is also found in RNase HIII, and phylogenetic analyses show that RNase HIII has evolved from bacterial RNase HII via TBP-domain fusion. Finally, our comparative genomic screens confirm and extend earlier reports of proteins consisting of a single TBP domain among some Archaea. These monopartite TBP-domain proteins suggest that this domain is functional in its own right, and that the TBP domain could have first evolved as an independent protein, which was later recruited in different contexts.


Subject(s)
Bacterial Proteins/genetics , DNA Glycosylases/genetics , Ribonucleases/genetics , TATA-Box Binding Protein/genetics , Animals , Archaeal Proteins/chemistry , Archaeal Proteins/genetics , Bacterial Proteins/chemistry , Cluster Analysis , DNA Glycosylases/chemistry , DNA-Binding Proteins/chemistry , DNA-Binding Proteins/genetics , Evolution, Molecular , Humans , Models, Genetic , Models, Molecular , Phylogeny , Protein Binding , Protein Structure, Secondary , Protein Structure, Tertiary/genetics , Ribonucleases/chemistry , Sequence Homology, Amino Acid , Structural Homology, Protein , TATA-Box Binding Protein/chemistry
8.
Biochem J ; 449(3): 581-94, 2013 Feb 01.
Article in English | MEDLINE | ID: mdl-23301657

ABSTRACT

The present review focuses on the evolution of proteins and the impact of amino acid mutations on function from a structural perspective. Proteins evolve under the law of natural selection and undergo alternating periods of conservative evolution and of relatively rapid change. The likelihood of mutations being fixed in the genome depends on various factors, such as the fitness of the phenotype or the position of the residues in the three-dimensional structure. For example, co-evolution of residues located close together in three-dimensional space can occur to preserve global stability. Whereas point mutations can fine-tune the protein function, residue insertions and deletions ('decorations' at the structural level) can sometimes modify functional sites and protein interactions more dramatically. We discuss recent developments and tools to identify such episodic mutations, and examine their applications in medical research. Such tools have been tested on simulated data and applied to real data such as viruses or animal sequences. Traditionally, there has been little if any cross-talk between the fields of protein biophysics, protein structure-function and molecular evolution. However, the last several years have seen some exciting developments in combining these approaches to obtain an in-depth understanding of how proteins evolve. For example, a better understanding of how structural constraints affect protein evolution will greatly help us to optimize our models of sequence evolution. The present review explores this new synthesis of perspectives.


Subject(s)
Evolution, Molecular , Mutation , Proteins/chemistry , Proteins/genetics , Adaptation, Physiological/genetics , Amino Acid Substitution , Animals , Binding Sites/genetics , Databases, Genetic , Female , Genetic Diseases, Inborn/genetics , Humans , INDEL Mutation , Infections/genetics , Male , Models, Genetic , Models, Molecular , Neoplasms/genetics , Protein Interaction Domains and Motifs/genetics , Protein Stability , Proteins/metabolism
9.
Nucleic Acids Res ; 41(Database issue): D490-8, 2013 Jan.
Article in English | MEDLINE | ID: mdl-23203873

ABSTRACT

CATH version 3.5 (Class, Architecture, Topology, Homology, available at http://www.cathdb.info/) contains 173 536 domains, 2626 homologous superfamilies and 1313 fold groups. When focusing on structural genomics (SG) structures, we observe that the number of new folds for CATH v3.5 is slightly less than for previous releases, and this observation suggests that we may now know the majority of folds that are easily accessible to structure determination. We have improved the accuracy of our functional family (FunFams) sub-classification method and the CATH sequence domain search facility has been extended to provide FunFam annotations for each domain. The CATH website has been redesigned. We have improved the display of functional data and of conserved sequence features associated with FunFams within each CATH superfamily.


Subject(s)
Databases, Protein , Protein Structure, Tertiary , Genomics , Internet , Molecular Sequence Annotation , Protein Folding , Proteins/chemistry , Proteins/classification , Proteins/genetics , Sequence Alignment , Sequence Analysis, Protein , Structural Homology, Protein
10.
Nucleic Acids Res ; 40(Database issue): D465-71, 2012 Jan.
Article in English | MEDLINE | ID: mdl-22139938

ABSTRACT

Gene3D http://gene3d.biochem.ucl.ac.uk is a comprehensive database of protein domain assignments for sequences from the major sequence databases. Domains are directly mapped from structures in the CATH database or predicted using a library of representative profile HMMs derived from CATH superfamilies. As previously described, Gene3D integrates many other protein family and function databases. These facilitate complex associations of molecular function, structure and evolution. Gene3D now includes a domain functional family (FunFam) level below the homologous superfamily level assignments. Additions have also been made to the interaction data. More significantly, to help with the visualization and interpretation of multi-genome scale data sets, we have developed a new, revamped website. Searching has been simplified with more sophisticated filtering of results, along with new tools based on Cytoscape Web, for visualizing protein-protein interaction networks, differences in domain composition between genomes and the taxonomic distribution of individual superfamilies.


Subject(s)
Databases, Protein , Molecular Sequence Annotation , Protein Interaction Maps , Protein Structure, Tertiary , Genomics , Proteins/chemistry , Proteins/classification , Proteins/genetics
11.
J Biol Chem ; 286(8): 6685-96, 2011 Feb 25.
Article in English | MEDLINE | ID: mdl-21156796

ABSTRACT

The inhibitory T-cell surface-expressed receptor, cytotoxic T lymphocyte-associated antigen-4 (CTLA-4), which belongs to the class of cell surface proteins phosphorylated by extrinsic tyrosine kinases that also includes antigen receptors, binds the related ligands, B7-1 and B7-2, expressed on antigen-presenting cells. Conformational changes are commonly invoked to explain ligand-induced "triggering" of this class of receptors. Crystal structures of ligand-bound CTLA-4 have been reported, but not the apo form, precluding analysis of the structural changes accompanying ligand binding. The 1.8-Å resolution structure of an apo human CTLA-4 homodimer emphasizes the shared evolutionary history of the CTLA-4/CD28 subgroup of the immunoglobulin superfamily and the antigen receptors. The ligand-bound and unbound forms of both CTLA-4 and B7-1 are remarkably similar, in marked contrast to B7-2, whose binding to CTLA-4 has elements of induced fit. Isothermal titration calorimetry reveals that ligand binding by CTLA-4 is enthalpically driven and accompanied by unfavorable entropic changes. The similarity of the thermodynamic parameters determined for the interactions of CTLA-4 with B7-1 and B7-2 suggests that the binding is not highly specific, but the conformational changes observed for B7-2 binding suggest some level of selectivity. The new structure establishes that rigid-body ligand interactions are capable of triggering CTLA-4 phosphorylation by extrinsic kinase(s).


Subject(s)
Antigens, CD/chemistry , B7-1 Antigen/chemistry , B7-2 Antigen/chemistry , Receptors, Antigen, T-Cell/chemistry , Animals , Antigens, CD/genetics , Antigens, CD/immunology , B7-1 Antigen/genetics , B7-1 Antigen/immunology , B7-2 Antigen/genetics , B7-2 Antigen/immunology , Binding Sites , CHO Cells , CTLA-4 Antigen , Cricetinae , Cricetulus , Crystallography, X-Ray , Humans , Protein Structure, Tertiary , Receptors, Antigen, T-Cell/genetics , Receptors, Antigen, T-Cell/immunology , Thermodynamics
12.
Structure ; 18(11): 1522-35, 2010 Nov 10.
Article in English | MEDLINE | ID: mdl-21070951

ABSTRACT

Some superfamilies contain large numbers of protein domains with very different functions. The ability to refine the functional classification of domains within these superfamilies is necessary for better understanding the evolution of functions and to guide function prediction of new relatives. To achieve this, a suitable starting point is the detailed analysis of functional divisions and mechanisms of functional divergence in a single superfamily. Here, we present such a detailed analysis in the superfamily of HUP domains. A biologically meaningful functional classification of HUP domains is obtained manually. Mechanisms of function diversification are investigated in detail using this classification. We observe that structural motifs play an important role in shaping broad functional divergence, whereas residue-level changes shape diversity at a more specific level. In parallel we examine the ability of an automated protocol to capture the biologically meaningful classification, with a view to automatically extending this classification in the future.


Subject(s)
Evolution, Molecular , Models, Molecular , Molecular Dynamics Simulation , Protein Conformation , Protein Structure, Tertiary/genetics , Protein Subunits/chemistry , Protein Subunits/classification , Cluster Analysis , Databases, Protein , Protein Subunits/genetics
13.
Structure ; 18(10): 1233-43, 2010 Oct 13.
Article in English | MEDLINE | ID: mdl-20947012

ABSTRACT

Transient interactions, which involve protein interactions that are formed and broken easily, are important in many aspects of cellular function. Here we describe structural and functional properties of transient interactions between globular domains and between globular domains, short peptides, and disordered regions. The importance of posttranslational modifications in transient interactions is also considered. We review techniques used in the detection of the different types of transient protein-protein interactions. We also look at the role of transient interactions within protein-protein interaction networks and consider their contribution to different aspects of these networks.


Subject(s)
Protein Interaction Domains and Motifs , Protein Interaction Mapping/methods , Proteins/chemistry , Binding Sites , Computational Biology/methods , Models, Molecular , Protein Binding , Protein Conformation , Protein Processing, Post-Translational , Proteins/metabolism
14.
PLoS Comput Biol ; 5(8): e1000485, 2009 Aug.
Article in English | MEDLINE | ID: mdl-19714201

ABSTRACT

Predicting protein function from structure remains an active area of interest, particularly for the structural genomics initiatives where a substantial number of structures are initially solved with little or no functional characterisation. Although global structure comparison methods can be used to transfer functional annotations, the relationship between fold and function is complex, particularly in functionally diverse superfamilies that have evolved through different secondary structure embellishments to a common structural core. The majority of prediction algorithms employ local templates built on known or predicted functional residues. Here, we present a novel method (FLORA) that automatically generates structural motifs associated with different functional sub-families (FSGs) within functionally diverse domain superfamilies. Templates are created purely on the basis of their specificity for a given FSG, and the method makes no prior prediction of functional sites, nor assumes specific physico-chemical properties of residues. FLORA is able to accurately discriminate between homologous domains with different functions and substantially outperforms (a 2-3 fold increase in coverage at low error rates) popular structure comparison methods and a leading function prediction method. We benchmark FLORA on a large data set of enzyme superfamilies from all three major protein classes (alpha, beta, alphabeta) and demonstrate the functional relevance of the motifs it identifies. We also provide novel predictions of enzymatic activity for a large number of structures solved by the Protein Structure Initiative. Overall, we show that FLORA is able to effectively detect functionally similar protein domain structures by purely using patterns of structural conservation of all residues.


Subject(s)
Computational Biology/methods , Proteins/chemistry , Algorithms , Amino Acid Motifs , Area Under Curve , Data Interpretation, Statistical , Databases, Protein , Genomics , Multigene Family , Protein Conformation , Protein Structure, Tertiary , Proteomics/methods , ROC Curve , Reproducibility of Results
15.
Biochem Soc Trans ; 37(Pt 4): 745-50, 2009 Aug.
Article in English | MEDLINE | ID: mdl-19614587

ABSTRACT

The study of superfamilies of protein domains using a combination of structure, sequence and function data provides insights into deep evolutionary history. In the present paper, analyses of functional diversity within such superfamilies as defined in the CATH-Gene3D resource are described. These analyses focus on structure-function relationships in very large and diverse superfamilies, and on the evolution of domain superfamily members in protein-protein complexes.


Subject(s)
Evolution, Molecular , Proteins/chemistry , Proteins/metabolism , Animals , Databases, Protein , Humans , Protein Structure, Secondary , Protein Structure, Tertiary , Proteins/classification , Structure-Activity Relationship
16.
Structure ; 17(6): 869-81, 2009 Jun 10.
Article in English | MEDLINE | ID: mdl-19523904

ABSTRACT

One major objective of structural genomics efforts, including the NIH-funded Protein Structure Initiative (PSI), has been to increase the structural coverage of protein sequence space. Here, we present the target selection strategy used during the second phase of PSI (PSI-2). This strategy, jointly devised by the bioinformatics groups associated with the PSI-2 large-scale production centers, targets representatives from large, structurally uncharacterized protein domain families, and from structurally uncharacterized subfamilies in very large and diverse families with incomplete structural coverage. These very large families are extremely diverse both structurally and functionally, and are highly overrepresented in known proteomes. On the basis of several metrics, we then discuss to what extent PSI-2, during its first 3 years, has increased the structural coverage of genomes, and contributed structural and functional novelty. Together, the results presented here suggest that PSI-2 is successfully meeting its objectives and provides useful insights into structural and functional space.


Subject(s)
Genomics/methods , Proteins/chemistry , Proteins/metabolism , Proteomics/methods , Animals , Computational Biology , Humans , Multigene Family , Protein Conformation , Protein Structure, Tertiary/genetics , Proteins/genetics , Sequence Analysis, Protein
17.
Curr Opin Struct Biol ; 19(3): 349-56, 2009 Jun.
Article in English | MEDLINE | ID: mdl-19398323

ABSTRACT

The ability to assign function to proteins has become a major bottleneck for comprehensively understanding cellular mechanisms at the molecular level. Here we discuss the extent to which structural domain classifications can help in deciphering the complex relationship between the functions of proteins and their sequences and structures. Structural classifications are particularly helpful in understanding the mosaic manner in which new proteins and functions emerge through evolution. This is partly because they provide reliable and concrete domain definitions and enable the detection of very remote structural similarities and homologies. It is also because structural data can illuminate more clearly the mechanisms by which a broader functional repertoire can emerge during evolution.


Subject(s)
Proteins/chemistry , Proteins/metabolism , Animals , Humans , Models, Molecular , Protein Folding , Protein Structure, Tertiary , Proteins/classification
18.
Nucleic Acids Res ; 36(Database issue): D667-73, 2008 Jan.
Article in English | MEDLINE | ID: mdl-17933762

ABSTRACT

Better characterization of binding sites in proteins and the ability to accurately predict their location and energetic properties are major challenges which, if addressed, would have many valuable practical applications. Unfortunately, reliable benchmark datasets of binding sites in proteins are still sorely lacking. Here, we present LigASite ('LIGand Attachment SITE'), a gold-standard dataset of binding sites in 550 proteins of known structures. LigASite consists exclusively of biologically relevant binding sites in proteins for which at least one apo- and one holo-structure are available. In defining the binding sites for each protein, information from all holo-structures is combined, considering in each case the quaternary structure defined by the PQS server. LigASite is built using simple criteria and is automatically updated as new structures become available in the PDB, thereby guaranteeing optimal data coverage over time. Both a redundant and a culled non-redundant version of the dataset is available at http://www.scmbb.ulb.ac.be/Users/benoit/LigASite. The website interface allows users to search the dataset by PDB identifiers, ligand identifiers, protein names or sequence, and to look for structural matches as defined by the CATH homologous superfamilies. The datasets can be downloaded from the website as Schema-validated XML files or comma-separated flat files.


Subject(s)
Apoproteins/chemistry , Databases, Protein , Ligands , Apoproteins/metabolism , Binding Sites , Databases, Protein/statistics & numerical data , Internet , Protein Conformation , User-Computer Interface
19.
BMC Bioinformatics ; 8: 141, 2007 Apr 30.
Article in English | MEDLINE | ID: mdl-17470296

ABSTRACT

BACKGROUND: Most methods for predicting functional sites in protein 3D structures, rely on information on related proteins and cannot be applied to proteins with no known relatives. Another limitation of these methods is the lack of a well annotated set of functional sites to use as benchmark for validating their predictions. Experimental findings and theoretical considerations suggest that residues involved in function often contribute unfavorably to the native state stability. We examine the possibility of systematically exploiting this intrinsic property to identify functional sites using an original procedure that detects destabilizing regions in protein structures. In addition, to relate destabilizing regions to known functional sites, a novel benchmark consisting of a diverse set of hand-curated protein functional sites is derived. RESULTS: A procedure for detecting clusters of destabilizing residues in protein structures is presented. Individual residue contributions to protein stability are evaluated using detailed atomic models and a force-field successfully applied in computational protein design. The most destabilizing residues, and some of their closest neighbours, are clustered into destabilizing regions following a rigorous protocol. Our procedure is applied to high quality apo-structures of 63 unrelated proteins. The biologically relevant binding sites of these proteins were annotated using all available information, including structural data and literature curation, resulting in the largest hand-curated dataset of binding sites in proteins available to date. Comparing the destabilizing regions with the annotated binding sites in these proteins, we find that the overlap is on average limited, but significantly better than random. Results depend on the type of bound ligand. Significant overlap is obtained for most polysaccharide- and small ligand-binding sites, whereas no overlap is observed for most nucleic acid binding sites. These differences are rationalised in terms of the geometry and energetics of the binding site. CONCLUSION: We find that although destabilizing regions as detected here can in general not be used to predict binding sites in protein structures, they can provide useful information, particularly on the location of functional sites that bind polysaccharides and small ligands. This information can be exploited in methods for predicting function in protein structures with no known relatives. Our publicly available benchmark of hand-curated functional sites in proteins should help other workers derive and validate new prediction methods.


Subject(s)
Algorithms , Models, Chemical , Models, Molecular , Proteins/chemistry , Proteins/ultrastructure , Sequence Alignment/methods , Sequence Analysis, Protein/methods , Amino Acid Sequence , Computer Simulation , Molecular Sequence Data , Protein Conformation , Protein Structure, Tertiary , Sequence Homology, Amino Acid , Structure-Activity Relationship
SELECTION OF CITATIONS
SEARCH DETAIL
...