Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
Methods Mol Biol ; 2112: 43-57, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32006277

RESUMO

The functional diversity of proteins is closely related to their differences in sequence and structure. Despite variations in functional sites, global structural similarity is a valuable source of information when assessing potential functional similarities between proteins. The CATH database contains a well-established hierarchical classification of more than 430,000 protein domain structures and nearly 95 million protein domain sequences, with integrated functional annotations for each represented family. The present chapter provides an overview of the main features of CATH with emphasis on exploiting structural similarities to obtain functional information for proteins.


Assuntos
Proteínas/química , Análise de Sequência de Proteína , Homologia de Sequência de Aminoácidos , Homologia Estrutural de Proteína , Bases de Dados de Proteínas , Estrutura Terciária de Proteína
2.
Sci Rep ; 7(1): 10102, 2017 08 31.
Artigo em Inglês | MEDLINE | ID: mdl-28860623

RESUMO

Protein domains mediate drug-protein interactions and this principle can guide the design of multi-target drugs i.e. polypharmacology. In this study, we associate multi-target drugs with CATH functional families through the overrepresentation of targets of those drugs in CATH functional families. Thus, we identify CATH functional families that are currently enriched in drugs (druggable CATH functional families) and we use the network properties of these druggable protein families to analyse their association with drug side effects. Analysis of selected druggable CATH functional families, enriched in drug targets, show that relatives exhibit highly conserved drug binding sites. Furthermore, relatives within druggable CATH functional families occupy central positions in a human protein functional network, cluster together forming network neighbourhoods and are less likely to be within proteins associated with drug side effects. Our results demonstrate that CATH functional families can be used to identify drug-target interactions, opening a new research direction in target identification.


Assuntos
Bases de Dados de Proteínas , Polifarmacologia , Algoritmos , Sítios de Ligação , Descoberta de Drogas/métodos , Humanos , Ligação Proteica , Análise de Sequência de Proteína/métodos
3.
Methods Mol Biol ; 1558: 79-110, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28150234

RESUMO

This chapter describes the generation of the data in the CATH-Gene3D online resource and how it can be used to study protein domains and their evolutionary relationships. Methods will be presented for: comparing protein structures, recognizing homologs, predicting domain structures within protein sequences, and subclassifying superfamilies into functionally pure families, together with a guide on using the webpages.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Proteínas/genética , Proteínas/metabolismo , Software , Algoritmos , Modelos Moleculares , Anotação de Sequência Molecular , Ligação Proteica , Conformação Proteica , Domínios e Motivos de Interação entre Proteínas , Proteínas/química , Proteínas/classificação , Relação Estrutura-Atividade , Navegador , Fluxo de Trabalho
4.
Nucleic Acids Res ; 45(D1): D289-D295, 2017 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-27899584

RESUMO

The latest version of the CATH-Gene3D protein structure classification database has recently been released (version 4.1, http://www.cathdb.info). The resource comprises over 300 000 domain structures and over 53 million protein domains classified into 2737 homologous superfamilies, doubling the number of predicted protein domains in the previous version. The daily-updated CATH-B, which contains our very latest domain assignment data, provides putative classifications for over 100 000 additional protein domains. This article describes developments to the CATH-Gene3D resource over the last two years since the publication in 2015, including: significant increases to our structural and sequence coverage; expansion of the functional families in CATH; building a support vector machine (SVM) to automatically assign domains to superfamilies; improved search facilities to return alignments of query sequences against multiple sequence alignments; the redesign of the web pages and download site.


Assuntos
Biologia Computacional/métodos , Bases de Dados de Proteínas , Modelos Moleculares , Proteínas/química , Proteínas/metabolismo , Software , Relação Estrutura-Atividade , Navegador
6.
PLoS Comput Biol ; 12(6): e1004926, 2016 06.
Artigo em Inglês | MEDLINE | ID: mdl-27332861

RESUMO

Beta-lactamases represent the main bacterial mechanism of resistance to beta-lactam antibiotics and are a significant challenge to modern medicine. We have developed an automated classification and analysis protocol that exploits structure- and sequence-based approaches and which allows us to propose a grouping of serine beta-lactamases that more consistently captures and rationalizes the existing three classification schemes: Classes, (A, C and D, which vary in their implementation of the mechanism of action); Types (that largely reflect evolutionary distance measured by sequence similarity); and Variant groups (which largely correspond with the Bush-Jacoby clinical groups). Our analysis platform exploits a suite of in-house and public tools to identify Functional Determinants (FDs), i.e. residue sites, responsible for conferring different phenotypes between different classes, different types and different variants. We focused on Class A beta-lactamases, the most highly populated and clinically relevant class, to identify FDs implicated in the distinct phenotypes associated with different Class A Types and Variants. We show that our FunFHMMer method can separate the known beta-lactamase classes and identify those positions likely to be responsible for the different implementations of the mechanism of action in these enzymes. Two novel algorithms, ASSP and SSPA, allow detection of FD sites likely to contribute to the broadening of the substrate profiles. Using our approaches, we recognise 151 Class A types in UniProt. Finally, we used our beta-lactamase FunFams and ASSP profiles to detect 4 novel Class A types in microbiome samples. Our platforms have been validated by literature studies, in silico analysis and some targeted experimental verification. Although developed for the serine beta-lactamases they could be used to classify and analyse any diverse protein superfamily where sub-families have diverged over both long and short evolutionary timescales.


Assuntos
Algoritmos , Simulação de Acoplamento Molecular/métodos , Análise de Sequência de Proteína/métodos , Software , beta-Lactamases/química , beta-Lactamases/ultraestrutura , Sítios de Ligação , Simulação por Computador , Farmacorresistência Bacteriana , Ativação Enzimática , Ligação Proteica , Serina , Relação Estrutura-Atividade , Especificidade por Substrato , Resistência beta-Lactâmica , Inibidores de beta-Lactamases/química
7.
Curr Opin Struct Biol ; 38: 44-52, 2016 06.
Artigo em Inglês | MEDLINE | ID: mdl-27309309

RESUMO

Domains are the functional building blocks of proteins. In this work we discuss how domains can contribute to the evolution of new functions. Domains themselves can evolve through various mechanisms, altering their intrinsic function. Domains can also facilitate functional innovations by combining with other domains to make novel proteins. We discuss the mechanisms by which domain and domain combinations support functional innovations. We highlight interesting examples where changes in domain combination promote changes at the domain level.


Assuntos
Proteínas/química , Proteínas/metabolismo , Animais , Humanos , Neoplasias/metabolismo , Domínios Proteicos
8.
J Mol Biol ; 428(2 Pt A): 253-267, 2016 Jan 29.
Artigo em Inglês | MEDLINE | ID: mdl-26585402

RESUMO

Enzymes, as biological catalysts, form the basis of all forms of life. How these proteins have evolved their functions remains a fundamental question in biology. Over 100 years of detailed biochemistry studies, combined with the large volumes of sequence and protein structural data now available, means that we are able to perform large-scale analyses to address this question. Using a range of computational tools and resources, we have compiled information on all experimentally annotated changes in enzyme function within 379 structurally defined protein domain superfamilies, linking the changes observed in functions during evolution to changes in reaction chemistry. Many superfamilies show changes in function at some level, although one function often dominates one superfamily. We use quantitative measures of changes in reaction chemistry to reveal the various types of chemical changes occurring during evolution and to exemplify these by detailed examples. Additionally, we use structural information of the enzymes active site to examine how different superfamilies have changed their catalytic machinery during evolution. Some superfamilies have changed the reactions they perform without changing catalytic machinery. In others, large changes of enzyme function, in terms of both overall chemistry and substrate specificity, have been brought about by significant changes in catalytic machinery. Interestingly, in some superfamilies, relatives perform similar functions but with different catalytic machineries. This analysis highlights characteristics of functional evolution across a wide range of superfamilies, providing insights that will be useful in predicting the function of uncharacterised sequences and the design of new synthetic enzymes.


Assuntos
Evolução Molecular , Hidrolases/genética , Hidrolases/metabolismo , Domínio Catalítico , Biologia Computacional , Hidrolases/química , Modelos Moleculares
9.
Nucleic Acids Res ; 44(D1): D404-9, 2016 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-26578585

RESUMO

Gene3D http://gene3d.biochem.ucl.ac.uk is a database of domain annotations of Ensembl and UniProtKB protein sequences. Domains are predicted using a library of profile HMMs representing 2737 CATH superfamilies. Gene3D has previously featured in the Database issue of NAR and here we report updates to the website and database. The current Gene3D (v14) release has expanded its domain assignments to ∼ 20,000 cellular genomes and over 43 million unique protein sequences, more than doubling the number of protein sequences since our last publication. Amongst other updates, we have improved our Functional Family annotation method. We have also improved the quality and coverage of our 3D homology modelling pipeline of predicted CATH domains. Additionally, the structural models have been expanded to include an extra model organism (Drosophila melanogaster). We also document a number of additional visualization tools in the Gene3D website.


Assuntos
Bases de Dados de Proteínas , Estrutura Terciária de Proteína , Humanos , Internet , Modelos Moleculares , Anotação de Sequência Molecular , Domínios e Motivos de Interação entre Proteínas , Estrutura Terciária de Proteína/genética
10.
Hum Mutat ; 37(4): 364-70, 2016 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-26703369

RESUMO

Inactivating mutations in TSC1 and TSC2 cause tuberous sclerosis complex (TSC). The 2012 international consensus meeting on TSC diagnosis and management agreed that the identification of a pathogenic TSC1 or TSC2 variant establishes a diagnosis of TSC, even in the absence of clinical signs. However, exons 25 and 31 of TSC2 are subject to alternative splicing. No variants causing clinically diagnosed TSC have been reported in these exons, raising the possibility that such variants would not cause TSC. We present truncating and in-frame variants in exons 25 and 31 in three individuals unlikely to fulfil TSC diagnostic criteria and examine the importance of these exons in TSC using different approaches. Amino acid conservation analysis suggests significantly less conservation in these exons compared with the majority of TSC2 exons, and TSC2 expression data demonstrates that the majority of TSC2 transcripts lack exons 25 and/or 31 in many human adult tissues. In vitro assay of both exons shows that neither exon is essential for TSC complex function. Our evidence suggests that variants in TSC2 exons 25 or 31 are very unlikely to cause classical TSC, although a role for these exons in tissue/stage specific development cannot be excluded.


Assuntos
Éxons , Estudos de Associação Genética , Mutação , Esclerose Tuberosa/diagnóstico , Esclerose Tuberosa/genética , Proteínas Supressoras de Tumor/genética , Adulto , Alelos , Processamento Alternativo , Criança , Pré-Escolar , Biologia Computacional/métodos , Bases de Dados Genéticas , Expressão Gênica , Variação Genética , Humanos , Fenótipo , Proteína 2 do Complexo Esclerose Tuberosa
11.
Curr Opin Genet Dev ; 35: 40-9, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26451979

RESUMO

Whilst ∼93% of domain superfamilies appear to be relatively structurally and functionally conserved based on the available data from the CATH-Gene3D domain classification resource, the remainder are much more diverse. In this review, we consider how domains in some of the most ubiquitous and promiscuous superfamilies have evolved, in particular the plasticity in their functional sites and surfaces which expands the repertoire of molecules they interact with and actions performed on them. To what extent can we identify a core function for these superfamilies which would allow us to develop a 'domain grammar of function' whereby a protein's biological role can be proposed from its constituent domains? Clearly the first step is to understand the extent to which these components vary and how changes in their molecular make-up modifies function.


Assuntos
Modelos Moleculares , Estrutura Terciária de Proteína , Proteínas/classificação , Proteínas/metabolismo , Sequência de Aminoácidos
12.
Bioinformatics ; 31(21): 3460-7, 2015 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-26139634

RESUMO

MOTIVATION: Computational approaches that can predict protein functions are essential to bridge the widening function annotation gap especially since <1.0% of all proteins in UniProtKB have been experimentally characterized. We present a domain-based method for protein function classification and prediction of functional sites that exploits functional sub-classification of CATH superfamilies. The superfamilies are sub-classified into functional families (FunFams) using a hierarchical clustering algorithm supervised by a new classification method, FunFHMMer. RESULTS: FunFHMMer generates more functionally coherent groupings of protein sequences than other domain-based protein classifications. This has been validated using known functional information. The conserved positions predicted by the FunFams are also found to be enriched in known functional residues. Moreover, the functional annotations provided by the FunFams are found to be more precise than other domain-based resources. FunFHMMer currently identifies 110,439 FunFams in 2735 superfamilies which can be used to functionally annotate>16 million domain sequences. AVAILABILITY AND IMPLEMENTATION: All FunFam annotation data are made available through the CATH webpages (http://www.cathdb.info). The FunFHMMer webserver (http://www.cathdb.info/search/by_funfhmmer) allows users to submit query sequences for assignment to a CATH FunFam. CONTACT: sayoni.das.12@ucl.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Bases de Dados de Proteínas , Anotação de Sequência Molecular , Estrutura Terciária de Proteína , Proteínas/química , Proteínas/classificação , Sequência de Aminoácidos , Humanos , Dados de Sequência Molecular , Proteínas/genética , Proteínas/metabolismo , Análise de Sequência de Proteína , Homologia de Sequência de Aminoácidos , Homologia Estrutural de Proteína
13.
Nucleic Acids Res ; 43(W1): W148-53, 2015 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-25964299

RESUMO

The widening function annotation gap in protein databases and the increasing number and diversity of the proteins being sequenced presents new challenges to protein function prediction methods. Multidomain proteins complicate the protein sequence-structure-function relationship further as new combinations of domains can expand the functional repertoire, creating new proteins and functions. Here, we present the FunFHMMer web server, which provides Gene Ontology (GO) annotations for query protein sequences based on the functional classification of the domain-based CATH-Gene3D resource. Our server also provides valuable information for the prediction of functional sites. The predictive power of FunFHMMer has been validated on a set of 95 proteins where FunFHMMer performs better than BLAST, Pfam and CDD. Recent validation by an independent international competition ranks FunFHMMer as one of the top function prediction methods in predicting GO annotations for both the Biological Process and Molecular Function Ontology. The FunFHMMer web server is available at http://www.cathdb.info/search/by_funfhmmer.


Assuntos
Anotação de Sequência Molecular , Estrutura Terciária de Proteína , Software , Ontologia Genética , Internet , Proteínas/classificação , Proteínas/genética , Proteínas/fisiologia
14.
Nucleic Acids Res ; 43(Database issue): D376-81, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25348408

RESUMO

The latest version of the CATH-Gene3D protein structure classification database (4.0, http://www.cathdb.info) provides annotations for over 235,000 protein domain structures and includes 25 million domain predictions. This article provides an update on the major developments in the 2 years since the last publication in this journal including: significant improvements to the predictive power of our functional families (FunFams); the release of our 'current' putative domain assignments (CATH-B); a new, strictly non-redundant data set of CATH domains suitable for homology benchmarking experiments (CATH-40) and a number of improvements to the web pages.


Assuntos
Bases de Dados de Proteínas , Anotação de Sequência Molecular , Estrutura Terciária de Proteína , Genômica , Internet , Estrutura Terciária de Proteína/genética , Proteínas/classificação
15.
Nucleic Acids Res ; 42(Database issue): D240-5, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24270792

RESUMO

Gene3D (http://gene3d.biochem.ucl.ac.uk) is a database of protein domain structure annotations for protein sequences. Domains are predicted using a library of profile HMMs from 2738 CATH superfamilies. Gene3D assigns domain annotations to Ensembl and UniProt sequence sets including >6000 cellular genomes and >20 million unique protein sequences. This represents an increase of 45% in the number of protein sequences since our last publication. Thanks to improvements in the underlying data and pipeline, we see large increases in the domain coverage of sequences. We have expanded this coverage by integrating Pfam and SUPERFAMILY domain annotations, and we now resolve domain overlaps to provide highly comprehensive composite multi-domain architectures. To make these data more accessible for comparative genome analyses, we have developed novel search algorithms for searching genomes to identify related multi-domain architectures. In addition to providing domain family annotations, we have now developed a pipeline for 3D homology modelling of domains in Gene3D. This has been applied to the human genome and will be rolled out to other major organisms over the next year.


Assuntos
Bases de Dados de Proteínas , Anotação de Sequência Molecular , Estrutura Terciária de Proteína , Genoma , Genômica , Internet , Modelos Moleculares , Estrutura Terciária de Proteína/genética , Análise de Sequência de Proteína
16.
Biochim Biophys Acta ; 1834(5): 874-89, 2013 May.
Artigo em Inglês | MEDLINE | ID: mdl-23499848

RESUMO

We present, to our knowledge, the first quantitative analysis of functional site diversity in homologous domain superfamilies. Different types of functional sites are considered separately. Our results show that most diverse superfamilies are very plastic in terms of the spatial location of their functional sites. This is especially true for protein-protein interfaces. In contrast, we confirm that catalytic sites typically occupy only a very small number of topological locations. Small-ligand binding sites are more diverse than expected, although in a more limited manner than protein-protein interfaces. In spite of the observed diversity, our results also confirm the previously reported preferential location of functional sites. We identify a subset of homologous domain superfamilies where diversity is particularly extreme, and discuss possible reasons for such plasticity, i.e. structural diversity. Our results do not contradict previous reports of preferential co-location of sites among homologues, but rather point at the importance of not ignoring other sites, especially in large and diverse superfamilies. Data on sites exploited by different relatives, within each well annotated domain superfamily, has been made accessible from the CATH website in order to highlight versatile superfamilies or superfamilies with highly preferential sites. This information is valuable for system biology and knowledge of any constraints on protein interactions could help in understanding the dynamic control of networks in which these proteins participate. The novelty of our work lies in the comprehensive nature of the analysis - we have used a significantly larger dataset than previous studies - and the fact that in many superfamilies we show that different parts of the domain surface are exploited by different relatives for ligand/protein interactions, particularly in superfamilies which are diverse in sequence and structure, an observation not previously reported on such a large scale. This article is part of a Special Issue entitled: The emerging dynamic view of proteins: Protein plasticity in allostery, evolution and self-assembly.


Assuntos
Proteínas/fisiologia , Ligação Proteica
17.
Nucleic Acids Res ; 41(Database issue): D490-8, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-23203873

RESUMO

CATH version 3.5 (Class, Architecture, Topology, Homology, available at http://www.cathdb.info/) contains 173 536 domains, 2626 homologous superfamilies and 1313 fold groups. When focusing on structural genomics (SG) structures, we observe that the number of new folds for CATH v3.5 is slightly less than for previous releases, and this observation suggests that we may now know the majority of folds that are easily accessible to structure determination. We have improved the accuracy of our functional family (FunFams) sub-classification method and the CATH sequence domain search facility has been extended to provide FunFam annotations for each domain. The CATH website has been redesigned. We have improved the display of functional data and of conserved sequence features associated with FunFams within each CATH superfamily.


Assuntos
Bases de Dados de Proteínas , Estrutura Terciária de Proteína , Genômica , Internet , Anotação de Sequência Molecular , Dobramento de Proteína , Proteínas/química , Proteínas/classificação , Proteínas/genética , Alinhamento de Sequência , Análise de Sequência de Proteína , Homologia Estrutural de Proteína
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA