Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
1.
Nucleic Acids Res ; 47(D1): D280-D284, 2019 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-30398663

RESUMO

This article provides an update of the latest data and developments within the CATH protein structure classification database (http://www.cathdb.info). The resource provides two levels of release: CATH-B, a daily snapshot of the latest structural domain boundaries and superfamily assignments, and CATH+, which adds layers of derived data, such as predicted sequence domains, functional annotations and functional clustering (known as Functional Families or FunFams). The most recent CATH+ release (version 4.2) provides a huge update in the coverage of structural data. This release increases the number of fully- classified domains by over 40% (from 308 999 to 434 857 structural domains), corresponding to an almost two- fold increase in sequence data (from 53 million to over 95 million predicted domains) organised into 6119 superfamilies. The coverage of high-resolution, protein PDB chains that contain at least one assigned CATH domain is now 90.2% (increased from 82.3% in the previous release). A number of highly requested features have also been implemented in our web pages: allowing the user to view an alignment between their query sequence and a representative FunFam structure and providing tools that make it easier to view the full structural context (multi-domain architecture) of domains and chains.


Assuntos
Bases de Dados de Proteínas , Genoma , Sequência de Aminoácidos , Animais , Sequência Conservada , Ontologia Genética , Humanos , Modelos Moleculares , Anotação de Sequência Molecular , Família Multigênica/genética , Conformação Proteica , Domínios Proteicos/genética , Alinhamento de Sequência , Homologia de Sequência de Aminoácidos , Relação Estrutura-Atividade
2.
Nucleic Acids Res ; 46(D1): D435-D439, 2018 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-29112716

RESUMO

Gene3D (http://gene3d.biochem.ucl.ac.uk) is a database of globular domain annotations for millions of available protein sequences. Gene3D has previously featured in the Database issue of NAR and here we report a significant update to the Gene3D database. The current release, Gene3D v16, has significantly expanded its domain coverage over the previous version and now contains over 95 million domain assignments. We also report a new method for dealing with complex domain architectures that exist in Gene3D, arising from discontinuous domains. Amongst other updates, we have added visualization tools for exploring domain annotations in the context of other sequence features and in gene families. We also provide web-pages to visualize other domain families that co-occur with a given query domain family.


Assuntos
Bases de Dados de Proteínas , Genoma , Domínios Proteicos , Proteínas/química , Software , Sequência de Aminoácidos , Animais , Gráficos por Computador , Humanos , Internet , Anotação de Sequência Molecular , Proteínas/genética , Proteínas/metabolismo , Análise de Sequência de Proteína
3.
Nucleic Acids Res ; 45(D1): D289-D295, 2017 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-27899584

RESUMO

The latest version of the CATH-Gene3D protein structure classification database has recently been released (version 4.1, http://www.cathdb.info). The resource comprises over 300 000 domain structures and over 53 million protein domains classified into 2737 homologous superfamilies, doubling the number of predicted protein domains in the previous version. The daily-updated CATH-B, which contains our very latest domain assignment data, provides putative classifications for over 100 000 additional protein domains. This article describes developments to the CATH-Gene3D resource over the last two years since the publication in 2015, including: significant increases to our structural and sequence coverage; expansion of the functional families in CATH; building a support vector machine (SVM) to automatically assign domains to superfamilies; improved search facilities to return alignments of query sequences against multiple sequence alignments; the redesign of the web pages and download site.


Assuntos
Biologia Computacional/métodos , Bases de Dados de Proteínas , Modelos Moleculares , Proteínas/química , Proteínas/metabolismo , Software , Relação Estrutura-Atividade , Navegador
4.
Nucleic Acids Res ; 43(Database issue): D376-81, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25348408

RESUMO

The latest version of the CATH-Gene3D protein structure classification database (4.0, http://www.cathdb.info) provides annotations for over 235,000 protein domain structures and includes 25 million domain predictions. This article provides an update on the major developments in the 2 years since the last publication in this journal including: significant improvements to the predictive power of our functional families (FunFams); the release of our 'current' putative domain assignments (CATH-B); a new, strictly non-redundant data set of CATH domains suitable for homology benchmarking experiments (CATH-40) and a number of improvements to the web pages.


Assuntos
Bases de Dados de Proteínas , Anotação de Sequência Molecular , Estrutura Terciária de Proteína , Genômica , Internet , Estrutura Terciária de Proteína/genética , Proteínas/classificação
5.
Nucleic Acids Res ; 43(Database issue): D382-6, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25348407

RESUMO

Genome3D (http://www.genome3d.eu) is a collaborative resource that provides predicted domain annotations and structural models for key sequences. Since introducing Genome3D in a previous NAR paper, we have substantially extended and improved the resource. We have annotated representatives from Pfam families to improve coverage of diverse sequences and added a fast sequence search to the website to allow users to find Genome3D-annotated sequences similar to their own. We have improved and extended the Genome3D data, enlarging the source data set from three model organisms to 10, and adding VIVACE, a resource new to Genome3D. We have analysed and updated Genome3D's SCOP/CATH mapping. Finally, we have improved the superposition tools, which now give users a more powerful interface for investigating similarities and differences between structural models.


Assuntos
Bases de Dados de Proteínas , Anotação de Sequência Molecular , Estrutura Terciária de Proteína , Algoritmos , Genômica , Internet , Modelos Moleculares , Estrutura Terciária de Proteína/genética , Análise de Sequência de Proteína
7.
Nucleic Acids Res ; 41(Database issue): D490-8, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-23203873

RESUMO

CATH version 3.5 (Class, Architecture, Topology, Homology, available at http://www.cathdb.info/) contains 173 536 domains, 2626 homologous superfamilies and 1313 fold groups. When focusing on structural genomics (SG) structures, we observe that the number of new folds for CATH v3.5 is slightly less than for previous releases, and this observation suggests that we may now know the majority of folds that are easily accessible to structure determination. We have improved the accuracy of our functional family (FunFams) sub-classification method and the CATH sequence domain search facility has been extended to provide FunFam annotations for each domain. The CATH website has been redesigned. We have improved the display of functional data and of conserved sequence features associated with FunFams within each CATH superfamily.


Assuntos
Bases de Dados de Proteínas , Estrutura Terciária de Proteína , Genômica , Internet , Anotação de Sequência Molecular , Dobramento de Proteína , Proteínas/química , Proteínas/classificação , Proteínas/genética , Alinhamento de Sequência , Análise de Sequência de Proteína , Homologia Estrutural de Proteína
8.
Nucleic Acids Res ; 41(Database issue): D499-507, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-23203986

RESUMO

Genome3D, available at http://www.genome3d.eu, is a new collaborative project that integrates UK-based structural resources to provide a unique perspective on sequence-structure-function relationships. Leading structure prediction resources (DomSerf, FUGUE, Gene3D, pDomTHREADER, Phyre and SUPERFAMILY) provide annotations for UniProt sequences to indicate the locations of structural domains (structural annotations) and their 3D structures (structural models). Structural annotations and 3D model predictions are currently available for three model genomes (Homo sapiens, E. coli and baker's yeast), and the project will extend to other genomes in the near future. As these resources exploit different strategies for predicting structures, the main aim of Genome3D is to enable comparisons between all the resources so that biologists can see where predictions agree and are therefore more trusted. Furthermore, as these methods differ in whether they build their predictions using CATH or SCOP, Genome3D also contains the first official mapping between these two databases. This has identified pairs of similar superfamilies from the two resources at various degrees of consensus (532 bronze pairs, 527 silver pairs and 370 gold pairs).


Assuntos
Bases de Dados de Proteínas , Estrutura Terciária de Proteína , Genômica , Humanos , Internet , Anotação de Sequência Molecular , Proteínas/química , Proteínas/classificação , Proteínas/genética , Software
9.
Nucleic Acids Res ; 39(Database issue): D420-6, 2011 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-21097779

RESUMO

CATH version 3.3 (class, architecture, topology, homology) contains 128,688 domains, 2386 homologous superfamilies and 1233 fold groups, and reflects a major focus on classifying structural genomics (SG) structures and transmembrane proteins, both of which are likely to add structural novelty to the database and therefore increase the coverage of protein fold space within CATH. For CATH version 3.4 we have significantly improved the presentation of sequence information and associated functional information for CATH superfamilies. The CATH superfamily pages now reflect both the functional and structural diversity within the superfamily and include structural alignments of close and distant relatives within the superfamily, annotated with functional information and details of conserved residues. A significantly more efficient search function for CATH has been established by implementing the search server Solr (http://lucene.apache.org/solr/). The CATH v3.4 webpages have been built using the Catalyst web framework.


Assuntos
Bases de Dados de Proteínas , Estrutura Terciária de Proteína , Filogenia , Dobramento de Proteína , Proteínas/química , Proteínas/classificação
10.
Nucleic Acids Res ; 37(Database issue): D310-4, 2009 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-18996897

RESUMO

The latest version of CATH (class, architecture, topology, homology) (version 3.2), released in July 2008 (http://www.cathdb.info), contains 114,215 domains, 2178 Homologous superfamilies and 1110 fold groups. We have assigned 20,330 new domains, 87 new homologous superfamilies and 26 new folds since CATH release version 3.1. A total of 28,064 new domains have been assigned since our NAR 2007 database publication (CATH version 3.0). The CATH website has been completely redesigned and includes more comprehensive documentation. We have revisited the CATH architecture level as part of the development of a 'Protein Chart' and present information on the population of each architecture. The CATHEDRAL structure comparison algorithm has been improved and used to characterize structural diversity in CATH superfamilies and structural overlaps between superfamilies. Although the majority of superfamilies in CATH are not structurally diverse and do not overlap significantly with other superfamilies, approximately 4% of superfamilies are very diverse and these are the superfamilies that are most highly populated in both the PDB and in the genomes. Information on the degree of structural diversity in each superfamily and structural overlaps between superfamilies can now be downloaded from the CATH website.


Assuntos
Bases de Dados de Proteínas , Estrutura Terciária de Proteína , Modelos Moleculares , Dobramento de Proteína , Estrutura Secundária de Proteína , Proteínas/classificação , Homologia de Sequência de Aminoácidos
11.
Nucleic Acids Res ; 35(Database issue): D291-7, 2007 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-17135200

RESUMO

We report the latest release (version 3.0) of the CATH protein domain database (http://www.cathdb.info). There has been a 20% increase in the number of structural domains classified in CATH, up to 86 151 domains. Release 3.0 comprises 1110 fold groups and 2147 homologous superfamilies. To cope with the increases in diverse structural homologues being determined by the structural genomics initiatives, more sensitive methods have been developed for identifying boundaries in multi-domain proteins and for recognising homologues. The CATH classification update is now being driven by an integrated pipeline that links these automated procedures with validation steps, that have been made easier by the provision of information rich web pages summarising comparison scores and relevant links to external sites for each domain being classified. An analysis of the population of domains in the CATH hierarchy and several domain characteristics are presented for version 3.0. We also report an update of the CATH Dictionary of homologous structures (CATH-DHS) which now contains multiple structural alignments, consensus information and functional annotations for 1459 well populated superfamilies in CATH. CATH is directly linked to the Gene3D database which is a projection of CATH structural data onto approximately 2 million sequences in completed genomes and UniProt.


Assuntos
Bases de Dados de Proteínas , Estrutura Terciária de Proteína , Classificação/métodos , Evolução Molecular , Internet , Dobramento de Proteína , Estrutura Terciária de Proteína/genética , Proteínas/classificação , Homologia de Sequência de Aminoácidos , Homologia Estrutural de Proteína , Interface Usuário-Computador
12.
BMC Bioinformatics ; 8: 86, 2007 Mar 09.
Artigo em Inglês | MEDLINE | ID: mdl-17349043

RESUMO

BACKGROUND: Structural genomics initiatives were established with the aim of solving protein structures on a large-scale. For many initiatives, such as the Protein Structure Initiative (PSI), the primary aim of target selection is focussed towards structurally characterising protein families which, so far, lack a structural representative. It is therefore of considerable interest to gain insights into the number and distribution of these families, and what efforts may be required to achieve a comprehensive structural coverage across all protein families. RESULTS: In this analysis we have derived a comprehensive domain annotation of the genomes using CATH, Pfam-A and Newfam domain families. We consider what proportions of structurally uncharacterized families are accessible to high-throughput structural genomics pipelines, specifically those targeting families containing multiple prokaryotic orthologues. In measuring the domain coverage of the genomes, we show the benefits of selecting targets from both structurally uncharacterized domain families, whilst in addition, pursuing additional targets from large structurally characterised protein superfamilies. CONCLUSION: This work suggests that such a combined approach to target selection is essential if structural genomics is to achieve a comprehensive structural coverage of the genomes, leading to greater insights into structure and the mechanisms that underlie protein evolution.


Assuntos
Bases de Dados de Proteínas , Genoma/genética , Genômica , Animais , Genômica/métodos , Humanos , Família Multigênica , Análise de Sequência de Proteína/métodos , Homologia Estrutural de Proteína
13.
Nucleic Acids Res ; 33(Database issue): D247-51, 2005 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-15608188

RESUMO

The CATH database of protein domain structures (http://www.biochem.ucl.ac.uk/bsm/cath/) currently contains 43,229 domains classified into 1467 superfamilies and 5107 sequence families. Each structural family is expanded with sequence relatives from GenBank and completed genomes, using a variety of efficient sequence search protocols and reliable thresholds. This extended CATH protein family database contains 616,470 domain sequences classified into 23,876 sequence families. This results in the significant expansion of the CATH HMM model library to include models built from the CATH sequence relatives, giving a 10% increase in coverage for detecting remote homologues. An improved Dictionary of Homologous superfamilies (DHS) (http://www.biochem.ucl.ac.uk/bsm/dhs/) containing specific sequence, structural and functional information for each superfamily in CATH considerably assists manual validation of homologues. Information on sequence relatives in CATH superfamilies, GenBank and completed genomes is presented in the CATH associated DHS and Gene3D resources. Domain partnership information can be obtained from Gene3D (http://www.biochem.ucl.ac.uk/bsm/cath/Gene3D/). A new CATH server has been implemented (http://www.biochem.ucl.ac.uk/cgi-bin/cath/CathServer.pl) providing automatic classification of newly determined sequences and structures using a suite of rapid sequence and structure comparison methods. The statistical significance of matches is assessed and links are provided to the putative superfamily or fold group to which the query sequence or structure is assigned.


Assuntos
Bases de Dados de Ácidos Nucleicos , Bases de Dados de Proteínas , Genômica , Estrutura Terciária de Proteína , Proteínas/classificação , Análise de Sequência de Proteína , Bases de Dados de Proteínas/estatística & dados numéricos , Internet , Proteínas/genética , Homologia de Sequência de Aminoácidos , Integração de Sistemas , Interface Usuário-Computador
14.
Curr Protoc Bioinformatics ; 50: 1.28.1-1.28.21, 2015 Jun 19.
Artigo em Inglês | MEDLINE | ID: mdl-26087950

RESUMO

The CATH database is a classification of protein structures found in the Protein Data Bank (PDB). Protein structures are chopped into individual units of structural domains, and these domains are grouped together into superfamilies if there is sufficient evidence that they have diverged from a common ancestor during the process of evolution. A sister resource, Gene3D, extends this information by scanning sequence profiles of these CATH domain superfamilies against many millions of known proteins to identify related sequences. Thus the combined CATH-Gene3D resource provides confident predictions of the likely structural fold, domain organisation, and evolutionary relatives of these proteins. In addition, this resource incorporates annotations from a large number of external databases such as known enzyme active sites, GO molecular functions, physical interactions, and mutations. This unit details how to access and understand the information contained within the CATH-Gene3D Web pages, the downloadable data files, and the remotely accessible Web services.


Assuntos
Bases de Dados de Proteínas , Proteínas/química , Sequência de Aminoácidos , Dados de Sequência Molecular , Estrutura Terciária de Proteína , Ferramenta de Busca
15.
Vet Rec ; 183(10): 330, 2018 09 15.
Artigo em Inglês | MEDLINE | ID: mdl-30217918

Assuntos
Comércio , Animais , Cães , Feminino
16.
Structure ; 17(8): 1051-62, 2009 Aug 12.
Artigo em Inglês | MEDLINE | ID: mdl-19679085

RESUMO

This paper explores the structural continuum in CATH and the extent to which superfamilies adopt distinct folds. Although most superfamilies are structurally conserved, in some of the most highly populated superfamilies (4% of all superfamilies) there is considerable structural divergence. While relatives share a similar fold in the evolutionary conserved core, diverse elaborations to this core can result in significant differences in the global structures. Applying similar protocols to examine the extent to which structural overlaps occur between different fold groups, it appears this effect is confined to just a few architectures and is largely due to small, recurring super-secondary motifs (e.g., alphabeta-motifs, alpha-hairpins). Although 24% of superfamilies overlap with superfamilies having different folds, only 14% of nonredundant structures in CATH are involved in overlaps. Nevertheless, the existence of these overlaps suggests that, in some regions of structure space, the fold universe should be seen as more continuous.


Assuntos
Bases de Dados de Proteínas , Estrutura Terciária de Proteína , Proteínas/química , Biologia Computacional/métodos , Modelos Moleculares , Dobramento de Proteína , Estrutura Secundária de Proteína , Proteínas/classificação
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA