Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
1.
Nucleic Acids Res ; 35(Database issue): D291-7, 2007 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-17135200

RESUMEN

We report the latest release (version 3.0) of the CATH protein domain database (http://www.cathdb.info). There has been a 20% increase in the number of structural domains classified in CATH, up to 86 151 domains. Release 3.0 comprises 1110 fold groups and 2147 homologous superfamilies. To cope with the increases in diverse structural homologues being determined by the structural genomics initiatives, more sensitive methods have been developed for identifying boundaries in multi-domain proteins and for recognising homologues. The CATH classification update is now being driven by an integrated pipeline that links these automated procedures with validation steps, that have been made easier by the provision of information rich web pages summarising comparison scores and relevant links to external sites for each domain being classified. An analysis of the population of domains in the CATH hierarchy and several domain characteristics are presented for version 3.0. We also report an update of the CATH Dictionary of homologous structures (CATH-DHS) which now contains multiple structural alignments, consensus information and functional annotations for 1459 well populated superfamilies in CATH. CATH is directly linked to the Gene3D database which is a projection of CATH structural data onto approximately 2 million sequences in completed genomes and UniProt.


Asunto(s)
Bases de Datos de Proteínas , Estructura Terciaria de Proteína , Clasificación/métodos , Evolución Molecular , Internet , Pliegue de Proteína , Estructura Terciaria de Proteína/genética , Proteínas/clasificación , Homología de Secuencia de Aminoácido , Homología Estructural de Proteína , Interfaz Usuario-Computador
2.
Nucleic Acids Res ; 35(Database issue): D224-8, 2007 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-17202162

RESUMEN

InterPro is an integrated resource for protein families, domains and functional sites, which integrates the following protein signature databases: PROSITE, PRINTS, ProDom, Pfam, SMART, TIGRFAMs, PIRSF, SUPERFAMILY, Gene3D and PANTHER. The latter two new member databases have been integrated since the last publication in this journal. There have been several new developments in InterPro, including an additional reading field, new database links, extensions to the web interface and additional match XML files. InterPro has always provided matches to UniProtKB proteins on the website and in the match XML file on the FTP site. Additional matches to proteins in UniParc (UniProt archive) are now available for download in the new match XML files only. The latest InterPro release (13.0) contains more than 13 000 entries, covering over 78% of all proteins in UniProtKB. The database is available for text- and sequence-based searches via a webserver (http://www.ebi.ac.uk/interpro), and for download by anonymous FTP (ftp://ftp.ebi.ac.uk/pub/databases/interpro). The InterProScan search tool is now also available via a web service at http://www.ebi.ac.uk/Tools/webservices/WSInterProScan.html.


Asunto(s)
Bases de Datos de Proteínas , Internet , Estructura Terciaria de Proteína , Proteínas/química , Proteínas/clasificación , Proteínas/fisiología , Análisis de Secuencia de Proteína , Integración de Sistemas , Interfaz Usuario-Computador
3.
Nucleic Acids Res ; 34(Database issue): D281-4, 2006 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-16381865

RESUMEN

The Gene3D release 4 database and web portal (http://cathwww.biochem.ucl.ac.uk:8080/Gene3D) provide a combined structural, functional and evolutionary view of the protein world. It is focussed on providing structural annotation for protein sequences without structural representatives--including the complete proteome sets of over 240 different species. The protein sequences have also been clustered into whole-chain families so as to aid functional prediction. The structural annotation is generated using HMM models based on the CATH domain families; CATH is a repository for manually deduced protein domains. Amongst the changes from the last publication are: the addition of over 100 genomes and the UniProt sequence database, domain data from Pfam, metabolic pathway and functional data from COGs, KEGG and GO, and protein-protein interaction data from MINT and BIND. The website has been rebuilt to allow more sophisticated querying and the data returned is presented in a clearer format with greater functionality. Furthermore, all data can be downloaded in a simple XML format, allowing users to carry out complex investigations at their own computers.


Asunto(s)
Bases de Datos de Proteínas , Proteínas/química , Proteínas/genética , Evolución Molecular , Genómica , Internet , Modelos Moleculares , Estructura Terciaria de Proteína , Proteínas/fisiología , Proteoma/química , Análisis de Secuencia de Proteína , Interfaz Usuario-Computador
4.
Nucleic Acids Res ; 33(Database issue): D247-51, 2005 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-15608188

RESUMEN

The CATH database of protein domain structures (http://www.biochem.ucl.ac.uk/bsm/cath/) currently contains 43,229 domains classified into 1467 superfamilies and 5107 sequence families. Each structural family is expanded with sequence relatives from GenBank and completed genomes, using a variety of efficient sequence search protocols and reliable thresholds. This extended CATH protein family database contains 616,470 domain sequences classified into 23,876 sequence families. This results in the significant expansion of the CATH HMM model library to include models built from the CATH sequence relatives, giving a 10% increase in coverage for detecting remote homologues. An improved Dictionary of Homologous superfamilies (DHS) (http://www.biochem.ucl.ac.uk/bsm/dhs/) containing specific sequence, structural and functional information for each superfamily in CATH considerably assists manual validation of homologues. Information on sequence relatives in CATH superfamilies, GenBank and completed genomes is presented in the CATH associated DHS and Gene3D resources. Domain partnership information can be obtained from Gene3D (http://www.biochem.ucl.ac.uk/bsm/cath/Gene3D/). A new CATH server has been implemented (http://www.biochem.ucl.ac.uk/cgi-bin/cath/CathServer.pl) providing automatic classification of newly determined sequences and structures using a suite of rapid sequence and structure comparison methods. The statistical significance of matches is assessed and links are provided to the putative superfamily or fold group to which the query sequence or structure is assigned.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Bases de Datos de Proteínas , Genómica , Estructura Terciaria de Proteína , Proteínas/clasificación , Análisis de Secuencia de Proteína , Bases de Datos de Proteínas/estadística & datos numéricos , Internet , Proteínas/genética , Homología de Secuencia de Aminoácido , Integración de Sistemas , Interfaz Usuario-Computador
5.
PLoS One ; 12(11): e0188543, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-29166669

RESUMEN

BACKGROUND: A large and growing number of inherited genetic disease mutations are now known in the dog. Frequencies of these mutations are typically examined within the breed of discovery, possibly in related breeds, but nearly always in purebred dogs. No report to date has examined the frequencies of specific genetic disease mutations in a large population of mixed-breed dogs. Further, veterinarians and dog owners typically dismiss inherited/genetic diseases as possibilities for health problems in mixed-breed dogs, assuming hybrid vigor will guarantee that single-gene disease mutations are not a cause for concern. Therefore, the objective of this study was to screen a large mixed-breed canine population for the presence of mutant alleles associated with five autosomal recessive disorders: hyperuricosuria and hyperuricemia (HUU), cystinuria (CYST), factor VII deficiency (FVIID), myotonia congenita (MYC) and phosphofructokinase deficiency (PKFD). Genetic testing was performed in conjunction with breed determination via the commercially-available Wisdom PanelTM test. RESULTS: From a population of nearly 35,000 dogs, homozygous mutant dogs were identified for HUU (n = 57) and FVIID (n = 65). Homozygotes for HUU and FVIID were identified even among dogs with highly mixed breed ancestry. Carriers were identified for all disorders except MYC. HUU and FVIID were of high enough frequency to merit consideration in any mixed-breed dog, while CYST, MYC, and PKFD are vanishingly rare. CONCLUSIONS: The assumption that mixed-breed dogs do not suffer from single-gene genetic disorders is shown here to be false. Within the diseases examined, HUU and FVIID should remain on any practitioner's rule-out list, when clinically appropriate, for all mixed-breed dogs, and judicious genetic testing should be performed for diagnosis or screening. Future testing of large mixed-breed dog populations that include additional known canine genetic mutations will refine our knowledge of which genetic diseases can strike mixed-breed dogs.


Asunto(s)
Cruzamiento , Predisposición Genética a la Enfermedad , Tasa de Mutación , Mutación/genética , Alelos , Animales , Perros
6.
Protein Sci ; 14(7): 1800-10, 2005 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-15937274

RESUMEN

There are more than 200 completed genomes and over 1 million nonredundant sequences in public repositories. Although the structural data are more sparse (approximately 13,000 nonredundant structures solved to date), several powerful sequence-based methodologies now allow these structures to be mapped onto related regions in a significant proportion of genome sequences. We review a number of publicly available strategies for providing structural annotations for genome sequences, and we describe the protocol adopted to provide CATH structural annotations for completed genomes. In particular, we assess the performance of several sequence-based protocols employing Hidden Markov model (HMM) technologies for superfamily recognition, including a new approach (SAMOSA [sequence augmented models of structure alignments]) that exploits multiple structural alignments from the CATH domain structure database when building the models. Using a data set of remote homologs detected by structure comparison and manually validated in CATH, a single-seed HMM library was able to recognize 76% of the data set. Including the SAMOSA models in the HMM library showed little gain in homolog recognition, although a slight improvement in alignment quality was observed for very remote homologs. However, using an expanded 1D-HMM library, CATH-ISL increased the coverage to 86%. The single-seed HMM library has been used to annotate the protein sequences of 120 genomes from all three major kingdoms, allowing up to 70% of the genes or partial genes to be assigned to CATH superfamilies. It has also been used to recruit sequences from Swiss-Prot and TrEMBL into CATH domain superfamilies, expanding the CATH database eightfold.


Asunto(s)
Bases de Datos de Proteínas , Genoma , Cadenas de Markov , Estructura Terciaria de Proteína , Proteínas/química , Proteínas/genética , Análisis de Secuencia de Proteína , Bases de Datos de Proteínas/estadística & datos numéricos , Proteínas/clasificación , Homología de Secuencia
7.
Structure ; 17(8): 1051-62, 2009 Aug 12.
Artículo en Inglés | MEDLINE | ID: mdl-19679085

RESUMEN

This paper explores the structural continuum in CATH and the extent to which superfamilies adopt distinct folds. Although most superfamilies are structurally conserved, in some of the most highly populated superfamilies (4% of all superfamilies) there is considerable structural divergence. While relatives share a similar fold in the evolutionary conserved core, diverse elaborations to this core can result in significant differences in the global structures. Applying similar protocols to examine the extent to which structural overlaps occur between different fold groups, it appears this effect is confined to just a few architectures and is largely due to small, recurring super-secondary motifs (e.g., alphabeta-motifs, alpha-hairpins). Although 24% of superfamilies overlap with superfamilies having different folds, only 14% of nonredundant structures in CATH are involved in overlaps. Nevertheless, the existence of these overlaps suggests that, in some regions of structure space, the fold universe should be seen as more continuous.


Asunto(s)
Bases de Datos de Proteínas , Estructura Terciaria de Proteína , Proteínas/química , Biología Computacional/métodos , Modelos Moleculares , Pliegue de Proteína , Estructura Secundaria de Proteína , Proteínas/clasificación
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA