Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
J Mol Biol ; 387(2): 416-30, 2009 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-19135455

RESUMO

Divergence in function of homologous proteins is based on both sequence and structural changes. Overall enzyme function has been reported to diverge earlier (50% sequence identity) than overall structure (35%). We herein study the functional conservation of enzymes and non-enzyme sequences using the protein domain families in CATH-Gene3D. Despite the rapid increase in sequence data since the last comprehensive study by Tian and Skolnick, our findings suggest that generic thresholds of 40% and 60% aligned sequence identity are still sufficient to safely inherit third-level and full Enzyme Commission numbers, respectively. This increases to 50% and 70% on the domain level, unless the multi-domain architecture matches. Assignments from the Kyoto Encyclopedia of Genes and Genomes and the Munich Information Center for Protein Sequences Functional Catalogue seem to be less conserved with sequence, probably due to a more pathway-centric view: 80% domain sequence identity is required for safe function transfer. Comparing domains (more pairwise relationships) and the use of family-specific thresholds (varying evolutionary speeds) yields the highest coverage rates when transferring functions to model proteomes. An average twofold increase in enzyme annotations is seen for 523 proteomes in Gene3D. As simple 'rules of thumb', sequence identity thresholds do not require a bioinformatics background. We will provide and update this information with future releases of CATH-Gene3D.


Assuntos
Proteínas/química , Proteínas/metabolismo , Análise de Sequência de Proteína , Sequência de Aminoácidos , Enzimas/química , Enzimas/metabolismo , Genoma/genética , Modelos Biológicos , Família Multigênica , Estrutura Terciária de Proteína , Proteoma/química , Proteoma/metabolismo , Homologia de Sequência de Aminoácidos
2.
Nucleic Acids Res ; 35(Database issue): D291-7, 2007 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-17135200

RESUMO

We report the latest release (version 3.0) of the CATH protein domain database (http://www.cathdb.info). There has been a 20% increase in the number of structural domains classified in CATH, up to 86 151 domains. Release 3.0 comprises 1110 fold groups and 2147 homologous superfamilies. To cope with the increases in diverse structural homologues being determined by the structural genomics initiatives, more sensitive methods have been developed for identifying boundaries in multi-domain proteins and for recognising homologues. The CATH classification update is now being driven by an integrated pipeline that links these automated procedures with validation steps, that have been made easier by the provision of information rich web pages summarising comparison scores and relevant links to external sites for each domain being classified. An analysis of the population of domains in the CATH hierarchy and several domain characteristics are presented for version 3.0. We also report an update of the CATH Dictionary of homologous structures (CATH-DHS) which now contains multiple structural alignments, consensus information and functional annotations for 1459 well populated superfamilies in CATH. CATH is directly linked to the Gene3D database which is a projection of CATH structural data onto approximately 2 million sequences in completed genomes and UniProt.


Assuntos
Bases de Dados de Proteínas , Estrutura Terciária de Proteína , Classificação/métodos , Evolução Molecular , Internet , Dobramento de Proteína , Estrutura Terciária de Proteína/genética , Proteínas/classificação , Homologia de Sequência de Aminoácidos , Homologia Estrutural de Proteína , Interface Usuário-Computador
3.
Philos Trans R Soc Lond B Biol Sci ; 361(1467): 425-40, 2006 Mar 29.
Artigo em Inglês | MEDLINE | ID: mdl-16524831

RESUMO

New directions in biology are being driven by the complete sequencing of genomes, which has given us the protein repertoires of diverse organisms from all kingdoms of life. In tandem with this accumulation of sequence data, worldwide structural genomics initiatives, advanced by the development of improved technologies in X-ray crystallography and NMR, are expanding our knowledge of structural families and increasing our fold libraries. Methods for detecting remote sequence similarities have also been made more sensitive and this means that we can map domains from these structural families onto genome sequences to understand how these families are distributed throughout the genomes and reveal how they might influence the functional repertoires and biological complexities of the organisms. We have used robust protocols to assign sequences from completed genomes to domain structures in the CATH database, allowing up to 60% of domain sequences in these genomes, depending on the organism, to be assigned to a domain family of known structure. Analysis of the distribution of these families throughout bacterial genomes identified more than 300 universal families, some of which had expanded significantly in proportion to genome size. These highly expanded families are primarily involved in metabolism and regulation and appear to make major contributions to the functional repertoire and complexity of bacterial organisms. When comparisons are made across all kingdoms of life, we find a smaller set of universal domain families (approx. 140), of which families involved in protein biosynthesis are the largest conserved component. Analysis of the behaviour of other families reveals that some (e.g. those involved in metabolism, regulation) have remained highly innovative during evolution, making it harder to trace their evolutionary ancestry. Structural analyses of metabolic families provide some insights into the mechanisms of functional innovation, which include changes in domain partnerships and significant structural embellishments leading to modulation of active sites and protein interactions.


Assuntos
Evolução Molecular , Proteínas/química , Proteínas/metabolismo , Algoritmos , Biologia Computacional , Bases de Dados Factuais , Conformação Proteica
4.
Nucleic Acids Res ; 34(Database issue): D281-4, 2006 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-16381865

RESUMO

The Gene3D release 4 database and web portal (http://cathwww.biochem.ucl.ac.uk:8080/Gene3D) provide a combined structural, functional and evolutionary view of the protein world. It is focussed on providing structural annotation for protein sequences without structural representatives--including the complete proteome sets of over 240 different species. The protein sequences have also been clustered into whole-chain families so as to aid functional prediction. The structural annotation is generated using HMM models based on the CATH domain families; CATH is a repository for manually deduced protein domains. Amongst the changes from the last publication are: the addition of over 100 genomes and the UniProt sequence database, domain data from Pfam, metabolic pathway and functional data from COGs, KEGG and GO, and protein-protein interaction data from MINT and BIND. The website has been rebuilt to allow more sophisticated querying and the data returned is presented in a clearer format with greater functionality. Furthermore, all data can be downloaded in a simple XML format, allowing users to carry out complex investigations at their own computers.


Assuntos
Bases de Dados de Proteínas , Proteínas/química , Proteínas/genética , Evolução Molecular , Genômica , Internet , Modelos Moleculares , Estrutura Terciária de Proteína , Proteínas/fisiologia , Proteoma/química , Análise de Sequência de Proteína , Interface Usuário-Computador
5.
Protein Sci ; 14(7): 1800-10, 2005 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-15937274

RESUMO

There are more than 200 completed genomes and over 1 million nonredundant sequences in public repositories. Although the structural data are more sparse (approximately 13,000 nonredundant structures solved to date), several powerful sequence-based methodologies now allow these structures to be mapped onto related regions in a significant proportion of genome sequences. We review a number of publicly available strategies for providing structural annotations for genome sequences, and we describe the protocol adopted to provide CATH structural annotations for completed genomes. In particular, we assess the performance of several sequence-based protocols employing Hidden Markov model (HMM) technologies for superfamily recognition, including a new approach (SAMOSA [sequence augmented models of structure alignments]) that exploits multiple structural alignments from the CATH domain structure database when building the models. Using a data set of remote homologs detected by structure comparison and manually validated in CATH, a single-seed HMM library was able to recognize 76% of the data set. Including the SAMOSA models in the HMM library showed little gain in homolog recognition, although a slight improvement in alignment quality was observed for very remote homologs. However, using an expanded 1D-HMM library, CATH-ISL increased the coverage to 86%. The single-seed HMM library has been used to annotate the protein sequences of 120 genomes from all three major kingdoms, allowing up to 70% of the genes or partial genes to be assigned to CATH superfamilies. It has also been used to recruit sequences from Swiss-Prot and TrEMBL into CATH domain superfamilies, expanding the CATH database eightfold.


Assuntos
Bases de Dados de Proteínas , Genoma , Cadeias de Markov , Estrutura Terciária de Proteína , Proteínas/química , Proteínas/genética , Análise de Sequência de Proteína , Bases de Dados de Proteínas/estatística & dados numéricos , Proteínas/classificação , Homologia de Sequência
6.
Nucleic Acids Res ; 33(Database issue): D247-51, 2005 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-15608188

RESUMO

The CATH database of protein domain structures (http://www.biochem.ucl.ac.uk/bsm/cath/) currently contains 43,229 domains classified into 1467 superfamilies and 5107 sequence families. Each structural family is expanded with sequence relatives from GenBank and completed genomes, using a variety of efficient sequence search protocols and reliable thresholds. This extended CATH protein family database contains 616,470 domain sequences classified into 23,876 sequence families. This results in the significant expansion of the CATH HMM model library to include models built from the CATH sequence relatives, giving a 10% increase in coverage for detecting remote homologues. An improved Dictionary of Homologous superfamilies (DHS) (http://www.biochem.ucl.ac.uk/bsm/dhs/) containing specific sequence, structural and functional information for each superfamily in CATH considerably assists manual validation of homologues. Information on sequence relatives in CATH superfamilies, GenBank and completed genomes is presented in the CATH associated DHS and Gene3D resources. Domain partnership information can be obtained from Gene3D (http://www.biochem.ucl.ac.uk/bsm/cath/Gene3D/). A new CATH server has been implemented (http://www.biochem.ucl.ac.uk/cgi-bin/cath/CathServer.pl) providing automatic classification of newly determined sequences and structures using a suite of rapid sequence and structure comparison methods. The statistical significance of matches is assessed and links are provided to the putative superfamily or fold group to which the query sequence or structure is assigned.


Assuntos
Bases de Dados de Ácidos Nucleicos , Bases de Dados de Proteínas , Genômica , Estrutura Terciária de Proteína , Proteínas/classificação , Análise de Sequência de Proteína , Bases de Dados de Proteínas/estatística & dados numéricos , Internet , Proteínas/genética , Homologia de Sequência de Aminoácidos , Integração de Sistemas , Interface Usuário-Computador
7.
Hum Mutat ; 22(3): 209-13, 2003 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-12938085

RESUMO

Trimethylaminuria (TMAuria), or fish-odor syndrome, is due to defective flavin-containing monooxygenase 3 (FMO3). In the liver, this protein catalyzes the NADPH-dependent oxidative metabolism of odorous trimethylamine (TMA), derived in the gut from dietary sources, to nonodorous trimethylamine N-oxide (TMA N-oxide). Affected individuals are unable to carry out this reaction and consequently exude a fishy body odor, due to the secretion of TMA in their breath and sweat and its excretion in their urine. This leads to a variety of psychosocial problems, including disruption of schooling, clinical depression, and attempted suicide. Twelve missense, three nonsense, and one gross deletion mutation are known to cause TMAuria. FMO3 is also a drug-metabolizing enzyme and compromised activity is expected to have implications for the efficacy of drug treatment and the possibility of adverse drug reactions both in TMAuric patients and in the general population. To date eight polymorphic variants, not associated with TMAuria, have been reported. A human FMO3 mutation database was created using MuStar, a locus-specific database system for maintaining data about allelic variants and distributing these via the World Wide Web. The database currently contains 24 entries and is accessible on the World Wide Web via the URL http://human-fmo3.biochem.ucl.ac.uk/Human_FMO3. Additional entries can be submitted via the curator of the database or via a web-based form.


Assuntos
Bases de Dados Genéticas , Erros Inatos do Metabolismo/genética , Erros Inatos do Metabolismo/urina , Metilaminas/urina , Mutação , Oxigenases/genética , Alelos , Animais , Bases de Dados Genéticas/tendências , Variação Genética , Humanos , Oxigenases/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...