Pesquisa | Portal de Pesquisa da BVS

CDD: a conserved domain database for interactive domain family analysis.

Marchler-Bauer, Aron; Anderson, John B; Derbyshire, Myra K; DeWeese-Scott, Carol; Gonzales, Noreen R; Gwadz, Marc; Hao, Luning; He, Siqian; Hurwitz, David I; Jackson, John D; Ke, Zhaoxi; Krylov, Dmitri; Lanczycki, Christopher J; Liebert, Cynthia A; Liu, Chunlei; Lu, Fu; Lu, Shennan; Marchler, Gabriele H; Mullokandov, Mikhail; Song, James S; Thanki, Narmada; Yamashita, Roxanne A; Yin, Jodie J; Zhang, Dachuan; Bryant, Stephen H.

Nucleic Acids Res ; 35(Database issue): D237-40, 2007 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-17135202

RESUMO

The conserved domain database (CDD) is part of NCBI's Entrez database system and serves as a primary resource for the annotation of conserved domain footprints on protein sequences in Entrez. Entrez's global query interface can be accessed at http://www.ncbi.nlm.nih.gov/Entrez and will search CDD and many other databases. Domain annotation for proteins in Entrez has been pre-computed and is readily available in the form of 'Conserved Domain' links. Novel protein sequences can be scanned against CDD using the CD-Search service; this service searches databases of CDD-derived profile models with protein sequence queries using BLAST heuristics, at http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi. Protein query sequences submitted to NCBI's protein BLAST search service are scanned for conserved domain signatures by default. The CDD collection contains models imported from Pfam, SMART and COG, as well as domain models curated at NCBI. NCBI curated models are organized into hierarchies of domains related by common descent. Here we report on the status of the curation effort and present a novel helper application, CDTree, which enables users of the CDD resource to examine curated hierarchies. More importantly, CDD and CDTree used in concert, serve as a powerful tool in protein classification, as they allow users to analyze protein sequences in the context of domain family hierarchies.

Assuntos

Bases de Dados de Proteínas , Estrutura Terciária de Proteína , Sequência de Aminoácidos , Animais , Sequência Conservada , Internet , Filogenia , Estrutura Terciária de Proteína/genética , Proteínas/classificação , Análise de Sequência de Proteína , Interface Usuário-Computador

CDD: a Conserved Domain Database for protein classification.

Marchler-Bauer, Aron; Anderson, John B; Cherukuri, Praveen F; DeWeese-Scott, Carol; Geer, Lewis Y; Gwadz, Marc; He, Siqian; Hurwitz, David I; Jackson, John D; Ke, Zhaoxi; Lanczycki, Christopher J; Liebert, Cynthia A; Liu, Chunlei; Lu, Fu; Marchler, Gabriele H; Mullokandov, Mikhail; Shoemaker, Benjamin A; Simonyan, Vahan; Song, James S; Thiessen, Paul A; Yamashita, Roxanne A; Yin, Jodie J; Zhang, Dachuan; Bryant, Stephen H.

Nucleic Acids Res ; 33(Database issue): D192-6, 2005 Jan 01.

Artigo em Inglês | MEDLINE | ID: mdl-15608175

RESUMO

The Conserved Domain Database (CDD) is the protein classification component of NCBI's Entrez query and retrieval system. CDD is linked to other Entrez databases such as Proteins, Taxonomy and PubMed, and can be accessed at http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=cdd. CD-Search, which is available at http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi, is a fast, interactive tool to identify conserved domains in new protein sequences. CD-Search results for protein sequences in Entrez are pre-computed to provide links between proteins and domain models, and computational annotation visible upon request. Protein-protein queries submitted to NCBI's BLAST search service at http://www.ncbi.nlm.nih.gov/BLAST are scanned for the presence of conserved domains by default. While CDD started out as essentially a mirror of publicly available domain alignment collections, such as SMART, Pfam and COG, we have continued an effort to update, and in some cases replace these models with domain hierarchies curated at the NCBI. Here, we report on the progress of the curation effort and associated improvements in the functionality of the CDD information retrieval system.

Assuntos

Bases de Dados de Proteínas , Estrutura Terciária de Proteína , Proteínas/classificação , Sequência de Aminoácidos , Sequência Conservada , Filogenia , Alinhamento de Sequência , Análise de Sequência de Proteína , Interface Usuário-Computador

CDD: a curated Entrez database of conserved domain alignments.

Marchler-Bauer, Aron; Anderson, John B; DeWeese-Scott, Carol; Fedorova, Natalie D; Geer, Lewis Y; He, Siqian; Hurwitz, David I; Jackson, John D; Jacobs, Aviva R; Lanczycki, Christopher J; Liebert, Cynthia A; Liu, Chunlei; Madej, Thomas; Marchler, Gabriele H; Mazumder, Raja; Nikolskaya, Anastasia N; Panchenko, Anna R; Rao, Bachoti S; Shoemaker, Benjamin A; Simonyan, Vahan; Song, James S; Thiessen, Paul A; Vasudevan, Sona; Wang, Yanli; Yamashita, Roxanne A; Yin, Jodie J; Bryant, Stephen H.

Nucleic Acids Res ; 31(1): 383-7, 2003 Jan 01.

Artigo em Inglês | MEDLINE | ID: mdl-12520028

RESUMO

The Conserved Domain Database (CDD) is now indexed as a separate database within the Entrez system and linked to other Entrez databases such as MEDLINE(R). This allows users to search for domain types by name, for example, or to view the domain architecture of any protein in Entrez's sequence database. CDD can be accessed on the WorldWideWeb at http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=cdd. Users may also employ the CD-Search service to identify conserved domains in new sequences, at http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi. CD-Search results, and pre-computed links from Entrez's protein database, are calculated using the RPS-BLAST algorithm and Position Specific Score Matrices (PSSMs) derived from CDD alignments. CD-Searches are also run by default for protein-protein queries submitted to BLAST(R) at http://www.ncbi.nlm.nih.gov/BLAST. CDD mirrors the publicly available domain alignment collections SMART and PFAM, and now also contains alignment models curated at NCBI. Structure information is used to identify the core substructure likely to be present in all family members, and to produce sequence alignments consistent with structure conservation. This alignment model allows NCBI curators to annotate 'columns' corresponding to functional sites conserved among family members.

Assuntos

Bases de Dados de Proteínas , Estrutura Terciária de Proteína , Sequência de Aminoácidos , Animais , Sequência Conservada , Armazenamento e Recuperação da Informação , Modelos Moleculares , Alinhamento de Sequência

MMDB: Entrez's 3D-structure database.

Chen, Jie; Anderson, John B; DeWeese-Scott, Carol; Fedorova, Natalie D; Geer, Lewis Y; He, Siqian; Hurwitz, David I; Jackson, John D; Jacobs, Aviva R; Lanczycki, Christopher J; Liebert, Cynthia A; Liu, Chunlei; Madej, Thomas; Marchler-Bauer, Aron; Marchler, Gabriele H; Mazumder, Raja; Nikolskaya, Anastasia N; Rao, Bachoti S; Panchenko, Anna R; Shoemaker, Benjamin A; Simonyan, Vahan; Song, James S; Thiessen, Paul A; Vasudevan, Sona; Wang, Yanli; Yamashita, Roxanne A; Yin, Jodie J; Bryant, Stephen H.

Nucleic Acids Res ; 31(1): 474-7, 2003 Jan 01.

Artigo em Inglês | MEDLINE | ID: mdl-12520055

RESUMO

Three-dimensional structures are now known within most protein families and it is likely, when searching a sequence database, that one will identify a homolog of known structure. The goal of Entrez's 3D-structure database is to make structure information and the functional annotation it can provide easily accessible to molecular biologists. To this end, Entrez's search engine provides several powerful features: (i) links between databases, for example between a protein's sequence and structure; (ii) pre-computed sequence and structure neighbors; and (iii) structure and sequence/structure alignment visualization. Here, we focus on a new feature of Entrez's Molecular Modeling Database (MMDB): Graphical summaries of the biological annotation available for each 3D structure, based on the results of automated comparative analysis. MMDB is available at: http://www.ncbi.nlm.nih.gov/Entrez/structure.html.

Assuntos

Bases de Dados de Proteínas , Modelos Moleculares , Homologia Estrutural de Proteína , Animais , Gráficos por Computador , Imageamento Tridimensional , Estrutura Terciária de Proteína , Proteínas/química

The COG database: an updated version includes eukaryotes.

Tatusov, Roman L; Fedorova, Natalie D; Jackson, John D; Jacobs, Aviva R; Kiryutin, Boris; Koonin, Eugene V; Krylov, Dmitri M; Mazumder, Raja; Mekhedov, Sergei L; Nikolskaya, Anastasia N; Rao, B Sridhar; Smirnov, Sergei; Sverdlov, Alexander V; Vasudevan, Sona; Wolf, Yuri I; Yin, Jodie J; Natale, Darren A.

BMC Bioinformatics ; 4: 41, 2003 Sep 11.

Artigo em Inglês | MEDLINE | ID: mdl-12969510

RESUMO

BACKGROUND: The availability of multiple, essentially complete genome sequences of prokaryotes and eukaryotes spurred both the demand and the opportunity for the construction of an evolutionary classification of genes from these genomes. Such a classification system based on orthologous relationships between genes appears to be a natural framework for comparative genomics and should facilitate both functional annotation of genomes and large-scale evolutionary studies. RESULTS: We describe here a major update of the previously developed system for delineation of Clusters of Orthologous Groups of proteins (COGs) from the sequenced genomes of prokaryotes and unicellular eukaryotes and the construction of clusters of predicted orthologs for 7 eukaryotic genomes, which we named KOGs after eukaryotic orthologous groups. The COG collection currently consists of 138,458 proteins, which form 4873 COGs and comprise 75% of the 185,505 (predicted) proteins encoded in 66 genomes of unicellular organisms. The eukaryotic orthologous groups (KOGs) include proteins from 7 eukaryotic genomes: three animals (the nematode Caenorhabditis elegans, the fruit fly Drosophila melanogaster and Homo sapiens), one plant, Arabidopsis thaliana, two fungi (Saccharomyces cerevisiae and Schizosaccharomyces pombe), and the intracellular microsporidian parasite Encephalitozoon cuniculi. The current KOG set consists of 4852 clusters of orthologs, which include 59,838 proteins, or approximately 54% of the analyzed eukaryotic 110,655 gene products. Compared to the coverage of the prokaryotic genomes with COGs, a considerably smaller fraction of eukaryotic genes could be included into the KOGs; addition of new eukaryotic genomes is expected to result in substantial increase in the coverage of eukaryotic genomes with KOGs. Examination of the phyletic patterns of KOGs reveals a conserved core represented in all analyzed species and consisting of approximately 20% of the KOG set. This conserved portion of the KOG set is much greater than the ubiquitous portion of the COG set (approximately 1% of the COGs). In part, this difference is probably due to the small number of included eukaryotic genomes, but it could also reflect the relative compactness of eukaryotes as a clade and the greater evolutionary stability of eukaryotic genomes. CONCLUSION: The updated collection of orthologous protein sets for prokaryotes and eukaryotes is expected to be a useful platform for functional annotation of newly sequenced genomes, including those of complex eukaryotes, and genome-wide evolutionary studies.

Assuntos

Bases de Dados de Proteínas/tendências , Células Eucarióticas , Proteínas/classificação , Proteínas/genética , Animais , Bases de Dados de Ácidos Nucleicos/tendências , Células Eucarióticas/química , Células Eucarióticas/fisiologia , Evolução Molecular , Humanos , National Institutes of Health (U.S.) , Proteínas/fisiologia , Terminologia como Assunto , Estados Unidos

A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes.

Koonin, Eugene V; Fedorova, Natalie D; Jackson, John D; Jacobs, Aviva R; Krylov, Dmitri M; Makarova, Kira S; Mazumder, Raja; Mekhedov, Sergei L; Nikolskaya, Anastasia N; Rao, B Sridhar; Rogozin, Igor B; Smirnov, Sergei; Sorokin, Alexander V; Sverdlov, Alexander V; Vasudevan, Sona; Wolf, Yuri I; Yin, Jodie J; Natale, Darren A.

Genome Biol ; 5(2): R7, 2004.

Artigo em Inglês | MEDLINE | ID: mdl-14759257

RESUMO

BACKGROUND: Sequencing the genomes of multiple, taxonomically diverse eukaryotes enables in-depth comparative-genomic analysis which is expected to help in reconstructing ancestral eukaryotic genomes and major events in eukaryotic evolution and in making functional predictions for currently uncharacterized conserved genes. RESULTS: We examined functional and evolutionary patterns in the recently constructed set of 5,873 clusters of predicted orthologs (eukaryotic orthologous groups or KOGs) from seven eukaryotic genomes: Caenorhabditis elegans, Drosophila melanogaster, Homo sapiens, Arabidopsis thaliana, Saccharomyces cerevisiae, Schizosaccharomyces pombe and Encephalitozoon cuniculi. Conservation of KOGs through the phyletic range of eukaryotes strongly correlates with their functions and with the effect of gene knockout on the organism's viability. The approximately 40% of KOGs that are represented in six or seven species are enriched in proteins responsible for housekeeping functions, particularly translation and RNA processing. These conserved KOGs are often essential for survival and might approximate the minimal set of essential eukaryotic genes. The 131 single-member, pan-eukaryotic KOGs we identified were examined in detail. For around 20 that remained uncharacterized, functions were predicted by in-depth sequence analysis and examination of genomic context. Nearly all these proteins are subunits of known or predicted multiprotein complexes, in agreement with the balance hypothesis of evolution of gene copy number. Other KOGs show a variety of phyletic patterns, which points to major contributions of lineage-specific gene loss and the 'invention' of genes new to eukaryotic evolution. Examination of the sets of KOGs lost in individual lineages reveals co-elimination of functionally connected genes. Parsimonious scenarios of eukaryotic genome evolution and gene sets for ancestral eukaryotic forms were reconstructed. The gene set of the last common ancestor of the crown group consists of 3,413 KOGs and largely includes proteins involved in genome replication and expression, and central metabolism. Only 44% of the KOGs, mostly from the reconstructed gene set of the last common ancestor of the crown group, have detectable homologs in prokaryotes; the remainder apparently evolved via duplication with divergence and invention of new genes. CONCLUSIONS: The KOG analysis reveals a conserved core of largely essential eukaryotic genes as well as major diversification and innovation associated with evolution of eukaryotic genomes. The results provide quantitative support for major trends of eukaryotic evolution noticed previously at the qualitative level and a basis for detailed reconstruction of evolution of eukaryotic genomes and biology of ancestral forms.

Assuntos

Células Eucarióticas/classificação , Genoma , Filogenia , Proteínas/classificação , Animais , Caenorhabditis elegans/genética , Evolução Molecular , Deleção de Genes , Humanos , Células Procarióticas/classificação , Estrutura Terciária de Proteína , Proteínas/genética , Proteínas/fisiologia , Análise de Sequência de Proteína , Leveduras/genética

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA