Pesquisa | BVS IEC

1.

The conserved domain database in 2023.

Wang, Jiyao; Chitsaz, Farideh; Derbyshire, Myra K; Gonzales, Noreen R; Gwadz, Marc; Lu, Shennan; Marchler, Gabriele H; Song, James S; Thanki, Narmada; Yamashita, Roxanne A; Yang, Mingzhang; Zhang, Dachuan; Zheng, Chanjuan; Lanczycki, Christopher J; Marchler-Bauer, Aron.

Nucleic Acids Res ; 51(D1): D384-D388, 2023 01 06.

Artigo em Inglês | MEDLINE | ID: mdl-36477806

RESUMO

NLM's conserved domain database (CDD) is a collection of protein domain and protein family models constructed as multiple sequence alignments. Its main purpose is to provide annotation for protein and translated nucleotide sequences with the location of domain footprints and associated functional sites, and to define protein domain architecture as a basis for assigning gene product names and putative/predicted function. CDD has been available publicly for over 20 years and has grown substantially during that time. Maintaining an archive of pre-computed annotation continues to be a challenge and has slowed down the cadence of CDD releases. CDD curation staff builds hierarchical classifications of large protein domain families, adds models for novel domain families via surveillance of the protein 'dark matter' that currently lacks annotation, and now spends considerable effort on providing names and attribution for conserved domain architectures. CDD can be accessed at https://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml.

Assuntos

Bases de Dados de Proteínas , Proteínas , Humanos , Sequência de Aminoácidos , Sequência Conservada , Estrutura Terciária de Proteína , Proteínas/química , Proteínas/genética , Domínios Proteicos

2.

InterPro in 2022.

Paysan-Lafosse, Typhaine; Blum, Matthias; Chuguransky, Sara; Grego, Tiago; Pinto, Beatriz Lázaro; Salazar, Gustavo A; Bileschi, Maxwell L; Bork, Peer; Bridge, Alan; Colwell, Lucy; Gough, Julian; Haft, Daniel H; Letunic, Ivica; Marchler-Bauer, Aron; Mi, Huaiyu; Natale, Darren A; Orengo, Christine A; Pandurangan, Arun P; Rivoire, Catherine; Sigrist, Christian J A; Sillitoe, Ian; Thanki, Narmada; Thomas, Paul D; Tosatto, Silvio C E; Wu, Cathy H; Bateman, Alex.

Nucleic Acids Res ; 51(D1): D418-D427, 2023 01 06.

Artigo em Inglês | MEDLINE | ID: mdl-36350672

RESUMO

The InterPro database (https://www.ebi.ac.uk/interpro/) provides an integrative classification of protein sequences into families, and identifies functionally important domains and conserved sites. Here, we report recent developments with InterPro (version 90.0) and its associated software, including updates to data content and to the website. These developments extend and enrich the information provided by InterPro, and provide a more user friendly access to the data. Additionally, we have worked on adding Pfam website features to the InterPro website, as the Pfam website will be retired in late 2022. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB. Moreover, we report the development of a card game as a method of engaging the non-scientific community. Finally, we discuss the benefits and challenges brought by the use of artificial intelligence for protein structure prediction.

Assuntos

Bases de Dados de Proteínas , Humanos , Sequência de Aminoácidos , Inteligência Artificial , Internet , Proteínas/química , Software

3.

RefSeq: expanding the Prokaryotic Genome Annotation Pipeline reach with protein family model curation.

Li, Wenjun; O'Neill, Kathleen R; Haft, Daniel H; DiCuccio, Michael; Chetvernin, Vyacheslav; Badretdin, Azat; Coulouris, George; Chitsaz, Farideh; Derbyshire, Myra K; Durkin, A Scott; Gonzales, Noreen R; Gwadz, Marc; Lanczycki, Christopher J; Song, James S; Thanki, Narmada; Wang, Jiyao; Yamashita, Roxanne A; Yang, Mingzhang; Zheng, Chanjuan; Marchler-Bauer, Aron; Thibaud-Nissen, Françoise.

Nucleic Acids Res ; 49(D1): D1020-D1028, 2021 01 08.

Artigo em Inglês | MEDLINE | ID: mdl-33270901

RESUMO

The Reference Sequence (RefSeq) project at the National Center for Biotechnology Information (NCBI) contains nearly 200 000 bacterial and archaeal genomes and 150 million proteins with up-to-date annotation. Changes in the Prokaryotic Genome Annotation Pipeline (PGAP) since 2018 have resulted in a substantial reduction in spurious annotation. The hierarchical collection of protein family models (PFMs) used by PGAP as evidence for structural and functional annotation was expanded to over 35 000 protein profile hidden Markov models (HMMs), 12 300 BlastRules and 36 000 curated CDD architectures. As a result, >122 million or 79% of RefSeq proteins are now named based on a match to a curated PFM. Gene symbols, Enzyme Commission numbers or supporting publication attributes are available on over 40% of the PFMs and are inherited by the proteins and features they name, facilitating multi-genome analyses and connections to the literature. In adherence with the principles of FAIR (findable, accessible, interoperable, reusable), the PFMs are available in the Protein Family Models Entrez database to any user. Finally, the reference and representative genome set, a taxonomically diverse subset of RefSeq prokaryotic genomes, is now recalculated regularly and available for download and homology searches with BLAST. RefSeq is found at https://www.ncbi.nlm.nih.gov/refseq/.

Assuntos

Biologia Computacional/métodos , Bases de Dados Genéticas , Genoma Arqueal/genética , Genoma Bacteriano/genética , Anotação de Sequência Molecular/métodos , Proteínas/genética , Curadoria de Dados/métodos , Mineração de Dados/métodos , Genômica/métodos , Internet , Proteínas/classificação , Interface Usuário-Computador

4.

The InterPro protein families and domains database: 20 years on.

Blum, Matthias; Chang, Hsin-Yu; Chuguransky, Sara; Grego, Tiago; Kandasaamy, Swaathi; Mitchell, Alex; Nuka, Gift; Paysan-Lafosse, Typhaine; Qureshi, Matloob; Raj, Shriya; Richardson, Lorna; Salazar, Gustavo A; Williams, Lowri; Bork, Peer; Bridge, Alan; Gough, Julian; Haft, Daniel H; Letunic, Ivica; Marchler-Bauer, Aron; Mi, Huaiyu; Natale, Darren A; Necci, Marco; Orengo, Christine A; Pandurangan, Arun P; Rivoire, Catherine; Sigrist, Christian J A; Sillitoe, Ian; Thanki, Narmada; Thomas, Paul D; Tosatto, Silvio C E; Wu, Cathy H; Bateman, Alex; Finn, Robert D.

Nucleic Acids Res ; 49(D1): D344-D354, 2021 01 08.

Artigo em Inglês | MEDLINE | ID: mdl-33156333

RESUMO

The InterPro database (https://www.ebi.ac.uk/interpro/) provides an integrative classification of protein sequences into families, and identifies functionally important domains and conserved sites. InterProScan is the underlying software that allows protein and nucleic acid sequences to be searched against InterPro's signatures. Signatures are predictive models which describe protein families, domains or sites, and are provided by multiple databases. InterPro combines signatures representing equivalent families, domains or sites, and provides additional information such as descriptions, literature references and Gene Ontology (GO) terms, to produce a comprehensive resource for protein classification. Founded in 1999, InterPro has become one of the most widely used resources for protein family annotation. Here, we report the status of InterPro (version 81.0) in its 20th year of operation, and its associated software, including updates to database content, the release of a new website and REST API, and performance improvements in InterProScan.

Assuntos

Bases de Dados de Proteínas , Proteínas/química , Sequência de Aminoácidos , COVID-19/metabolismo , Internet , Anotação de Sequência Molecular , Domínios Proteicos , Mapas de Interação de Proteínas , SARS-CoV-2/metabolismo , Alinhamento de Sequência

5.

CDD/SPARCLE: the conserved domain database in 2020.

Lu, Shennan; Wang, Jiyao; Chitsaz, Farideh; Derbyshire, Myra K; Geer, Renata C; Gonzales, Noreen R; Gwadz, Marc; Hurwitz, David I; Marchler, Gabriele H; Song, James S; Thanki, Narmada; Yamashita, Roxanne A; Yang, Mingzhang; Zhang, Dachuan; Zheng, Chanjuan; Lanczycki, Christopher J; Marchler-Bauer, Aron.

Nucleic Acids Res ; 48(D1): D265-D268, 2020 01 08.

Artigo em Inglês | MEDLINE | ID: mdl-31777944

RESUMO

As NLM's Conserved Domain Database (CDD) enters its 20th year of operations as a publicly available resource, CDD curation staff continues to develop hierarchical classifications of widely distributed protein domain families, and to record conserved sites associated with molecular function, so that they can be mapped onto user queries in support of hypothesis-driven biomolecular research. CDD offers both an archive of pre-computed domain annotations as well as live search services for both single protein or nucleotide queries and larger sets of protein query sequences. CDD staff has continued to characterize protein families via conserved domain architectures and has built up a significant corpus of curated domain architectures in support of naming bacterial proteins in RefSeq. These architecture definitions are available via SPARCLE, the Subfamily Protein Architecture Labeling Engine. CDD can be accessed at https://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml.

Assuntos

Bases de Dados de Proteínas , Domínios Proteicos , Sequência de Aminoácidos , Sequência Conservada

6.

InterPro in 2019: improving coverage, classification and access to protein sequence annotations.

Mitchell, Alex L; Attwood, Teresa K; Babbitt, Patricia C; Blum, Matthias; Bork, Peer; Bridge, Alan; Brown, Shoshana D; Chang, Hsin-Yu; El-Gebali, Sara; Fraser, Matthew I; Gough, Julian; Haft, David R; Huang, Hongzhan; Letunic, Ivica; Lopez, Rodrigo; Luciani, Aurélien; Madeira, Fabio; Marchler-Bauer, Aron; Mi, Huaiyu; Natale, Darren A; Necci, Marco; Nuka, Gift; Orengo, Christine; Pandurangan, Arun P; Paysan-Lafosse, Typhaine; Pesseat, Sebastien; Potter, Simon C; Qureshi, Matloob A; Rawlings, Neil D; Redaschi, Nicole; Richardson, Lorna J; Rivoire, Catherine; Salazar, Gustavo A; Sangrador-Vegas, Amaia; Sigrist, Christian J A; Sillitoe, Ian; Sutton, Granger G; Thanki, Narmada; Thomas, Paul D; Tosatto, Silvio C E; Yong, Siew-Yit; Finn, Robert D.

Nucleic Acids Res ; 47(D1): D351-D360, 2019 01 08.

Artigo em Inglês | MEDLINE | ID: mdl-30398656

RESUMO

The InterPro database (http://www.ebi.ac.uk/interpro/) classifies protein sequences into families and predicts the presence of functionally important domains and sites. Here, we report recent developments with InterPro (version 70.0) and its associated software, including an 18% growth in the size of the database in terms on new InterPro entries, updates to content, the inclusion of an additional entry type, refined modelling of discontinuous domains, and the development of a new programmatic interface and website. These developments extend and enrich the information provided by InterPro, and provide greater flexibility in terms of data access. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB, and discuss how our evaluation of residue coverage may help guide future curation activities.

Assuntos

Bases de Dados de Proteínas , Anotação de Sequência Molecular , Animais , Bases de Dados Genéticas , Ontologia Genética , Humanos , Internet , Família Multigênica , Domínios Proteicos/genética , Homologia de Sequência de Aminoácidos , Software , Interface Usuário-Computador

7.

RefSeq: an update on prokaryotic genome annotation and curation.

Haft, Daniel H; DiCuccio, Michael; Badretdin, Azat; Brover, Vyacheslav; Chetvernin, Vyacheslav; O'Neill, Kathleen; Li, Wenjun; Chitsaz, Farideh; Derbyshire, Myra K; Gonzales, Noreen R; Gwadz, Marc; Lu, Fu; Marchler, Gabriele H; Song, James S; Thanki, Narmada; Yamashita, Roxanne A; Zheng, Chanjuan; Thibaud-Nissen, Françoise; Geer, Lewis Y; Marchler-Bauer, Aron; Pruitt, Kim D.

Nucleic Acids Res ; 46(D1): D851-D860, 2018 01 04.

Artigo em Inglês | MEDLINE | ID: mdl-29112715

RESUMO

The Reference Sequence (RefSeq) project at the National Center for Biotechnology Information (NCBI) provides annotation for over 95 000 prokaryotic genomes that meet standards for sequence quality, completeness, and freedom from contamination. Genomes are annotated by a single Prokaryotic Genome Annotation Pipeline (PGAP) to provide users with a resource that is as consistent and accurate as possible. Notable recent changes include the development of a hierarchical evidence scheme, a new focus on curating annotation evidence sources, the addition and curation of protein profile hidden Markov models (HMMs), release of an updated pipeline (PGAP-4), and comprehensive re-annotation of RefSeq prokaryotic genomes. Antimicrobial resistance proteins have been reannotated comprehensively, improved structural annotation of insertion sequence transposases and selenoproteins is provided, curated complex domain architectures have given upgraded names to millions of multidomain proteins, and we introduce a new kind of annotation rule-BlastRules. Continual curation of supporting evidence, and propagation of improved names onto RefSeq proteins ensures that the functional annotation of genomes is kept current. An increasing share of our annotation now derives from HMMs and other sets of annotation rules that are portable by nature, and available for download and for reuse by other investigators. RefSeq is found at https://www.ncbi.nlm.nih.gov/refseq/.

Assuntos

Curadoria de Dados , Bases de Dados de Ácidos Nucleicos , Genoma , Anotação de Sequência Molecular , Células Procarióticas , Archaea/genética , Bactérias/genética , Bases de Dados de Proteínas , Eucariotos/genética , Previsões , Humanos , Homologia de Sequência , Software , Vírus/genética

8.

CDD/SPARCLE: functional classification of proteins via subfamily domain architectures.

Marchler-Bauer, Aron; Bo, Yu; Han, Lianyi; He, Jane; Lanczycki, Christopher J; Lu, Shennan; Chitsaz, Farideh; Derbyshire, Myra K; Geer, Renata C; Gonzales, Noreen R; Gwadz, Marc; Hurwitz, David I; Lu, Fu; Marchler, Gabriele H; Song, James S; Thanki, Narmada; Wang, Zhouxi; Yamashita, Roxanne A; Zhang, Dachuan; Zheng, Chanjuan; Geer, Lewis Y; Bryant, Stephen H.

Nucleic Acids Res ; 45(D1): D200-D203, 2017 01 04.

Artigo em Inglês | MEDLINE | ID: mdl-27899674

RESUMO

NCBI's Conserved Domain Database (CDD) aims at annotating biomolecular sequences with the location of evolutionarily conserved protein domain footprints, and functional sites inferred from such footprints. An archive of pre-computed domain annotation is maintained for proteins tracked by NCBI's Entrez database, and live search services are offered as well. CDD curation staff supplements a comprehensive collection of protein domain and protein family models, which have been imported from external providers, with representations of selected domain families that are curated in-house and organized into hierarchical classifications of functionally distinct families and sub-families. CDD also supports comparative analyses of protein families via conserved domain architectures, and a recent curation effort focuses on providing functional characterizations of distinct subfamily architectures using SPARCLE: Subfamily Protein Architecture Labeling Engine. CDD can be accessed at https://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml.

Assuntos

Biologia Computacional/métodos , Bases de Dados de Proteínas , Domínios e Motivos de Interação entre Proteínas , Proteínas , Disseminação de Informação , Internet , Proteínas/química , Proteínas/classificação , Proteínas/genética

9.

InterPro in 2017-beyond protein family and domain annotations.

Finn, Robert D; Attwood, Teresa K; Babbitt, Patricia C; Bateman, Alex; Bork, Peer; Bridge, Alan J; Chang, Hsin-Yu; Dosztányi, Zsuzsanna; El-Gebali, Sara; Fraser, Matthew; Gough, Julian; Haft, David; Holliday, Gemma L; Huang, Hongzhan; Huang, Xiaosong; Letunic, Ivica; Lopez, Rodrigo; Lu, Shennan; Marchler-Bauer, Aron; Mi, Huaiyu; Mistry, Jaina; Natale, Darren A; Necci, Marco; Nuka, Gift; Orengo, Christine A; Park, Youngmi; Pesseat, Sebastien; Piovesan, Damiano; Potter, Simon C; Rawlings, Neil D; Redaschi, Nicole; Richardson, Lorna; Rivoire, Catherine; Sangrador-Vegas, Amaia; Sigrist, Christian; Sillitoe, Ian; Smithers, Ben; Squizzato, Silvano; Sutton, Granger; Thanki, Narmada; Thomas, Paul D; Tosatto, Silvio C E; Wu, Cathy H; Xenarios, Ioannis; Yeh, Lai-Su; Young, Siew-Yit; Mitchell, Alex L.

Nucleic Acids Res ; 45(D1): D190-D199, 2017 01 04.

Artigo em Inglês | MEDLINE | ID: mdl-27899635

RESUMO

InterPro (http://www.ebi.ac.uk/interpro/) is a freely available database used to classify protein sequences into families and to predict the presence of important domains and sites. InterProScan is the underlying software that allows both protein and nucleic acid sequences to be searched against InterPro's predictive models, which are provided by its member databases. Here, we report recent developments with InterPro and its associated software, including the addition of two new databases (SFLD and CDD), and the functionality to include residue-level annotation and prediction of intrinsic disorder. These developments enrich the annotations provided by InterPro, increase the overall number of residues annotated and allow more specific functional inferences.

Assuntos

Biologia Computacional/métodos , Bases de Dados de Proteínas , Domínios e Motivos de Interação entre Proteínas , Software , Humanos , Anotação de Sequência Molecular , Filogenia

10.

CDD: NCBI's conserved domain database.

Marchler-Bauer, Aron; Derbyshire, Myra K; Gonzales, Noreen R; Lu, Shennan; Chitsaz, Farideh; Geer, Lewis Y; Geer, Renata C; He, Jane; Gwadz, Marc; Hurwitz, David I; Lanczycki, Christopher J; Lu, Fu; Marchler, Gabriele H; Song, James S; Thanki, Narmada; Wang, Zhouxi; Yamashita, Roxanne A; Zhang, Dachuan; Zheng, Chanjuan; Bryant, Stephen H.

Nucleic Acids Res ; 43(Database issue): D222-6, 2015 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-25414356

RESUMO

NCBI's CDD, the Conserved Domain Database, enters its 15(th) year as a public resource for the annotation of proteins with the location of conserved domain footprints. Going forward, we strive to improve the coverage and consistency of domain annotation provided by CDD. We maintain a live search system as well as an archive of pre-computed domain annotation for sequences tracked in NCBI's Entrez protein database, which can be retrieved for single sequences or in bulk. We also maintain import procedures so that CDD contains domain models and domain definitions provided by several collections available in the public domain, as well as those produced by an in-house curation effort. The curation effort aims at increasing coverage and providing finer-grained classifications of common protein domains, for which a wealth of functional and structural data has become available. CDD curation generates alignment models of representative sequence fragments, which are in agreement with domain boundaries as observed in protein 3D structure, and which model the structurally conserved cores of domain families as well as annotate conserved features. CDD can be accessed at http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml.

Assuntos

Bases de Dados de Proteínas , Estrutura Terciária de Proteína , Motivos de Aminoácidos , Sequência de Aminoácidos , Sequência Conservada , Curadoria de Dados

11.

CDD: conserved domains and protein three-dimensional structure.

Marchler-Bauer, Aron; Zheng, Chanjuan; Chitsaz, Farideh; Derbyshire, Myra K; Geer, Lewis Y; Geer, Renata C; Gonzales, Noreen R; Gwadz, Marc; Hurwitz, David I; Lanczycki, Christopher J; Lu, Fu; Lu, Shennan; Marchler, Gabriele H; Song, James S; Thanki, Narmada; Yamashita, Roxanne A; Zhang, Dachuan; Bryant, Stephen H.

Nucleic Acids Res ; 41(Database issue): D348-52, 2013 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-23197659

RESUMO

CDD, the Conserved Domain Database, is part of NCBI's Entrez query and retrieval system and is also accessible via http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml. CDD provides annotation of protein sequences with the location of conserved domain footprints and functional sites inferred from these footprints. Pre-computed annotation is available via Entrez, and interactive search services accept single protein or nucleotide queries, as well as batch submissions of protein query sequences, utilizing RPS-BLAST to rapidly identify putative matches. CDD incorporates several protein domain and full-length protein model collections, and maintains an active curation effort that aims at providing fine grained classifications for major and well-characterized protein domain families, as supported by available protein three-dimensional (3D) structure and the published literature. To this date, the majority of protein 3D structures are represented by models tracked by CDD, and CDD curators are characterizing novel families that emerge from protein structure determination efforts.

Assuntos

Bases de Dados de Proteínas , Conformação Proteica , Estrutura Terciária de Proteína , Sequência de Aminoácidos , Sequência Conservada , Internet , Modelos Moleculares , Anotação de Sequência Molecular , Proteínas/química , Proteínas/classificação , Proteínas/genética , Análise de Sequência de Proteína

12.

CDD: a Conserved Domain Database for the functional annotation of proteins.

Marchler-Bauer, Aron; Lu, Shennan; Anderson, John B; Chitsaz, Farideh; Derbyshire, Myra K; DeWeese-Scott, Carol; Fong, Jessica H; Geer, Lewis Y; Geer, Renata C; Gonzales, Noreen R; Gwadz, Marc; Hurwitz, David I; Jackson, John D; Ke, Zhaoxi; Lanczycki, Christopher J; Lu, Fu; Marchler, Gabriele H; Mullokandov, Mikhail; Omelchenko, Marina V; Robertson, Cynthia L; Song, James S; Thanki, Narmada; Yamashita, Roxanne A; Zhang, Dachuan; Zhang, Naigong; Zheng, Chanjuan; Bryant, Stephen H.

Nucleic Acids Res ; 39(Database issue): D225-9, 2011 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-21109532

RESUMO

NCBI's Conserved Domain Database (CDD) is a resource for the annotation of protein sequences with the location of conserved domain footprints, and functional sites inferred from these footprints. CDD includes manually curated domain models that make use of protein 3D structure to refine domain models and provide insights into sequence/structure/function relationships. Manually curated models are organized hierarchically if they describe domain families that are clearly related by common descent. As CDD also imports domain family models from a variety of external sources, it is a partially redundant collection. To simplify protein annotation, redundant models and models describing homologous families are clustered into superfamilies. By default, domain footprints are annotated with the corresponding superfamily designation, on top of which specific annotation may indicate high-confidence assignment of family membership. Pre-computed domain annotation is available for proteins in the Entrez/Protein dataset, and a novel interface, Batch CD-Search, allows the computation and download of annotation for large sets of protein queries. CDD can be accessed via http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml.

Assuntos

Bases de Dados de Proteínas , Estrutura Terciária de Proteína , Sequência de Aminoácidos , Sequência Conservada , Modelos Biológicos , Proteínas/classificação , Análise de Sequência de Proteína

13.

CDD: specific functional annotation with the Conserved Domain Database.

Marchler-Bauer, Aron; Anderson, John B; Chitsaz, Farideh; Derbyshire, Myra K; DeWeese-Scott, Carol; Fong, Jessica H; Geer, Lewis Y; Geer, Renata C; Gonzales, Noreen R; Gwadz, Marc; He, Siqian; Hurwitz, David I; Jackson, John D; Ke, Zhaoxi; Lanczycki, Christopher J; Liebert, Cynthia A; Liu, Chunlei; Lu, Fu; Lu, Shennan; Marchler, Gabriele H; Mullokandov, Mikhail; Song, James S; Tasneem, Asba; Thanki, Narmada; Yamashita, Roxanne A; Zhang, Dachuan; Zhang, Naigong; Bryant, Stephen H.

Nucleic Acids Res ; 37(Database issue): D205-10, 2009 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-18984618

RESUMO

NCBI's Conserved Domain Database (CDD) is a collection of multiple sequence alignments and derived database search models, which represent protein domains conserved in molecular evolution. The collection can be accessed at http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml, and is also part of NCBI's Entrez query and retrieval system, cross-linked to numerous other resources. CDD provides annotation of domain footprints and conserved functional sites on protein sequences. Precalculated domain annotation can be retrieved for protein sequences tracked in NCBI's Entrez system, and CDD's collection of models can be queried with novel protein sequences via the CD-Search service at http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi. Starting with the latest version of CDD, v2.14, information from redundant and homologous domain models is summarized at a superfamily level, and domain annotation on proteins is flagged as either 'specific' (identifying molecular function with high confidence) or as 'non-specific' (identifying superfamily membership only).

Assuntos

Bases de Dados de Proteínas , Estrutura Terciária de Proteína , Sequência de Aminoácidos , Sequência Conservada , Proteínas/classificação , Alinhamento de Sequência , Análise de Sequência de Proteína

14.

CDD: a conserved domain database for interactive domain family analysis.

Marchler-Bauer, Aron; Anderson, John B; Derbyshire, Myra K; DeWeese-Scott, Carol; Gonzales, Noreen R; Gwadz, Marc; Hao, Luning; He, Siqian; Hurwitz, David I; Jackson, John D; Ke, Zhaoxi; Krylov, Dmitri; Lanczycki, Christopher J; Liebert, Cynthia A; Liu, Chunlei; Lu, Fu; Lu, Shennan; Marchler, Gabriele H; Mullokandov, Mikhail; Song, James S; Thanki, Narmada; Yamashita, Roxanne A; Yin, Jodie J; Zhang, Dachuan; Bryant, Stephen H.

Nucleic Acids Res ; 35(Database issue): D237-40, 2007 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-17135202

RESUMO

The conserved domain database (CDD) is part of NCBI's Entrez database system and serves as a primary resource for the annotation of conserved domain footprints on protein sequences in Entrez. Entrez's global query interface can be accessed at http://www.ncbi.nlm.nih.gov/Entrez and will search CDD and many other databases. Domain annotation for proteins in Entrez has been pre-computed and is readily available in the form of 'Conserved Domain' links. Novel protein sequences can be scanned against CDD using the CD-Search service; this service searches databases of CDD-derived profile models with protein sequence queries using BLAST heuristics, at http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi. Protein query sequences submitted to NCBI's protein BLAST search service are scanned for conserved domain signatures by default. The CDD collection contains models imported from Pfam, SMART and COG, as well as domain models curated at NCBI. NCBI curated models are organized into hierarchies of domains related by common descent. Here we report on the status of the curation effort and present a novel helper application, CDTree, which enables users of the CDD resource to examine curated hierarchies. More importantly, CDD and CDTree used in concert, serve as a powerful tool in protein classification, as they allow users to analyze protein sequences in the context of domain family hierarchies.

Assuntos

Bases de Dados de Proteínas , Estrutura Terciária de Proteína , Sequência de Aminoácidos , Animais , Sequência Conservada , Internet , Filogenia , Estrutura Terciária de Proteína/genética , Proteínas/classificação , Análise de Sequência de Proteína , Interface Usuário-Computador

15.

PubMed Text Similarity Model and its application to curation efforts in the Conserved Domain Database.

Islamaj, Rezarta; Wilbur, W John; Xie, Natalie; Gonzales, Noreen R; Thanki, Narmada; Yamashita, Roxanne; Zheng, Chanjuan; Marchler-Bauer, Aron; Lu, Zhiyong.

Database (Oxford) ; 20192019 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-31267135

RESUMO

This study proposes a text similarity model to help biocuration efforts of the Conserved Domain Database (CDD). CDD is a curated resource that catalogs annotated multiple sequence alignment models for ancient domains and full-length proteins. These models allow for fast searching and quick identification of conserved motifs in protein sequences via Reverse PSI-BLAST. In addition, CDD curators prepare summaries detailing the function of these conserved domains and specific protein families, based on published peer-reviewed articles. To facilitate information access for database users, it is desirable to specifically identify the referenced articles that support the assertions of curator-composed sentences. Moreover, CDD curators desire an alert system that scans the newly published literature and proposes related articles of relevance to the existing CDD records. Our approach to address these needs is a text similarity method that automatically maps a curator-written statement to candidate sentences extracted from the list of referenced articles, as well as the articles in the PubMed Central database. To evaluate this proposal, we paired CDD description sentences with the top 10 matching sentences from the literature, which were given to curators for review. Through this exercise, we discovered that we were able to map the articles in the reference list to the CDD description statements with an accuracy of 77%. In the dataset that was reviewed by curators, we were able to successfully provide references for 86% of the curator statements. In addition, we suggested new articles for curator review, which were accepted by curators to be added into the reference list at an acceptance rate of 50%. Through this process, we developed a substantial corpus of similar sentences from biomedical articles on protein sequence, structure and function research, which constitute the CDD text similarity corpus. This corpus contains 5159 sentence pairs judged for their similarity on a scale from 1 (low) to 5 (high) doubly annotated by four CDD curators. Curator-assigned similarity scores have a Pearson correlation coefficient of 0.70 and an inter-annotator agreement of 85%. To date, this is the largest biomedical text similarity resource that has been manually judged, evaluated and made publicly available to the community to foster research and development of text similarity algorithms.

Assuntos

Algoritmos , Curadoria de Dados , Bases de Dados de Proteínas , Proteínas , PubMed , Alinhamento de Sequência , Domínios Proteicos , Proteínas/química , Proteínas/genética

16.

The Protein Data Bank: unifying the archive.

Westbrook, John; Feng, Zukang; Jain, Shri; Bhat, T N; Thanki, Narmada; Ravichandran, Veerasamy; Gilliland, Gary L; Bluhm, Wolfgang; Weissig, Helge; Greer, Douglas S; Bourne, Philip E; Berman, Helen M.

Nucleic Acids Res ; 30(1): 245-8, 2002 Jan 01.

Artigo em Inglês | MEDLINE | ID: mdl-11752306

RESUMO

The Protein Data Bank (PDB; http://www.pdb.org/) is the single worldwide archive of structural data of biological macromolecules. This paper describes the progress that has been made in validating all data in the PDB archive and in releasing a uniform archive for the community. We have now produced a collection of mmCIF data files for the PDB archive (ftp://beta.rcsb.org/pub/pdb/uniformity/data/mmCIF/). A utility application that converts the mmCIF data files to the PDB format (called CIFTr) has also been released to provide support for existing software.

Assuntos

Bases de Dados de Proteínas , Proteínas/química , Sequência de Aminoácidos , Animais , Arquivos , Sistemas de Gerenciamento de Base de Dados , Enzimas/química , Previsões , Armazenamento e Recuperação da Informação , Internet , Ligantes , Polímeros/química , Conformação Proteica , Controle de Qualidade , Estereoisomerismo , Terminologia como Assunto , Interface Usuário-Computador

17.

Linking tumor cell cytotoxicity to mechanism of drug action: an integrated analysis of gene expression, small-molecule screening and structural databases.

Covell, David G; Wallqvist, Anders; Huang, Ruili; Thanki, Narmada; Rabow, Alfred A; Lu, Xiang-Jun.

Proteins ; 59(3): 403-33, 2005 May 15.

Artigo em Inglês | MEDLINE | ID: mdl-15778971

RESUMO

An integrated, bioinformatic analysis of three databases comprising tumor-cell-based small molecule screening data, gene expression measurements, and PDB (Protein Data Bank) ligand-target structures has been developed for probing mechanism of drug action (MOA). Clustering analysis of GI50 profiles for the NCI's database of compounds screened across a panel of tumor cells (NCI60) was used to select a subset of unique cytotoxic responses for about 4000 small molecules. Drug-gene-PDB relationships for this test set were examined by correlative analysis of cytotoxic response and differential gene expression profiles within the NCI60 and structural comparisons with known ligand-target crystallographic complexes. A survey of molecular features within these compounds finds thirteen conserved Compound Classes, each class exhibiting chemical features important for interactions with a variety of biological targets. Protein targets for an additional twelve Compound Classes could be directly assigned using drug-protein interactions observed in the crystallographic database. Results from the analysis of constitutive gene expressions established a clear connection between chemo-resistance and overexpression of gene families associated with the extracellular matrix, cytoskeletal organization, and xenobiotic metabolism. Conversely, chemo-sensitivity implicated overexpression of gene families involved in homeostatic functions of nucleic acid repair, aryl hydrocarbon metabolism, heat shock response, proteasome degradation and apoptosis. Correlations between chemo-responsiveness and differential gene expressions identified chemotypes with nonselective (i.e., many) molecular targets from those likely to have selective (i.e., few) molecular targets. Applications of data mining strategies that jointly utilize tumor cell screening, genomic, and structural data are presented for hypotheses generation and identifying novel anticancer candidates.

Assuntos

Perfilação da Expressão Gênica , Neoplasias/genética , Antineoplásicos/uso terapêutico , Sobrevivência Celular/efeitos dos fármacos , Regulação Neoplásica da Expressão Gênica , Humanos , Neoplasias/tratamento farmacológico , Neoplasias/patologia , Transcrição Gênica

18.

Molecular classification of cancer: unsupervised self-organizing map analysis of gene expression microarray data.

Covell, David G; Wallqvist, Anders; Rabow, Alfred A; Thanki, Narmada.

Mol Cancer Ther ; 2(3): 317-32, 2003 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-12657727

RESUMO

An unsupervised self-organizing map-based clustering strategy has been developed to classify tissue samples from an oligonucleotide microarray patient database. Our method is based on the likelihood that a test data vector may have a gene expression fingerprint that is shared by more than one tumor class and as such can identify datasets that cannot be unequivocally assigned to a single tumor class. Our self-organizing map analysis completely separated the tumor from the normal expression datasets. Within the 14 different tumor types, classification accuracies on the order of approximately 80% correct were achieved. Nearly perfect classifications were found for leukemia, central nervous system, melanoma, uterine, and lymphoma tumor types, with very poor classifications found for colorectal, ovarian, breast, and lung tumors. Classification results were further analyzed to identify sets of differentially expressed genes between tumor and normal gene expressions and among each tumor class. Within the total pool of 1139 genes most differentially expressed in this dataset, subsets were found that could be vetted according to previously published literature sources to be specific tumor markers. Attempts to classify gene expression datasets from other sources found a wide range of classification accuracies. Discussions about the utility of this method and the quality of data needed for accurate tumor classifications are provided.

Assuntos

Bases de Dados Factuais , Perfilação da Expressão Gênica , Neoplasias/classificação , Neoplasias/genética , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Algoritmos , Inteligência Artificial , Biomarcadores Tumorais , Biologia Computacional , DNA de Neoplasias/análise , Humanos , Neoplasias/patologia

19.

Crystal structure of the YjeE protein from Haemophilus influenzae: a putative Atpase involved in cell wall synthesis.

Teplyakov, Alexey; Obmolova, Galina; Tordova, Maria; Thanki, Narmada; Bonander, Nicklas; Eisenstein, Edward; Howard, Andrew J; Gilliland, Gary L.

Proteins ; 48(2): 220-6, 2002 Aug 01.

Artigo em Inglês | MEDLINE | ID: mdl-12112691

RESUMO

A hypothetical protein encoded by the gene YjeE of Haemophilus influenzae was selected as part of a structural genomics project for X-ray analysis to assist with the functional assignment. The protein is considered essential to bacteria because the gene is present in virtually all bacterial genomes but not in those of archaea or eukaryotes. The amino acid sequence shows no homology to other proteins except for the presence of the Walker A motif G-X-X-X-X-G-K-T that indicates the possibility of a nucleotide-binding protein. The YjeE protein was cloned, expressed, and the crystal structure determined by the MAD method at 1.7-A resolution. The protein has a nucleotide-binding fold with a four-stranded parallel beta-sheet flanked by antiparallel beta-strands on each side. The topology of the beta-sheet is unique among P-loop proteins and has features of different families of enzymes. Crystallization of YjeE in the presence of ATP and Mg2+ resulted in the structure with ADP bound in the P-loop. The ATPase activity of YjeE was confirmed by kinetic measurements. The distribution of conserved residues suggests that the protein may work as a "molecular switch" triggered by ATP hydrolysis. The phylogenetic pattern of YjeE suggests its involvement in cell wall biosynthesis.

Assuntos

Adenosina Trifosfatases/química , Proteínas de Bactérias/química , Haemophilus influenzae/enzimologia , Modelos Moleculares , Adenosina Trifosfatases/genética , Adenosina Trifosfatases/fisiologia , Sequência de Aminoácidos , Proteínas de Bactérias/genética , Proteínas de Bactérias/fisiologia , Parede Celular/metabolismo , Cristalografia por Raios X , Haemophilus influenzae/crescimento & desenvolvimento , Dados de Sequência Molecular , Nucleotídeos/metabolismo , Filogenia , Homologia de Sequência de Aminoácidos

20.

Assisting functional assignment for hypothetical Heamophilus influenzae gene products through structural genomics.

Gilliland, Gary L; Teplyakov, Alexey; Obmolova, Galina; Tordova, Maria; Thanki, Narmada; Ladner, Jane; Herzberg, Osnat; Lim, Kap; Zhang, Hong; Huang, Kui; Li, Zhong; Tempczyk, Aleksandra; Krajewski, Wojiech; Parsons, Lisa; Yeh, Deok Cheon; Orban, John; Howard, Andrew J; Eisenstein, Edward; F Parsons, James; Bonander, Nicklas; Fisher, Kathryn E; Toedt, John; Reddy, Prasad; Rao, C V; Melamud, Eugene; Moult, John.

Curr Drug Targets Infect Disord ; 2(4): 339-53, 2002 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-12570740

RESUMO

The three-dimensional structures of Haemophilus influenzae proteins whose biological functions are unknown are being determined as part of a structural genomics project to ask whether structural information can assist in assigning the functions of proteins. The structures of the hypothetical proteins are being used to guide further studies and narrow the field of such studies for ultimately determining protein function. An outline of the structural genomics methodological approach is provided along with summaries of a number of completed and in progress crystallographic and NMR structure determinations. With more than twenty-five structures determined at this point and with many more in various stages of completion, the results are encouraging in that some level of functional understanding can be deduced from experimentally solved structures. In addition to aiding in functional assignment, this effort is identifying a number of possible new targets for drug development.

Assuntos

Genoma Viral , Haemophilus influenzae/genética , Proteínas Virais/química , Haemophilus influenzae/metabolismo , Modelos Moleculares , Conformação Proteica , Proteínas Virais/genética , Proteínas Virais/fisiologia

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA