Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
1.
Nucleic Acids Res ; 51(D1): D384-D388, 2023 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-36477806

RESUMO

NLM's conserved domain database (CDD) is a collection of protein domain and protein family models constructed as multiple sequence alignments. Its main purpose is to provide annotation for protein and translated nucleotide sequences with the location of domain footprints and associated functional sites, and to define protein domain architecture as a basis for assigning gene product names and putative/predicted function. CDD has been available publicly for over 20 years and has grown substantially during that time. Maintaining an archive of pre-computed annotation continues to be a challenge and has slowed down the cadence of CDD releases. CDD curation staff builds hierarchical classifications of large protein domain families, adds models for novel domain families via surveillance of the protein 'dark matter' that currently lacks annotation, and now spends considerable effort on providing names and attribution for conserved domain architectures. CDD can be accessed at https://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml.


Assuntos
Bases de Dados de Proteínas , Proteínas , Humanos , Sequência de Aminoácidos , Sequência Conservada , Estrutura Terciária de Proteína , Proteínas/química , Proteínas/genética , Domínios Proteicos
2.
Nucleic Acids Res ; 48(D1): D265-D268, 2020 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-31777944

RESUMO

As NLM's Conserved Domain Database (CDD) enters its 20th year of operations as a publicly available resource, CDD curation staff continues to develop hierarchical classifications of widely distributed protein domain families, and to record conserved sites associated with molecular function, so that they can be mapped onto user queries in support of hypothesis-driven biomolecular research. CDD offers both an archive of pre-computed domain annotations as well as live search services for both single protein or nucleotide queries and larger sets of protein query sequences. CDD staff has continued to characterize protein families via conserved domain architectures and has built up a significant corpus of curated domain architectures in support of naming bacterial proteins in RefSeq. These architecture definitions are available via SPARCLE, the Subfamily Protein Architecture Labeling Engine. CDD can be accessed at https://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml.


Assuntos
Bases de Dados de Proteínas , Domínios Proteicos , Sequência de Aminoácidos , Sequência Conservada
3.
Bioinformatics ; 36(1): 131-135, 2020 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-31218344

RESUMO

MOTIVATION: Build a web-based 3D molecular structure viewer focusing on interactive structural analysis. RESULTS: iCn3D (I-see-in-3D) can simultaneously show 3D structure, 2D molecular contacts and 1D protein and nucleotide sequences through an integrated sequence/annotation browser. Pre-defined and arbitrary molecular features can be selected in any of the 1D/2D/3D windows as sets of residues and these selections are synchronized dynamically in all displays. Biological annotations such as protein domains, single nucleotide variations, etc. can be shown as tracks in the 1D sequence/annotation browser. These customized displays can be shared with colleagues or publishers via a simple URL. iCn3D can display structure-structure alignments obtained from NCBI's VAST+ service. It can also display the alignment of a sequence with a structure as identified by BLAST, and thus relate 3D structure to a large fraction of all known proteins. iCn3D can also display electron density maps or electron microscopy (EM) density maps, and export files for 3D printing. The following example URL exemplifies some of the 1D/2D/3D representations: https://www.ncbi.nlm.nih.gov/Structure/icn3d/full.html?mmdbid=1TUP&showanno=1&show2d=1&showsets=1. AVAILABILITY AND IMPLEMENTATION: iCn3D is freely available to the public. Its source code is available at https://github.com/ncbi/icn3d. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Sequência de Bases , Biologia Computacional , Internet , Modelos Moleculares , Proteínas , Software , Biologia Computacional/métodos , Bases de Dados Genéticas , Conformação Molecular , Proteínas/química
4.
Nucleic Acids Res ; 45(D1): D200-D203, 2017 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-27899674

RESUMO

NCBI's Conserved Domain Database (CDD) aims at annotating biomolecular sequences with the location of evolutionarily conserved protein domain footprints, and functional sites inferred from such footprints. An archive of pre-computed domain annotation is maintained for proteins tracked by NCBI's Entrez database, and live search services are offered as well. CDD curation staff supplements a comprehensive collection of protein domain and protein family models, which have been imported from external providers, with representations of selected domain families that are curated in-house and organized into hierarchical classifications of functionally distinct families and sub-families. CDD also supports comparative analyses of protein families via conserved domain architectures, and a recent curation effort focuses on providing functional characterizations of distinct subfamily architectures using SPARCLE: Subfamily Protein Architecture Labeling Engine. CDD can be accessed at https://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml.


Assuntos
Biologia Computacional/métodos , Bases de Dados de Proteínas , Domínios e Motivos de Interação entre Proteínas , Proteínas , Disseminação de Informação , Internet , Proteínas/química , Proteínas/classificação , Proteínas/genética
5.
Nucleic Acids Res ; 45(D1): D190-D199, 2017 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-27899635

RESUMO

InterPro (http://www.ebi.ac.uk/interpro/) is a freely available database used to classify protein sequences into families and to predict the presence of important domains and sites. InterProScan is the underlying software that allows both protein and nucleic acid sequences to be searched against InterPro's predictive models, which are provided by its member databases. Here, we report recent developments with InterPro and its associated software, including the addition of two new databases (SFLD and CDD), and the functionality to include residue-level annotation and prediction of intrinsic disorder. These developments enrich the annotations provided by InterPro, increase the overall number of residues annotated and allow more specific functional inferences.


Assuntos
Biologia Computacional/métodos , Bases de Dados de Proteínas , Domínios e Motivos de Interação entre Proteínas , Software , Humanos , Anotação de Sequência Molecular , Filogenia
6.
Nucleic Acids Res ; 43(Database issue): D222-6, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25414356

RESUMO

NCBI's CDD, the Conserved Domain Database, enters its 15(th) year as a public resource for the annotation of proteins with the location of conserved domain footprints. Going forward, we strive to improve the coverage and consistency of domain annotation provided by CDD. We maintain a live search system as well as an archive of pre-computed domain annotation for sequences tracked in NCBI's Entrez protein database, which can be retrieved for single sequences or in bulk. We also maintain import procedures so that CDD contains domain models and domain definitions provided by several collections available in the public domain, as well as those produced by an in-house curation effort. The curation effort aims at increasing coverage and providing finer-grained classifications of common protein domains, for which a wealth of functional and structural data has become available. CDD curation generates alignment models of representative sequence fragments, which are in agreement with domain boundaries as observed in protein 3D structure, and which model the structurally conserved cores of domain families as well as annotate conserved features. CDD can be accessed at http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml.


Assuntos
Bases de Dados de Proteínas , Estrutura Terciária de Proteína , Motivos de Aminoácidos , Sequência de Aminoácidos , Sequência Conservada , Curadoria de Dados
7.
Nucleic Acids Res ; 41(Database issue): D348-52, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-23197659

RESUMO

CDD, the Conserved Domain Database, is part of NCBI's Entrez query and retrieval system and is also accessible via http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml. CDD provides annotation of protein sequences with the location of conserved domain footprints and functional sites inferred from these footprints. Pre-computed annotation is available via Entrez, and interactive search services accept single protein or nucleotide queries, as well as batch submissions of protein query sequences, utilizing RPS-BLAST to rapidly identify putative matches. CDD incorporates several protein domain and full-length protein model collections, and maintains an active curation effort that aims at providing fine grained classifications for major and well-characterized protein domain families, as supported by available protein three-dimensional (3D) structure and the published literature. To this date, the majority of protein 3D structures are represented by models tracked by CDD, and CDD curators are characterizing novel families that emerge from protein structure determination efforts.


Assuntos
Bases de Dados de Proteínas , Conformação Proteica , Estrutura Terciária de Proteína , Sequência de Aminoácidos , Sequência Conservada , Internet , Modelos Moleculares , Anotação de Sequência Molecular , Proteínas/química , Proteínas/classificação , Proteínas/genética , Análise de Sequência de Proteína
8.
Nucleic Acids Res ; 40(Database issue): D461-4, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22135289

RESUMO

Close to 60% of protein sequences tracked in comprehensive databases can be mapped to a known three-dimensional (3D) structure by standard sequence similarity searches. Potentially, a great deal can be learned about proteins or protein families of interest from considering 3D structure, and to this day 3D structure data may remain an underutilized resource. Here we present enhancements in the Molecular Modeling Database (MMDB) and its data presentation, specifically pertaining to biologically relevant complexes and molecular interactions. MMDB is tightly integrated with NCBI's Entrez search and retrieval system, and mirrors the contents of the Protein Data Bank. It links protein 3D structure data with sequence data, sequence classification resources and PubChem, a repository of small-molecule chemical structures and their biological activities, facilitating access to 3D structure data not only for structural biologists, but also for molecular biologists and chemists. MMDB provides a complete set of detailed and pre-computed structural alignments obtained with the VAST algorithm, and provides visualization tools for 3D structure and structure/sequence alignment via the molecular graphics viewer Cn3D. MMDB can be accessed at http://www.ncbi.nlm.nih.gov/structure.


Assuntos
Bases de Dados de Proteínas , Modelos Moleculares , Conformação Proteica , Análise de Sequência de Proteína
9.
Nucleic Acids Res ; 39(Database issue): D225-9, 2011 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-21109532

RESUMO

NCBI's Conserved Domain Database (CDD) is a resource for the annotation of protein sequences with the location of conserved domain footprints, and functional sites inferred from these footprints. CDD includes manually curated domain models that make use of protein 3D structure to refine domain models and provide insights into sequence/structure/function relationships. Manually curated models are organized hierarchically if they describe domain families that are clearly related by common descent. As CDD also imports domain family models from a variety of external sources, it is a partially redundant collection. To simplify protein annotation, redundant models and models describing homologous families are clustered into superfamilies. By default, domain footprints are annotated with the corresponding superfamily designation, on top of which specific annotation may indicate high-confidence assignment of family membership. Pre-computed domain annotation is available for proteins in the Entrez/Protein dataset, and a novel interface, Batch CD-Search, allows the computation and download of annotation for large sets of protein queries. CDD can be accessed via http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml.


Assuntos
Bases de Dados de Proteínas , Estrutura Terciária de Proteína , Sequência de Aminoácidos , Sequência Conservada , Modelos Biológicos , Proteínas/classificação , Análise de Sequência de Proteína
10.
Aliment Pharmacol Ther ; 55(5): 558-567, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-35032052

RESUMO

BACKGROUND: Hepatitis B surface antigen (HBsAg) seroclearance is the most important milestone indicating favourable clinical outcomes in patients with chronic hepatitis B (CHB). However, it is difficult to achieve due to the impaired HBV-specific immunity, such as programmed cell death 1 (PD-1)-associated T cell exhaustion. We assessed soluble PD-1 (sPD-1) as a novel seromarker for predicting spontaneous HBsAg loss. METHODS: Serial serum levels of sPD-1 were evaluated in 1046 untreated hepatitis B e antigen (HBeAg)-seronegative individuals who had achieved undetectable serum HBV DNA. Multiple regression analyses were applied to assess associations among baseline and subsequent sPD-1 levels, HBsAg decline during follow-up, and spontaneous HBsAg seroclearance. RESULTS: A total of 390 individuals achieved spontaneous HBsAg seroclearance during 6464.4 person-years of follow-up. Baseline sPD-1 levels were inversely associated with baseline HBsAg levels (qHBsAg) as well as a greater decline in qHBsAg during follow-up. Incidence rates of HBsAg seroclearance were 11.5, 61.7, 96.7 and 151.0 per 1000 person-years for sPD-1 levels of ≥4000, 536-3999, 125-535 and <125 pg/mL, respectively (Ptrend  < 0.0001). Compared with baseline sPD-1 levels ≥4000 pg/mL, the rate ratio (95% CI) of HBsAg seroclearance was 2.1 (1.1-3.9), 3.0 (1.6-5.5) and 5.1 (2.8-9.5), for baseline sPD-1 levels of 536-3999, 125-535 and <125 pg/mL, respectively, after adjustment for sex, age and serum alanine aminotransferase and HBsAg levels. CONCLUSION: sPD-1 level is a novel marker which independently predicts spontaneous HBsAg seroclearance of HBeAg-negative inactive CHB patients with undetectable HBV DNA. (word count: 234, <250).


Assuntos
Hepatite B Crônica , Apoptose , DNA Viral/genética , Antígenos de Superfície da Hepatite B , Antígenos E da Hepatite B , Vírus da Hepatite B/genética , Hepatite B Crônica/tratamento farmacológico , Humanos , Receptor de Morte Celular Programada 1
11.
Front Mol Biosci ; 9: 831740, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35252351

RESUMO

iCn3D was initially developed as a web-based 3D molecular viewer. It then evolved from visualization into a full-featured interactive structural analysis software. It became a collaborative research instrument through the sharing of permanent, shortened URLs that encapsulate not only annotated visual molecular scenes, but also all underlying data and analysis scripts in a FAIR manner. More recently, with the growth of structural databases, the need to analyze large structural datasets systematically led us to use Python scripts and convert the code to be used in Node. js scripts. We showed a few examples of Python scripts at https://github.com/ncbi/icn3d/tree/master/icn3dpython to export secondary structures or PNG images from iCn3D. Users just need to replace the URL in the Python scripts to export other annotations from iCn3D. Furthermore, any interactive iCn3D feature can be converted into a Node. js script to be run in batch mode, enabling an interactive analysis performed on one or a handful of protein complexes to be scaled up to analysis features of large ensembles of structures. Currently available Node. js analysis scripts examples are available at https://github.com/ncbi/icn3d/tree/master/icn3dnode. This development will enable ensemble analyses on growing structural databases such as AlphaFold or RoseTTAFold on one hand and Electron Microscopy on the other. In this paper, we also review new features such as DelPhi electrostatic potential, 3D view of mutations, alignment of multiple chains, assembly of multiple structures by realignment, dynamic symmetry calculation, 2D cartoons at different levels, interactive contact maps, and use of iCn3D in Jupyter Notebook as described at https://pypi.org/project/icn3dpy.

12.
Nucleic Acids Res ; 37(Database issue): D205-10, 2009 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-18984618

RESUMO

NCBI's Conserved Domain Database (CDD) is a collection of multiple sequence alignments and derived database search models, which represent protein domains conserved in molecular evolution. The collection can be accessed at http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml, and is also part of NCBI's Entrez query and retrieval system, cross-linked to numerous other resources. CDD provides annotation of domain footprints and conserved functional sites on protein sequences. Precalculated domain annotation can be retrieved for protein sequences tracked in NCBI's Entrez system, and CDD's collection of models can be queried with novel protein sequences via the CD-Search service at http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi. Starting with the latest version of CDD, v2.14, information from redundant and homologous domain models is summarized at a superfamily level, and domain annotation on proteins is flagged as either 'specific' (identifying molecular function with high confidence) or as 'non-specific' (identifying superfamily membership only).


Assuntos
Bases de Dados de Proteínas , Estrutura Terciária de Proteína , Sequência de Aminoácidos , Sequência Conservada , Proteínas/classificação , Alinhamento de Sequência , Análise de Sequência de Proteína
13.
Nucleic Acids Res ; 35(Database issue): D298-300, 2007 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-17135201

RESUMO

Three-dimensional (3D) structure is now known for a large fraction of all protein families. Thus, it has become rather likely that one will find a homolog with known 3D structure when searching a sequence database with an arbitrary query sequence. Depending on the extent of similarity, such neighbor relationships may allow one to infer biological function and to identify functional sites such as binding motifs or catalytic centers. Entrez's 3D-structure database, the Molecular Modeling Database (MMDB), provides easy access to the richness of 3D structure data and its large potential for functional annotation. Entrez's search engine offers several tools to assist biologist users: (i) links between databases, such as between protein sequences and structures, (ii) pre-computed sequence and structure neighbors, (iii) visualization of structure and sequence/structure alignment. Here, we describe an annotation service that combines some of these tools automatically, Entrez's 'Related Structure' links. For all proteins in Entrez, similar sequences with known 3D structure are detected by BLAST and alignments are recorded. The 'Related Structure' service summarizes this information and presents 3D views mapping sequence residues onto all 3D structures available in MMDB (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=structure).


Assuntos
Bases de Dados de Proteínas , Modelos Moleculares , Conformação Proteica , Análise de Sequência de Proteína , Internet , Alinhamento de Sequência , Interface Usuário-Computador
14.
Nucleic Acids Res ; 35(Database issue): D237-40, 2007 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-17135202

RESUMO

The conserved domain database (CDD) is part of NCBI's Entrez database system and serves as a primary resource for the annotation of conserved domain footprints on protein sequences in Entrez. Entrez's global query interface can be accessed at http://www.ncbi.nlm.nih.gov/Entrez and will search CDD and many other databases. Domain annotation for proteins in Entrez has been pre-computed and is readily available in the form of 'Conserved Domain' links. Novel protein sequences can be scanned against CDD using the CD-Search service; this service searches databases of CDD-derived profile models with protein sequence queries using BLAST heuristics, at http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi. Protein query sequences submitted to NCBI's protein BLAST search service are scanned for conserved domain signatures by default. The CDD collection contains models imported from Pfam, SMART and COG, as well as domain models curated at NCBI. NCBI curated models are organized into hierarchies of domains related by common descent. Here we report on the status of the curation effort and present a novel helper application, CDTree, which enables users of the CDD resource to examine curated hierarchies. More importantly, CDD and CDTree used in concert, serve as a powerful tool in protein classification, as they allow users to analyze protein sequences in the context of domain family hierarchies.


Assuntos
Bases de Dados de Proteínas , Estrutura Terciária de Proteína , Sequência de Aminoácidos , Animais , Sequência Conservada , Internet , Filogenia , Estrutura Terciária de Proteína/genética , Proteínas/classificação , Análise de Sequência de Proteína , Interface Usuário-Computador
15.
Artigo em Inglês | MEDLINE | ID: mdl-25767294

RESUMO

When annotating protein sequences with the footprints of evolutionarily conserved domains, conservative score or E-value thresholds need to be applied for RPS-BLAST hits, to avoid many false positives. We notice that manual inspection and classification of hits gathered at a higher threshold can add a significant amount of valuable domain annotation. We report an automated algorithm that 'rescues' valuable borderline-scoring domain hits that are well-supported by domain architecture (DA, the sequential order of conserved domains in a protein query), including tandem repeats of domain hits reported at a more conservative threshold. This algorithm is now available as a selectable option on the public conserved domain search (CD-Search) pages. We also report on the possibility to 'suppress' domain hits close to the threshold based on a lack of well-supported DA and to implement this conservatively as an option in live conserved domain searches and for pre-computed results. Improving domain annotation consistency will in turn reduce the fraction of NR sequences with incomplete DAs.


Assuntos
Algoritmos , Bases de Dados de Proteínas , Anotação de Sequência Molecular/métodos , Análise de Sequência de Proteína/métodos , Estrutura Terciária de Proteína
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA