Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
1.
Nucleic Acids Res ; 47(D1): D398-D402, 2019 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-30371819

RESUMEN

MoonDB 2.0 (http://moondb.hb.univ-amu.fr/) is a database of predicted and manually curated extreme multifunctional (EMF) and moonlighting proteins, i.e. proteins that perform multiple unrelated functions. We have previously shown that such proteins can be predicted through the analysis of their molecular interaction subnetworks, their functional annotations and their association to distinct groups of proteins that are involved in unrelated functions. In MoonDB 2.0, we updated the set of human EMF proteins (238 proteins), using the latest functional annotations and protein-protein interaction networks. Furthermore, for the first time, we applied our method to four additional model organisms - mouse, fly, worm and yeast - and identified 54 novel EMF proteins in these species. In addition to novel predictions, this update contains 63 human and yeast proteins that were manually curated from literature, including descriptions of moonlighting functions and associated references. Importantly, MoonDB's interface was fully redesigned and improved, and its entries are now cross-referenced in the UniProt Knowledgebase (UniProtKB). MoonDB will be updated once a year with the novel EMF candidates calculated from the latest available protein interactions and functional annotations.


Asunto(s)
Bases de Datos de Proteínas , Animales , Caenorhabditis elegans/genética , Curaduría de Datos , Drosophila melanogaster/genética , Ontología de Genes , Humanos , Ratones , Anotación de Secuencia Molecular , Mapeo de Interacción de Proteínas , Interfaz Usuario-Computador , Levaduras/genética
2.
Nucleic Acids Res ; 46(D1): D428-D434, 2018 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-29136216

RESUMEN

Short linear motifs (SLiMs) are protein binding modules that play major roles in almost all cellular processes. SLiMs are short, often highly degenerate, difficult to characterize and hard to detect. The eukaryotic linear motif (ELM) resource (elm.eu.org) is dedicated to SLiMs, consisting of a manually curated database of over 275 motif classes and over 3000 motif instances, and a pipeline to discover candidate SLiMs in protein sequences. For 15 years, ELM has been one of the major resources for motif research. In this database update, we present the latest additions to the database including 32 new motif classes, and new features including Uniprot and Reactome integration. Finally, to help provide cellular context, we present some biological insights about SLiMs in the cell cycle, as targets for bacterial pathogenicity and their functionality in the human kinome.


Asunto(s)
Bases de Datos de Proteínas , Células Eucariotas/metabolismo , Interacciones Huésped-Patógeno/genética , Anotación de Secuencia Molecular , Proteínas/química , Programas Informáticos , Secuencias de Aminoácidos , Animales , Bacterias/genética , Bacterias/metabolismo , Sitios de Unión , Ciclo Celular/genética , Células Eucariotas/citología , Células Eucariotas/microbiología , Células Eucariotas/virología , Hongos/genética , Hongos/metabolismo , Humanos , Internet , Modelos Moleculares , Plantas/genética , Plantas/metabolismo , Unión Proteica , Conformación Proteica en Hélice alfa , Conformación Proteica en Lámina beta , Dominios y Motivos de Interacción de Proteínas , Proteínas/genética , Proteínas/metabolismo , Virus/genética , Virus/metabolismo
3.
Proteomics ; 15(1): 48-57, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25307260

RESUMEN

In this article, we provide a comprehensive study of the content of the Universal Protein Resource (UniProt) protein data sets for human and mouse. The tryptic search spaces of the UniProtKB (UniProt knowledgebase) complete proteome sets were compared with other data sets from UniProtKB and with the corresponding International Protein Index, reference sequence, Ensembl, and UniRef100 (where UniRef is UniProt reference clusters) organism-specific data sets. All protein forms annotated in UniProtKB (both the canonical sequences and isoforms) were evaluated in this study. In addition, natural and disease-associated amino acid variants annotated in UniProtKB were included in the evaluation. The peptide unicity was also evaluated for each data set. Furthermore, the peptide information in the UniProtKB data sets was also compared against the available peptide-level identifications in the main MS-based proteomics repositories. Identifying the peptides observed in these repositories is an important resource of information for protein databases as they provide supporting evidence for the existence of otherwise predicted proteins. Likewise, the repositories could use the information available in UniProtKB to direct reprocessing efforts on specific sets of peptides/proteins of interest. In summary, we provide comprehensive information about the different organism-specific sequence data sets available from UniProt, together with the pros and cons for each, in terms of search space for MS-based bottom-up proteomics workflows. The aim of the analysis is to provide a clear view of the tryptic search space of UniProt and other protein databases to enable scientists to select those most appropriate for their purposes.


Asunto(s)
Bases de Datos de Proteínas , Proteínas/química , Proteómica , Animales , Humanos , Ratones , Péptidos/química , Péptidos/metabolismo , Isoformas de Proteínas/química , Isoformas de Proteínas/metabolismo , Proteínas/metabolismo , Análisis de Secuencia de Proteína , Tripsina/metabolismo
4.
Nucleic Acids Res ; 40(Database issue): D565-70, 2012 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-22123736

RESUMEN

The GO annotation dataset provided by the UniProt Consortium (GOA: http://www.ebi.ac.uk/GOA) is a comprehensive set of evidenced-based associations between terms from the Gene Ontology resource and UniProtKB proteins. Currently supplying over 100 million annotations to 11 million proteins in more than 360,000 taxa, this resource has increased 2-fold over the last 2 years and has benefited from a wealth of checks to improve annotation correctness and consistency as well as now supplying a greater information content enabled by GO Consortium annotation format developments. Detailed, manual GO annotations obtained from the curation of peer-reviewed papers are directly contributed by all UniProt curators and supplemented with manual and electronic annotations from 36 model organism and domain-focused scientific resources. The inclusion of high-quality, automatic annotation predictions ensures the UniProt GO annotation dataset supplies functional information to a wide range of proteins, including those from poorly characterized, non-model organism species. UniProt GO annotations are freely available in a range of formats accessible by both file downloads and web-based views. In addition, the introduction of a new, normalized file format in 2010 has made for easier handling of the complete UniProt-GOA data set.


Asunto(s)
Bases de Datos de Proteínas , Anotación de Secuencia Molecular , Vocabulario Controlado , Anotación de Secuencia Molecular/normas
5.
Artículo en Inglés | MEDLINE | ID: mdl-28025334

RESUMEN

Advances in high-throughput sequencing have led to an unprecedented growth in genome sequences being submitted to biological databases. In particular, the sequencing of large numbers of nearly identical bacterial genomes during infection outbreaks and for other large-scale studies has resulted in a high level of redundancy in nucleotide databases and consequently in the UniProt Knowledgebase (UniProtKB). Redundancy negatively impacts on database searches by causing slower searches, an increase in statistical bias and cumbersome result analysis. The redundancy combined with the large data volume increases the computational costs for most reuses of UniProtKB data. All of this poses challenges for effective discovery in this wealth of data. With the continuing development of sequencing technologies, it is clear that finding ways to minimize redundancy is crucial to maintaining UniProt's essential contribution to data interpretation by our users. We have developed a methodology to identify and remove highly redundant proteomes from UniProtKB. The procedure identifies redundant proteomes by performing pairwise alignments of sets of sequences for pairs of proteomes and subsequently, applies graph theory to find dominating sets that provide a set of non-redundant proteomes with a minimal loss of information. This method was implemented for bacteria in mid-2015, resulting in a removal of 50 million proteins in UniProtKB. With every new release, this procedure is used to filter new incoming proteomes, resulting in a more scalable and scientifically valuable growth of UniProtKB.Database URL: http://www.uniprot.org/proteomes/.


Asunto(s)
Bacterias/genética , Proteínas Bacterianas/genética , Bases de Datos de Proteínas , Anotación de Secuencia Molecular/métodos , Proteoma/genética , Análisis de Secuencia de Proteína/métodos , Bacterias/metabolismo , Proteínas Bacterianas/metabolismo , Proteoma/metabolismo
6.
Artículo en Inglés | MEDLINE | ID: mdl-27114493

RESUMEN

Domain-specific databases are essential resources for the biomedical community, leveraging expert knowledge to curate published literature and provide access to referenced data and knowledge. The limited scope of these databases, however, poses important challenges on their infrastructure, visibility, funding and usefulness to the broader scientific community. CollecTF is a community-oriented database documenting experimentally validated transcription factor (TF)-binding sites in the Bacteria domain. In its quest to become a community resource for the annotation of transcriptional regulatory elements in bacterial genomes, CollecTF aims to move away from the conventional data-repository paradigm of domain-specific databases. Through the adoption of well-established ontologies, identifiers and collaborations, CollecTF has progressively become also a portal for the annotation and submission of information on transcriptional regulatory elements to major biological sequence resources (RefSeq, UniProtKB and the Gene Ontology Consortium). This fundamental change in database conception capitalizes on the domain-specific knowledge of contributing communities to provide high-quality annotations, while leveraging the availability of stable information hubs to promote long-term access and provide high-visibility to the data. As a submission portal, CollecTF generates TF-binding site information through direct annotation of RefSeq genome records, definition of TF-based regulatory networks in UniProtKB entries and submission of functional annotations to the Gene Ontology. As a database, CollecTF provides enhanced search and browsing, targeted data exports, binding motif analysis tools and integration with motif discovery and search platforms. This innovative approach will allow CollecTF to focus its limited resources on the generation of high-quality information and the provision of specialized access to the data.Database URL: http://www.collectf.org/.


Asunto(s)
Sistemas de Administración de Bases de Datos , Bases de Datos Genéticas , Conjuntos de Datos como Asunto , Interfaz Usuario-Computador
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA