RESUMO
CORUM is a database that provides a manually curated repository of experimentally characterized protein complexes from mammalian organisms, mainly human (64%), mouse (16%) and rat (12%). Protein complexes are key molecular entities that integrate multiple gene products to perform cellular functions. The new CORUM 2.0 release encompasses 2837 protein complexes offering the largest and most comprehensive publicly available dataset of mammalian protein complexes. The CORUM dataset is built from 3198 different genes, representing approximately 16% of the protein coding genes in humans. Each protein complex is described by a protein complex name, subunit composition, function as well as the literature reference that characterizes the respective protein complex. Recent developments include mapping of functional annotation to Gene Ontology terms as well as cross-references to Entrez Gene identifiers. In addition, a 'Phylogenetic Conservation' analysis tool was implemented that analyses the potential occurrence of orthologous protein complex subunits in mammals and other selected groups of organisms. This allows one to predict the occurrence of protein complexes in different phylogenetic groups. CORUM is freely accessible at (http://mips.helmholtz-muenchen.de/genre/proj/corum/index.html).
Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Bases de Dados de Proteínas , Complexos Multiproteicos , Animais , Biologia Computacional/tendências , Humanos , Armazenamento e Recuperação da Informação/métodos , Internet , Camundongos , Filogenia , Estrutura Terciária de Proteína , Ratos , Saccharomyces cerevisiae/genética , SoftwareRESUMO
The bacterial type II protein secretion (T2S) and type IV piliation (T4P) systems share several common features. In particular, it is well established that the T2S system requires the function of a pilus-like structure, called pseudopilus, which is built upon assembly of pilin-like subunits, called pseudopilins. Pilins and pseudopilins have a hydrophobic N-terminal region, which precedes an extended hydrophilic C-terminal region. In the case of pilins, it was shown that oligomerisation and formation of helical fibers, takes place through interaction between the hydrophobic domains. XcpT, is the most abundant protein of the Pseudomonas aeruginosa T2S, and was proposed to be the main component in the pseudopilus. In this study we present the high-resolution NMR structure of the hydrophilic domain of XcpT (XcpTp). XcpTp is lacking the C-terminal disulfide bridged "D" domain found in type IV pilins and likely involved in receptor binding. This is in agreement with the idea that the XcpT-containing pseudopilus is required for protein secretion and not for bacterial attachment. Interestingly, by solving the 3D structure of XcpTp we revealed that the previously called alphabeta-loop pilin region is in fact highly conserved among major type II pseudopilins and constitutes a specific consensus motif for identifying major pseudopilins, which belong to this family.
Assuntos
Proteínas de Bactérias/metabolismo , Proteínas de Membrana Transportadoras/química , Proteínas de Membrana Transportadoras/metabolismo , Pseudomonas aeruginosa/metabolismo , Sequência de Aminoácidos , Proteínas de Bactérias/química , Dados de Sequência Molecular , Ressonância Magnética Nuclear Biomolecular , Estrutura Secundária de Proteína , Homologia de Sequência de Aminoácidos , Homologia Estrutural de ProteínaRESUMO
UNLABELLED: Cross-mapping of gene and protein identifiers between different databases is a tedious and time-consuming task. To overcome this, we developed CRONOS, a cross-reference server that contains entries from five mammalian organisms presented by major gene and protein information resources. Sequence similarity analysis of the mapped entries shows that the cross-references are highly accurate. In total, up to 18 different identifier types can be used for identification of cross-references. The quality of the mapping could be improved substantially by exclusion of ambiguous gene and protein names which were manually validated. Organism-specific lists of ambiguous terms, which are valuable for a variety of bioinformatics applications like text mining are available for download. AVAILABILITY: CRONOS is freely available to non-commercial users at http://mips.gsf.de/genre/proj/cronos/index.html, web services are available at http://mips.gsf.de/CronosWSService/CronosWS?wsdl.
Assuntos
Biologia Computacional/instrumentação , Biologia Computacional/métodos , Internet , Software , Animais , Genes , Humanos , ProteínasRESUMO
The generation of expressed sequence tag (EST) libraries offers an affordable approach to investigate organisms, if no genome sequence is available. OREST (http://mips.gsf.de/genre/proj/orest/index.html) is a server-based EST analysis pipeline, which allows the rapid analysis of large amounts of ESTs or cDNAs from mammalia and fungi. In order to assign the ESTs to genes or proteins OREST maps DNA sequences to reference datasets of gene products and in a second step to complete genome sequences. Mapping against genome sequences recovers additional 13% of EST data, which otherwise would escape further analysis. To enable functional analysis of the datasets, ESTs are functionally annotated using the hierarchical FunCat annotation scheme as well as GO annotation terms. OREST also allows to predict the association of gene products and diseases by Morbid Map (OMIM) classification. A statistical analysis of the results of the dataset is possible with the included PROMPT software, which provides information about enrichment and depletion of functional and disease annotation terms. OREST was successfully applied for the identification and functional characterization of more than 3000 EST sequences of the common marmoset monkey (Callithrix jacchus) as part of an international collaboration.
Assuntos
Etiquetas de Sequências Expressas/química , Software , Animais , Mapeamento Cromossômico , Genes Fúngicos , Humanos , Internet , Mamíferos/genética , Camundongos , Proteínas/genética , Ratos , Saccharomyces cerevisiae/genética , Análise de Sequência de DNARESUMO
Protein complexes are key molecular entities that integrate multiple gene products to perform cellular functions. The CORUM (http://mips.gsf.de/genre/proj/corum/index.html) database is a collection of experimentally verified mammalian protein complexes. Information is manually derived by critical reading of the scientific literature from expert annotators. Information about protein complexes includes protein complex names, subunits, literature references as well as the function of the complexes. For functional annotation, we use the FunCat catalogue that enables to organize the protein complex space into biologically meaningful subsets. The database contains more than 1750 protein complexes that are built from 2400 different genes, thus representing 12% of the protein-coding genes in human. A web-based system is available to query, view and download the data. CORUM provides a comprehensive dataset of protein complexes for discoveries in systems biology, analyses of protein networks and protein complex-associated diseases. Comparable to the MIPS reference dataset of protein complexes from yeast, CORUM intends to serve as a reference for mammalian protein complexes.
Assuntos
Bases de Dados de Proteínas , Complexos Multiproteicos/fisiologia , Animais , Humanos , Internet , Camundongos , Complexos Multiproteicos/análise , Complexos Multiproteicos/química , Ratos , Interface Usuário-ComputadorRESUMO
BACKGROUND: The common marmoset monkey (Callithrix jacchus), a small non-endangered New World primate native to eastern Brazil, is becoming increasingly used as a non-human primate model in biomedical research, drug development and safety assessment. In contrast to the growing interest for the marmoset as an animal model, the molecular tools for genetic analysis are extremely limited. RESULTS: Here we report the development of the first marmoset-specific oligonucleotide microarray (EUMAMA) containing probe sets targeting 1541 different marmoset transcripts expressed in hippocampus. These 1541 transcripts represent a wide variety of different functional gene classes. Hybridisation of the marmoset microarray with labelled RNA from hippocampus, cortex and a panel of 7 different peripheral tissues resulted in high detection rates of 85% in the neuronal tissues and on average 70% in the non-neuronal tissues. The expression profiles of the 2 neuronal tissues, hippocampus and cortex, were highly similar, as indicated by a correlation coefficient of 0.96. Several transcripts with a tissue-specific pattern of expression were identified. Besides the marmoset microarray we have generated 3215 ESTs derived from marmoset hippocampus, which have been annotated and submitted to GenBank [GenBank: EF214838-EF215447, EH380242-EH382846]. CONCLUSION: We have generated the first marmoset-specific DNA microarray and demonstrated its use to characterise large-scale gene expression profiles of hippocampus but also of other neuronal and non-neuronal tissues. In addition, we have generated a large collection of ESTs of marmoset origin, which are now available in the public domain. These new tools will facilitate molecular genetic research into this non-human primate animal model.
Assuntos
Callithrix/genética , Etiquetas de Sequências Expressas , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Animais , Biotinilação , Técnicas Genéticas , Genoma , Hipocampo/metabolismo , Dados de Sequência Molecular , Hibridização de Ácido Nucleico , Análise de Sequência com Séries de Oligonucleotídeos , RNA/metabolismo , Distribuição TecidualRESUMO
The pathobiology of common diseases is influenced by heterogeneous factors interacting in complex networks. CIDeR http://mips.helmholtz-muenchen.de/cider/ is a publicly available, manually curated, integrative database of metabolic and neurological disorders. The resource provides structured information on 18,813 experimentally validated interactions between molecules, bioprocesses and environmental factors extracted from the scientific literature. Systematic annotation and interactive graphical representation of disease networks make CIDeR a versatile knowledge base for biologists, analysis of large-scale data and systems biology approaches.