Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 39
Filtrar
1.
Int J Cancer ; 143(11): 2800-2813, 2018 12 01.
Artículo en Inglés | MEDLINE | ID: mdl-29987844

RESUMEN

In many families with suspected Lynch syndrome (LS), no germline mutation in the causative mismatch repair (MMR) genes is detected during routine diagnostics. To identify novel causative genes for LS, the present study investigated 77 unrelated, mutation-negative patients with clinically suspected LS and a loss of MSH2 in tumor tissue. An analysis for genomic copy number variants (CNV) was performed, with subsequent next generation sequencing (NGS) of selected candidate genes in a subgroup of the cohort. Genomic DNA was genotyped using Illumina's HumanOmniExpress Bead Array. After quality control and filtering, 25 deletions and 16 duplications encompassing 73 genes were identified in 28 patients. No recurrent CNV was detected, and none of the CNVs affected the regulatory regions of MSH2. A total of 49 candidate genes from genomic regions implicated by the present CNV analysis and 30 known or assumed risk genes for colorectal cancer (CRC) were then sequenced in a subset of 38 patients using a customized NGS gene panel and Sanger sequencing. Single nucleotide variants were identified in 14 candidate genes from the CNV analysis. The most promising of these candidate genes were: (i) PRKCA, PRKDC, and MCM4, as a functional relation to MSH2 is predicted by network analysis, and (ii) CSMD1, as this is commonly mutated in CRC. Furthermore, six patients harbored POLE variants outside the exonuclease domain, suggesting that these might be implicated in hereditary CRC. Analyses in larger cohorts of suspected LS patients recruited via international collaborations are warranted to verify the present findings.


Asunto(s)
Neoplasias Colorrectales Hereditarias sin Poliposis/genética , Variaciones en el Número de Copia de ADN/genética , Adulto , Neoplasias Colorrectales/genética , Reparación de la Incompatibilidad de ADN/genética , Femenino , Genotipo , Mutación de Línea Germinal/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Masculino
2.
Cell Rep ; 12(9): 1519-30, 2015 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-26299969

RESUMEN

Many cellular processes involve the recruitment of proteins to specific membranes, which are decorated with distinctive lipids that act as docking sites. The phosphoinositides form signaling hubs, and we examine mechanisms underlying recruitment. We applied a physiological, quantitative, liposome microarray-based assay to measure the membrane-binding properties of 91 pleckstrin homology (PH) domains, the most common phosphoinositide-binding target. 10,514 experiments quantified the role of phosphoinositides in membrane recruitment. For most domains examined, the observed binding specificity implied cooperativity with additional signaling lipids. Analyses of PH domains with similar lipid-binding profiles identified a conserved motif, mutations in which-including some found in human cancers-induced discrete changes in binding affinities in vitro and protein mislocalization in vivo. The data set reveals cooperativity as a key mechanism for membrane recruitment and, by enabling the interpretation of disease-associated mutations, suggests avenues for the design of small molecules targeting PH domains.


Asunto(s)
Membrana Celular/metabolismo , Proteínas Fúngicas/metabolismo , Fosfatidilinositoles/metabolismo , Chaetomium/metabolismo , Proteínas Fúngicas/química , Unión Proteica , Estructura Terciaria de Proteína , Transporte de Proteínas , Saccharomyces cerevisiae/metabolismo
3.
Microb Cell ; 2(1): 5-13, 2015 Jan 05.
Artículo en Inglés | MEDLINE | ID: mdl-28357258

RESUMEN

Horizontal gene transfer has emerged as a crucial driving force for the evolution of eukaryotes. This also includes Plasmodium falciparum and related economically and clinically relevant apicomplexan parasites, whose rather small genomes have been shaped not only by natural selection in different host populations but also by horizontal gene transfer following endosymbiosis. However, there is rather little reliable data on horizontal gene transfer between animal hosts or bacteria and apicomplexan parasites. Here we show that apicomplexan homologues of peroxiredoxin 5 (Prx5) have a prokaryotic ancestry and therefore represent a special subclass of Prx5 isoforms in eukaryotes. Using two different immunobiochemical approaches, we found that the P. falciparum Prx5 homologue is dually localized to the parasite plastid and cytosol. This dual localization is reflected by a modular Plasmodium-specific gene architecture consisting of two exons. Despite the plastid localization, our phylogenetic analyses contradict an acquisition by secondary endosymbiosis and support a gene fusion event following a horizontal prokaryote-to-eukaryote gene transfer in early apicomplexans. The results provide unexpected insights into the evolution of apicomplexan parasites as well as the molecular evolution of peroxiredoxins, an important family of ubiquitous, usually highly concentrated thiol-dependent hydroperoxidases that exert functions as detoxifying enzymes, redox sensors and chaperones.

4.
Nucleic Acids Res ; 43(Database issue): D257-60, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25300481

RESUMEN

SMART (Simple Modular Architecture Research Tool) is a web resource (http://smart.embl.de/) providing simple identification and extensive annotation of protein domains and the exploration of protein domain architectures. In the current version, SMART contains manually curated models for more than 1200 protein domains, with ∼ 200 new models since our last update article. The underlying protein databases were synchronized with UniProt, Ensembl and STRING, bringing the total number of annotated domains and other protein features above 100 million. SMART's 'Genomic' mode, which annotates proteins from completely sequenced genomes was greatly expanded and now includes 2031 species, compared to 1133 in the previous release. SMART analysis results pages have been completely redesigned and include links to several new information sources. A new, vector-based display engine has been developed for protein schematics in SMART, which can also be exported as high-resolution bitmap images for easy inclusion into other documents. Taxonomic tree displays in SMART have been significantly improved, and can be easily navigated using the integrated search engine.


Asunto(s)
Bases de Datos de Proteínas , Estructura Terciaria de Proteína , Curaduría de Datos , Mapeo de Interacción de Proteínas , Estructura Terciaria de Proteína/genética
5.
PLoS One ; 9(11): e111122, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25369365

RESUMEN

Accurate orthology prediction is crucial for many applications in the post-genomic era. The lack of broadly accepted benchmark tests precludes a comprehensive analysis of orthology inference. So far, functional annotation between orthologs serves as a performance proxy. However, this violates the fundamental principle of orthology as an evolutionary definition, while it is often not applicable due to limited experimental evidence for most species. Therefore, we constructed high quality "gold standard" orthologous groups that can serve as a benchmark set for orthology inference in bacterial species. Herein, we used this dataset to demonstrate 1) why a manually curated, phylogeny-based dataset is more appropriate for benchmarking orthology than other popular practices and 2) how it guides database design and parameterization through careful error quantification. More specifically, we illustrate how function-based tests often fail to identify false assignments, misjudging the true performance of orthology inference methods. We also examined how our dataset can instruct the selection of a "core" species repertoire to improve detection accuracy. We conclude that including more genomes at the proper evolutionary distances can influence the overall quality of orthology detection. The curated gene families, called Reference Orthologous Groups, are publicly available at http://eggnog.embl.de/orthobench2.


Asunto(s)
Biología Computacional , Filogenia , Bacterias/clasificación , Biología Computacional/normas , Genómica , Internet , Interfaz Usuario-Computador
6.
Nucleic Acids Res ; 42(22): 13525-33, 2014 Dec 16.
Artículo en Inglés | MEDLINE | ID: mdl-25398899

RESUMEN

The thermophilic fungus Chaetomium thermophilum holds great promise for structural biology. To increase the efficiency of its biochemical and structural characterization and to explore its thermophilic properties beyond those of individual proteins, we obtained transcriptomics and proteomics data, and integrated them with computational annotation methods and a multitude of biochemical experiments conducted by the structural biology community. We considerably improved the genome annotation of Chaetomium thermophilum and characterized the transcripts and expression of thousands of genes. We furthermore show that the composition and structure of the expressed proteome of Chaetomium thermophilum is similar to its mesophilic relatives. Data were deposited in a publicly available repository and provide a rich source to the structural biology community.


Asunto(s)
Chaetomium/genética , Genoma Fúngico , Anotación de Secuencia Molecular , Proteínas Fúngicas/genética , Proteínas Fúngicas/metabolismo , Genes Fúngicos , Intrones , Proteoma/metabolismo , Seudogenes , Transcriptoma
7.
Genome Biol ; 14(7): R81, 2013 Jul 31.
Artículo en Inglés | MEDLINE | ID: mdl-23902751

RESUMEN

BACKGROUND: The interactions between proteins and nucleic acids have a fundamental function in many biological processes, including gene transcription, RNA homeostasis, protein translation and pathogen sensing for innate immunity. While our knowledge of the ensemble of proteins that bind individual mRNAs in mammalian cells has been greatly augmented by recent surveys, no systematic study on the non-sequence-specific engagement of native human proteins with various types of nucleic acids has been reported. RESULTS: We designed an experimental approach to achieve broad coverage of the non-sequence-specific RNA and DNA binding space, including methylated cytosine, and tested for interaction potential with the human proteome. We used 25 rationally designed nucleic acid probes in an affinity purification mass spectrometry and bioinformatics workflow to identify proteins from whole cell extracts of three different human cell lines. The proteins were profiled for their binding preferences to the different general types of nucleic acids. The study identified 746 high-confidence direct binders, 139 of which were novel and 237 devoid of previous experimental evidence. We could assign specific affinities for sub-types of nucleic acid probes to 219 distinct proteins and individual domains. The evolutionarily conserved protein YB-1, previously associated with cancer and drug resistance, was shown to bind methylated cytosine preferentially, potentially conferring upon YB-1 an epigenetics-related function. CONCLUSIONS: The dataset described here represents a rich resource of experimentally determined nucleic acid-binding proteins, and our methodology has great potential for further exploration of the interface between the protein and nucleic acid realms.


Asunto(s)
Ácidos Nucleicos/metabolismo , Mapeo de Interacción de Proteínas , Secuencia de Bases , Línea Celular , Bases de Datos de Proteínas , Enfermedad , Humanos , Unión Proteica , Estructura Terciaria de Proteína , Reproducibilidad de los Resultados , Especificidad por Sustrato
8.
PLoS One ; 7(4): e34302, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22485162

RESUMEN

The genome of Mycobacterium tuberculosis (H37Rv) contains 4,019 protein coding genes, of which more than thousand have been categorized as 'hypothetical' implying that for these not even weak functional associations could be identified so far. We here predict reliable functional indications for half of this large hypothetical orfeome: 497 genes can be annotated based on orthology, and another 125 can be linked to interacting proteins via integrated genomic context analysis and literature mining. The assignments include newly identified clusters of interacting proteins, hypothetical genes that are associated to well known pathways and putative disease-relevant targets. All together, we have raised the fraction of the proteome with at least some functional annotation to 88% which should considerably enhance the interpretation of large-scale experiments targeting this medically important organism.


Asunto(s)
Proteínas Bacterianas/genética , Anotación de Secuencia Molecular , Mycobacterium tuberculosis/genética , Sistemas de Lectura Abierta , Vías Biosintéticas/genética , Pared Celular/genética , Operón , Filogenia , Mapas de Interacción de Proteínas , Análisis de Secuencia de ADN
9.
Br J Haematol ; 157(2): 180-7, 2012 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-22296450

RESUMEN

Transient myeloproliferative disorder (TMD) of the newborn and acute megakaryoblastic leukaemia (AMKL) in children with Down syndrome (DS) represent paradigmatic models of leukaemogenesis. Chromosome 21 gene dosage effects and truncating mutations of the X-chromosomal transcription factor GATA1 synergize to trigger TMD and AMKL in most patients. Here, we report the occurrence of TMD, which spontaneously remitted and later progressed to AMKL in a patient without DS but with a distinct dysmorphic syndrome. Genetic analysis of the leukaemic clone revealed somatic trisomy 21 and a truncating GATA1 mutation. The analysis of the patient's normal blood cell DNA on a genomic single nucleotide polymorphism (SNP) array revealed a de novo germ line 2·58 Mb 15q24 microdeletion including 41 known genes encompassing the tumour suppressor PML. Genomic context analysis of proteins encoded by genes that are included in the microdeletion, chromosome 21-encoded proteins and GATA1 suggests that the microdeletion may trigger leukaemogenesis by disturbing the balance of a hypothetical regulatory network of normal megakaryopoiesis involving PML, SUMO3 and GATA1. The 15q24 microdeletion may thus represent the first genetic hit to initiate leukaemogenesis and implicates PML and SUMO3 as novel components of the leukaemogenic network in TMD/AMKL.


Asunto(s)
Cromosomas Humanos Par 15/genética , Síndrome de Down/genética , Leucemia Megacarioblástica Aguda/genética , Trastornos Mieloproliferativos/genética , Proteínas Nucleares/genética , Eliminación de Secuencia , Factores de Transcripción/genética , Proteínas Supresoras de Tumor/genética , Ubiquitinas/genética , Niño , Preescolar , Síndrome de Down/patología , Factor de Transcripción GATA1/genética , Humanos , Lactante , Leucemia Megacarioblástica Aguda/tratamiento farmacológico , Leucemia Megacarioblástica Aguda/patología , Masculino , Trastornos Mieloproliferativos/tratamiento farmacológico , Trastornos Mieloproliferativos/patología , Proteína de la Leucemia Promielocítica
10.
Nucleic Acids Res ; 40(Database issue): D284-9, 2012 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-22096231

RESUMEN

Orthologous relationships form the basis of most comparative genomic and metagenomic studies and are essential for proper phylogenetic and functional analyses. The third version of the eggNOG database (http://eggnog.embl.de) contains non-supervised orthologous groups constructed from 1133 organisms, doubling the number of genes with orthology assignment compared to eggNOG v2. The new release is the result of a number of improvements and expansions: (i) the underlying homology searches are now based on the SIMAP database; (ii) the orthologous groups have been extended to 41 levels of selected taxonomic ranges enabling much more fine-grained orthology assignments; and (iii) the newly designed web page is considerably faster with more functionality. In total, eggNOG v3 contains 721,801 orthologous groups, encompassing a total of 4,396,591 genes. Additionally, we updated 4873 and 4850 original COGs and KOGs, respectively, to include all 1133 organisms. At the universal level, covering all three domains of life, 101,208 orthologous groups are available, while the others are applicable at 40 more limited taxonomic ranges. Each group is amended by multiple sequence alignments and maximum-likelihood trees and broad functional descriptions are provided for 450,904 orthologous groups (62.5%).


Asunto(s)
Bases de Datos Genéticas , Filogenia , Genómica , Proteínas/genética , Proteínas/fisiología , Homología de Secuencia , Interfaz Usuario-Computador
11.
Nucleic Acids Res ; 40(Database issue): D302-5, 2012 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-22053084

RESUMEN

SMART (Simple Modular Architecture Research Tool) is an online resource (http://smart.embl.de/) for the identification and annotation of protein domains and the analysis of protein domain architectures. SMART version 7 contains manually curated models for 1009 protein domains, 200 more than in the previous version. The current release introduces several novel features and a streamlined user interface resulting in a faster and more comfortable workflow. The underlying protein databases were greatly expanded, resulting in a 2-fold increase in number of annotated domains and features. The database of completely sequenced genomes now includes 1133 species, compared to 630 in the previous release. Domain architecture analysis results can now be exported and visualized through the iTOL phylogenetic tree viewer. 'metaSMART' was introduced as a novel subresource dedicated to the exploration and analysis of domain architectures in various metagenomics data sets. An advanced full text search engine was implemented, covering the complete annotations for SMART and Pfam domains, as well as the complete set of protein descriptions, allowing users to quickly find relevant information.


Asunto(s)
Bases de Datos de Proteínas , Anotación de Secuencia Molecular , Estructura Terciaria de Proteína , Gráficos por Computador , Metagenómica , Mapas de Interacción de Proteínas
12.
PLoS Comput Biol ; 7(12): e1002269, 2011 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-22144877

RESUMEN

The identification of single copy (1-to-1) orthologs in any group of organisms is important for functional classification and phylogenetic studies. The Metazoa are no exception, but only recently has there been a wide-enough distribution of taxa with sufficiently high quality sequenced genomes to gain confidence in the wide-spread single copy status of a gene.Here, we present a phylogenetic approach for identifying overlooked single copy orthologs from multigene families and apply it to the Metazoa. Using 18 sequenced metazoan genomes of high quality we identified a robust set of 1,126 orthologous groups that have been retained in single copy since the last common ancestor of Metazoa. We found that the use of the phylogenetic procedure increased the number of single copy orthologs found by over a third more than standard taxon-count approaches. The orthologs represented a wide range of functional categories, expression profiles and levels of divergence.To demonstrate the value of our set of single copy orthologs, we used them to assess the completeness of 24 currently published metazoan genomes and 62 EST datasets. We found that the annotated genes in published genomes vary in coverage from 79% (Ciona intestinalis) to 99.8% (human) with an average of 92%, suggesting a value for the underlying error rate in genome annotation, and a strategy for identifying single copy orthologs in larger datasets. In contrast, the vast majority of EST datasets with no corresponding genome sequence available are largely under-sampled and probably do not accurately represent the actual genomic complement of the organisms from which they are derived.


Asunto(s)
Dosificación de Gen , Genoma/genética , Genómica/métodos , Filogenia , Animales , Bases de Datos Genéticas , Evolución Molecular , Etiquetas de Secuencia Expresada , Humanos , Familia de Multigenes
13.
Bioessays ; 33(10): 769-80, 2011 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-21853451

RESUMEN

The increasing number of sequenced genomes has prompted the development of several automated orthology prediction methods. Tests to evaluate the accuracy of predictions and to explore biases caused by biological and technical factors are therefore required. We used 70 manually curated families to analyze the performance of five public methods in Metazoa. We analyzed the strengths and weaknesses of the methods and quantified the impact of biological and technical challenges. From the latter part of the analysis, genome annotation emerged as the largest single influencer, affecting up to 30% of the performance. Generally, most methods did well in assigning orthologous group but they failed to assign the exact number of genes for half of the groups. The publicly available benchmark set (http://eggnog.embl.de/orthobench/) should facilitate the improvement of current orthology assignment protocols, which is of utmost importance for many fields of biology and should be tackled by a broad scientific community.


Asunto(s)
Biología Computacional/métodos , Genes , Proteínas/genética , Algoritmos , Animales , Bases de Datos Genéticas , Bases de Datos de Proteínas , Internet , Anotación de Secuencia Molecular , Mucinas/genética , Mucinas/metabolismo , Filogenia , Proteínas/metabolismo , Reproducibilidad de los Resultados , Especificidad de la Especie , Interfaz Usuario-Computador
14.
PLoS One ; 6(8): e22099, 2011.
Artículo en Inglés | MEDLINE | ID: mdl-21850220

RESUMEN

Single copy genes, universally distributed across the three domains of life and encoding mostly ancient parts of the translation machinery, are thought to be only rarely subjected to horizontal gene transfer (HGT). Indeed it has been proposed to have occurred in only a few genes and implies a rare, probably not advantageous event in which an ortholog displaces the original gene and has to function in a foreign context (orthologous gene displacement, OGD). Here, we have utilised an automatic method to identify HGT based on a conservative statistical approach capable of robustly assigning both donors and acceptors. Applied to 40 universally single copy genes we found that as many as 68 HGTs (implying OGDs) have occurred in these genes with a rate of 1.7 per family since the last universal common ancestor (LUCA). We examined a number of factors that have been claimed to be fundamental to HGT in general and tested their validity in the subset of universally distributed single copy genes. We found that differing functional constraints impact rates of OGD and the more evolutionarily distant the donor and acceptor, the less likely an OGD is to occur. Furthermore, species with larger genomes are more likely to be subjected to OGD. Most importantly, regardless of the trends above, the number of OGDs increases linearly with time, indicating a neutral, constant rate. This suggests that levels of HGT above this rate may be indicative of positively selected transfers that may allow niche adaptation or bestow other benefits to the recipient organism.


Asunto(s)
Transferencia de Gen Horizontal/genética , Acidobacteria/genética , Archaea/genética , Bacillus/genética , Bacterias/genética , Bacteroides/genética , Composición de Base/genética , Ecosistema , Euryarchaeota/genética , Evolución Molecular , Proteobacteria/genética , Tenericutes/genética
15.
Nucleic Acids Res ; 39(Database issue): D561-8, 2011 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-21045058

RESUMEN

An essential prerequisite for any systems-level understanding of cellular functions is to correctly uncover and annotate all functional interactions among proteins in the cell. Toward this goal, remarkable progress has been made in recent years, both in terms of experimental measurements and computational prediction techniques. However, public efforts to collect and present protein interaction information have struggled to keep up with the pace of interaction discovery, partly because protein-protein interaction information can be error-prone and require considerable effort to annotate. Here, we present an update on the online database resource Search Tool for the Retrieval of Interacting Genes (STRING); it provides uniquely comprehensive coverage and ease of access to both experimental as well as predicted interaction information. Interactions in STRING are provided with a confidence score, and accessory information such as protein domains and 3D structures is made available, all within a stable and consistent identifier space. New features in STRING include an interactive network viewer that can cluster networks on demand, updated on-screen previews of structural information including homology models, extensive data updates and strongly improved connectivity and integration with third-party resources. Version 9.0 of STRING covers more than 1100 completely sequenced organisms; the resource can be reached at http://string-db.org.


Asunto(s)
Bases de Datos de Proteínas , Mapeo de Interacción de Proteínas/métodos , Integración de Sistemas , Interfaz Usuario-Computador
16.
Science ; 326(5957): 1268-71, 2009 Nov 27.
Artículo en Inglés | MEDLINE | ID: mdl-19965477

RESUMEN

To study basic principles of transcriptome organization in bacteria, we analyzed one of the smallest self-replicating organisms, Mycoplasma pneumoniae. We combined strand-specific tiling arrays, complemented by transcriptome sequencing, with more than 252 spotted arrays. We detected 117 previously undescribed, mostly noncoding transcripts, 89 of them in antisense configuration to known genes. We identified 341 operons, of which 139 are polycistronic; almost half of the latter show decaying expression in a staircase-like manner. Under various conditions, operons could be divided into 447 smaller transcriptional units, resulting in many alternative transcripts. Frequent antisense transcripts, alternative transcripts, and multiple regulators per gene imply a highly dynamic transcriptome, more similar to that of eukaryotes than previously thought.


Asunto(s)
Perfilación de la Expresión Génica , Regulación Bacteriana de la Expresión Génica , Genoma Bacteriano , Mycoplasma pneumoniae/genética , ARN Bacteriano/genética , ARN no Traducido/genética , Transcripción Genética , Secuencia de Bases , Genes Bacterianos , Datos de Secuencia Molecular , Mycoplasma pneumoniae/metabolismo , Análisis de Secuencia por Matrices de Oligonucleótidos , Operón , ARN sin Sentido/genética , ARN sin Sentido/metabolismo , ARN Bacteriano/metabolismo , ARN Mensajero/genética , ARN Mensajero/metabolismo , ARN no Traducido/análisis
17.
Nucleic Acids Res ; 37(Database issue): D229-32, 2009 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-18978020

RESUMEN

Simple modular architecture research tool (SMART) is an online tool (http://smart.embl.de/) for the identification and annotation of protein domains. It provides a user-friendly platform for the exploration and comparative study of domain architectures in both proteins and genes. The current release of SMART contains manually curated models for 784 protein domains. Recent developments were focused on further data integration and improving user friendliness. The underlying protein database based on completely sequenced genomes was greatly expanded and now includes 630 species, compared to 191 in the previous release. As an initial step towards integrating information on biological pathways into SMART, our domain annotations were extended with data on metabolic pathways and links to several pathways resources. The interaction network view was completely redesigned and is now available for more than 2 million proteins. In addition to the standard web access to the database, users can now query SMART using distributed annotation system (DAS) or through a simple object access protocol (SOAP) based web service.


Asunto(s)
Bases de Datos de Proteínas , Estructura Terciaria de Proteína , Internet , Redes y Vías Metabólicas , Mapeo de Interacción de Proteínas , Programas Informáticos , Interfaz Usuario-Computador
18.
Nucleic Acids Res ; 37(Database issue): D412-6, 2009 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-18940858

RESUMEN

Functional partnerships between proteins are at the core of complex cellular phenotypes, and the networks formed by interacting proteins provide researchers with crucial scaffolds for modeling, data reduction and annotation. STRING is a database and web resource dedicated to protein-protein interactions, including both physical and functional interactions. It weights and integrates information from numerous sources, including experimental repositories, computational prediction methods and public text collections, thus acting as a meta-database that maps all interaction evidence onto a common set of genomes and proteins. The most important new developments in STRING 8 over previous releases include a URL-based programming interface, which can be used to query STRING from other resources, improved interaction prediction via genomic neighborhood in prokaryotes, and the inclusion of protein structures. Version 8.0 of STRING covers about 2.5 million proteins from 630 organisms, providing the most comprehensive view on protein-protein interactions currently available. STRING can be reached at http://string-db.org/.


Asunto(s)
Bases de Datos de Proteínas , Mapeo de Interacción de Proteínas , Proteínas/metabolismo , Genómica , Complejos Multiproteicos/metabolismo , Proteínas/química , Proteínas/genética , Interfaz Usuario-Computador
19.
J Bacteriol ; 191(1): 32-41, 2009 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-18849420

RESUMEN

The emerging coverage of diverse habitats by metagenomic shotgun data opens new avenues of discovering functional novelty using computational tools. Here, we apply three different concepts for predicting novel functions within light-mediated microbial pathways in five diverse environments. Using phylogenetic approaches, we discovered two novel deep-branching subfamilies of photolyases (involved in light-mediated repair) distributed abundantly in high-UV environments. Using neighborhood approaches, we were able to assign seven novel functional partners in luciferase synthesis, nitrogen metabolism, and quorum sensing to BLUF domain-containing proteins (involved in light sensing). Finally, by domain analysis, for RcaE proteins (involved in chromatic adaptation), we predict 16 novel domain architectures that indicate novel functionalities in habitats with little or no light. Quantification of protein abundance in the various environments supports our findings that bacteria utilize light for sensing, repair, and adaptation far more widely than previously thought. While the discoveries illustrate the opportunities in function discovery, we also discuss the immense conceptual and practical challenges that come along with this new type of data.


Asunto(s)
Bacterias/genética , Genes/efectos de la radiación , Genómica/métodos , Bacterias/clasificación , Bacterias/crecimiento & desarrollo , Bacterias/efectos de la radiación , Proteínas Bacterianas/genética , Ecosistema , Ambiente , Genoma Bacteriano , Luz , Filogenia , Plantas/clasificación , Plantas/genética , Rayos Ultravioleta
20.
PLoS One ; 3(12): e3976, 2008.
Artículo en Inglés | MEDLINE | ID: mdl-19096720

RESUMEN

Bacterial nitrile hydratase (NHases) are important industrial catalysts and waste water remediation tools. In a global computational screening of conventional and metagenomic sequence data for NHases, we detected the two usually separated NHase subunits fused in one protein of the choanoflagellate Monosiga brevicollis, a recently sequenced unicellular model organism from the closest sister group of Metazoa. This is the first time that an NHase is found in eukaryotes and the first time it is observed as a fusion protein. The presence of an intron, subunit fusion and expressed sequence tags covering parts of the gene exclude contamination and suggest a functional gene. Phylogenetic analyses and genomic context imply a probable ancient horizontal gene transfer (HGT) from proteobacteria. The newly discovered NHase might open biotechnological routes due to its unconventional structure, its new type of host and its apparent integration into eukaryotic protein networks.


Asunto(s)
Eucariontes/enzimología , Hidroliasas/aislamiento & purificación , Secuencia de Aminoácidos , Animales , Eucariontes/genética , Células Eucariotas/enzimología , Hidroliasas/química , Hidroliasas/genética , Hidroliasas/metabolismo , Modelos Biológicos , Filogenia , Subunidades de Proteína/química
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA