Búsqueda | Portal de Búsqueda de la BVS Colombia

1.

SmartPhase: Accurate and fast phasing of heterozygous variant pairs for genetic diagnosis of rare diseases.

Hager, Paul; Mewes, Hans-Werner; Rohlfs, Meino; Klein, Christoph; Jeske, Tim.

PLoS Comput Biol ; 16(2): e1007613, 2020 02.

Artículo en Inglés | MEDLINE | ID: mdl-32032351

RESUMEN

There is an increasing need to use genome and transcriptome sequencing to genetically diagnose patients suffering from suspected monogenic rare diseases. The proper detection of compound heterozygous variant combinations as disease-causing candidates is a challenge in diagnostic workflows as haplotype information is lost by currently used next-generation sequencing technologies. Consequently, computational tools are required to phase, or resolve the haplotype of, the high number of heterozygous variants in the exome or genome of each patient. Here we present SmartPhase, a phasing tool designed to efficiently reduce the set of potential compound heterozygous variant pairs in genetic diagnoses pipelines. The phasing algorithm of SmartPhase creates haplotypes using both parental genotype information and reads generated by DNA or RNA sequencing and is thus well suited to resolve the phase of rare variants. To inform the user about the reliability of a phasing prediction, it computes a confidence score which is essential to select error-free predictions. It incorporates existing haplotype information and applies logical rules to determine variants that can be excluded as causing a recessive, monogenic disease. SmartPhase can phase either all possible variant pairs in predefined genetic loci or preselected variant pairs of interest, thus keeping the focus on clinically relevant results. We compared SmartPhase to WhatsHap, one of the leading comparable phasing tools, using simulated data and a real clinical cohort of 921 patients. On both data sets, SmartPhase generated error-free predictions using our derived confidence score threshold. It outperformed WhatsHap with regard to the percentage of resolved pairs when parental genotype information is available. On the cohort data, SmartPhase enabled on average the exclusion of approximately 22% of the input variant pairs in each singleton patient and 44% in each trio patient. SmartPhase is implemented as an open-source Java tool and freely available at http://ibis.helmholtz-muenchen.de/smartphase/.

Asunto(s)

Heterocigoto , Enfermedades Raras/diagnóstico , Haplotipos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Enfermedades Raras/genética , Reproducibilidad de los Resultados

2.

The bioinformatics of the yeast genome-A historical perspective.

Mewes, Hans-Werner.

Yeast ; 36(4): 161-165, 2019 04.

Artículo en Inglés | MEDLINE | ID: mdl-30650215

RESUMEN

From 1989 to 1997, the yeast genome was sequenced by a worldwide international consortium initiated and conducted by André Goffeau (1935-2018). The article describes the pioneering collaboration of yeast scientists from a bioinformatics perspective. Indeed, the yeast genome has turned bioinformatics from an exotic hobby of few nerds into a discipline indispensable for answering biological questions using computational methods.

Asunto(s)

Biología Computacional/historia , Genoma Fúngico , Saccharomyces cerevisiae/genética , Historia del Siglo XX , Historia del Siglo XXI

3.

Human metabolic individuality in biomedical and pharmaceutical research.

Suhre, Karsten; Shin, So-Youn; Petersen, Ann-Kristin; Mohney, Robert P; Meredith, David; Wägele, Brigitte; Altmaier, Elisabeth; Deloukas, Panos; Erdmann, Jeanette; Grundberg, Elin; Hammond, Christopher J; de Angelis, Martin Hrabé; Kastenmüller, Gabi; Köttgen, Anna; Kronenberg, Florian; Mangino, Massimo; Meisinger, Christa; Meitinger, Thomas; Mewes, Hans-Werner; Milburn, Michael V; Prehn, Cornelia; Raffler, Johannes; Ried, Janina S; Römisch-Margl, Werner; Samani, Nilesh J; Small, Kerrin S; Wichmann, H-Erich; Zhai, Guangju; Illig, Thomas; Spector, Tim D; Adamski, Jerzy; Soranzo, Nicole; Gieger, Christian.

Nature ; 477(7362): 54-60, 2011 Aug 31.

Artículo en Inglés | MEDLINE | ID: mdl-21886157

RESUMEN

Genome-wide association studies (GWAS) have identified many risk loci for complex diseases, but effect sizes are typically small and information on the underlying biological processes is often lacking. Associations with metabolic traits as functional intermediates can overcome these problems and potentially inform individualized therapy. Here we report a comprehensive analysis of genotype-dependent metabolic phenotypes using a GWAS with non-targeted metabolomics. We identified 37 genetic loci associated with blood metabolite concentrations, of which 25 show effect sizes that are unusually high for GWAS and account for 10-60% differences in metabolite levels per allele copy. Our associations provide new functional insights for many disease-related associations that have been reported in previous studies, including those for cardiovascular and kidney disorders, type 2 diabetes, cancer, gout, venous thromboembolism and Crohn's disease. The study advances our knowledge of the genetic basis of metabolic individuality in humans and generates many new hypotheses for biomedical and pharmaceutical research.

Asunto(s)

Investigación Biomédica , Industria Farmacéutica , Variación Genética , Estudio de Asociación del Genoma Completo , Metabolismo/genética , Adolescente , Adulto , Anciano , Anciano de 80 o más Años , Sangre/metabolismo , Niño , Enfermedad Crónica , Enfermedad de la Arteria Coronaria/genética , Diabetes Mellitus/genética , Femenino , Sitios Genéticos/genética , Genotipo , Humanos , Masculino , Metabolómica , Persona de Mediana Edad , Farmacogenética , Insuficiencia Renal/genética , Factores de Riesgo , Tromboembolia Venosa/genética , Adulto Joven

4.

SIMAP--the database of all-against-all protein sequence similarities and annotations with new interfaces and increased coverage.

Arnold, Roland; Goldenberg, Florian; Mewes, Hans-Werner; Rattei, Thomas.

Nucleic Acids Res ; 42(Database issue): D279-84, 2014 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-24165881

RESUMEN

The Similarity Matrix of Proteins (SIMAP, http://mips.gsf.de/simap/) database has been designed to massively accelerate computationally expensive protein sequence analysis tasks in bioinformatics. It provides pre-calculated sequence similarities interconnecting the entire known protein sequence universe, complemented by pre-calculated protein features and domains, similarity clusters and functional annotations. SIMAP covers all major public protein databases as well as many consistently re-annotated metagenomes from different repositories. As of September 2013, SIMAP contains >163 million proteins corresponding to â¼70 million non-redundant sequences. SIMAP uses the sensitive FASTA search heuristics, the Smith-Waterman alignment algorithm, the InterPro database of protein domain models and the BLAST2GO functional annotation algorithm. SIMAP assists biologists by facilitating the interactive exploration of the protein sequence universe. Web-Service and DAS interfaces allow connecting SIMAP with any other bioinformatic tool and resource. All-against-all protein sequence similarity matrices of project-specific protein collections are generated on request. Recent improvements allow SIMAP to cover the rapidly growing sequenced protein sequence universe. New Web-Service interfaces enhance the connectivity of SIMAP. Novel tools for interactive extraction of protein similarity networks have been added. Open access to SIMAP is provided through the web portal; the portal also contains instructions and links for software access and flat file downloads.

Asunto(s)

Bases de Datos de Proteínas , Anotación de Secuencia Molecular , Análisis de Secuencia de Proteína , Internet , Estructura Terciaria de Proteína , Alineación de Secuencia , Interfaz Usuario-Computador

5.

Large-scale modeling of condition-specific gene regulatory networks by information integration and inference.

Ellwanger, Daniel Christian; Leonhardt, Jörn Florian; Mewes, Hans-Werner.

Nucleic Acids Res ; 42(21)2014 Dec 01.

Artículo en Inglés | MEDLINE | ID: mdl-25294834

RESUMEN

Understanding how regulatory networks globally coordinate the response of a cell to changing conditions, such as perturbations by shifting environments, is an elementary challenge in systems biology which has yet to be met. Genome-wide gene expression measurements are high dimensional as these are reflecting the condition-specific interplay of thousands of cellular components. The integration of prior biological knowledge into the modeling process of systems-wide gene regulation enables the large-scale interpretation of gene expression signals in the context of known regulatory relations. We developed COGERE (http://mips.helmholtz-muenchen.de/cogere), a method for the inference of condition-specific gene regulatory networks in human and mouse. We integrated existing knowledge of regulatory interactions from multiple sources to a comprehensive model of prior information. COGERE infers condition-specific regulation by evaluating the mutual dependency between regulator (transcription factor or miRNA) and target gene expression using prior information. This dependency is scored by the non-parametric, nonlinear correlation coefficient Î·(2) (eta squared) that is derived by a two-way analysis of variance. We show that COGERE significantly outperforms alternative methods in predicting condition-specific gene regulatory networks on simulated data sets. Furthermore, by inferring the cancer-specific gene regulatory network from the NCI-60 expression study, we demonstrate the utility of COGERE to promote hypothesis-driven clinical research.

Asunto(s)

Redes Reguladoras de Genes , Modelos Genéticos , Animales , Línea Celular Tumoral , Perfilación de la Expresión Génica , Humanos , Ratones , MicroARNs/metabolismo , Neoplasias/genética , Factores de Transcripción/metabolismo

6.

Rare variants in LRRK1 and Parkinson's disease.

Schulte, Eva C; Ellwanger, Daniel C; Dihanich, Sybille; Manzoni, Claudia; Stangl, Katrin; Schormair, Barbara; Graf, Elisabeth; Eck, Sebastian; Mollenhauer, Brit; Haubenberger, Dietrich; Pirker, Walter; Zimprich, Alexander; Brücke, Thomas; Lichtner, Peter; Peters, Annette; Gieger, Christian; Trenkwalder, Claudia; Mewes, Hans-Werner; Meitinger, Thomas; Lewis, Patrick A; Klünemann, Hans H; Winkelmann, Juliane.

Neurogenetics ; 15(1): 49-57, 2014 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-24241507

RESUMEN

Approximately 20 % of individuals with Parkinson's disease (PD) report a positive family history. Yet, a large portion of causal and disease-modifying variants is still unknown. We used exome sequencing in two affected individuals from a family with late-onset PD to identify 15 potentially causal variants. Segregation analysis and frequency assessment in 862 PD cases and 1,014 ethnically matched controls highlighted variants in EEF1D and LRRK1 as the best candidates. Mutation screening of the coding regions of these genes in 862 cases and 1,014 controls revealed several novel non-synonymous variants in both genes in cases and controls. An in silico multi-model bioinformatics analysis was used to prioritize identified variants in LRRK1 for functional follow-up. However, protein expression, subcellular localization, and cell viability were not affected by the identified variants. Although it has yet to be proven conclusively that variants in LRRK1 are indeed causative of PD, our data strengthen a possible role for LRRK1 in addition to LRRK2 in the genetic underpinnings of PD but, at the same time, highlight the difficulties encountered in the study of rare variants identified by next-generation sequencing in diseases with autosomal dominant or complex patterns of inheritance.

Asunto(s)

Variación Genética , Enfermedad de Parkinson/genética , Proteínas Serina-Treonina Quinasas/genética , Algoritmos , Supervivencia Celular , Análisis Mutacional de ADN , Exoma , Salud de la Familia , Femenino , Dosificación de Gen , Frecuencia de los Genes , Predisposición Genética a la Enfermedad , Genotipo , Alemania , Humanos , Masculino , Persona de Mediana Edad , Modelos Genéticos , Mutación , Análisis de Secuencia por Matrices de Oligonucleótidos , Factor 1 de Elongación Peptídica/genética , Fenotipo

7.

Functional characterization of two clusters of Brachypodium distachyon UDP-glycosyltransferases encoding putative deoxynivalenol detoxification genes.

Schweiger, Wolfgang; Pasquet, Jean-Claude; Nussbaumer, Thomas; Paris, Maria Paula Kovalsky; Wiesenberger, Gerlinde; Macadré, Catherine; Ametz, Christian; Berthiller, Franz; Lemmens, Marc; Saindrenan, Patrick; Mewes, Hans-Werner; Mayer, Klaus F X; Dufresne, Marie; Adam, Gerhard.

Mol Plant Microbe Interact ; 26(7): 781-92, 2013 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-23550529

RESUMEN

Plant small-molecule UDP-glycosyltransferases (UGT) glycosylate a vast number of endogenous substances but also act in detoxification of metabolites produced by plant-pathogenic microorganisms. The ability to inactivate the Fusarium graminearum mycotoxin deoxynivalenol (DON) into DON-3-O-glucoside is crucial for resistance of cereals. We analyzed the UGT gene family of the monocot model species Brachypodium distachyon and functionally characterized two gene clusters containing putative orthologs of previously identified DON-detoxification genes from Arabidopsis thaliana and barley. Analysis of transcription showed that UGT encoded in both clusters are highly inducible by DON and expressed at much higher levels upon infection with a wild-type DON-producing F. graminearum strain compared with infection with a mutant deficient in DON production. Expression of these genes in a toxin-sensitive strain of Saccharomyces cerevisiae revealed that only two B. distachyon UGT encoded by members of a cluster of six genes homologous to the DON-inactivating barley HvUGT13248 were able to convert DON into DON-3-O-glucoside. Also, a single copy gene from Sorghum bicolor orthologous to this cluster and one of three putative orthologs of rice exhibit this ability. Seemingly, the UGT genes undergo rapid evolution and changes in copy number, making it difficult to identify orthologs with conserved substrate specificity.

Asunto(s)

Brachypodium/enzimología , Fusarium/patogenicidad , Glicosiltransferasas/metabolismo , Enfermedades de las Plantas/microbiología , Tricotecenos/metabolismo , Secuencia de Aminoácidos , Brachypodium/genética , Fusarium/química , Dosificación de Gen , Regulación de la Expresión Génica de las Plantas , Orden Génico , Glucósidos/metabolismo , Glicosiltransferasas/genética , Datos de Secuencia Molecular , Familia de Multigenes , Mutación , Micotoxinas/genética , Micotoxinas/metabolismo , Oryza/enzimología , Oryza/genética , Filogenia , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Sorghum/enzimología , Sorghum/genética , Especificidad de la Especie , Sintenía

8.

FGDB: revisiting the genome annotation of the plant pathogen Fusarium graminearum.

Wong, Philip; Walter, Mathias; Lee, Wanseon; Mannhaupt, Gertrud; Münsterkötter, Martin; Mewes, Hans-Werner; Adam, Gerhard; Güldener, Ulrich.

Nucleic Acids Res ; 39(Database issue): D637-9, 2011 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-21051345

RESUMEN

The MIPS Fusarium graminearum Genome Database (FGDB) was established as a comprehensive genome database on one of the most devastating fungal plant pathogens of wheat, barley and maize. The current version of FGDB v3.1 provides information on the full manually revised gene set based on the Broad Institute assembly FG3 genome sequence. The results of gene prediction tools were integrated with the help of comparative data on related species to result in a set of 13.718 annotated protein coding genes. This rigorous approach involved adding or modifying gene models and represents a coding sequence gold standard for the genus Fusarium. The gene loci improvements results in 2461 genes which either are new or have different structures compared to the Broad Institute assembly 3 gene set. Moreover the database serves as a convenient entry point to explore expression data results and to obtain information on the Affymetrix GeneChip probe sets. The resource is accessible on http://mips.gsf.de/genre/proj/FGDB/.

Asunto(s)

Bases de Datos Genéticas , Fusarium/genética , Proteínas Fúngicas/genética , Fusarium/metabolismo , Perfilación de la Expresión Génica , Genoma Fúngico , Anotación de Secuencia Molecular

9.

Network-based SNP meta-analysis identifies joint and disjoint genetic features across common human diseases.

Arnold, Matthias; Hartsperger, Mara L; Baurecht, Hansjörg; Rodríguez, Elke; Wachinger, Benedikt; Franke, Andre; Kabesch, Michael; Winkelmann, Juliane; Pfeufer, Arne; Romanos, Marcel; Illig, Thomas; Mewes, Hans-Werner; Stümpflen, Volker; Weidinger, Stephan.

BMC Genomics ; 13: 490, 2012 Sep 18.

Artículo en Inglés | MEDLINE | ID: mdl-22988944

RESUMEN

BACKGROUND: Genome-wide association studies (GWAS) have provided a large set of genetic loci influencing the risk for many common diseases. Association studies typically analyze one specific trait in single populations in an isolated fashion without taking into account the potential phenotypic and genetic correlation between traits. However, GWA data can be efficiently used to identify overlapping loci with analogous or contrasting effects on different diseases. RESULTS: Here, we describe a new approach to systematically prioritize and interpret available GWA data. We focus on the analysis of joint and disjoint genetic determinants across diseases. Using network analysis, we show that variant-based approaches are superior to locus-based analyses. In addition, we provide a prioritization of disease loci based on network properties and discuss the roles of hub loci across several diseases. We demonstrate that, in general, agonistic associations appear to reflect current disease classifications, and present the potential use of effect sizes in refining and revising these agonistic signals. We further identify potential branching points in disease etiologies based on antagonistic variants and describe plausible small-scale models of the underlying molecular switches. CONCLUSIONS: The observation that a surprisingly high fraction (>15%) of the SNPs considered in our study are associated both agonistically and antagonistically with related as well as unrelated disorders indicates that the molecular mechanisms influencing causes and progress of human diseases are in part interrelated. Genetic overlaps between two diseases also suggest the importance of the affected entities in the specific pathogenic pathways and should be investigated further.

Asunto(s)

Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Análisis por Conglomerados , Sitios Genéticos , Genoma Humano , Humanos , Oportunidad Relativa

10.

The sufficient minimal set of miRNA seed types.

Ellwanger, Daniel C; Büttner, Florian A; Mewes, Hans-Werner; Stümpflen, Volker.

Bioinformatics ; 27(10): 1346-50, 2011 May 15.

Artículo en Inglés | MEDLINE | ID: mdl-21441577

RESUMEN

MOTIVATION: Pairing between the target sequence and the 6-8 nt long seed sequence of the miRNA presents the most important feature for miRNA target site prediction. Novel high-throughput technologies such as Argonaute HITS-CLIP afford meanwhile a detailed study of miRNA:mRNA duplices. These interaction maps enable a first discrimination between functional and non-functional target sites in a bulky fashion. Prediction algorithms apply different seed paradigms to identify miRNA target sites. Therefore, a quantitative assessment of miRNA target site prediction is of major interest. RESULTS: We identified a set of canonical seed types based on a transcriptome wide analysis of experimentally verified functional target sites. We confirmed the specificity of long seeds but we found that the majority of functional target sites are formed by less specific seeds of only 6 nt indicating a crucial role of this type. A substantial fraction of genuine target sites arenon-conserved. Moreover, the majority of functional sites remain uncovered by common prediction methods.

Asunto(s)

Algoritmos , Perfilación de la Expresión Génica , MicroARNs/química , MicroARNs/genética , Animales , Secuencia de Bases , Factores Eucarióticos de Iniciación/metabolismo , Humanos , Ratones , MicroARNs/metabolismo , Análisis de Secuencia por Matrices de Oligonucleótidos , Oligonucleótidos/genética , Oligonucleótidos/metabolismo , ARN Mensajero/genética , ARN Mensajero/metabolismo

11.

Deciphering the evolution and metabolism of an anammox bacterium from a community genome.

Strous, Marc; Pelletier, Eric; Mangenot, Sophie; Rattei, Thomas; Lehner, Angelika; Taylor, Michael W; Horn, Matthias; Daims, Holger; Bartol-Mavel, Delphine; Wincker, Patrick; Barbe, Valérie; Fonknechten, Nuria; Vallenet, David; Segurens, Béatrice; Schenowitz-Truong, Chantal; Médigue, Claudine; Collingro, Astrid; Snel, Berend; Dutilh, Bas E; Op den Camp, Huub J M; van der Drift, Chris; Cirpus, Irina; van de Pas-Schoonen, Katinka T; Harhangi, Harry R; van Niftrik, Laura; Schmid, Markus; Keltjens, Jan; van de Vossenberg, Jack; Kartal, Boran; Meier, Harald; Frishman, Dmitrij; Huynen, Martijn A; Mewes, Hans-Werner; Weissenbach, Jean; Jetten, Mike S M; Wagner, Michael; Le Paslier, Denis.

Nature ; 440(7085): 790-4, 2006 Apr 06.

Artículo en Inglés | MEDLINE | ID: mdl-16598256

RESUMEN

Anaerobic ammonium oxidation (anammox) has become a main focus in oceanography and wastewater treatment. It is also the nitrogen cycle's major remaining biochemical enigma. Among its features, the occurrence of hydrazine as a free intermediate of catabolism, the biosynthesis of ladderane lipids and the role of cytoplasm differentiation are unique in biology. Here we use environmental genomics--the reconstruction of genomic data directly from the environment--to assemble the genome of the uncultured anammox bacterium Kuenenia stuttgartiensis from a complex bioreactor community. The genome data illuminate the evolutionary history of the Planctomycetes and allow us to expose the genetic blueprint of the organism's special properties. Most significantly, we identified candidate genes responsible for ladderane biosynthesis and biological hydrazine metabolism, and discovered unexpected metabolic versatility.

Asunto(s)

Bacterias/genética , Bacterias/metabolismo , Evolución Biológica , Genoma Bacteriano , Compuestos de Amonio Cuaternario/metabolismo , Anaerobiosis , Bacterias/clasificación , Reactores Biológicos , Evolución Molecular , Ácidos Grasos/biosíntesis , Genes Bacterianos/genética , Hidrazinas/metabolismo , Hidrolasas/metabolismo , Operón/genética , Oxidorreductasas/metabolismo , Filogenia , Termodinámica

12.

SIMAP--a comprehensive database of pre-calculated protein sequence similarities, domains, annotations and clusters.

Rattei, Thomas; Tischler, Patrick; Götz, Stefan; Jehl, Marc-André; Hoser, Jonathan; Arnold, Roland; Conesa, Ana; Mewes, Hans-Werner.

Nucleic Acids Res ; 38(Database issue): D223-6, 2010 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-19906725

RESUMEN

The prediction of protein function as well as the reconstruction of evolutionary genesis employing sequence comparison at large is still the most powerful tool in sequence analysis. Due to the exponential growth of the number of known protein sequences and the subsequent quadratic growth of the similarity matrix, the computation of the Similarity Matrix of Proteins (SIMAP) becomes a computational intensive task. The SIMAP database provides a comprehensive and up-to-date pre-calculation of the protein sequence similarity matrix, sequence-based features and sequence clusters. As of September 2009, SIMAP covers 48 million proteins and more than 23 million non-redundant sequences. Novel features of SIMAP include the expansion of the sequence space by including databases such as ENSEMBL as well as the integration of metagenomes based on their consistent processing and annotation. Furthermore, protein function predictions by Blast2GO are pre-calculated for all sequences in SIMAP and the data access and query functions have been improved. SIMAP assists biologists to query the up-to-date sequence space systematically and facilitates large-scale downstream projects in computational biology. Access to SIMAP is freely provided through the web portal for individuals (http://mips.gsf.de/simap/) and for programmatic access through DAS (http://webclu.bio.wzw.tum.de/das/) and Web-Service (http://mips.gsf.de/webservices/services/SimapService2.0?wsdl).

Asunto(s)

Biología Computacional/métodos , Bases de Datos Genéticas , Bases de Datos de Ácidos Nucleicos , Bases de Datos de Proteínas , Proteínas/química , Animales , Biología Computacional/tendencias , Humanos , Almacenamiento y Recuperación de la Información/métodos , Internet , Sistemas de Lectura Abierta , Estructura Terciaria de Proteína , Análisis de Secuencia de Proteína , Programas Informáticos , Interfaz Usuario-Computador

13.

Sequence-based prediction of type III secreted proteins.

Arnold, Roland; Brandmaier, Stefan; Kleine, Frederick; Tischler, Patrick; Heinz, Eva; Behrens, Sebastian; Niinikoski, Antti; Mewes, Hans-Werner; Horn, Matthias; Rattei, Thomas.

PLoS Pathog ; 5(4): e1000376, 2009 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-19390696

RESUMEN

The type III secretion system (TTSS) is a key mechanism for host cell interaction used by a variety of bacterial pathogens and symbionts of plants and animals including humans. The TTSS represents a molecular syringe with which the bacteria deliver effector proteins directly into the host cell cytosol. Despite the importance of the TTSS for bacterial pathogenesis, recognition and targeting of type III secreted proteins has up until now been poorly understood. Several hypotheses are discussed, including an mRNA-based signal, a chaperon-mediated process, or an N-terminal signal peptide. In this study, we systematically analyzed the amino acid composition and secondary structure of N-termini of 100 experimentally verified effector proteins. Based on this, we developed a machine-learning approach for the prediction of TTSS effector proteins, taking into account N-terminal sequence features such as frequencies of amino acids, short peptides, or residues with certain physico-chemical properties. The resulting computational model revealed a strong type III secretion signal in the N-terminus that can be used to detect effectors with sensitivity of approximately 71% and selectivity of approximately 85%. This signal seems to be taxonomically universal and conserved among animal pathogens and plant symbionts, since we could successfully detect effector proteins if the respective group was excluded from training. The application of our prediction approach to 739 complete bacterial and archaeal genome sequences resulted in the identification of between 0% and 12% putative TTSS effector proteins. Comparison of effector proteins with orthologs that are not secreted by the TTSS showed no clear pattern of signal acquisition by fusion, suggesting convergent evolutionary processes shaping the type III secretion signal. The newly developed program EffectiveT3 (http://www.chlamydiaedb.org) is the first universal in silico prediction program for the identification of novel TTSS effectors. Our findings will facilitate further studies on and improve our understanding of type III secretion and its role in pathogen-host interactions.

Asunto(s)

Proteínas Bacterianas/metabolismo , Biología Computacional/métodos , Bacterias Gramnegativas/química , Señales de Clasificación de Proteína/genética , Secuencia de Aminoácidos , Inteligencia Artificial , Proteínas Bacterianas/química , Chlamydia , Secuencia Conservada , Bases de Datos de Proteínas , Escherichia , Evolución Molecular , Estructura Secundaria de Proteína , Salmonella , Yersinia

14.

PEDANT covers all complete RefSeq genomes.

Walter, Mathias C; Rattei, Thomas; Arnold, Roland; Güldener, Ulrich; Münsterkötter, Martin; Nenova, Karamfilka; Kastenmüller, Gabi; Tischler, Patrick; Wölling, Andreas; Volz, Andreas; Pongratz, Norbert; Jost, Ralf; Mewes, Hans-Werner; Frishman, Dmitrij.

Nucleic Acids Res ; 37(Database issue): D408-11, 2009 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-18940859

RESUMEN

The PEDANT genome database provides exhaustive annotation of nearly 3000 publicly available eukaryotic, eubacterial, archaeal and viral genomes with more than 4.5 million proteins by a broad set of bioinformatics algorithms. In particular, all completely sequenced genomes from the NCBI's Reference Sequence collection (RefSeq) are covered. The PEDANT processing pipeline has been sped up by an order of magnitude through the utilization of precalculated similarity information stored in the similarity matrix of proteins (SIMAP) database, making it possible to process newly sequenced genomes immediately as they become available. PEDANT is freely accessible to academic users at http://pedant.gsf.de. For programmatic access Web Services are available at http://pedant.gsf.de/webservices.jsp.

Asunto(s)

Bases de Datos Genéticas , Genómica , Proteínas/genética , Genoma , Internet

15.

Genetics meets metabolomics: a genome-wide association study of metabolite profiles in human serum.

Gieger, Christian; Geistlinger, Ludwig; Altmaier, Elisabeth; Hrabé de Angelis, Martin; Kronenberg, Florian; Meitinger, Thomas; Mewes, Hans-Werner; Wichmann, H-Erich; Weinberger, Klaus M; Adamski, Jerzy; Illig, Thomas; Suhre, Karsten.

PLoS Genet ; 4(11): e1000282, 2008 Nov.

Artículo en Inglés | MEDLINE | ID: mdl-19043545

RESUMEN

The rapidly evolving field of metabolomics aims at a comprehensive measurement of ideally all endogenous metabolites in a cell or body fluid. It thereby provides a functional readout of the physiological state of the human body. Genetic variants that associate with changes in the homeostasis of key lipids, carbohydrates, or amino acids are not only expected to display much larger effect sizes due to their direct involvement in metabolite conversion modification, but should also provide access to the biochemical context of such variations, in particular when enzyme coding genes are concerned. To test this hypothesis, we conducted what is, to the best of our knowledge, the first GWA study with metabolomics based on the quantitative measurement of 363 metabolites in serum of 284 male participants of the KORA study. We found associations of frequent single nucleotide polymorphisms (SNPs) with considerable differences in the metabolic homeostasis of the human body, explaining up to 12% of the observed variance. Using ratios of certain metabolite concentrations as a proxy for enzymatic activity, up to 28% of the variance can be explained (p-values 10(-16) to 10(-21)). We identified four genetic variants in genes coding for enzymes (FADS1, LIPC, SCAD, MCAD) where the corresponding metabolic phenotype (metabotype) clearly matches the biochemical pathways in which these enzymes are active. Our results suggest that common genetic polymorphisms induce major differentiations in the metabolic make-up of the human population. This may lead to a novel approach to personalized health care based on a combination of genotyping and metabolic characterization. These genetically determined metabotypes may subscribe the risk for a certain medical phenotype, the response to a given drug treatment, or the reaction to a nutritional intervention or environmental challenge.

Asunto(s)

Estudio de Asociación del Genoma Completo/métodos , Compuestos Orgánicos/sangre , Proteínas Sanguíneas/metabolismo , delta-5 Desaturasa de Ácido Graso , Ácido Graso Desaturasas/metabolismo , Genética , Genoma Humano , Humanos , Masculino , Metabolómica/métodos , Fenotipo , Fosfoproteínas/metabolismo , Polimorfismo de Nucleótido Simple , Ubiquitina-Proteína Ligasas/metabolismo

16.

The DICS repository: module-assisted analysis of disease-related gene lists.

Dietmann, Sabine; Georgii, Elisabeth; Antonov, Alexey; Tsuda, Koji; Mewes, Hans-Werner.

Bioinformatics ; 25(6): 830-1, 2009 Mar 15.

Artículo en Inglés | MEDLINE | ID: mdl-19176557

RESUMEN

SUMMARY: The DICS database is a dynamic web repository of computationally predicted functional modules from the human protein-protein interaction network. It provides references to the CORUM, DrugBank, KEGG and Reactome pathway databases. DICS can be accessed for retrieving sets of overlapping modules and protein complexes that are significantly enriched in a gene list, thereby providing valuable information about the functional context. AVAILABILITY: Supplementary information on datasets and methods is available on the web server http://mips.gsf.de/proj/dics.

Asunto(s)

Biología Computacional/métodos , Bases de Datos de Proteínas , Enfermedad/genética , Mapeo de Interacción de Proteínas , Bases de Datos de Proteínas/normas , Genes , Humanos , Internet , Proteínas/química

17.

The minimum information required for reporting a molecular interaction experiment (MIMIx).

Orchard, Sandra; Salwinski, Lukasz; Kerrien, Samuel; Montecchi-Palazzi, Luisa; Oesterheld, Matthias; Stümpflen, Volker; Ceol, Arnaud; Chatr-aryamontri, Andrew; Armstrong, John; Woollard, Peter; Salama, John J; Moore, Susan; Wojcik, Jérôme; Bader, Gary D; Vidal, Marc; Cusick, Michael E; Gerstein, Mark; Gavin, Anne-Claude; Superti-Furga, Giulio; Greenblatt, Jack; Bader, Joel; Uetz, Peter; Tyers, Mike; Legrain, Pierre; Fields, Stan; Mulder, Nicola; Gilson, Michael; Niepmann, Michael; Burgoon, Lyle; De Las Rivas, Javier; Prieto, Carlos; Perreau, Victoria M; Hogue, Chris; Mewes, Hans-Werner; Apweiler, Rolf; Xenarios, Ioannis; Eisenberg, David; Cesareni, Gianni; Hermjakob, Henning.

Nat Biotechnol ; 25(8): 894-8, 2007 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-17687370

RESUMEN

A wealth of molecular interaction data is available in the literature, ranging from large-scale datasets to a single interaction confirmed by several different techniques. These data are all too often reported either as free text or in tables of variable format, and are often missing key pieces of information essential for a full understanding of the experiment. Here we propose MIMIx, the minimum information required for reporting a molecular interaction experiment. Adherence to these reporting guidelines will result in publications of increased clarity and usefulness to the scientific community and will support the rapid, systematic capture of molecular interaction data in public databases, thereby improving access to valuable interaction data.

Asunto(s)

Bases de Datos de Proteínas/normas , Guías como Asunto , Almacenamiento y Recuperación de la Información/normas , Mapeo de Interacción de Proteínas/normas , Proteómica/normas , Investigación/normas , Humanos , Internacionalidad

18.

An environmental perspective on large-scale genome clustering based on metabolic capabilities.

Kastenmüller, Gabi; Gasteiger, Johann; Mewes, Hans-Werner.

Bioinformatics ; 24(16): i56-62, 2008 Aug 15.

Artículo en Inglés | MEDLINE | ID: mdl-18689840

RESUMEN

MOTIVATION: In principle, an organism's ability to survive in a speci.c environment, is an observable result of the organism's regulatory and metabolic capabilities. Nonetheless, current knowledge about the global relation of the metabolisms and the niches of organisms is still limited. RESULTS: In order to further investigate this relation, we grouped species showing similar metabolic capabilities and systematically mapped their habitats onto these groups. For this purpose, we predicted the metabolic capabilities for 214 sequenced genomes. Based on these predictions, we grouped the genomes by hierarchical clustering. Finally, we mapped different environmental conditions and diseases related to the genomes onto the resulting clusters. This mapping uncovered several conditions and diseases that were unexpectedly enriched in clusters of metabolically similar species. As an example, Encephalitozoon cuniculi--a microsporidian causing a multisystemic disease accompanied by CNS problems in rabbits--occurred in the same metabolism-based cluster as bacteria causing similar symptoms in humans. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Mapeo Cromosómico/métodos , Análisis por Conglomerados , Ambiente , Regulación de la Expresión Génica/genética , Proteoma/genética , Proteoma/metabolismo , Selección Genética , Evolución Biológica , Simulación por Computador , Variación Genética/genética , Modelos Genéticos

19.

Beyond the 'best' match: machine learning annotation of protein sequences by integration of different sources of information.

Tetko, Igor V; Rodchenkov, Igor V; Walter, Mathias C; Rattei, Thomas; Mewes, Hans-Werner.

Bioinformatics ; 24(5): 621-8, 2008 Mar 01.

Artículo en Inglés | MEDLINE | ID: mdl-18174184

RESUMEN

MOTIVATION: Accurate automatic assignment of protein functions remains a challenge for genome annotation. We have developed and compared the automatic annotation of four bacterial genomes employing a 5-fold cross-validation procedure and several machine learning methods. RESULTS: The analyzed genomes were manually annotated with FunCat categories in MIPS providing a gold standard. Features describing a pair of sequences rather than each sequence alone were used. The descriptors were derived from sequence alignment scores, InterPro domains, synteny information, sequence length and calculated protein properties. Following training we scored all pairs from the validation sets, selected a pair with the highest predicted score and annotated the target protein with functional categories of the prototype protein. The data integration using machine-learning methods provided significantly higher annotation accuracy compared to the use of individual descriptors alone. The neural network approach showed the best performance. The descriptors derived from the InterPro domains and sequence similarity provided the highest contribution to the method performance. The predicted annotation scores allow differentiation of reliable versus non-reliable annotations. The developed approach was applied to annotate the protein sequences from 180 complete bacterial genomes. AVAILABILITY: The FUNcat Annotation Tool (FUNAT) is available on-line as Web Services at http://mips.gsf.de/proj/funat.

Asunto(s)

Proteínas Bacterianas/química , Algoritmos , Proteínas Bacterianas/genética , Genoma Bacteriano

20.

Approaching clinical proteomics: current state and future fields of application in cellular proteomics.

Apweiler, Rolf; Aslanidis, Charalampos; Deufel, Thomas; Gerstner, Andreas; Hansen, Jens; Hochstrasser, Dennis; Kellner, Roland; Kubicek, Markus; Lottspeich, Friedrich; Maser, Edmund; Mewes, Hans-Werner; Meyer, Helmut E; Müllner, Stefan; Mutter, Wolfgang; Neumaier, Michael; Nollau, Peter; Nothwang, Hans G; Ponten, Fredrik; Radbruch, Andreas; Reinert, Knut; Rothe, Gregor; Stockinger, Hannes; Tárnok, Attila; Taussig, Mike J; Thiel, Andreas; Thiery, Joachim; Ueffing, Marius; Valet, Günther; Vandekerckhove, Joel; Wagener, Christoph; Wagner, Oswald; Schmitz, Gerd.

Cytometry A ; 75(10): 816-32, 2009 Oct.

Artículo en Inglés | MEDLINE | ID: mdl-19739086

RESUMEN

Recent developments in proteomics technology offer new opportunities for clinical applications in hospital or specialized laboratories including the identification of novel biomarkers, monitoring of disease, detecting adverse effects of drugs, and environmental hazards. Advanced spectrometry technologies and the development of new protein array formats have brought these analyses to a standard, which now has the potential to be used in clinical diagnostics. Besides standardization of methodologies and distribution of proteomic data into public databases, the nature of the human body fluid proteome with its high dynamic range in protein concentrations, its quantitation problems, and its extreme complexity present enormous challenges. Molecular cell biology (cytomics) with its link to proteomics is a new fast moving scientific field, which addresses functional cell analysis and bioinformatic approaches to search for novel cellular proteomic biomarkers or their release products into body fluids that provide better insight into the enormous biocomplexity of disease processes and are suitable for patient stratification, therapeutic monitoring, and prediction of prognosis. Experience from studies of in vitro diagnostics and especially in clinical chemistry showed that the majority of errors occurs in the preanalytical phase and the setup of the diagnostic strategy. This is also true for clinical proteomics where similar preanalytical variables such as inter- and intra-assay variability due to biological variations or proteolytical activities in the sample will most likely also influence the results of proteomics studies. However, before complex proteomic analysis can be introduced at a broader level into the clinic, standardization of the preanalytical phase including patient preparation, sample collection, sample preparation, sample storage, measurement, and data analysis is another issue which has to be improved. In this report, we discuss the recent advances and applications that fulfill the criteria for clinical proteomics with the focus on cellular proteomics (cytoproteomics) as related to preanalytical and analytical standardization and to quality control measures required for effective implementation of these technologies and analytes into routine laboratory testing to generate novel actionable health information. It will then be crucial to design and carry out clinical studies that can eventually identify novel clinical diagnostic strategies based on these techniques and validate their impact on clinical decision making.

Asunto(s)

Células/metabolismo , Proteómica/métodos , Proteómica/tendencias , Métodos Analíticos de la Preparación de la Muestra , Biología Computacional , Humanos , Proteómica/normas , Estadística como Asunto

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA