Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 58
Filter
Add more filters

Country/Region as subject
Publication year range
1.
PLoS Comput Biol ; 16(2): e1007613, 2020 02.
Article in English | MEDLINE | ID: mdl-32032351

ABSTRACT

There is an increasing need to use genome and transcriptome sequencing to genetically diagnose patients suffering from suspected monogenic rare diseases. The proper detection of compound heterozygous variant combinations as disease-causing candidates is a challenge in diagnostic workflows as haplotype information is lost by currently used next-generation sequencing technologies. Consequently, computational tools are required to phase, or resolve the haplotype of, the high number of heterozygous variants in the exome or genome of each patient. Here we present SmartPhase, a phasing tool designed to efficiently reduce the set of potential compound heterozygous variant pairs in genetic diagnoses pipelines. The phasing algorithm of SmartPhase creates haplotypes using both parental genotype information and reads generated by DNA or RNA sequencing and is thus well suited to resolve the phase of rare variants. To inform the user about the reliability of a phasing prediction, it computes a confidence score which is essential to select error-free predictions. It incorporates existing haplotype information and applies logical rules to determine variants that can be excluded as causing a recessive, monogenic disease. SmartPhase can phase either all possible variant pairs in predefined genetic loci or preselected variant pairs of interest, thus keeping the focus on clinically relevant results. We compared SmartPhase to WhatsHap, one of the leading comparable phasing tools, using simulated data and a real clinical cohort of 921 patients. On both data sets, SmartPhase generated error-free predictions using our derived confidence score threshold. It outperformed WhatsHap with regard to the percentage of resolved pairs when parental genotype information is available. On the cohort data, SmartPhase enabled on average the exclusion of approximately 22% of the input variant pairs in each singleton patient and 44% in each trio patient. SmartPhase is implemented as an open-source Java tool and freely available at http://ibis.helmholtz-muenchen.de/smartphase/.


Subject(s)
Heterozygote , Rare Diseases/diagnosis , Haplotypes , High-Throughput Nucleotide Sequencing/methods , Humans , Rare Diseases/genetics , Reproducibility of Results
2.
Yeast ; 36(4): 161-165, 2019 04.
Article in English | MEDLINE | ID: mdl-30650215

ABSTRACT

From 1989 to 1997, the yeast genome was sequenced by a worldwide international consortium initiated and conducted by André Goffeau (1935-2018). The article describes the pioneering collaboration of yeast scientists from a bioinformatics perspective. Indeed, the yeast genome has turned bioinformatics from an exotic hobby of few nerds into a discipline indispensable for answering biological questions using computational methods.


Subject(s)
Computational Biology/history , Genome, Fungal , Saccharomyces cerevisiae/genetics , History, 20th Century , History, 21st Century
3.
Nature ; 477(7362): 54-60, 2011 Aug 31.
Article in English | MEDLINE | ID: mdl-21886157

ABSTRACT

Genome-wide association studies (GWAS) have identified many risk loci for complex diseases, but effect sizes are typically small and information on the underlying biological processes is often lacking. Associations with metabolic traits as functional intermediates can overcome these problems and potentially inform individualized therapy. Here we report a comprehensive analysis of genotype-dependent metabolic phenotypes using a GWAS with non-targeted metabolomics. We identified 37 genetic loci associated with blood metabolite concentrations, of which 25 show effect sizes that are unusually high for GWAS and account for 10-60% differences in metabolite levels per allele copy. Our associations provide new functional insights for many disease-related associations that have been reported in previous studies, including those for cardiovascular and kidney disorders, type 2 diabetes, cancer, gout, venous thromboembolism and Crohn's disease. The study advances our knowledge of the genetic basis of metabolic individuality in humans and generates many new hypotheses for biomedical and pharmaceutical research.


Subject(s)
Biomedical Research , Drug Industry , Genetic Variation , Genome-Wide Association Study , Metabolism/genetics , Adolescent , Adult , Aged , Aged, 80 and over , Blood/metabolism , Child , Chronic Disease , Coronary Artery Disease/genetics , Diabetes Mellitus/genetics , Female , Genetic Loci/genetics , Genotype , Humans , Male , Metabolomics , Middle Aged , Pharmacogenetics , Renal Insufficiency/genetics , Risk Factors , Venous Thromboembolism/genetics , Young Adult
4.
Nucleic Acids Res ; 42(21)2014 Dec 01.
Article in English | MEDLINE | ID: mdl-25294834

ABSTRACT

Understanding how regulatory networks globally coordinate the response of a cell to changing conditions, such as perturbations by shifting environments, is an elementary challenge in systems biology which has yet to be met. Genome-wide gene expression measurements are high dimensional as these are reflecting the condition-specific interplay of thousands of cellular components. The integration of prior biological knowledge into the modeling process of systems-wide gene regulation enables the large-scale interpretation of gene expression signals in the context of known regulatory relations. We developed COGERE (http://mips.helmholtz-muenchen.de/cogere), a method for the inference of condition-specific gene regulatory networks in human and mouse. We integrated existing knowledge of regulatory interactions from multiple sources to a comprehensive model of prior information. COGERE infers condition-specific regulation by evaluating the mutual dependency between regulator (transcription factor or miRNA) and target gene expression using prior information. This dependency is scored by the non-parametric, nonlinear correlation coefficient η(2) (eta squared) that is derived by a two-way analysis of variance. We show that COGERE significantly outperforms alternative methods in predicting condition-specific gene regulatory networks on simulated data sets. Furthermore, by inferring the cancer-specific gene regulatory network from the NCI-60 expression study, we demonstrate the utility of COGERE to promote hypothesis-driven clinical research.


Subject(s)
Gene Regulatory Networks , Models, Genetic , Animals , Cell Line, Tumor , Gene Expression Profiling , Humans , Mice , MicroRNAs/metabolism , Neoplasms/genetics , Transcription Factors/metabolism
5.
Nucleic Acids Res ; 42(Database issue): D279-84, 2014 Jan.
Article in English | MEDLINE | ID: mdl-24165881

ABSTRACT

The Similarity Matrix of Proteins (SIMAP, http://mips.gsf.de/simap/) database has been designed to massively accelerate computationally expensive protein sequence analysis tasks in bioinformatics. It provides pre-calculated sequence similarities interconnecting the entire known protein sequence universe, complemented by pre-calculated protein features and domains, similarity clusters and functional annotations. SIMAP covers all major public protein databases as well as many consistently re-annotated metagenomes from different repositories. As of September 2013, SIMAP contains >163 million proteins corresponding to ∼70 million non-redundant sequences. SIMAP uses the sensitive FASTA search heuristics, the Smith-Waterman alignment algorithm, the InterPro database of protein domain models and the BLAST2GO functional annotation algorithm. SIMAP assists biologists by facilitating the interactive exploration of the protein sequence universe. Web-Service and DAS interfaces allow connecting SIMAP with any other bioinformatic tool and resource. All-against-all protein sequence similarity matrices of project-specific protein collections are generated on request. Recent improvements allow SIMAP to cover the rapidly growing sequenced protein sequence universe. New Web-Service interfaces enhance the connectivity of SIMAP. Novel tools for interactive extraction of protein similarity networks have been added. Open access to SIMAP is provided through the web portal; the portal also contains instructions and links for software access and flat file downloads.


Subject(s)
Databases, Protein , Molecular Sequence Annotation , Sequence Analysis, Protein , Internet , Protein Structure, Tertiary , Sequence Alignment , User-Computer Interface
6.
Neurogenetics ; 15(1): 49-57, 2014 Mar.
Article in English | MEDLINE | ID: mdl-24241507

ABSTRACT

Approximately 20 % of individuals with Parkinson's disease (PD) report a positive family history. Yet, a large portion of causal and disease-modifying variants is still unknown. We used exome sequencing in two affected individuals from a family with late-onset PD to identify 15 potentially causal variants. Segregation analysis and frequency assessment in 862 PD cases and 1,014 ethnically matched controls highlighted variants in EEF1D and LRRK1 as the best candidates. Mutation screening of the coding regions of these genes in 862 cases and 1,014 controls revealed several novel non-synonymous variants in both genes in cases and controls. An in silico multi-model bioinformatics analysis was used to prioritize identified variants in LRRK1 for functional follow-up. However, protein expression, subcellular localization, and cell viability were not affected by the identified variants. Although it has yet to be proven conclusively that variants in LRRK1 are indeed causative of PD, our data strengthen a possible role for LRRK1 in addition to LRRK2 in the genetic underpinnings of PD but, at the same time, highlight the difficulties encountered in the study of rare variants identified by next-generation sequencing in diseases with autosomal dominant or complex patterns of inheritance.


Subject(s)
Genetic Variation , Parkinson Disease/genetics , Protein Serine-Threonine Kinases/genetics , Algorithms , Cell Survival , DNA Mutational Analysis , Exome , Family Health , Female , Gene Dosage , Gene Frequency , Genetic Predisposition to Disease , Genotype , Germany , Humans , Male , Middle Aged , Models, Genetic , Mutation , Oligonucleotide Array Sequence Analysis , Peptide Elongation Factor 1/genetics , Phenotype
7.
Mol Plant Microbe Interact ; 26(7): 781-92, 2013 Jul.
Article in English | MEDLINE | ID: mdl-23550529

ABSTRACT

Plant small-molecule UDP-glycosyltransferases (UGT) glycosylate a vast number of endogenous substances but also act in detoxification of metabolites produced by plant-pathogenic microorganisms. The ability to inactivate the Fusarium graminearum mycotoxin deoxynivalenol (DON) into DON-3-O-glucoside is crucial for resistance of cereals. We analyzed the UGT gene family of the monocot model species Brachypodium distachyon and functionally characterized two gene clusters containing putative orthologs of previously identified DON-detoxification genes from Arabidopsis thaliana and barley. Analysis of transcription showed that UGT encoded in both clusters are highly inducible by DON and expressed at much higher levels upon infection with a wild-type DON-producing F. graminearum strain compared with infection with a mutant deficient in DON production. Expression of these genes in a toxin-sensitive strain of Saccharomyces cerevisiae revealed that only two B. distachyon UGT encoded by members of a cluster of six genes homologous to the DON-inactivating barley HvUGT13248 were able to convert DON into DON-3-O-glucoside. Also, a single copy gene from Sorghum bicolor orthologous to this cluster and one of three putative orthologs of rice exhibit this ability. Seemingly, the UGT genes undergo rapid evolution and changes in copy number, making it difficult to identify orthologs with conserved substrate specificity.


Subject(s)
Brachypodium/enzymology , Fusarium/pathogenicity , Glycosyltransferases/metabolism , Plant Diseases/microbiology , Trichothecenes/metabolism , Amino Acid Sequence , Brachypodium/genetics , Fusarium/chemistry , Gene Dosage , Gene Expression Regulation, Plant , Gene Order , Glucosides/metabolism , Glycosyltransferases/genetics , Molecular Sequence Data , Multigene Family , Mutation , Mycotoxins/genetics , Mycotoxins/metabolism , Oryza/enzymology , Oryza/genetics , Phylogeny , Plant Proteins/genetics , Plant Proteins/metabolism , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae/metabolism , Sorghum/enzymology , Sorghum/genetics , Species Specificity , Synteny
8.
Nucleic Acids Res ; 39(Database issue): D637-9, 2011 Jan.
Article in English | MEDLINE | ID: mdl-21051345

ABSTRACT

The MIPS Fusarium graminearum Genome Database (FGDB) was established as a comprehensive genome database on one of the most devastating fungal plant pathogens of wheat, barley and maize. The current version of FGDB v3.1 provides information on the full manually revised gene set based on the Broad Institute assembly FG3 genome sequence. The results of gene prediction tools were integrated with the help of comparative data on related species to result in a set of 13.718 annotated protein coding genes. This rigorous approach involved adding or modifying gene models and represents a coding sequence gold standard for the genus Fusarium. The gene loci improvements results in 2461 genes which either are new or have different structures compared to the Broad Institute assembly 3 gene set. Moreover the database serves as a convenient entry point to explore expression data results and to obtain information on the Affymetrix GeneChip probe sets. The resource is accessible on http://mips.gsf.de/genre/proj/FGDB/.


Subject(s)
Databases, Genetic , Fusarium/genetics , Fungal Proteins/genetics , Fusarium/metabolism , Gene Expression Profiling , Genome, Fungal , Molecular Sequence Annotation
9.
BMC Genomics ; 13: 490, 2012 Sep 18.
Article in English | MEDLINE | ID: mdl-22988944

ABSTRACT

BACKGROUND: Genome-wide association studies (GWAS) have provided a large set of genetic loci influencing the risk for many common diseases. Association studies typically analyze one specific trait in single populations in an isolated fashion without taking into account the potential phenotypic and genetic correlation between traits. However, GWA data can be efficiently used to identify overlapping loci with analogous or contrasting effects on different diseases. RESULTS: Here, we describe a new approach to systematically prioritize and interpret available GWA data. We focus on the analysis of joint and disjoint genetic determinants across diseases. Using network analysis, we show that variant-based approaches are superior to locus-based analyses. In addition, we provide a prioritization of disease loci based on network properties and discuss the roles of hub loci across several diseases. We demonstrate that, in general, agonistic associations appear to reflect current disease classifications, and present the potential use of effect sizes in refining and revising these agonistic signals. We further identify potential branching points in disease etiologies based on antagonistic variants and describe plausible small-scale models of the underlying molecular switches. CONCLUSIONS: The observation that a surprisingly high fraction (>15%) of the SNPs considered in our study are associated both agonistically and antagonistically with related as well as unrelated disorders indicates that the molecular mechanisms influencing causes and progress of human diseases are in part interrelated. Genetic overlaps between two diseases also suggest the importance of the affected entities in the specific pathogenic pathways and should be investigated further.


Subject(s)
Genome-Wide Association Study , Polymorphism, Single Nucleotide , Cluster Analysis , Genetic Loci , Genome, Human , Humans , Odds Ratio
10.
Bioinformatics ; 27(10): 1346-50, 2011 May 15.
Article in English | MEDLINE | ID: mdl-21441577

ABSTRACT

MOTIVATION: Pairing between the target sequence and the 6-8 nt long seed sequence of the miRNA presents the most important feature for miRNA target site prediction. Novel high-throughput technologies such as Argonaute HITS-CLIP afford meanwhile a detailed study of miRNA:mRNA duplices. These interaction maps enable a first discrimination between functional and non-functional target sites in a bulky fashion. Prediction algorithms apply different seed paradigms to identify miRNA target sites. Therefore, a quantitative assessment of miRNA target site prediction is of major interest. RESULTS: We identified a set of canonical seed types based on a transcriptome wide analysis of experimentally verified functional target sites. We confirmed the specificity of long seeds but we found that the majority of functional target sites are formed by less specific seeds of only 6 nt indicating a crucial role of this type. A substantial fraction of genuine target sites arenon-conserved. Moreover, the majority of functional sites remain uncovered by common prediction methods.


Subject(s)
Algorithms , Gene Expression Profiling , MicroRNAs/chemistry , MicroRNAs/genetics , Animals , Base Sequence , Eukaryotic Initiation Factors/metabolism , Humans , Mice , MicroRNAs/metabolism , Oligonucleotide Array Sequence Analysis , Oligonucleotides/genetics , Oligonucleotides/metabolism , RNA, Messenger/genetics , RNA, Messenger/metabolism
11.
Nature ; 440(7085): 790-4, 2006 Apr 06.
Article in English | MEDLINE | ID: mdl-16598256

ABSTRACT

Anaerobic ammonium oxidation (anammox) has become a main focus in oceanography and wastewater treatment. It is also the nitrogen cycle's major remaining biochemical enigma. Among its features, the occurrence of hydrazine as a free intermediate of catabolism, the biosynthesis of ladderane lipids and the role of cytoplasm differentiation are unique in biology. Here we use environmental genomics--the reconstruction of genomic data directly from the environment--to assemble the genome of the uncultured anammox bacterium Kuenenia stuttgartiensis from a complex bioreactor community. The genome data illuminate the evolutionary history of the Planctomycetes and allow us to expose the genetic blueprint of the organism's special properties. Most significantly, we identified candidate genes responsible for ladderane biosynthesis and biological hydrazine metabolism, and discovered unexpected metabolic versatility.


Subject(s)
Bacteria/genetics , Bacteria/metabolism , Biological Evolution , Genome, Bacterial , Quaternary Ammonium Compounds/metabolism , Anaerobiosis , Bacteria/classification , Bioreactors , Evolution, Molecular , Fatty Acids/biosynthesis , Genes, Bacterial/genetics , Hydrazines/metabolism , Hydrolases/metabolism , Operon/genetics , Oxidoreductases/metabolism , Phylogeny , Thermodynamics
12.
Nucleic Acids Res ; 38(Database issue): D223-6, 2010 Jan.
Article in English | MEDLINE | ID: mdl-19906725

ABSTRACT

The prediction of protein function as well as the reconstruction of evolutionary genesis employing sequence comparison at large is still the most powerful tool in sequence analysis. Due to the exponential growth of the number of known protein sequences and the subsequent quadratic growth of the similarity matrix, the computation of the Similarity Matrix of Proteins (SIMAP) becomes a computational intensive task. The SIMAP database provides a comprehensive and up-to-date pre-calculation of the protein sequence similarity matrix, sequence-based features and sequence clusters. As of September 2009, SIMAP covers 48 million proteins and more than 23 million non-redundant sequences. Novel features of SIMAP include the expansion of the sequence space by including databases such as ENSEMBL as well as the integration of metagenomes based on their consistent processing and annotation. Furthermore, protein function predictions by Blast2GO are pre-calculated for all sequences in SIMAP and the data access and query functions have been improved. SIMAP assists biologists to query the up-to-date sequence space systematically and facilitates large-scale downstream projects in computational biology. Access to SIMAP is freely provided through the web portal for individuals (http://mips.gsf.de/simap/) and for programmatic access through DAS (http://webclu.bio.wzw.tum.de/das/) and Web-Service (http://mips.gsf.de/webservices/services/SimapService2.0?wsdl).


Subject(s)
Computational Biology/methods , Databases, Genetic , Databases, Nucleic Acid , Databases, Protein , Proteins/chemistry , Animals , Computational Biology/trends , Humans , Information Storage and Retrieval/methods , Internet , Open Reading Frames , Protein Structure, Tertiary , Sequence Analysis, Protein , Software , User-Computer Interface
13.
PLoS Pathog ; 5(4): e1000376, 2009 Apr.
Article in English | MEDLINE | ID: mdl-19390696

ABSTRACT

The type III secretion system (TTSS) is a key mechanism for host cell interaction used by a variety of bacterial pathogens and symbionts of plants and animals including humans. The TTSS represents a molecular syringe with which the bacteria deliver effector proteins directly into the host cell cytosol. Despite the importance of the TTSS for bacterial pathogenesis, recognition and targeting of type III secreted proteins has up until now been poorly understood. Several hypotheses are discussed, including an mRNA-based signal, a chaperon-mediated process, or an N-terminal signal peptide. In this study, we systematically analyzed the amino acid composition and secondary structure of N-termini of 100 experimentally verified effector proteins. Based on this, we developed a machine-learning approach for the prediction of TTSS effector proteins, taking into account N-terminal sequence features such as frequencies of amino acids, short peptides, or residues with certain physico-chemical properties. The resulting computational model revealed a strong type III secretion signal in the N-terminus that can be used to detect effectors with sensitivity of approximately 71% and selectivity of approximately 85%. This signal seems to be taxonomically universal and conserved among animal pathogens and plant symbionts, since we could successfully detect effector proteins if the respective group was excluded from training. The application of our prediction approach to 739 complete bacterial and archaeal genome sequences resulted in the identification of between 0% and 12% putative TTSS effector proteins. Comparison of effector proteins with orthologs that are not secreted by the TTSS showed no clear pattern of signal acquisition by fusion, suggesting convergent evolutionary processes shaping the type III secretion signal. The newly developed program EffectiveT3 (http://www.chlamydiaedb.org) is the first universal in silico prediction program for the identification of novel TTSS effectors. Our findings will facilitate further studies on and improve our understanding of type III secretion and its role in pathogen-host interactions.


Subject(s)
Bacterial Proteins/metabolism , Computational Biology/methods , Gram-Negative Bacteria/chemistry , Protein Sorting Signals/genetics , Amino Acid Sequence , Artificial Intelligence , Bacterial Proteins/chemistry , Chlamydia , Conserved Sequence , Databases, Protein , Escherichia , Evolution, Molecular , Protein Structure, Secondary , Salmonella , Yersinia
14.
Nucleic Acids Res ; 37(Database issue): D408-11, 2009 Jan.
Article in English | MEDLINE | ID: mdl-18940859

ABSTRACT

The PEDANT genome database provides exhaustive annotation of nearly 3000 publicly available eukaryotic, eubacterial, archaeal and viral genomes with more than 4.5 million proteins by a broad set of bioinformatics algorithms. In particular, all completely sequenced genomes from the NCBI's Reference Sequence collection (RefSeq) are covered. The PEDANT processing pipeline has been sped up by an order of magnitude through the utilization of precalculated similarity information stored in the similarity matrix of proteins (SIMAP) database, making it possible to process newly sequenced genomes immediately as they become available. PEDANT is freely accessible to academic users at http://pedant.gsf.de. For programmatic access Web Services are available at http://pedant.gsf.de/webservices.jsp.


Subject(s)
Databases, Genetic , Genomics , Proteins/genetics , Genome , Internet
15.
PLoS Genet ; 4(11): e1000282, 2008 Nov.
Article in English | MEDLINE | ID: mdl-19043545

ABSTRACT

The rapidly evolving field of metabolomics aims at a comprehensive measurement of ideally all endogenous metabolites in a cell or body fluid. It thereby provides a functional readout of the physiological state of the human body. Genetic variants that associate with changes in the homeostasis of key lipids, carbohydrates, or amino acids are not only expected to display much larger effect sizes due to their direct involvement in metabolite conversion modification, but should also provide access to the biochemical context of such variations, in particular when enzyme coding genes are concerned. To test this hypothesis, we conducted what is, to the best of our knowledge, the first GWA study with metabolomics based on the quantitative measurement of 363 metabolites in serum of 284 male participants of the KORA study. We found associations of frequent single nucleotide polymorphisms (SNPs) with considerable differences in the metabolic homeostasis of the human body, explaining up to 12% of the observed variance. Using ratios of certain metabolite concentrations as a proxy for enzymatic activity, up to 28% of the variance can be explained (p-values 10(-16) to 10(-21)). We identified four genetic variants in genes coding for enzymes (FADS1, LIPC, SCAD, MCAD) where the corresponding metabolic phenotype (metabotype) clearly matches the biochemical pathways in which these enzymes are active. Our results suggest that common genetic polymorphisms induce major differentiations in the metabolic make-up of the human population. This may lead to a novel approach to personalized health care based on a combination of genotyping and metabolic characterization. These genetically determined metabotypes may subscribe the risk for a certain medical phenotype, the response to a given drug treatment, or the reaction to a nutritional intervention or environmental challenge.


Subject(s)
Genome-Wide Association Study/methods , Organic Chemicals/blood , Blood Proteins/metabolism , Delta-5 Fatty Acid Desaturase , Fatty Acid Desaturases/metabolism , Genetics , Genome, Human , Humans , Male , Metabolomics/methods , Phenotype , Phosphoproteins/metabolism , Polymorphism, Single Nucleotide , Ubiquitin-Protein Ligases/metabolism
16.
Bioinformatics ; 25(6): 830-1, 2009 Mar 15.
Article in English | MEDLINE | ID: mdl-19176557

ABSTRACT

SUMMARY: The DICS database is a dynamic web repository of computationally predicted functional modules from the human protein-protein interaction network. It provides references to the CORUM, DrugBank, KEGG and Reactome pathway databases. DICS can be accessed for retrieving sets of overlapping modules and protein complexes that are significantly enriched in a gene list, thereby providing valuable information about the functional context. AVAILABILITY: Supplementary information on datasets and methods is available on the web server http://mips.gsf.de/proj/dics.


Subject(s)
Computational Biology/methods , Databases, Protein , Disease/genetics , Protein Interaction Mapping , Databases, Protein/standards , Genes , Humans , Internet , Proteins/chemistry
17.
Nat Biotechnol ; 25(8): 894-8, 2007 Aug.
Article in English | MEDLINE | ID: mdl-17687370

ABSTRACT

A wealth of molecular interaction data is available in the literature, ranging from large-scale datasets to a single interaction confirmed by several different techniques. These data are all too often reported either as free text or in tables of variable format, and are often missing key pieces of information essential for a full understanding of the experiment. Here we propose MIMIx, the minimum information required for reporting a molecular interaction experiment. Adherence to these reporting guidelines will result in publications of increased clarity and usefulness to the scientific community and will support the rapid, systematic capture of molecular interaction data in public databases, thereby improving access to valuable interaction data.


Subject(s)
Databases, Protein/standards , Guidelines as Topic , Information Storage and Retrieval/standards , Protein Interaction Mapping/standards , Proteomics/standards , Research/standards , Humans , Internationality
18.
Bioinformatics ; 24(16): i56-62, 2008 Aug 15.
Article in English | MEDLINE | ID: mdl-18689840

ABSTRACT

MOTIVATION: In principle, an organism's ability to survive in a speci.c environment, is an observable result of the organism's regulatory and metabolic capabilities. Nonetheless, current knowledge about the global relation of the metabolisms and the niches of organisms is still limited. RESULTS: In order to further investigate this relation, we grouped species showing similar metabolic capabilities and systematically mapped their habitats onto these groups. For this purpose, we predicted the metabolic capabilities for 214 sequenced genomes. Based on these predictions, we grouped the genomes by hierarchical clustering. Finally, we mapped different environmental conditions and diseases related to the genomes onto the resulting clusters. This mapping uncovered several conditions and diseases that were unexpectedly enriched in clusters of metabolically similar species. As an example, Encephalitozoon cuniculi--a microsporidian causing a multisystemic disease accompanied by CNS problems in rabbits--occurred in the same metabolism-based cluster as bacteria causing similar symptoms in humans. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Chromosome Mapping/methods , Cluster Analysis , Environment , Gene Expression Regulation/genetics , Proteome/genetics , Proteome/metabolism , Selection, Genetic , Biological Evolution , Computer Simulation , Genetic Variation/genetics , Models, Genetic
19.
Bioinformatics ; 24(5): 621-8, 2008 Mar 01.
Article in English | MEDLINE | ID: mdl-18174184

ABSTRACT

MOTIVATION: Accurate automatic assignment of protein functions remains a challenge for genome annotation. We have developed and compared the automatic annotation of four bacterial genomes employing a 5-fold cross-validation procedure and several machine learning methods. RESULTS: The analyzed genomes were manually annotated with FunCat categories in MIPS providing a gold standard. Features describing a pair of sequences rather than each sequence alone were used. The descriptors were derived from sequence alignment scores, InterPro domains, synteny information, sequence length and calculated protein properties. Following training we scored all pairs from the validation sets, selected a pair with the highest predicted score and annotated the target protein with functional categories of the prototype protein. The data integration using machine-learning methods provided significantly higher annotation accuracy compared to the use of individual descriptors alone. The neural network approach showed the best performance. The descriptors derived from the InterPro domains and sequence similarity provided the highest contribution to the method performance. The predicted annotation scores allow differentiation of reliable versus non-reliable annotations. The developed approach was applied to annotate the protein sequences from 180 complete bacterial genomes. AVAILABILITY: The FUNcat Annotation Tool (FUNAT) is available on-line as Web Services at http://mips.gsf.de/proj/funat.


Subject(s)
Bacterial Proteins/chemistry , Algorithms , Bacterial Proteins/genetics , Genome, Bacterial
20.
Cytometry A ; 75(10): 816-32, 2009 Oct.
Article in English | MEDLINE | ID: mdl-19739086

ABSTRACT

Recent developments in proteomics technology offer new opportunities for clinical applications in hospital or specialized laboratories including the identification of novel biomarkers, monitoring of disease, detecting adverse effects of drugs, and environmental hazards. Advanced spectrometry technologies and the development of new protein array formats have brought these analyses to a standard, which now has the potential to be used in clinical diagnostics. Besides standardization of methodologies and distribution of proteomic data into public databases, the nature of the human body fluid proteome with its high dynamic range in protein concentrations, its quantitation problems, and its extreme complexity present enormous challenges. Molecular cell biology (cytomics) with its link to proteomics is a new fast moving scientific field, which addresses functional cell analysis and bioinformatic approaches to search for novel cellular proteomic biomarkers or their release products into body fluids that provide better insight into the enormous biocomplexity of disease processes and are suitable for patient stratification, therapeutic monitoring, and prediction of prognosis. Experience from studies of in vitro diagnostics and especially in clinical chemistry showed that the majority of errors occurs in the preanalytical phase and the setup of the diagnostic strategy. This is also true for clinical proteomics where similar preanalytical variables such as inter- and intra-assay variability due to biological variations or proteolytical activities in the sample will most likely also influence the results of proteomics studies. However, before complex proteomic analysis can be introduced at a broader level into the clinic, standardization of the preanalytical phase including patient preparation, sample collection, sample preparation, sample storage, measurement, and data analysis is another issue which has to be improved. In this report, we discuss the recent advances and applications that fulfill the criteria for clinical proteomics with the focus on cellular proteomics (cytoproteomics) as related to preanalytical and analytical standardization and to quality control measures required for effective implementation of these technologies and analytes into routine laboratory testing to generate novel actionable health information. It will then be crucial to design and carry out clinical studies that can eventually identify novel clinical diagnostic strategies based on these techniques and validate their impact on clinical decision making.


Subject(s)
Cells/metabolism , Proteomics/methods , Proteomics/trends , Analytic Sample Preparation Methods , Computational Biology , Humans , Proteomics/standards , Statistics as Topic
SELECTION OF CITATIONS
SEARCH DETAIL