RESUMEN
BACKGROUND: Gene expression analyses based on complex hybridization measurements have increased rapidly in recent years and have given rise to a huge amount of bioinformatic tools such as image analyses and cluster analyses. However, the amount of work done to integrate and evaluate these tools and the corresponding experimental procedures is not high. Although complex hybridization experiments are based on a data production pipeline that incorporates a significant amount of error parameters, the evaluation of these parameters has not been studied yet in sufficient detail. RESULTS: In this paper we present simulation studies on several error parameters arising in complex hybridization experiments. A general tool was developed that allows the design of exactly defined hybridization data incorporating, for example, variations of spot shapes, spot positions and local and global background noise. The simulation environment was used to judge the influence of these parameters on subsequent data analysis, for example image analysis and the detection of differentially expressed genes. As a guide for simulating expression data real experimental data were used and model parameters were adapted to these data. Our results show how measurement error can be balanced by the analysis tools. CONCLUSIONS: We describe an implemented model for the simulation of DNA-array experiments. This tool was used to judge the influence of critical parameters on the subsequent image analysis and differential expression analysis. Furthermore the tool can be used to guide future experiments and to improve performance by better experimental design. Series of simulated images varying specific parameters can be downloaded from our web-site: http://www.molgen.mpg.de/~lh_bioinf/projects/simulation/biotech/
Asunto(s)
Simulación por Computador , Modelos Genéticos , Hibridación de Ácido Nucleico/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos , Algoritmos , Arabidopsis/embriología , Arabidopsis/genética , Biología Computacional/métodos , Biología Computacional/estadística & datos numéricos , ADN Complementario/análisis , ADN de Plantas/análisis , Perfilación de la Expresión Génica/normas , Perfilación de la Expresión Génica/estadística & datos numéricos , Genoma de Planta , Análisis de Secuencia por Matrices de Oligonucleótidos/normas , Análisis de Secuencia por Matrices de Oligonucleótidos/estadística & datos numéricos , Proyectos de Investigación/normas , Proyectos de Investigación/estadística & datos numéricosRESUMEN
MOTIVATION: We have established a novel data mining procedure for the identification of genes associated with pre-defined phenotypes and/or molecular pathways. Based on the observation that these genes are frequently expressed in the same place or in close proximity at about the same time, we have devised an approach termed Common Denominator Procedure. One unusual feature of this approach is that the specificity and probability to identify genes linked to the desired phenotype/pathway increase with greater diversity of the input data. RESULT: To show the feasibility of our approach, the Cancer Genome Anatomy Project expression data combined with a defined set of angiogenic factors was used to identify additional and novel angiogenesis-associated genes. A multitude of these additional genes were known to be associated with angiogenesis according to published data, verifying our approach. For some of the remaining candidate genes, application of a high-throughput functional genomics platform (XantoScreen) provided further experimental evidence for association with angiogenesis.
Asunto(s)
Proteínas Angiogénicas/metabolismo , Bases de Datos de Proteínas , Perfilación de la Expresión Génica/métodos , Almacenamiento y Recuperación de la Información/métodos , Neoplasias/irrigación sanguínea , Neoplasias/metabolismo , Neovascularización Patológica , Sistemas de Administración de Bases de Datos , Regulación Neoplásica de la Expresión Génica , Biblioteca de Genes , Humanos , Proteínas de Neoplasias/metabolismo , Neoplasias/genética , Fenotipo , Transducción de SeñalRESUMEN
About five years ago, ontology was almost unknown in bioinformatics, even more so in molecular biology. Nowadays, many bioinformatics articles mention it in connection with text mining, data integration or as a metaphysical cure for problems in standardisation of nomenclature and other applications. This article attempts to give an account of what concept ontologies in the domain of biology and bioinformatics are; what they are not; how they can be constructed; how they can be used; and some fallacies and pitfalls creators and users should be aware of.
Asunto(s)
Biología Computacional , Biología Molecular , Teoría de la InformaciónRESUMEN
A system for "intelligent" semantic integration and querying of federated databases is being implemented by using three main components: A component which enables SQL access to integrated databases by database federation (MARGBench), an ontology based semantic metadatabase (SEMEDA) and an ontology based query interface (SEMEDA-query). In this publication we explain and demonstrate the principles, architecture and the use of SEMEDA. Since SEMEDA is implemented as 3 tiered web application database providers can enter all relevant semantic and technical information about their databases by themselves via a web browser. SEMEDA' s collaborative ontology editing feature is not restricted to database integration, and might also be useful for ongoing ontology developments, such as the "Gene Ontology" [2]. SEMEDA can be found at http://www-bm.cs.uni-magdeburg.de/semeda/. We explain how this ontologically structured information can be used for semantic database integration. In addition, requirements to ontologies for molecular biological database integration are discussed and relevant existing ontologies are evaluated. We further discuss how ontologies and structured knowledge sources can be used in SEMEDA and whether they can be merged supplemented or updated to meet the requirements for semantic database integration.
Asunto(s)
Bases de Datos Genéticas , Biología Molecular , Integración de SistemasRESUMEN
The rapidly increasing wealth of genomic data has driven the development of tools to assist in the task of representing and processing information about genes, their products and their functions. One of the most important of these tools is the Gene Ontology (GO), which is being developed in tandem with work on a variety of bioinformatics databases. An examination of the structure of GO, however, reveals a number of problems, which we believe can be resolved by taking account of certain organizing principles drawn from philosophical ontology. We shall explore the results of applying such principles to GO with a view to improving GO's consistency and coherence and thus its future applicability in the automated processing of biological data.