Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24
Filtrar
Más filtros

Bases de datos
Tipo del documento
Intervalo de año de publicación
1.
Bioinformatics ; 33(10): 1565-1567, 2017 05 15.
Artículo en Inglés | MEDLINE | ID: mdl-28069593

RESUMEN

Summary: Analysis of Next Generation Sequencing (NGS) data requires the processing of large datasets by chaining various tools with complex input and output formats. In order to automate data analysis, we propose to standardize NGS tasks into modular workflows. This simplifies reliable handling and processing of NGS data, and corresponding solutions become substantially more reproducible and easier to maintain. Here, we present a documented, linux-based, toolbox of 42 processing modules that are combined to construct workflows facilitating a variety of tasks such as DNAseq and RNAseq analysis. We also describe important technical extensions. The high throughput executor (HTE) helps to increase the reliability and to reduce manual interventions when processing complex datasets. We also provide a dedicated binary manager that assists users in obtaining the modules' executables and keeping them up to date. As basis for this actively developed toolbox we use the workflow management software KNIME. Availability and Implementation: See http://ibisngs.github.io/knime4ngs for nodes and user manual (GPLv3 license). Contact: robert.kueffner@helmholtz-muenchen.de. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Programas Informáticos , Reproducibilidad de los Resultados , Flujo de Trabajo
2.
Bioinformatics ; 31(17): 2836-43, 2015 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-25910697

RESUMEN

MOTIVATION: Experimentally determined gene regulatory networks can be enriched by computational inference from high-throughput expression profiles. However, the prediction of regulatory interactions is severely impaired by indirect and spurious effects, particularly for eukaryotes. Recently, published methods report improved predictions by exploiting the a priori known targets of a regulator (its local topology) in addition to expression profiles. RESULTS: We find that methods exploiting known targets show an unexpectedly high rate of false discoveries. This leads to inflated performance estimates and the prediction of an excessive number of new interactions for regulators with many known targets. These issues are hidden from common evaluation and cross-validation setups, which is due to Simpson's paradox. We suggest a confidence score recalibration method (CoRe) that reduces the false discovery rate and enables a reliable performance estimation. CONCLUSIONS: CoRe considerably improves the results of network inference methods that exploit known targets. Predictions then display the biological process specificity of regulators more correctly and enable the inference of accurate genome-wide regulatory networks in eukaryotes. For yeast, we propose a network with more than 22 000 confident interactions. We point out that machine learning approaches outside of the area of network inference may be affected as well. AVAILABILITY AND IMPLEMENTATION: Results, executable code and networks are available via our website http://www.bio.ifi.lmu.de/forschung/CoRe. CONTACT: robert.kueffner@helmholtz-muenchen.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Reacciones Falso Positivas , Regulación Fúngica de la Expresión Génica , Redes Reguladoras de Genes , Proteínas de Saccharomyces cerevisiae/genética , Biología de Sistemas/métodos , Perfilación de la Expresión Génica , Aprendizaje Automático , Análisis de Secuencia por Matrices de Oligonucleótidos , Transducción de Señal , Programas Informáticos
3.
Nat Methods ; 9(8): 796-804, 2012 Jul 15.
Artículo en Inglés | MEDLINE | ID: mdl-22796662

RESUMEN

Reconstructing gene regulatory networks from high-throughput data is a long-standing challenge. Through the Dialogue on Reverse Engineering Assessment and Methods (DREAM) project, we performed a comprehensive blind assessment of over 30 network inference methods on Escherichia coli, Staphylococcus aureus, Saccharomyces cerevisiae and in silico microarray data. We characterize the performance, data requirements and inherent biases of different inference approaches, and we provide guidelines for algorithm application and development. We observed that no single inference method performs optimally across all data sets. In contrast, integration of predictions from multiple inference methods shows robust and high performance across diverse data sets. We thereby constructed high-confidence networks for E. coli and S. aureus, each comprising ~1,700 transcriptional interactions at a precision of ~50%. We experimentally tested 53 previously unobserved regulatory interactions in E. coli, of which 23 (43%) were supported. Our results establish community-based methods as a powerful and robust tool for the inference of transcriptional gene regulatory networks.


Asunto(s)
Biología Computacional , Regulación Bacteriana de la Expresión Génica/genética , Redes Reguladoras de Genes , Análisis de Secuencia por Matrices de Oligonucleótidos , Algoritmos , Escherichia coli/genética , Saccharomyces cerevisiae/genética , Programas Informáticos , Staphylococcus aureus/genética , Transcripción Genética/genética
4.
Nucleic Acids Res ; 41(18): 8452-63, 2013 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-23873954

RESUMEN

Existing machine-readable resources for large-scale gene regulatory networks usually do not provide context information characterizing the activating conditions for a regulation and how targeted genes are affected. Although this information is essentially required for data interpretation, available networks are often restricted to not condition-dependent, non-quantitative, plain binary interactions as derived from high-throughput screens. In this article, we present a comprehensive Petri net based regulatory network that controls the diauxic shift in Saccharomyces cerevisiae. For 100 specific enzymatic genes, we collected regulations from public databases as well as identified and manually curated >400 relevant scientific articles. The resulting network consists of >300 multi-input regulatory interactions providing (i) activating conditions for the regulators; (ii) semi-quantitative effects on their targets; and (iii) classification of the experimental evidence. The diauxic shift network compiles widespread distributed regulatory information and is available in an easy-to-use machine-readable form. Additionally, we developed a browsable system organizing the network into pathway maps, which allows to inspect and trace the evidence for each annotated regulation in the model.


Asunto(s)
Regulación Fúngica de la Expresión Génica , Redes Reguladoras de Genes , Saccharomyces cerevisiae/genética , Ciclo del Ácido Cítrico/genética , Ácidos Grasos/metabolismo , Gluconeogénesis/genética , Modelos Genéticos , Fosfoenolpiruvato Carboxiquinasa (ATP)/genética , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/genética
5.
Bioinformatics ; 29(20): 2603-9, 2013 Oct 15.
Artículo en Inglés | MEDLINE | ID: mdl-23956305

RESUMEN

MOTIVATION: The lack of reliable, comprehensive gold standards complicates the development of many bioinformatics tools, particularly for the analysis of expression data and biological networks. Simulation approaches can provide provisional gold standards, such as regulatory networks, for the assessment of network inference methods. However, this just defers the problem, as it is difficult to assess how closely simulators emulate the properties of real data. RESULTS: In analogy to Turing's test discriminating humans and computers based on responses to questions, we systematically compare real and artificial systems based on their gene expression output. Different expression data analysis techniques such as clustering are applied to both types of datasets. We define and extract distributions of properties from the results, for instance, distributions of cluster quality measures or transcription factor activity patterns. Distributions of properties are represented as histograms to enable the comparison of artificial and real datasets. We examine three frequently used simulators that generate expression data from parameterized regulatory networks. We identify features distinguishing real from artificial datasets that suggest how simulators could be adapted to better emulate real datasets and, thus, become more suitable for the evaluation of data analysis tools. AVAILABILITY: See http://www2.bio.ifi.lmu.de/∼kueffner/attfad/ and the supplement for precomputed analyses; other compendia can be analyzed via the CRAN package attfad. The full datasets can be obtained from http://www2.bio.ifi.lmu.de/∼kueffner/attfad/data.tar.gz.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Expresión Génica , Análisis por Conglomerados , Escherichia coli/genética , Humanos , Saccharomyces cerevisiae/genética , Programas Informáticos
6.
Bioinformatics ; 28(11): 1480-6, 2012 Jun 01.
Artículo en Inglés | MEDLINE | ID: mdl-22492315

RESUMEN

MOTIVATION: Several statistical tests are available to detect the enrichment of differential expression in gene sets. Such tests were originally proposed for analyzing gene sets associated with biological processes. The objective evaluation of tests on real measurements has not been possible as it is difficult to decide a priori, which processes will be affected in given experiments. RESULTS: We present a first large study to rigorously assess and compare the performance of gene set enrichment tests on real expression measurements. Gene sets are defined based on the targets of given regulators such as transcription factors (TFs) and microRNAs (miRNAs). In contrast to processes, TFs and miRNAs are amenable to direct perturbations, e.g. regulator over-expression or deletion. We assess the ability of 14 different statistical tests to predict the perturbations from expression measurements in Escherichia coli, Saccharomyces cerevisiae and human. We also analyze how performance depends on the quality and comprehensiveness of the regulator targets via a permutation approach. We find that ANOVA and Wilcoxons test consistently perform better than for instance Kolmogorov-Smirnov and hypergeometric tests. For scenarios where the optimal test is not known, we suggest to combine all evaluated tests into an unweighted consensus, which also performs well in our assessment. Our results provide a guide for the selection of existing tests as well as a basis for the development and assessment of novel tests.


Asunto(s)
Escherichia coli/genética , Perfilación de la Expresión Génica/métodos , Saccharomyces cerevisiae/genética , Redes Reguladoras de Genes , Humanos , MicroARNs/metabolismo , Factores de Transcripción/genética , Factores de Transcripción/metabolismo
7.
Bioinformatics ; 28(10): 1376-82, 2012 May 15.
Artículo en Inglés | MEDLINE | ID: mdl-22467911

RESUMEN

MOTIVATION: To improve the understanding of molecular regulation events, various approaches have been developed for deducing gene regulatory networks from mRNA expression data. RESULTS: We present a new score for network inference, η(2), that is derived from an analysis of variance. Candidate transcription factor:target gene (TF:TG) relationships are assumed more likely if the expression of TF and TG are mutually dependent in at least a subset of the examined experiments. We evaluate this dependency by η(2), a non-parametric, non-linear correlation coefficient. It is fast, easy to apply and does not require the discretization of the input data. In the recent DREAM5 blind assessment, the arguably most comprehensive evaluation of inference methods, our approach based on η(2) was rated the best performer on real expression compendia. It also performs better than methods tested in other recently published comparative assessments. About half of our predicted novel predictions are true interactions as estimated from qPCR experiments performed for DREAM5. CONCLUSIONS: The score η(2) has a number of interesting features that enable the efficient detection of gene regulatory interactions. For most experimental setups, it is an interesting alternative to other measures of dependency such as Pearson's correlation or mutual information.


Asunto(s)
Análisis de Varianza , Redes Reguladoras de Genes , Escherichia coli/genética , Escherichia coli/metabolismo , Perfilación de la Expresión Génica , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Factores de Transcripción/genética , Factores de Transcripción/metabolismo
8.
Bioinformatics ; 27(13): i366-73, 2011 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-21685094

RESUMEN

MOTIVATION: Current gene set enrichment approaches do not take interactions and associations between set members into account. Mutual activation and inhibition causing positive and negative correlation among set members are thus neglected. As a consequence, inconsistent regulations and contextless expression changes are reported and, thus, the biological interpretation of the result is impeded. RESULTS: We analyzed established gene set enrichment methods and their result sets in a large-scale investigation of 1000 expression datasets. The reported statistically significant gene sets exhibit only average consistency between the observed patterns of differential expression and known regulatory interactions. We present Gene Graph Enrichment Analysis (GGEA) to detect consistently and coherently enriched gene sets, based on prior knowledge derived from directed gene regulatory networks. Firstly, GGEA improves the concordance of pairwise regulation with individual expression changes in respective pairs of regulating and regulated genes, compared with set enrichment methods. Secondly, GGEA yields result sets where a large fraction of relevant expression changes can be explained by nearby regulators, such as transcription factors, again improving on set-based methods. Thirdly, we demonstrate in additional case studies that GGEA can be applied to human regulatory pathways, where it sensitively detects very specific regulation processes, which are altered in tumors of the central nervous system. GGEA significantly increases the detection of gene sets where measured positively or negatively correlated expression patterns coincide with directed inducing or repressing relationships, thus facilitating further interpretation of gene expression data. AVAILABILITY: The method and accompanying visualization capabilities have been bundled into an R package and tied to a grahical user interface, the Galaxy workflow environment, that is running as a web server. CONTACT: Ludwig.Geistlinger@bio.ifi.lmu.de; Ralf.Zimmer@bio.ifi.lmu.de.


Asunto(s)
Perfilación de la Expresión Génica , Neoplasias de Tejido Nervioso/genética , Neoplasias de Tejido Nervioso/metabolismo , Programas Informáticos , Algoritmos , Regulación Neoplásica de la Expresión Génica , Redes Reguladoras de Genes , Humanos , Proteínas/genética , Transducción de Señal
9.
BMC Bioinformatics ; 11: 135, 2010 Mar 16.
Artículo en Inglés | MEDLINE | ID: mdl-20233441

RESUMEN

BACKGROUND: MicroRNAs have been discovered as important regulators of gene expression. To identify the target genes of microRNAs, several databases and prediction algorithms have been developed. Only few experimentally confirmed microRNA targets are available in databases. Many of the microRNA targets stored in databases were derived from large-scale experiments that are considered not very reliable. We propose to use text mining of publication abstracts for extracting microRNA-gene associations including microRNA-target relations to complement current repositories. RESULTS: The microRNA-gene association database miRSel combines text-mining results with existing databases and computational predictions. Text mining enables the reliable extraction of microRNA, gene and protein occurrences as well as their relationships from texts. Thereby, we increased the number of human, mouse and rat miRNA-gene associations by at least three-fold as compared to e.g. TarBase, a resource for miRNA-gene associations. CONCLUSIONS: Our database miRSel offers the currently largest collection of literature derived miRNA-gene associations. Comprehensive collections of miRNA-gene associations are important for the development of miRNA target prediction tools and the analysis of regulatory networks. miRSel is updated daily and can be queried using a web-based interface via microRNA identifiers, gene and protein names, PubMed queries as well as gene ontology (GO) terms. miRSel is freely available online at http://services.bio.ifi.lmu.de/mirsel.


Asunto(s)
Minería de Datos/métodos , MicroARNs/genética , Programas Informáticos , Biología Computacional/métodos , Bases de Datos Genéticas , Internet , PubMed , Análisis de Secuencia de ARN
10.
Nucleic Acids Res ; 36(Database issue): D63-8, 2008 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-17933774

RESUMEN

Alternative splicing is known to be one of the major sources for functional diversity in higher eukaryotes. Several splicing isoforms have been characterized in the literature that play important roles in cellular processes like apoptosis or signal transduction pathways. Splicing events can often be detected on the mRNA level by large-scale cDNA or EST experiments and such data is collected and annotated in several databases. Nevertheless, the effects of splicing on the structure of a protein are largely unknown. The ProSAS (Protein Structure and Alternative Splicing) database fills this gap and provides a unified resource for analyzing effects of alternative splicing events in the context of protein structures. ProSAS comprehensively annotates and models protein structures for several Ensembl genomes as well as SwissProt entries harbouring splicing events. Alternative isoforms annotated in Ensembl or SwissProt can be analyzed on the protein structure and protein function level using an intuitive user interface that provides several features and tools for a structure-based analysis of alternative splicing events. The ProSAS database is freely accessible at http://www.bio.ifi.lmu.de/ProSAS.


Asunto(s)
Empalme Alternativo , Bases de Datos de Proteínas , Conformación Proteica , Internet , Isoformas de Proteínas/química , Isoformas de Proteínas/genética , Interfaz Usuario-Computador
11.
J Cell Biochem ; 103(2): 413-33, 2008 Feb 01.
Artículo en Inglés | MEDLINE | ID: mdl-17610236

RESUMEN

The molecular events associated with the age-related gain of fatty tissue in human bone marrow are still largely unknown. Besides enhanced adipogenic differentiation of mesenchymal stem cells (MSCs), transdifferentiation of osteoblast progenitors may contribute to bone-related diseases like osteopenia. Transdifferentiation of MSC-derived osteoblast progenitors into adipocytes and vice versa has previously been proven feasible in our cell culture system. Here, we focus on mRNA species that are regulated during transdifferentiation and represent possible control factors for the initiation of transdifferentiation. Microarray analyses comparing transdifferentiated cells with normally differentiated cells exhibited large numbers of reproducibly regulated genes for both, adipogenic and osteogenic transdifferentiation. To evaluate the relevance of individual genes, we designed a scoring scheme to rank genes according to reproducibility, regulation level, and reciprocity between the different transdifferentiation directions. Thereby, members of several signaling pathways like FGF, IGF, and Wnt signaling showed explicitly differential expression patterns. Additional bioinformatic analysis of microarray analyses allowed us to identify potential key factors associated with transdifferentiation of adipocytes and osteoblasts, respectively. Fibroblast growth factor 1 (FGF1) was scored as one of several lead candidate gene products to modulate the transdifferentiation process and is shown here to exert inhibitory effects on adipogenic commitment and differentiation.


Asunto(s)
Transdiferenciación Celular/genética , Perfilación de la Expresión Génica/métodos , Células Madre Mesenquimatosas/metabolismo , Adipocitos/citología , Adipogénesis/genética , Adulto , Anciano , Envejecimiento/patología , Células de la Médula Ósea/citología , Células Cultivadas/metabolismo , Femenino , Factor 1 de Crecimiento de Fibroblastos/fisiología , Humanos , Péptidos y Proteínas de Señalización Intercelular/biosíntesis , Péptidos y Proteínas de Señalización Intercelular/genética , Péptidos y Proteínas de Señalización Intracelular/genética , Masculino , Persona de Mediana Edad , Análisis de Secuencia por Matrices de Oligonucleótidos , Osteoblastos/citología , Osteogénesis/genética , ARN Mensajero/biosíntesis , Transducción de Señal
12.
Bioinformatics ; 23(3): 365-71, 2007 Feb 01.
Artículo en Inglés | MEDLINE | ID: mdl-17142812

RESUMEN

MOTIVATION: The discovery of regulatory pathways, signal cascades, metabolic processes or disease models requires knowledge on individual relations like e.g. physical or regulatory interactions between genes and proteins. Most interactions mentioned in the free text of biomedical publications are not yet contained in structured databases. RESULTS: We developed RelEx, an approach for relation extraction from free text. It is based on natural language preprocessing producing dependency parse trees and applying a small number of simple rules to these trees. We applied RelEx on a comprehensive set of one million MEDLINE abstracts dealing with gene and protein relations and extracted approximately 150,000 relations with an estimated performance of both 80% precision and 80% recall. AVAILABILITY: The used natural language preprocessing tools are free for use for academic research. Test sets and relation term lists are available from our website (http://www.bio.ifi.lmu.de/publications/RelEx/).


Asunto(s)
Expresión Génica/fisiología , Almacenamiento y Recuperación de la Información/métodos , MEDLINE , Procesamiento de Lenguaje Natural , Mapeo de Interacción de Proteínas/métodos , Proteínas/genética , Proteínas/metabolismo , Algoritmos , Sistemas de Administración de Bases de Datos , Proteínas/clasificación , Programas Informáticos
13.
PLoS One ; 13(8): e0201382, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30080876

RESUMEN

MOTIVATION: Gene regulatory networks (GRN) can be determined via various experimental techniques, and also by computational methods, which infer networks from gene expression data. However, these techniques treat interactions separately such that interdependencies of interactions forming meaningful subnetworks are typically not considered. METHODS: For the investigation of network properties and for the classification of different (sub-)networks based on gene expression data, we consider biological network motifs consisting of three genes and up to three interactions, e.g. the cascade chain (CSC), feed-forward loop (FFL), and dense-overlapping regulon (DOR). We examine several conventional methods for the inference of network motifs, which typically consider each interaction individually. In addition, we propose a new method based on three-way ANOVA (ANalysis Of VAriance) (3WA) that analyzes entire subnetworks at once. To demonstrate the advantages of such a more holistic perspective, we compare the ability of 3WA and other methods to detect and categorize network motifs on large real and artificial datasets. RESULTS: We find that conventional methods perform much better on artificial data (AUC up to 80%), than on real E. coli expression datasets (AUC 50% corresponding to random guessing). To explain this observation, we examine several important properties that differ between datasets and analyze predicted motifs in detail. We find that in case of real networks our new 3WA method outperforms (AUC 70% in E. coli) previous methods by exploiting the interdependencies in the full motif structure. Because of important differences between current artificial datasets and real measurements, the construction and testing of motif detection methods should focus on real data.


Asunto(s)
Bases de Datos Genéticas , Regulación de la Expresión Génica , Redes Reguladoras de Genes , Modelos Genéticos
14.
Bioinformatics ; 22(19): 2356-63, 2006 Oct 01.
Artículo en Inglés | MEDLINE | ID: mdl-16882647

RESUMEN

MOTIVATION: Two important questions for the analysis of gene expression measurements from different sample classes are (1) how to classify samples and (2) how to identify meaningful gene signatures (ranked gene lists) exhibiting the differences between classes and sample subsets. Solutions to both questions have immediate biological and biomedical applications. To achieve optimal classification performance, a suitable combination of classifier and gene selection method needs to be specifically selected for a given dataset. The selected gene signatures can be unstable and the resulting classification accuracy unreliable, particularly when considering different subsets of samples. Both unstable gene signatures and overestimated classification accuracy can impair biological conclusions. METHODS: We address these two issues by repeatedly evaluating the classification performance of all models, i.e. pairwise combinations of various gene selection and classification methods, for random subsets of arrays (sampling). A model score is used to select the most appropriate model for the given dataset. Consensus gene signatures are constructed by extracting those genes frequently selected over many samplings. Sampling additionally permits measurement of the stability of the classification performance for each model, which serves as a measure of model reliability. RESULTS: We analyzed a large gene expression dataset with 78 measurements of four different cartilage sample classes. Classifiers trained on subsets of measurements frequently produce models with highly variable performance. Our approach provides reliable classification performance estimates via sampling. In addition to reliable classification performance, we determined stable consensus signatures (i.e. gene lists) for sample classes. Manual literature screening showed that these genes are highly relevant to our gene expression experiment with osteoarthritic cartilage. We compared our approach to others based on a publicly available dataset on breast cancer. AVAILABILITY: R package at http://www.bio.ifi.lmu.de/~davis/edaprakt


Asunto(s)
Algoritmos , Perfilación de la Expresión Génica/métodos , Expresión Génica , Neoplasias/metabolismo , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Osteoartritis/metabolismo , Proteínas/metabolismo , Biomarcadores/análisis , Cartílago/metabolismo , Simulación por Computador , Humanos , Modelos Biológicos , Modelos Estadísticos , Neoplasias/genética , Osteoartritis/genética , Reproducibilidad de los Resultados , Sensibilidad y Especificidad
15.
Bioinformatics ; 21 Suppl 2: ii259-67, 2005 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-16204115

RESUMEN

MOTIVATION: The interpretation of expression data without appropriate expert knowledge is difficult and usually limited to exploratory data analysis, such as clustering and detecting differentially regulated genes. However, comparing experimental results against manually compiled knowledge resources might limit or bias the perspective on the data. Thus, manual analysis by experts is required to obtain confident predictions about involved processes. RESULTS: We present an algorithm to simultaneously derive interpretations of expression measurements together with biological hypotheses from biomedical publications. It identifies active functional contexts ('concepts'), i.e. gene clusters that exhibit both a significant gene expression as well as a coherent literature profile. Manual intervention by an expert in specifying prior knowledge is not required. The approach scales to realistic applications and does not rely on controlled vocabularies or pathway resources. We validated our algorithm by analyzing a current juvenile arthritis dataset. A number of gene clusters and accompanying literature topics are identified as an interpretation of the data that coincide well with the phenotype and biological processes known to be involved in the disease. We demonstrate that generated clusters are both more sensitive and more specific than Gene Ontology categories detected on the same data. The method allows for in-depth investigation of subsets of genes, the associated literature topics and publications. AVAILABILITY: Supplementary data on clusters is available upon request.


Asunto(s)
Sistemas Especialistas , Perfilación de la Expresión Génica/métodos , Expresión Génica/fisiología , Almacenamiento y Recuperación de la Información/métodos , Procesamiento de Lenguaje Natural , Publicaciones Periódicas como Asunto , Proteoma/metabolismo , Integración de Sistemas
16.
F1000Res ; 4: 1030, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-27134723

RESUMEN

UNLABELLED: DREAM challenges are community competitions designed to advance computational methods and address fundamental questions in system biology and translational medicine. Each challenge asks participants to develop and apply computational methods to either predict unobserved outcomes or to identify unknown model parameters given a set of training data. Computational methods are evaluated using an automated scoring metric, scores are posted to a public leaderboard, and methods are published to facilitate community discussions on how to build improved methods. By engaging participants from a wide range of science and engineering backgrounds, DREAM challenges can comparatively evaluate a wide range of statistical, machine learning, and biophysical methods. Here, we describe DREAMTools, a Python package for evaluating DREAM challenge scoring metrics. DREAMTools provides a command line interface that enables researchers to test new methods on past challenges, as well as a framework for scoring new challenges. As of March 2016, DREAMTools includes more than 80% of completed DREAM challenges. DREAMTools complements the data, metadata, and software tools available at the DREAM website http://dreamchallenges.org and on the Synapse platform at https://www.synapse.org. AVAILABILITY:   DREAMTools is a Python package. Releases and documentation are available at http://pypi.python.org/pypi/dreamtools. The source code is available at http://github.com/dreamtools/dreamtools.

17.
Nat Biotechnol ; 33(1): 51-7, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25362243

RESUMEN

Amyotrophic lateral sclerosis (ALS) is a fatal neurodegenerative disease with substantial heterogeneity in its clinical presentation. This makes diagnosis and effective treatment difficult, so better tools for estimating disease progression are needed. Here, we report results from the DREAM-Phil Bowen ALS Prediction Prize4Life challenge. In this crowdsourcing competition, competitors developed algorithms for the prediction of disease progression of 1,822 ALS patients from standardized, anonymized phase 2/3 clinical trials. The two best algorithms outperformed a method designed by the challenge organizers as well as predictions by ALS clinicians. We estimate that using both winning algorithms in future trial designs could reduce the required number of patients by at least 20%. The DREAM-Phil Bowen ALS Prediction Prize4Life challenge also identified several potential nonstandard predictors of disease progression including uric acid, creatinine and surprisingly, blood pressure, shedding light on ALS pathobiology. This analysis reveals the potential of a crowdsourcing competition that uses clinical trial data for accelerating ALS research and development.


Asunto(s)
Esclerosis Amiotrófica Lateral/patología , Ensayos Clínicos como Asunto , Colaboración de las Masas , Algoritmos , Progresión de la Enfermedad , Humanos
18.
PLoS One ; 9(2): e84596, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-24498260

RESUMEN

Different ensemble voting approaches have been successfully applied for reverse-engineering of gene regulatory networks. They are based on the assumption that a good approximation of true network structure can be derived by considering the frequencies of individual interactions in a large number of predicted networks. Such approximations are typically superior in terms of prediction quality and robustness as compared to considering a single best scoring network only. Nevertheless, ensemble approaches only work well if the predicted gene regulatory networks are sufficiently similar to each other. If the topologies of predicted networks are considerably different, an ensemble of all networks obscures interesting individual characteristics. Instead, networks should be grouped according to local topological similarities and ensemble voting performed for each group separately. We argue that the presence of sets of co-occurring interactions is a suitable indicator for grouping predicted networks. A stepwise bottom-up procedure is proposed, where first mutual dependencies between pairs of interactions are derived from predicted networks. Pairs of co-occurring interactions are subsequently extended to derive characteristic interaction sets that distinguish groups of networks. Finally, ensemble voting is applied separately to the resulting topologically similar groups of networks to create distinct group-ensembles. Ensembles of topologically similar networks constitute distinct hypotheses about the reference network structure. Such group-ensembles are easier to interpret as their characteristic topology becomes clear and dependencies between interactions are known. The availability of distinct hypotheses facilitates the design of further experiments to distinguish between plausible network structures. The proposed procedure is a reasonable refinement step for non-deterministic reverse-engineering applications that produce a large number of candidate predictions for a gene regulatory network, e.g. due to probabilistic optimization or a cross-validation procedure.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Redes Reguladoras de Genes , Modelos Genéticos , Reproducibilidad de los Resultados
19.
Syst Appl Microbiol ; 37(4): 287-95, 2014 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-24736031

RESUMEN

The strict anaerobe Geobacter metallireducens was cultivated in retentostats under acetate and acetate plus benzoate limitation in the presence of Fe(III) citrate in order to investigate its physiology under close to natural conditions. Growth rates below 0.003h(-1) were achieved in the course of cultivation. A nano-liquid chromatography-tandem mass spectrometry-based proteomic approach (nano-LC-MS/MS) with subsequent label-free quantification was performed on proteins extracted from cells sampled at different time points during retentostat cultivation. Proteins detected at low (0.002h(-1)) and high (0.06h(-1)) growth rates were compared between corresponding growth conditions (acetate or acetate plus benzoate). Carbon limitation significantly increased the abundances of several catabolic proteins involved in the degradation of substrates not present in the medium (ethanol, butyrate, fatty acids, and aromatic compounds). Growth rate-specific physiology was reflected in the changed abundances of energy-, chemotaxis-, oxidative stress-, and transport-related proteins. Mimicking natural conditions by extremely slow bacterial growth allowed to show how G. metallireducens optimized its physiology in order to survive in its natural habitats, since it was prepared to consume several carbon sources simultaneously and to withstand various environmental stresses.


Asunto(s)
Medios de Cultivo/química , Geobacter/crecimiento & desarrollo , Geobacter/metabolismo , Acetatos/metabolismo , Adaptación Fisiológica , Proteínas Bacterianas/análisis , Benzoatos/metabolismo , Cromatografía Liquida , Compuestos Férricos/metabolismo , Proteoma/análisis , Estrés Fisiológico , Espectrometría de Masas en Tándem
20.
Syst Appl Microbiol ; 37(4): 277-86, 2014 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-24731775

RESUMEN

For microorganisms that play an important role in bioremediation, the adaptation to swift changes in the availability of various substrates is a key for survival. The iron-reducing bacterium Geobacter metallireducens was hypothesized to repress utilization of less preferred substrates in the presence of high concentrations of easily degradable compounds. In our experiments, acetate and ethanol were preferred over benzoate, but benzoate was co-consumed with toluene and butyrate. To reveal overall physiological changes caused by different single substrates and a mixture of acetate plus benzoate, a nano-liquid chromatography-tandem mass spectrometry-based proteomic approach (nano-LC-MS/MS) was performed using label-free quantification. Significant differential expression during growth on different substrates was observed for 155 out of 1477 proteins. The benzoyl-CoA pathway was found to be subjected to incomplete repression during exponential growth on acetate in the presence of benzoate and on butyrate as a single substrate. Peripheral pathways of toluene, ethanol, and butyrate degradation were highly expressed only during growth on the corresponding substrates. However, low expression of these pathways was detected in all other tested conditions. Therefore, G. metallireducens seems to lack strong carbon catabolite repression under high substrate concentrations, which might be advantageous for survival in habitats rich in fatty acids and aromatic hydrocarbons.


Asunto(s)
Carbono/metabolismo , Medios de Cultivo/química , Geobacter/crecimiento & desarrollo , Geobacter/metabolismo , Acetatos/metabolismo , Adaptación Fisiológica , Proteínas Bacterianas/análisis , Benzoatos/metabolismo , Cromatografía Liquida , Redes y Vías Metabólicas , Proteoma/análisis , Espectrometría de Masas en Tándem
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA