Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 20 de 24
1.
PLoS One ; 13(8): e0201382, 2018.
Article En | MEDLINE | ID: mdl-30080876

MOTIVATION: Gene regulatory networks (GRN) can be determined via various experimental techniques, and also by computational methods, which infer networks from gene expression data. However, these techniques treat interactions separately such that interdependencies of interactions forming meaningful subnetworks are typically not considered. METHODS: For the investigation of network properties and for the classification of different (sub-)networks based on gene expression data, we consider biological network motifs consisting of three genes and up to three interactions, e.g. the cascade chain (CSC), feed-forward loop (FFL), and dense-overlapping regulon (DOR). We examine several conventional methods for the inference of network motifs, which typically consider each interaction individually. In addition, we propose a new method based on three-way ANOVA (ANalysis Of VAriance) (3WA) that analyzes entire subnetworks at once. To demonstrate the advantages of such a more holistic perspective, we compare the ability of 3WA and other methods to detect and categorize network motifs on large real and artificial datasets. RESULTS: We find that conventional methods perform much better on artificial data (AUC up to 80%), than on real E. coli expression datasets (AUC 50% corresponding to random guessing). To explain this observation, we examine several important properties that differ between datasets and analyze predicted motifs in detail. We find that in case of real networks our new 3WA method outperforms (AUC 70% in E. coli) previous methods by exploiting the interdependencies in the full motif structure. Because of important differences between current artificial datasets and real measurements, the construction and testing of motif detection methods should focus on real data.


Databases, Genetic , Gene Expression Regulation , Gene Regulatory Networks , Models, Genetic
2.
Bioinformatics ; 33(10): 1565-1567, 2017 05 15.
Article En | MEDLINE | ID: mdl-28069593

Summary: Analysis of Next Generation Sequencing (NGS) data requires the processing of large datasets by chaining various tools with complex input and output formats. In order to automate data analysis, we propose to standardize NGS tasks into modular workflows. This simplifies reliable handling and processing of NGS data, and corresponding solutions become substantially more reproducible and easier to maintain. Here, we present a documented, linux-based, toolbox of 42 processing modules that are combined to construct workflows facilitating a variety of tasks such as DNAseq and RNAseq analysis. We also describe important technical extensions. The high throughput executor (HTE) helps to increase the reliability and to reduce manual interventions when processing complex datasets. We also provide a dedicated binary manager that assists users in obtaining the modules' executables and keeping them up to date. As basis for this actively developed toolbox we use the workflow management software KNIME. Availability and Implementation: See http://ibisngs.github.io/knime4ngs for nodes and user manual (GPLv3 license). Contact: robert.kueffner@helmholtz-muenchen.de. Supplementary information: Supplementary data are available at Bioinformatics online.


High-Throughput Nucleotide Sequencing/methods , Software , Reproducibility of Results , Workflow
3.
Bioinformatics ; 31(17): 2836-43, 2015 Sep 01.
Article En | MEDLINE | ID: mdl-25910697

MOTIVATION: Experimentally determined gene regulatory networks can be enriched by computational inference from high-throughput expression profiles. However, the prediction of regulatory interactions is severely impaired by indirect and spurious effects, particularly for eukaryotes. Recently, published methods report improved predictions by exploiting the a priori known targets of a regulator (its local topology) in addition to expression profiles. RESULTS: We find that methods exploiting known targets show an unexpectedly high rate of false discoveries. This leads to inflated performance estimates and the prediction of an excessive number of new interactions for regulators with many known targets. These issues are hidden from common evaluation and cross-validation setups, which is due to Simpson's paradox. We suggest a confidence score recalibration method (CoRe) that reduces the false discovery rate and enables a reliable performance estimation. CONCLUSIONS: CoRe considerably improves the results of network inference methods that exploit known targets. Predictions then display the biological process specificity of regulators more correctly and enable the inference of accurate genome-wide regulatory networks in eukaryotes. For yeast, we propose a network with more than 22 000 confident interactions. We point out that machine learning approaches outside of the area of network inference may be affected as well. AVAILABILITY AND IMPLEMENTATION: Results, executable code and networks are available via our website http://www.bio.ifi.lmu.de/forschung/CoRe. CONTACT: robert.kueffner@helmholtz-muenchen.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Algorithms , False Positive Reactions , Gene Expression Regulation, Fungal , Gene Regulatory Networks , Saccharomyces cerevisiae Proteins/genetics , Systems Biology/methods , Gene Expression Profiling , Machine Learning , Oligonucleotide Array Sequence Analysis , Signal Transduction , Software
4.
F1000Res ; 4: 1030, 2015.
Article En | MEDLINE | ID: mdl-27134723

UNLABELLED: DREAM challenges are community competitions designed to advance computational methods and address fundamental questions in system biology and translational medicine. Each challenge asks participants to develop and apply computational methods to either predict unobserved outcomes or to identify unknown model parameters given a set of training data. Computational methods are evaluated using an automated scoring metric, scores are posted to a public leaderboard, and methods are published to facilitate community discussions on how to build improved methods. By engaging participants from a wide range of science and engineering backgrounds, DREAM challenges can comparatively evaluate a wide range of statistical, machine learning, and biophysical methods. Here, we describe DREAMTools, a Python package for evaluating DREAM challenge scoring metrics. DREAMTools provides a command line interface that enables researchers to test new methods on past challenges, as well as a framework for scoring new challenges. As of March 2016, DREAMTools includes more than 80% of completed DREAM challenges. DREAMTools complements the data, metadata, and software tools available at the DREAM website http://dreamchallenges.org and on the Synapse platform at https://www.synapse.org. AVAILABILITY:   DREAMTools is a Python package. Releases and documentation are available at http://pypi.python.org/pypi/dreamtools. The source code is available at http://github.com/dreamtools/dreamtools.

5.
Nat Biotechnol ; 33(1): 51-7, 2015 Jan.
Article En | MEDLINE | ID: mdl-25362243

Amyotrophic lateral sclerosis (ALS) is a fatal neurodegenerative disease with substantial heterogeneity in its clinical presentation. This makes diagnosis and effective treatment difficult, so better tools for estimating disease progression are needed. Here, we report results from the DREAM-Phil Bowen ALS Prediction Prize4Life challenge. In this crowdsourcing competition, competitors developed algorithms for the prediction of disease progression of 1,822 ALS patients from standardized, anonymized phase 2/3 clinical trials. The two best algorithms outperformed a method designed by the challenge organizers as well as predictions by ALS clinicians. We estimate that using both winning algorithms in future trial designs could reduce the required number of patients by at least 20%. The DREAM-Phil Bowen ALS Prediction Prize4Life challenge also identified several potential nonstandard predictors of disease progression including uric acid, creatinine and surprisingly, blood pressure, shedding light on ALS pathobiology. This analysis reveals the potential of a crowdsourcing competition that uses clinical trial data for accelerating ALS research and development.


Amyotrophic Lateral Sclerosis/pathology , Clinical Trials as Topic , Crowdsourcing , Algorithms , Disease Progression , Humans
6.
Syst Appl Microbiol ; 37(4): 287-95, 2014 Jun.
Article En | MEDLINE | ID: mdl-24736031

The strict anaerobe Geobacter metallireducens was cultivated in retentostats under acetate and acetate plus benzoate limitation in the presence of Fe(III) citrate in order to investigate its physiology under close to natural conditions. Growth rates below 0.003h(-1) were achieved in the course of cultivation. A nano-liquid chromatography-tandem mass spectrometry-based proteomic approach (nano-LC-MS/MS) with subsequent label-free quantification was performed on proteins extracted from cells sampled at different time points during retentostat cultivation. Proteins detected at low (0.002h(-1)) and high (0.06h(-1)) growth rates were compared between corresponding growth conditions (acetate or acetate plus benzoate). Carbon limitation significantly increased the abundances of several catabolic proteins involved in the degradation of substrates not present in the medium (ethanol, butyrate, fatty acids, and aromatic compounds). Growth rate-specific physiology was reflected in the changed abundances of energy-, chemotaxis-, oxidative stress-, and transport-related proteins. Mimicking natural conditions by extremely slow bacterial growth allowed to show how G. metallireducens optimized its physiology in order to survive in its natural habitats, since it was prepared to consume several carbon sources simultaneously and to withstand various environmental stresses.


Culture Media/chemistry , Geobacter/growth & development , Geobacter/metabolism , Acetates/metabolism , Adaptation, Physiological , Bacterial Proteins/analysis , Benzoates/metabolism , Chromatography, Liquid , Ferric Compounds/metabolism , Proteome/analysis , Stress, Physiological , Tandem Mass Spectrometry
7.
Syst Appl Microbiol ; 37(4): 277-86, 2014 Jun.
Article En | MEDLINE | ID: mdl-24731775

For microorganisms that play an important role in bioremediation, the adaptation to swift changes in the availability of various substrates is a key for survival. The iron-reducing bacterium Geobacter metallireducens was hypothesized to repress utilization of less preferred substrates in the presence of high concentrations of easily degradable compounds. In our experiments, acetate and ethanol were preferred over benzoate, but benzoate was co-consumed with toluene and butyrate. To reveal overall physiological changes caused by different single substrates and a mixture of acetate plus benzoate, a nano-liquid chromatography-tandem mass spectrometry-based proteomic approach (nano-LC-MS/MS) was performed using label-free quantification. Significant differential expression during growth on different substrates was observed for 155 out of 1477 proteins. The benzoyl-CoA pathway was found to be subjected to incomplete repression during exponential growth on acetate in the presence of benzoate and on butyrate as a single substrate. Peripheral pathways of toluene, ethanol, and butyrate degradation were highly expressed only during growth on the corresponding substrates. However, low expression of these pathways was detected in all other tested conditions. Therefore, G. metallireducens seems to lack strong carbon catabolite repression under high substrate concentrations, which might be advantageous for survival in habitats rich in fatty acids and aromatic hydrocarbons.


Carbon/metabolism , Culture Media/chemistry , Geobacter/growth & development , Geobacter/metabolism , Acetates/metabolism , Adaptation, Physiological , Bacterial Proteins/analysis , Benzoates/metabolism , Chromatography, Liquid , Metabolic Networks and Pathways , Proteome/analysis , Tandem Mass Spectrometry
8.
PLoS One ; 9(2): e84596, 2014.
Article En | MEDLINE | ID: mdl-24498260

Different ensemble voting approaches have been successfully applied for reverse-engineering of gene regulatory networks. They are based on the assumption that a good approximation of true network structure can be derived by considering the frequencies of individual interactions in a large number of predicted networks. Such approximations are typically superior in terms of prediction quality and robustness as compared to considering a single best scoring network only. Nevertheless, ensemble approaches only work well if the predicted gene regulatory networks are sufficiently similar to each other. If the topologies of predicted networks are considerably different, an ensemble of all networks obscures interesting individual characteristics. Instead, networks should be grouped according to local topological similarities and ensemble voting performed for each group separately. We argue that the presence of sets of co-occurring interactions is a suitable indicator for grouping predicted networks. A stepwise bottom-up procedure is proposed, where first mutual dependencies between pairs of interactions are derived from predicted networks. Pairs of co-occurring interactions are subsequently extended to derive characteristic interaction sets that distinguish groups of networks. Finally, ensemble voting is applied separately to the resulting topologically similar groups of networks to create distinct group-ensembles. Ensembles of topologically similar networks constitute distinct hypotheses about the reference network structure. Such group-ensembles are easier to interpret as their characteristic topology becomes clear and dependencies between interactions are known. The availability of distinct hypotheses facilitates the design of further experiments to distinguish between plausible network structures. The proposed procedure is a reasonable refinement step for non-deterministic reverse-engineering applications that produce a large number of candidate predictions for a gene regulatory network, e.g. due to probabilistic optimization or a cross-validation procedure.


Algorithms , Computational Biology/methods , Gene Regulatory Networks , Models, Genetic , Reproducibility of Results
9.
Front Genet ; 4: 262, 2013 Dec 03.
Article En | MEDLINE | ID: mdl-24348517

Networks provide a natural representation of molecular biology knowledge, in particular to model relationships between biological entities such as genes, proteins, drugs, or diseases. Because of the effort, the cost, or the lack of the experiments necessary for the elucidation of these networks, computational approaches for network inference have been frequently investigated in the literature. In this paper, we examine the assessment of supervised network inference. Supervised inference is based on machine learning techniques that infer the network from a training sample of known interacting and possibly non-interacting entities and additional measurement data. While these methods are very effective, their reliable validation in silico poses a challenge, since both prediction and validation need to be performed on the basis of the same partially known network. Cross-validation techniques need to be specifically adapted to classification problems on pairs of objects. We perform a critical review and assessment of protocols and measures proposed in the literature and derive specific guidelines how to best exploit and evaluate machine learning techniques for network inference. Through theoretical considerations and in silico experiments, we analyze in depth how important factors influence the outcome of performance estimation. These factors include the amount of information available for the interacting entities, the sparsity and topology of biological networks, and the lack of experimentally verified non-interacting pairs.

10.
Bioinformatics ; 29(20): 2603-9, 2013 Oct 15.
Article En | MEDLINE | ID: mdl-23956305

MOTIVATION: The lack of reliable, comprehensive gold standards complicates the development of many bioinformatics tools, particularly for the analysis of expression data and biological networks. Simulation approaches can provide provisional gold standards, such as regulatory networks, for the assessment of network inference methods. However, this just defers the problem, as it is difficult to assess how closely simulators emulate the properties of real data. RESULTS: In analogy to Turing's test discriminating humans and computers based on responses to questions, we systematically compare real and artificial systems based on their gene expression output. Different expression data analysis techniques such as clustering are applied to both types of datasets. We define and extract distributions of properties from the results, for instance, distributions of cluster quality measures or transcription factor activity patterns. Distributions of properties are represented as histograms to enable the comparison of artificial and real datasets. We examine three frequently used simulators that generate expression data from parameterized regulatory networks. We identify features distinguishing real from artificial datasets that suggest how simulators could be adapted to better emulate real datasets and, thus, become more suitable for the evaluation of data analysis tools. AVAILABILITY: See http://www2.bio.ifi.lmu.de/∼kueffner/attfad/ and the supplement for precomputed analyses; other compendia can be analyzed via the CRAN package attfad. The full datasets can be obtained from http://www2.bio.ifi.lmu.de/∼kueffner/attfad/data.tar.gz.


Gene Expression Profiling/methods , Gene Expression , Cluster Analysis , Escherichia coli/genetics , Humans , Saccharomyces cerevisiae/genetics , Software
11.
Nucleic Acids Res ; 41(18): 8452-63, 2013 Oct.
Article En | MEDLINE | ID: mdl-23873954

Existing machine-readable resources for large-scale gene regulatory networks usually do not provide context information characterizing the activating conditions for a regulation and how targeted genes are affected. Although this information is essentially required for data interpretation, available networks are often restricted to not condition-dependent, non-quantitative, plain binary interactions as derived from high-throughput screens. In this article, we present a comprehensive Petri net based regulatory network that controls the diauxic shift in Saccharomyces cerevisiae. For 100 specific enzymatic genes, we collected regulations from public databases as well as identified and manually curated >400 relevant scientific articles. The resulting network consists of >300 multi-input regulatory interactions providing (i) activating conditions for the regulators; (ii) semi-quantitative effects on their targets; and (iii) classification of the experimental evidence. The diauxic shift network compiles widespread distributed regulatory information and is available in an easy-to-use machine-readable form. Additionally, we developed a browsable system organizing the network into pathway maps, which allows to inspect and trace the evidence for each annotated regulation in the model.


Gene Expression Regulation, Fungal , Gene Regulatory Networks , Saccharomyces cerevisiae/genetics , Citric Acid Cycle/genetics , Fatty Acids/metabolism , Gluconeogenesis/genetics , Models, Genetic , Phosphoenolpyruvate Carboxykinase (ATP)/genetics , Saccharomyces cerevisiae/metabolism , Saccharomyces cerevisiae Proteins/genetics
12.
Nat Methods ; 9(8): 796-804, 2012 Jul 15.
Article En | MEDLINE | ID: mdl-22796662

Reconstructing gene regulatory networks from high-throughput data is a long-standing challenge. Through the Dialogue on Reverse Engineering Assessment and Methods (DREAM) project, we performed a comprehensive blind assessment of over 30 network inference methods on Escherichia coli, Staphylococcus aureus, Saccharomyces cerevisiae and in silico microarray data. We characterize the performance, data requirements and inherent biases of different inference approaches, and we provide guidelines for algorithm application and development. We observed that no single inference method performs optimally across all data sets. In contrast, integration of predictions from multiple inference methods shows robust and high performance across diverse data sets. We thereby constructed high-confidence networks for E. coli and S. aureus, each comprising ~1,700 transcriptional interactions at a precision of ~50%. We experimentally tested 53 previously unobserved regulatory interactions in E. coli, of which 23 (43%) were supported. Our results establish community-based methods as a powerful and robust tool for the inference of transcriptional gene regulatory networks.


Computational Biology , Gene Expression Regulation, Bacterial/genetics , Gene Regulatory Networks , Oligonucleotide Array Sequence Analysis , Algorithms , Escherichia coli/genetics , Saccharomyces cerevisiae/genetics , Software , Staphylococcus aureus/genetics , Transcription, Genetic/genetics
13.
Bioinformatics ; 28(10): 1376-82, 2012 May 15.
Article En | MEDLINE | ID: mdl-22467911

MOTIVATION: To improve the understanding of molecular regulation events, various approaches have been developed for deducing gene regulatory networks from mRNA expression data. RESULTS: We present a new score for network inference, η(2), that is derived from an analysis of variance. Candidate transcription factor:target gene (TF:TG) relationships are assumed more likely if the expression of TF and TG are mutually dependent in at least a subset of the examined experiments. We evaluate this dependency by η(2), a non-parametric, non-linear correlation coefficient. It is fast, easy to apply and does not require the discretization of the input data. In the recent DREAM5 blind assessment, the arguably most comprehensive evaluation of inference methods, our approach based on η(2) was rated the best performer on real expression compendia. It also performs better than methods tested in other recently published comparative assessments. About half of our predicted novel predictions are true interactions as estimated from qPCR experiments performed for DREAM5. CONCLUSIONS: The score η(2) has a number of interesting features that enable the efficient detection of gene regulatory interactions. For most experimental setups, it is an interesting alternative to other measures of dependency such as Pearson's correlation or mutual information.


Analysis of Variance , Gene Regulatory Networks , Escherichia coli/genetics , Escherichia coli/metabolism , Gene Expression Profiling , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae/metabolism , Transcription Factors/genetics , Transcription Factors/metabolism
14.
Bioinformatics ; 28(11): 1480-6, 2012 Jun 01.
Article En | MEDLINE | ID: mdl-22492315

MOTIVATION: Several statistical tests are available to detect the enrichment of differential expression in gene sets. Such tests were originally proposed for analyzing gene sets associated with biological processes. The objective evaluation of tests on real measurements has not been possible as it is difficult to decide a priori, which processes will be affected in given experiments. RESULTS: We present a first large study to rigorously assess and compare the performance of gene set enrichment tests on real expression measurements. Gene sets are defined based on the targets of given regulators such as transcription factors (TFs) and microRNAs (miRNAs). In contrast to processes, TFs and miRNAs are amenable to direct perturbations, e.g. regulator over-expression or deletion. We assess the ability of 14 different statistical tests to predict the perturbations from expression measurements in Escherichia coli, Saccharomyces cerevisiae and human. We also analyze how performance depends on the quality and comprehensiveness of the regulator targets via a permutation approach. We find that ANOVA and Wilcoxons test consistently perform better than for instance Kolmogorov-Smirnov and hypergeometric tests. For scenarios where the optimal test is not known, we suggest to combine all evaluated tests into an unweighted consensus, which also performs well in our assessment. Our results provide a guide for the selection of existing tests as well as a basis for the development and assessment of novel tests.


Escherichia coli/genetics , Gene Expression Profiling/methods , Saccharomyces cerevisiae/genetics , Gene Regulatory Networks , Humans , MicroRNAs/metabolism , Transcription Factors/genetics , Transcription Factors/metabolism
15.
PLoS One ; 6(8): e22519, 2011.
Article En | MEDLINE | ID: mdl-21857930

BACKGROUND: Several expression datasets of miRNA transfection experiments are available to analyze the regulatory mechanisms downstream of miRNA effects. The miRNA induced regulatory effects can be propagated via transcription factors (TFs). We propose the method MIRTFnet to identify miRNA controlled TFs as active regulators if their downstream target genes are differentially expressed. METHODOLOGY/PRINCIPAL FINDINGS: MIRTFnet enables the determination of active transcription factors (TFs) and is sensitive enough to exploit the small expression changes induced by the activity of miRNAs. For this purpose, different statistical tests were evaluated and compared. Based on the identified TFs, databases, computational predictions and the literature we construct regulatory models downstream of miRNA actions. Transfecting miRNAs are connected to active regulators via a network of miRNA-TF, miRNA-kinase-TF as well as TF-TF relationships. Based on 43 transfection experiments involving 17 cancer relevant miRNAs we show that MIRTFnet detects active regulators reliably. CONCLUSIONS/SIGNIFICANCE: The consensus of the individual regulatory models shows that the examined miRNAs induce activity changes in a common core of transcription factors involved in cancer related processes such as proliferation or apoptosis.


Databases, Genetic , Gene Expression Regulation , MicroRNAs/genetics , Transcription Factors/genetics , Gene Expression Profiling , Humans , Models, Genetic
16.
Bioinformatics ; 27(13): i366-73, 2011 Jul 01.
Article En | MEDLINE | ID: mdl-21685094

MOTIVATION: Current gene set enrichment approaches do not take interactions and associations between set members into account. Mutual activation and inhibition causing positive and negative correlation among set members are thus neglected. As a consequence, inconsistent regulations and contextless expression changes are reported and, thus, the biological interpretation of the result is impeded. RESULTS: We analyzed established gene set enrichment methods and their result sets in a large-scale investigation of 1000 expression datasets. The reported statistically significant gene sets exhibit only average consistency between the observed patterns of differential expression and known regulatory interactions. We present Gene Graph Enrichment Analysis (GGEA) to detect consistently and coherently enriched gene sets, based on prior knowledge derived from directed gene regulatory networks. Firstly, GGEA improves the concordance of pairwise regulation with individual expression changes in respective pairs of regulating and regulated genes, compared with set enrichment methods. Secondly, GGEA yields result sets where a large fraction of relevant expression changes can be explained by nearby regulators, such as transcription factors, again improving on set-based methods. Thirdly, we demonstrate in additional case studies that GGEA can be applied to human regulatory pathways, where it sensitively detects very specific regulation processes, which are altered in tumors of the central nervous system. GGEA significantly increases the detection of gene sets where measured positively or negatively correlated expression patterns coincide with directed inducing or repressing relationships, thus facilitating further interpretation of gene expression data. AVAILABILITY: The method and accompanying visualization capabilities have been bundled into an R package and tied to a grahical user interface, the Galaxy workflow environment, that is running as a web server. CONTACT: Ludwig.Geistlinger@bio.ifi.lmu.de; Ralf.Zimmer@bio.ifi.lmu.de.


Gene Expression Profiling , Neoplasms, Nerve Tissue/genetics , Neoplasms, Nerve Tissue/metabolism , Software , Algorithms , Gene Expression Regulation, Neoplastic , Gene Regulatory Networks , Humans , Proteins/genetics , Signal Transduction
17.
PLoS One ; 5(9)2010 Sep 20.
Article En | MEDLINE | ID: mdl-20862218

BACKGROUND: The recent DREAM4 blind assessment provided a particularly realistic and challenging setting for network reverse engineering methods. The in silico part of DREAM4 solicited the inference of cycle-rich gene regulatory networks from heterogeneous, noisy expression data including time courses as well as knockout, knockdown and multifactorial perturbations. METHODOLOGY AND PRINCIPAL FINDINGS: We inferred and parametrized simulation models based on Petri Nets with Fuzzy Logic (PNFL). This completely automated approach correctly reconstructed networks with cycles as well as oscillating network motifs. PNFL was evaluated as the best performer on DREAM4 in silico networks of size 10 with an area under the precision-recall curve (AUPR) of 81%. Besides topology, we inferred a range of additional mechanistic details with good reliability, e.g. distinguishing activation from inhibition as well as dependent from independent regulation. Our models also performed well on new experimental conditions such as double knockout mutations that were not included in the provided datasets. CONCLUSIONS: The inference of biological networks substantially benefits from methods that are expressive enough to deal with diverse datasets in a unified way. At the same time, overly complex approaches could generate multiple different models that explain the data equally well. PNFL appears to strike the balance between expressive power and complexity. This also applies to the intuitive representation of PNFL models combining a straightforward graphical notation with colloquial fuzzy parameters.


Fuzzy Logic , Gene Regulatory Networks , Genetic Engineering , Animals , Computational Biology/methods , Female , Gene Expression Profiling , Humans , Male , Mice , Mice, Inbred C57BL , Models, Genetic
18.
BMC Bioinformatics ; 11: 135, 2010 Mar 16.
Article En | MEDLINE | ID: mdl-20233441

BACKGROUND: MicroRNAs have been discovered as important regulators of gene expression. To identify the target genes of microRNAs, several databases and prediction algorithms have been developed. Only few experimentally confirmed microRNA targets are available in databases. Many of the microRNA targets stored in databases were derived from large-scale experiments that are considered not very reliable. We propose to use text mining of publication abstracts for extracting microRNA-gene associations including microRNA-target relations to complement current repositories. RESULTS: The microRNA-gene association database miRSel combines text-mining results with existing databases and computational predictions. Text mining enables the reliable extraction of microRNA, gene and protein occurrences as well as their relationships from texts. Thereby, we increased the number of human, mouse and rat miRNA-gene associations by at least three-fold as compared to e.g. TarBase, a resource for miRNA-gene associations. CONCLUSIONS: Our database miRSel offers the currently largest collection of literature derived miRNA-gene associations. Comprehensive collections of miRNA-gene associations are important for the development of miRNA target prediction tools and the analysis of regulatory networks. miRSel is updated daily and can be queried using a web-based interface via microRNA identifiers, gene and protein names, PubMed queries as well as gene ontology (GO) terms. miRSel is freely available online at http://services.bio.ifi.lmu.de/mirsel.


Data Mining/methods , MicroRNAs/genetics , Software , Computational Biology/methods , Databases, Genetic , Internet , PubMed , Sequence Analysis, RNA
19.
Nucleic Acids Res ; 36(Database issue): D63-8, 2008 Jan.
Article En | MEDLINE | ID: mdl-17933774

Alternative splicing is known to be one of the major sources for functional diversity in higher eukaryotes. Several splicing isoforms have been characterized in the literature that play important roles in cellular processes like apoptosis or signal transduction pathways. Splicing events can often be detected on the mRNA level by large-scale cDNA or EST experiments and such data is collected and annotated in several databases. Nevertheless, the effects of splicing on the structure of a protein are largely unknown. The ProSAS (Protein Structure and Alternative Splicing) database fills this gap and provides a unified resource for analyzing effects of alternative splicing events in the context of protein structures. ProSAS comprehensively annotates and models protein structures for several Ensembl genomes as well as SwissProt entries harbouring splicing events. Alternative isoforms annotated in Ensembl or SwissProt can be analyzed on the protein structure and protein function level using an intuitive user interface that provides several features and tools for a structure-based analysis of alternative splicing events. The ProSAS database is freely accessible at http://www.bio.ifi.lmu.de/ProSAS.


Alternative Splicing , Databases, Protein , Protein Conformation , Internet , Protein Isoforms/chemistry , Protein Isoforms/genetics , User-Computer Interface
20.
Bioinform Biol Insights ; 2: 291-305, 2008 May 28.
Article En | MEDLINE | ID: mdl-19812783

INTRODUCTION: Numerous methods exist for basic processing, e.g. normalization, of microarray gene expression data. These methods have an important effect on the final analysis outcome. Therefore, it is crucial to select methods appropriate for a given dataset in order to assure the validity and reliability of expression data analysis. Furthermore, biological interpretation requires expression values for genes, which are often represented by several spots or probe sets on a microarray. How to best integrate spot/probe set values into gene values has so far been a somewhat neglected problem. RESULTS: We present a case study comparing different between-array normalization methods with respect to the identification of differentially expressed genes. Our results show that it is feasible and necessary to use prior knowledge on gene expression measurements to select an adequate normalization method for the given data. Furthermore, we provide evidence that combining spot/probe set p-values into gene p-values for detecting differentially expressed genes has advantages compared to combining expression values for spots/probe sets into gene expression values. The comparison of different methods suggests to use Stouffer's method for this purpose. The study has been conducted on gene expression experiments investigating human joint cartilage samples of osteoarthritis related groups: a cDNA microarray (83 samples, four groups) and an Affymetrix (26 samples, two groups) data set. CONCLUSION: The apparently straight forward steps of gene expression data analysis, e.g. between-array normalization and detection of differentially regulated genes, can be accomplished by numerous different methods. We analyzed multiple methods and the possible effects and thereby demonstrate the importance of the single decisions taken during data processing. We give guidelines for evaluating normalization outcomes. An overview of these effects via appropriate measures and plots compared to prior knowledge is essential for the biological interpretation of gene expression measurements.

...