Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 20 de 27
1.
FEBS Lett ; 598(6): 635-657, 2024 Mar.
Article En | MEDLINE | ID: mdl-38366111

The response to proteotoxic stresses such as heat shock allows organisms to maintain protein homeostasis under changing environmental conditions. We asked what happens if an organism can no longer react to cytosolic proteotoxic stress. To test this, we deleted or depleted, either individually or in combination, the stress-responsive transcription factors Msn2, Msn4, and Hsf1 in Saccharomyces cerevisiae. Our study reveals a combination of survival strategies, which together protect essential proteins. Msn2 and 4 broadly reprogram transcription, triggering the response to oxidative stress, as well as biosynthesis of the protective sugar trehalose and glycolytic enzymes, while Hsf1 mainly induces the synthesis of molecular chaperones and reverses the transcriptional response upon prolonged mild heat stress (adaptation).


Saccharomyces cerevisiae Proteins , Transcription Factors , DNA-Binding Proteins/genetics , DNA-Binding Proteins/metabolism , Heat Shock Transcription Factors/genetics , Heat Shock Transcription Factors/metabolism , Heat-Shock Proteins/genetics , Heat-Shock Proteins/metabolism , Heat-Shock Response/genetics , Proteotoxic Stress , Saccharomyces cerevisiae/metabolism , Saccharomyces cerevisiae Proteins/metabolism , Transcription Factors/metabolism
2.
Blood ; 141(6): 645-658, 2023 02 09.
Article En | MEDLINE | ID: mdl-36223592

The mechanisms of coordinated changes in proteome composition and their relevance for the differentiation of neutrophil granulocytes are not well studied. Here, we discover 2 novel human genetic defects in signal recognition particle receptor alpha (SRPRA) and SRP19, constituents of the mammalian cotranslational targeting machinery, and characterize their roles in neutrophil granulocyte differentiation. We systematically study the proteome of neutrophil granulocytes from patients with variants in the SRP genes, HAX1, and ELANE, and identify global as well as specific proteome aberrations. Using in vitro differentiation of human induced pluripotent stem cells and in vivo zebrafish models, we study the effects of SRP deficiency on neutrophil granulocyte development. In a heterologous cell-based inducible protein expression system, we validate the effects conferred by SRP dysfunction for selected proteins that we identified in our proteome screen. Thus, SRP-dependent protein processing, intracellular trafficking, and homeostasis are critically important for the differentiation of neutrophil granulocytes.


Induced Pluripotent Stem Cells , Proteome , Animals , Humans , Zebrafish , Human Genetics , Mammals , Adaptor Proteins, Signal Transducing
3.
Brief Bioinform ; 22(1): 545-556, 2021 01 18.
Article En | MEDLINE | ID: mdl-32026945

MOTIVATION: Although gene set enrichment analysis has become an integral part of high-throughput gene expression data analysis, the assessment of enrichment methods remains rudimentary and ad hoc. In the absence of suitable gold standards, evaluations are commonly restricted to selected datasets and biological reasoning on the relevance of resulting enriched gene sets. RESULTS: We develop an extensible framework for reproducible benchmarking of enrichment methods based on defined criteria for applicability, gene set prioritization and detection of relevant processes. This framework incorporates a curated compendium of 75 expression datasets investigating 42 human diseases. The compendium features microarray and RNA-seq measurements, and each dataset is associated with a precompiled GO/KEGG relevance ranking for the corresponding disease under investigation. We perform a comprehensive assessment of 10 major enrichment methods, identifying significant differences in runtime and applicability to RNA-seq data, fraction of enriched gene sets depending on the null hypothesis tested and recovery of the predefined relevance rankings. We make practical recommendations on how methods originally developed for microarray data can efficiently be applied to RNA-seq data, how to interpret results depending on the type of gene set test conducted and which methods are best suited to effectively prioritize gene sets with high phenotype relevance. AVAILABILITY: http://bioconductor.org/packages/GSEABenchmarkeR. CONTACT: ludwig.geistlinger@sph.cuny.edu.


Gene Expression Profiling/methods , Genomics/methods , RNA-Seq/methods , Animals , Benchmarking , Databases, Genetic/standards , Gene Expression Profiling/standards , Genomics/standards , Humans , RNA-Seq/standards , Software
4.
Cell Rep ; 29(13): 4593-4607.e8, 2019 12 24.
Article En | MEDLINE | ID: mdl-31875563

Life is resilient because living systems are able to respond to elevated temperatures with an ancient gene expression program called the heat shock response (HSR). In yeast, the transcription of hundreds of genes is upregulated at stress temperatures. Besides stress protection conferred by chaperones, the function of the majority of the upregulated genes under stress has remained enigmatic. We show that those genes are required to directly counterbalance increased protein turnover at stress temperatures and to maintain the metabolism. This anaplerotic reaction together with molecular chaperones allows yeast to efficiently buffer proteotoxic stress. When the capacity of this system is exhausted at extreme temperatures, aggregation processes stop translation and growth pauses. The emerging concept is that the HSR is modular with distinct programs dependent on the severity of the stress.


Heat-Shock Response , Molecular Chaperones/metabolism , Proteostasis , Saccharomyces cerevisiae Proteins/metabolism , Saccharomyces cerevisiae/metabolism , Gene Expression Regulation, Fungal , Heat-Shock Response/genetics , Kinetics , Models, Genetic , Protein Aggregates , Protein Biosynthesis , Proteolysis , Proteome/metabolism , RNA, Messenger/genetics , RNA, Messenger/metabolism , Ribosomes/metabolism , Saccharomyces cerevisiae/genetics , Transcriptome/genetics
5.
Biotechnol Biofuels ; 12: 243, 2019.
Article En | MEDLINE | ID: mdl-31636702

BACKGROUND: One of the main obstacles preventing solventogenic clostridia from achieving higher yields in biofuel production is the toxicity of produced solvents. Unfortunately, regulatory mechanisms responsible for the shock response are poorly described on the transcriptomic level. Although the strain Clostridium beijerinckii NRRL B-598, a promising butanol producer, has been studied under different conditions in the past, its transcriptional response to a shock caused by butanol in the cultivation medium remains unknown. RESULTS: In this paper, we present a transcriptional response of the strain during a butanol challenge, caused by the addition of butanol to the cultivation medium at the very end of the acidogenic phase, using RNA-Seq. We resequenced and reassembled the genome sequence of the strain and prepared novel genome and gene ontology annotation to provide the most accurate results. When compared to samples under standard cultivation conditions, samples gathered during butanol shock represented a well-distinguished group. Using reference samples gathered directly before the addition of butanol, we identified genes that were differentially expressed in butanol challenge samples. We determined clusters of 293 down-regulated and 301 up-regulated genes whose expression was affected by the cultivation conditions. Enriched term "RNA binding" among down-regulated genes corresponded to the downturn of translation and the cluster contained a group of small acid-soluble spore proteins. This explained phenotype of the culture that had not sporulated. On the other hand, up-regulated genes were characterized by the term "protein binding" which corresponded to activation of heat-shock proteins that were identified within this cluster. CONCLUSIONS: We provided an overall transcriptional response of the strain C. beijerinckii NRRL B-598 to butanol shock, supplemented by auxiliary technologies, including high-pressure liquid chromatography and flow cytometry, to capture the corresponding phenotypic response. We identified genes whose regulation was affected by the addition of butanol to the cultivation medium and inferred related molecular functions that were significantly influenced. Additionally, using high-quality genome assembly and custom-made gene ontology annotation, we demonstrated that this settled terminology, widely used for the analysis of model organisms, could also be applied to non-model organisms and for research in the field of biofuels.

6.
Mol Cell Proteomics ; 18(9): 1880-1892, 2019 09.
Article En | MEDLINE | ID: mdl-31235637

Mass spectrometry based proteomics is the method of choice for quantifying genome-wide differential changes of protein expression in a wide range of biological and biomedical applications. Protein expression changes need to be reliably derived from many measured peptide intensities and their corresponding peptide fold changes. These peptide fold changes vary considerably for a given protein. Numerous instrumental setups aim to reduce this variability, whereas current computational methods only implicitly account for this problem. We introduce a new method, MS-EmpiRe, which explicitly accounts for the noise underlying peptide fold changes. We derive data set-specific, intensity-dependent empirical error fold change distributions, which are used for individual weighing of peptide fold changes to detect differentially expressed proteins (DEPs).In a recently published proteome-wide benchmarking data set, MS-EmpiRe doubles the number of correctly identified DEPs at an estimated FDR cutoff compared with state-of-the-art tools. We additionally confirm the superior performance of MS-EmpiRe on simulated data. MS-EmpiRe requires only peptide intensities mapped to proteins and, thus, can be applied to any common quantitative proteomics setup. We apply our method to diverse MS data sets and observe consistent increases in sensitivity with more than 1000 additional significant proteins in deep data sets, including a clinical study over multiple patients. At the same time, we observe that even the proteins classified as most insignificant by other methods but significant by MS-EmpiRe show very clear regulation on the peptide intensity level. MS-EmpiRe provides rapid processing (< 2 min for 6 LC-MS/MS runs (3 h gradients)) and is publicly available under github.com/zimmerlab/MS-EmpiRe with a manual including examples.


Mass Spectrometry/methods , Peptides/analysis , Proteome/analysis , Proteomics/methods , Software , Alzheimer Disease/metabolism , Benchmarking , Databases, Factual , Francisella/metabolism , Fungal Proteins/analysis , HeLa Cells , Humans , Parkinson Disease/metabolism , Plant Proteins/analysis , Reproducibility of Results , Signal-To-Noise Ratio
7.
Database (Oxford) ; 20192019 01 01.
Article En | MEDLINE | ID: mdl-30821814

The stress response in the model organisms Saccharomyces cerevisiae is a well-studied system for which many data sets are available. Already in 2000, it was discovered that yeast cells trigger a similar transcriptional response when different types of stress are applied. However, the exact regulatory mechanisms and differences between the different types of stress are still not understood. Here, we present the Yeast Environmental Stress database (YESdb), a database containing all high-throughput experiments measuring various kinds of stress in yeast. The goal of the database is to allow the user to execute complex, integrative analyses of selected data sets, e.g. the comparison of measurements of the same stress using different platforms or differences between strains, stress strengths or types of stress. The analyses can be visualized in various ways and can be compiled into interactive reports to summarize and communicate the results. The data sets are available as differential conditions (typically stressed vs control), which are grouped to time or concentration series when multiple measurements over time or concentrations are done in one experiment. An annotation ontology has been constructed to annotate the data sets with the type, duration and strength of the applied stress, the used strain and experimental platform as well as the publication date. These annotations can easily be combined to select all relevant data sets for an analysis. YESdb allows to construct and execute Petri net-based workflows to perform predefined and custom analyses. E.g. to compare two types of stress (e.g. salt vs oxidative stress), the corresponding data sets are selected from the database, the consistently changed genes are defined and combined and the shared genes are characterized by enrichment analysis. A broad collection of visualizations is available most of which are also interactive. The results of all analyses can be summarized in an interactive report. Visualizations of individual steps (transitions) of YESdb workflows can be automatically added to this report or customized visualizations as well as interpretive text can manually be added to the report. Overall, YESdb aims at making all published data sets on yeast stress immediately available and comparable for integrated analysis of data sets and sets of genes in order to identify and assess hypotheses and mechanisms.


Databases, Factual , Environment , Saccharomyces cerevisiae/physiology , Stress, Physiological , Data Curation , Internet , User-Computer Interface
8.
Bioinformatics ; 35(18): 3412-3420, 2019 09 15.
Article En | MEDLINE | ID: mdl-30759193

MOTIVATION: Several gene expression-based risk scores and subtype classifiers for breast cancer were developed to distinguish high- and low-risk patients. Evaluating the performance of these classifiers helps to decide which classifiers should be used in clinical practice for personal therapeutic recommendations. So far, studies that compared multiple classifiers in large independent patient cohorts mostly used microarray measurements. qPCR-based classifiers were not included in the comparison or had to be adapted to the different experimental platforms. RESULTS: We used a prospective study of 726 early breast cancer patients from seven certified German breast cancer centers. Patients were treated according to national guidelines and the expressions of 94 selected genes were measured by the mid-throughput qPCR platform Fluidigm. Clinical and pathological data including outcome over five years is available. Using these data, we could compare the performance of six classifiers (scmgene and research versions of PAM50, ROR-S, recurrence score, EndoPredict and GGI). Similar to other studies, we found a similar or even higher concordance between most of the classifiers and most were also able to differentiate high- and low-risk patients. The classifiers that were originally developed for microarray data still performed similarly using the Fluidigm data. Therefore, Fluidigm can be used to measure the gene expressions needed by several classifiers for a large cohort with little effort. In addition, we provide an interactive report of the results, which enables a transparent, in-depth comparison of classifiers and their prediction of individual patients. AVAILABILITY AND IMPLEMENTATION: https://services.bio.ifi.lmu.de/pia/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Breast Neoplasms , Humans , Neoplasm Recurrence, Local , Prospective Studies , Real-Time Polymerase Chain Reaction , Risk
9.
J Proteome Res ; 18(4): 1553-1566, 2019 04 05.
Article En | MEDLINE | ID: mdl-30793903

Spectral libraries play a central role in the analysis of data-independent-acquisition (DIA) proteomics experiments. A main assumption in current spectral library tools is that a single characteristic intensity pattern (CIP) suffices to describe the fragmentation of a peptide in a particular charge state (peptide charge pair). However, we find that this is often not the case. We carry out a systematic evaluation of spectral variability over public repositories and in-house data sets. We show that spectral variability is widespread and partly occurs under fixed experimental conditions. Using clustering of preprocessed spectra, we derive a limited number of multiple characteristic intensity patterns (MCIPs) for each peptide charge pair, which allow almost complete coverage of our heterogeneous data set without affecting the false discovery rate. We show that a MCIP library derived from public repositories performs in most cases similar to a "custom-made" spectral library, which has been acquired under identical experimental conditions as the query spectra. We apply the MCIP approach to a DIA data set and observe a significant increase in peptide recognition. We propose the MCIP approach as an easy-to-implement addition to current spectral library search engines and as a new way to utilize the data stored in spectral repositories.


Chromatography, Liquid , Databases, Protein , Peptide Library , Proteomics/methods , Tandem Mass Spectrometry , Algorithms , Peptide Fragments/chemistry , Peptide Fragments/genetics
10.
Nat Commun ; 9(1): 2645, 2018 07 06.
Article En | MEDLINE | ID: mdl-29980665

Blood flow at arterial bifurcations and curvatures is naturally disturbed. Endothelial cells (ECs) fail to adapt to disturbed flow, which transcriptionally direct ECs toward a maladapted phenotype, characterized by chronic regeneration of injured ECs. MicroRNAs (miRNAs) can regulate EC maladaptation through targeting of protein-coding RNAs. However, long noncoding RNAs (lncRNAs), known epigenetic regulators of biological processes, can also be miRNA targets, but their contribution on EC maladaptation is unclear. Here we show that hyperlipidemia- and oxLDL-induced upregulation of miR-103 inhibits EC proliferation and promotes endothelial DNA damage through targeting of novel lncWDR59. MiR-103 impedes lncWDR59 interaction with Notch1-inhibitor Numb, therefore affecting Notch1-induced EC proliferation. Moreover, miR-103 increases the susceptibility of proliferating ECs to oxLDL-induced mitotic aberrations, characterized by an increased micronucleic formation and DNA damage accumulation, by affecting Notch1-related ß-catenin co-activation. Collectively, these data indicate that miR-103 programs ECs toward a maladapted phenotype through targeting of lncWDR59, which may promote atherosclerosis.


Endothelial Cells/metabolism , MicroRNAs/metabolism , RNA, Long Noncoding/metabolism , Animals , Atherosclerosis/genetics , Atherosclerosis/pathology , Base Sequence , Cell Proliferation , DNA Damage , Gene Expression Regulation , HMGB Proteins/metabolism , Humans , Lipoproteins, LDL , Membrane Proteins/metabolism , Mice , MicroRNAs/genetics , Micronuclei, Chromosome-Defective , Nerve Tissue Proteins/metabolism , RNA, Long Noncoding/genetics , Receptors, Notch/metabolism , Ribonuclease III/metabolism , SOXF Transcription Factors/metabolism , Signal Transduction , beta Catenin/metabolism
11.
Bioinformatics ; 33(12): 1837-1844, 2017 Jun 15.
Article En | MEDLINE | ID: mdl-28165113

MOTIVATION: The goal of many genome-wide experiments is to explain the changes between the analyzed conditions. Typically, the analysis is started with a set of differential genes DG and the first step is to identify the set of relevant biological processes BP . Current enrichment methods identify the involved biological process via statistically significant overrepresentation of differential genes in predefined sets, but do not further explain how the differential genes interact with each other or which other genes might be important for the enriched process. Other network-based methods determine subnetworks of interacting genes containing many differential genes, but do not employ process knowledge for a more focused analysis. RESULTS: RelExplain is a method to analyze a given biological process bp (e.g. identified by enrichment) in more detail by computing an explanation using the measured DG and a given network. An explanation is a subnetwork that contains the differential genes in the process bp and connects them in the best way given the experimental data using also genes that are not differential or not in bp . RelExplain takes into account the functional annotations of nodes and the edge consistency of the measurements. Explanations are compact networks of the relevant part of the bp and additional nodes that might be important for the bp . Our evaluation showed that RelExplain is better suited to retrieve manually curated subnetworks from unspecific networks than other algorithms. The interactive RelExplain tool allows to compute and inspect sub-optimal and alternative optimal explanations. AVAILABILITY AND IMPLEMENTATION: A webserver is available at https://services.bio.ifi.lmu.de/relexplain . CONTACT: berchtold@bio.ifi.lmu.de. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Computational Biology/methods , Metabolic Networks and Pathways , Software , Algorithms , Biological Phenomena , Breast Neoplasms/metabolism , Humans , Molecular Sequence Annotation/methods
12.
PLoS One ; 11(10): e0164513, 2016.
Article En | MEDLINE | ID: mdl-27723775

Several methods predict activity changes of transcription factors (TFs) from a given regulatory network and measured expression data. But available gene regulatory networks are incomplete and contain many condition-dependent regulations that are not relevant for the specific expression measurement. It is not known which combination of active TFs is needed to cause a change in the expression of a target gene. A method to systematically evaluate the inferred activity changes is missing. We present such an evaluation strategy that indicates for how many target genes the observed expression changes can be explained by a given set of active TFs. To overcome the problem that the exact combination of active TFs needed to activate a gene is typically not known, we assume a gene to be explained if there exists any combination for which the predicted active TFs can possibly explain the observed change of the gene. We introduce the i-score (inconsistency score), which quantifies how many genes could not be explained by the set of activity changes of TFs. We observe that, even for these minimal requirements, published methods yield many unexplained target genes, i.e. large i-scores. This holds for all methods and all expression datasets we evaluated. We provide new optimization methods to calculate the best possible (minimal) i-score given the network and measured expression data. The evaluation of this optimized i-score on a large data compendium yields many unexplained target genes for almost every case. This indicates that currently available regulatory networks are still far from being complete. Both the presented Act-SAT and Act-A* methods produce optimal sets of TF activity changes, which can be used to investigate the difficult interplay of expression and network data. A web server and a command line tool to calculate our i-score and to find the active TFs associated with the minimal i-score is available from https://services.bio.ifi.lmu.de/i-score.


Databases, Genetic , Gene Expression Profiling/methods , Gene Expression Regulation , Models, Genetic , Transcription Factors/metabolism , Animals , Humans , Transcription Factors/genetics
13.
Circ Res ; 119(9): 1030-1038, 2016 Oct 14.
Article En | MEDLINE | ID: mdl-27531933

RATIONALE: Atheroprogression is a consequence of nonresolved inflammation, and currently a comprehensive overview of the mechanisms preventing resolution is missing. However, in acute inflammation, resolution is known to be orchestrated by a switch from inflammatory to resolving lipid mediators. Therefore, we hypothesized that lesional lipid mediator imbalance favors atheroprogression. OBJECTIVE: To understand the lipid mediator balance during atheroprogression and to establish an interventional strategy based on the delivery of resolving lipid mediators. METHODS AND RESULTS: Aortic lipid mediator profiling of aortas from Apoe-/- mice fed a high-fat diet for 4 weeks, 8 weeks, or 4 months revealed an expansion of inflammatory lipid mediators, Leukotriene B4 and Prostaglandin E2, and a concomitant decrease of resolving lipid mediators, Resolvin D2 (RvD2) and Maresin 1 (MaR1), during advanced atherosclerosis. Functionally, aortic Leukotriene B4 and Prostaglandin E2 levels correlated with traits of plaque instability, whereas RvD2 and MaR1 levels correlated with the signs of plaque stability. In a therapeutic context, repetitive RvD2 and MaR1 delivery prevented atheroprogression as characterized by halted expansion of the necrotic core and accumulation of macrophages along with increased fibrous cap thickness and smooth muscle cell numbers. Mechanistically, RvD2 and MaR1 induced a shift in macrophage profile toward a reparative phenotype, which secondarily stimulated collagen synthesis in smooth muscle cells. CONCLUSIONS: We present evidence for the imbalance between inflammatory and resolving lipid mediators during atheroprogression. Delivery of RvD2 and MaR1 successfully prevented atheroprogression, suggesting that resolving lipid mediators potentially represent an innovative strategy to resolve arterial inflammation.


Atherosclerosis/metabolism , Atherosclerosis/prevention & control , Docosahexaenoic Acids/metabolism , Inflammation Mediators/metabolism , Lipid Metabolism/physiology , Animals , Atherosclerosis/etiology , Cells, Cultured , Diet, High-Fat/adverse effects , Disease Progression , Docosahexaenoic Acids/administration & dosage , Drug Delivery Systems/methods , Lipid Metabolism/drug effects , Mice , Mice, Inbred C57BL , Mice, Knockout
14.
J Mol Biol ; 428(8): 1544-57, 2016 Apr 24.
Article En | MEDLINE | ID: mdl-26953259

Alternative splicing often affects structured and highly conserved regions of proteins, generating so called non-trivial splicing variants of unknown structure and cellular function. The human small G-protein Rab1A is involved in the regulation of the vesicle transfer from the ER to Golgi. A conserved non-trivial splice variant lacks nearly 40% of the sequence of the native Rab1A, including most of the regulatory interaction sites. We show that this variant of Rab1A represents a stable and folded protein, which is still able to bind nucleotides and co-localizes with membranes. Nevertheless, it should be mentioned that compared to other wild-typeRabGTPases, the measured nucleotide binding affinities are dramatically reduced in the variant studied. Furthermore, the Rab1A variant forms hetero-dimers with wild-type Rab1A and its presence in the cell enhances the efficiency of alkaline phosphatase secretion. However, this variant shows no specificity for GXP nucleotides, a constantly enhanced GTP hydrolysis activity and is no longer controlled by GEF or GAP proteins, indicating a new regulatory mechanism for the Rab1A cycle via alternative non-trivial splicing.


rab1 GTP-Binding Proteins/chemistry , Alternative Splicing , Cell Membrane/metabolism , Evolution, Molecular , Guanosine Diphosphate/chemistry , Guanosine Triphosphate/chemistry , Humans , Hydrolysis , Nucleotides/chemistry , Protein Binding , Protein Folding , Protein Isoforms/chemistry , Protein Multimerization , Protein Structure, Tertiary , Proteome , rab GTP-Binding Proteins/chemistry
15.
BMC Bioinformatics ; 17: 45, 2016 Jan 20.
Article En | MEDLINE | ID: mdl-26791995

BACKGROUND: Enrichment analysis of gene expression data is essential to find functional groups of genes whose interplay can explain experimental observations. Numerous methods have been published that either ignore (set-based) or incorporate (network-based) known interactions between genes. However, the often subtle benefits and disadvantages of the individual methods are confusing for most biological end users and there is currently no convenient way to combine methods for an enhanced result interpretation. RESULTS: We present the EnrichmentBrowser package as an easily applicable software that enables (1) the application of the most frequently used set-based and network-based enrichment methods, (2) their straightforward combination, and (3) a detailed and interactive visualization and exploration of the results. The package is available from the Bioconductor repository and implements additional support for standardized expression data preprocessing, differential expression analysis, and definition of suitable input gene sets and networks. CONCLUSION: The EnrichmentBrowser package implements essential functionality for the enrichment analysis of gene expression data. It combines the advantages of set-based and network-based enrichment analysis in order to derive high-confidence gene sets and biological pathways that are differentially regulated in the expression data under investigation. Besides, the package facilitates the visualization and exploration of such sets and pathways.


Gene Regulatory Networks , Microarray Analysis/methods , Software , Databases, Factual , Gene Expression Profiling , Sequence Analysis, RNA
16.
PLoS One ; 10(10): e0140487, 2015.
Article En | MEDLINE | ID: mdl-26469855

mRNA splicing is required in about 4% of protein coding genes in Saccharomyces cerevisiae. The gene structure of those genes is simple, generally comprising two exons and one intron. In order to characterize the impact of alternative splicing on the S. cerevisiae transcriptome, we perform a systematic analysis of mRNA sequencing data. We find evidence of a pervasive use of alternative splice sites and detect several novel introns both within and outside protein coding regions. We also find a predominance of alternative splicing on the 3' side of introns, a finding which is consistent with existing knowledge on conservation of exon-intron boundaries in S. cerevisiae. Some of the alternatively spliced transcripts allow for a translation into different protein products.


Alternative Splicing , Saccharomyces cerevisiae/genetics , Transcriptome , High-Throughput Nucleotide Sequencing , Introns , Sequence Analysis, RNA
17.
BMC Bioinformatics ; 16: 122, 2015 Apr 17.
Article En | MEDLINE | ID: mdl-25928589

BACKGROUND: Mapping of short sequencing reads is a crucial step in the analysis of RNA sequencing (RNA-seq) data. ContextMap is an RNA-seq mapping algorithm that uses a context-based approach to identify the best alignment for each read and allows parallel mapping against several reference genomes. RESULTS: In this article, we present ContextMap 2, a new and improved version of ContextMap. Its key novel features are: (i) a plug-in structure that allows easily integrating novel short read alignment programs with improved accuracy and runtime; (ii) context-based identification of insertions and deletions (indels); (iii) mapping of reads spanning an arbitrary number of exons and indels. ContextMap 2 using Bowtie, Bowtie 2 or BWA was evaluated on both simulated and real-life data from the recently published RGASP study. CONCLUSIONS: We show that ContextMap 2 generally combines similar or higher recall compared to other state-of-the-art approaches with significantly higher precision in read placement and junction and indel prediction. Furthermore, runtime was significantly lower than for the best competing approaches. ContextMap 2 is freely available at http://www.bio.ifi.lmu.de/ContextMap .


Algorithms , Genome, Human , High-Throughput Nucleotide Sequencing/methods , RNA, Messenger/genetics , Sequence Analysis, RNA/methods , Exons/genetics , Humans , INDEL Mutation/genetics , Transcriptome
18.
PLoS One ; 8(9): e73071, 2013.
Article En | MEDLINE | ID: mdl-24019895

RNA sequencing (RNA-seq) provides novel opportunities for transcriptomic studies at nucleotide resolution, including transcriptomics of viruses or microbes infecting a cell. However, standard approaches for mapping the resulting sequencing reads generally ignore alternative sources of expression other than the host cell and are little equipped to address the problems arising from redundancies and gaps among sequenced microbe and virus genomes. We show that screening of sequencing reads for contaminations and infections can be performed easily using ContextMap, our recently developed mapping software. Based on mapping-derived statistics, mapping confidence, similarities and misidentifications (e.g. due to missing genome sequences) of species/strains can be assessed. Performance of our approach is evaluated on three real-life sequencing data sets and compared to state-of-the-art metagenomics tools. In particular, ContextMap vastly outperformed GASiC and GRAMMy in terms of runtime. In contrast to MEGAN4, it was capable of providing individual read mappings to species and resolving non-unique mappings, thus allowing the identification of misalignments caused by sequence similarities between genomes and missing genome sequences. Our study illustrates the importance and potentials of routinely mining RNA-seq experiments for infections or contaminations by microbes and viruses. By using ContextMap, gene expression of infecting agents can be analyzed and novel insights in infection processes and tumorigenesis can be obtained.


Data Mining , Infections/genetics , Sequence Analysis, RNA , Colorectal Neoplasms/genetics , Colorectal Neoplasms/microbiology , HeLa Cells , Humans , Microbiota
19.
Nucleic Acids Res ; 41(18): 8452-63, 2013 Oct.
Article En | MEDLINE | ID: mdl-23873954

Existing machine-readable resources for large-scale gene regulatory networks usually do not provide context information characterizing the activating conditions for a regulation and how targeted genes are affected. Although this information is essentially required for data interpretation, available networks are often restricted to not condition-dependent, non-quantitative, plain binary interactions as derived from high-throughput screens. In this article, we present a comprehensive Petri net based regulatory network that controls the diauxic shift in Saccharomyces cerevisiae. For 100 specific enzymatic genes, we collected regulations from public databases as well as identified and manually curated >400 relevant scientific articles. The resulting network consists of >300 multi-input regulatory interactions providing (i) activating conditions for the regulators; (ii) semi-quantitative effects on their targets; and (iii) classification of the experimental evidence. The diauxic shift network compiles widespread distributed regulatory information and is available in an easy-to-use machine-readable form. Additionally, we developed a browsable system organizing the network into pathway maps, which allows to inspect and trace the evidence for each annotated regulation in the model.


Gene Expression Regulation, Fungal , Gene Regulatory Networks , Saccharomyces cerevisiae/genetics , Citric Acid Cycle/genetics , Fatty Acids/metabolism , Gluconeogenesis/genetics , Models, Genetic , Phosphoenolpyruvate Carboxykinase (ATP)/genetics , Saccharomyces cerevisiae/metabolism , Saccharomyces cerevisiae Proteins/genetics
20.
BMC Bioinformatics ; 13 Suppl 6: S9, 2012 Apr 19.
Article En | MEDLINE | ID: mdl-22537048

BACKGROUND: Sequencing of mRNA (RNA-seq) by next generation sequencing technologies is widely used for analyzing the transcriptomic state of a cell. Here, one of the main challenges is the mapping of a sequenced read to its transcriptomic origin. As a simple alignment to the genome will fail to identify reads crossing splice junctions and a transcriptome alignment will miss novel splice sites, several approaches have been developed for this purpose. Most of these approaches have two drawbacks. First, each read is assigned to a location independent on whether the corresponding gene is expressed or not, i.e. information from other reads is not taken into account. Second, in case of multiple possible mappings, the mapping with the fewest mismatches is usually chosen which may lead to wrong assignments due to sequencing errors. RESULTS: To address these problems, we developed ContextMap which efficiently uses information on the context of a read, i.e. reads mapping to the same expressed region. The context information is used to resolve possible ambiguities and, thus, a much larger degree of ambiguities can be allowed in the initial stage in order to detect all possible candidate positions. Although ContextMap can be used as a stand-alone version using either a genome or transcriptome as input, the version presented in this article is focused on refining initial mappings provided by other mapping algorithms. Evaluation results on simulated sequencing reads showed that the application of ContextMap to either TopHat or MapSplice mappings improved the mapping accuracy of both initial mappings considerably. CONCLUSIONS: In this article, we show that the context of reads mapping to nearby locations provides valuable information for identifying the best unique mapping for a read. Using our method, mappings provided by other state-of-the-art methods can be refined and alignment accuracy can be further improved. AVAILABILITY: http://www.bio.ifi.lmu.de/ContextMap.


Algorithms , RNA, Messenger/genetics , Sequence Analysis, RNA/methods , Animals , Genome , Humans , Mice , RNA Splicing , Transcriptome
...