Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 23
Filter
1.
Nat Methods ; 20(11): 1810-1821, 2023 Nov.
Article in English | MEDLINE | ID: mdl-37783886

ABSTRACT

The lack of benchmark data sets with inbuilt ground-truth makes it challenging to compare the performance of existing long-read isoform detection and differential expression analysis workflows. Here, we present a benchmark experiment using two human lung adenocarcinoma cell lines that were each profiled in triplicate together with synthetic, spliced, spike-in RNAs (sequins). Samples were deeply sequenced on both Illumina short-read and Oxford Nanopore Technologies long-read platforms. Alongside the ground-truth available via the sequins, we created in silico mixture samples to allow performance assessment in the absence of true positives or true negatives. Our results show that StringTie2 and bambu outperformed other tools from the six isoform detection tools tested, DESeq2, edgeR and limma-voom were best among the five differential transcript expression tools tested and there was no clear front-runner for performing differential transcript usage analysis between the five tools compared, which suggests further methods development is needed for this application.


Subject(s)
Gene Expression Profiling , High-Throughput Nucleotide Sequencing , Humans , Gene Expression Profiling/methods , High-Throughput Nucleotide Sequencing/methods , Benchmarking/methods , RNA , Protein Isoforms
2.
Development ; 148(13)2021 07 01.
Article in English | MEDLINE | ID: mdl-34121118

ABSTRACT

Development of a branching tree in the embryonic lung is crucial for the formation of a fully mature functional lung at birth. Sox9+ cells present at the tip of the primary embryonic lung endoderm are multipotent cells responsible for branch formation and elongation. We performed a genetic screen in murine primary cells and identified aurora kinase b (Aurkb) as an essential regulator of Sox9+ cells ex vivo. In vivo conditional knockout studies confirmed that Aurkb was required for lung development but was not necessary for postnatal growth and the repair of the adult lung after injury. Deletion of Aurkb in embryonic Sox9+ cells led to the formation of a stunted lung that retained the expression of Sox2 in the proximal airways, as well as Sox9 in the distal tips. Although we found no change in cell polarity, we showed that loss of Aurkb or chemical inhibition of Aurkb caused Sox9+ cells to arrest at G2/M, likely responsible for the lack of branch bifurcation. This work demonstrates the power of genetic screens in identifying novel regulators of Sox9+ progenitor cells and lung branching morphogenesis.


Subject(s)
Aurora Kinase B/genetics , Aurora Kinase B/metabolism , Embryonic Stem Cells/metabolism , Endoderm/metabolism , Lung/embryology , SOX9 Transcription Factor/metabolism , Animals , Gene Expression Regulation, Developmental , Mice , Mice, Knockout , Organogenesis , SOX9 Transcription Factor/genetics
3.
FASEB J ; 35(3): e21320, 2021 03.
Article in English | MEDLINE | ID: mdl-33660333

ABSTRACT

Influenza A virus (IAV) is rapidly detected in the airways by the immune system, with resident parenchymal cells and leukocytes orchestrating viral sensing and the induction of antiviral inflammatory responses. The airways are innervated by heterogeneous populations of vagal sensory neurons which also play an important role in pulmonary defense. How these neurons respond to IAV respiratory infection remains unclear. Here, we use a murine model to provide the first evidence that vagal sensory neurons undergo significant transcriptional changes following a respiratory IAV infection. RNA sequencing on vagal sensory ganglia showed that IAV infection induced the expression of many genes associated with an antiviral and pro-inflammatory response and this was accompanied by a significant increase in inflammatory cell recruitment into the vagal ganglia. Assessment of gene expression in single-vagal sensory neurons confirmed that IAV infection induced a neuronal inflammatory phenotype, which was most prominent in bronchopulmonary neurons, and also evident in some neurons innervating other organs. The altered transcriptome could be mimicked by intranasal treatment with cytokines and the lung homogenates of infected mice, in the absence of infectious virus. These data argue that IAV pulmonary infection and subsequent inflammation induces vagal sensory ganglia neuroinflammation and this may have important implications for IAV-induced morbidity.


Subject(s)
Inflammation/immunology , Influenza A virus , Lung/innervation , Orthomyxoviridae Infections/immunology , Sensory Receptor Cells/immunology , Vagus Nerve/immunology , Animals , Female , Lung/virology , Male , Mice , Mice, Inbred C57BL , Sensory Receptor Cells/metabolism , Transcription, Genetic , Vagus Nerve/metabolism
4.
Nucleic Acids Res ; 45(5): e30, 2017 03 17.
Article in English | MEDLINE | ID: mdl-27899618

ABSTRACT

Carefully designed control experiments provide a gold standard for benchmarking different genomics research tools. A shortcoming of many gene expression control studies is that replication involves profiling the same reference RNA sample multiple times. This leads to low, pure technical noise that is atypical of regular studies. To achieve a more realistic noise structure, we generated a RNA-sequencing mixture experiment using two cell lines of the same cancer type. Variability was added by extracting RNA from independent cell cultures and degrading particular samples. The systematic gene expression changes induced by this design allowed benchmarking of different library preparation kits (standard poly-A versus total RNA with Ribozero depletion) and analysis pipelines. Data generated using the total RNA kit had more signal for introns and various RNA classes (ncRNA, snRNA, snoRNA) and less variability after degradation. For differential expression analysis, voom with quality weights marginally outperformed other popular methods, while for differential splicing, DEXSeq was simultaneously the most sensitive and the most inconsistent method. For sample deconvolution analysis, DeMix outperformed IsoPure convincingly. Our RNA-sequencing data set provides a valuable resource for benchmarking different protocols and data pre-processing workflows. The extra noise mimics routine lab experiments more closely, ensuring any conclusions are widely applicable.


Subject(s)
Benchmarking , RNA/analysis , Sequence Analysis, RNA/standards , Cell Line, Tumor , Epithelial Cells/cytology , Epithelial Cells/metabolism , Gene Library , Genomics/instrumentation , Genomics/methods , Humans , RNA/classification , RNA/genetics , RNA Cleavage , Reagent Kits, Diagnostic/standards , Reference Standards , Sequence Analysis, RNA/methods , Sequence Analysis, RNA/statistics & numerical data
5.
Bioinformatics ; 33(13): 2050-2052, 2017 Jul 01.
Article in English | MEDLINE | ID: mdl-28203714

ABSTRACT

MOTIVATION: graphics for RNA-sequencing and microarray gene expression analyses may contain upwards of tens of thousands of points. Details about certain genes or samples of interest are easily obscured in such dense summary displays. Incorporating interactivity into summary plots would enable additional information to be displayed on demand and facilitate intuitive data exploration. RESULTS: The open-source Glimma package creates interactive graphics for exploring gene expression analysis with a few simple R commands. It extends popular plots found in the limma package, such as multi-dimensional scaling plots and mean-difference plots, to allow individual data points to be queried and additional annotation information to be displayed upon hovering or selecting particular points. It also offers links between plots so that more information can be revealed on demand. Glimma is widely applicable, supporting data analyses from a number of well-established Bioconductor workflows ( limma , edgeR and DESeq2 ) and uses D3/JavaScript to produce HTML pages with interactive displays that enable more effective data exploration by end-users. Results from Glimma can be easily shared between bioinformaticians and biologists, enhancing reporting capabilities while maintaining reproducibility. AVAILABILITY AND IMPLEMENTATION: The Glimma R package is available from http://bioconductor.org/packages/Glimma/ . CONTACT: su.s@wehi.edu.au , law@wehi.edu.au or mritchie@wehi.edu.au.


Subject(s)
Gene Expression Profiling/methods , Sequence Analysis, RNA/methods , Software , Animals , Mice
6.
Blood ; 127(11): 1438-48, 2016 Mar 17.
Article in English | MEDLINE | ID: mdl-26729899

ABSTRACT

Aberrant expression of the oncogenic transcription factor forkhead box protein 1 (FOXP1) is a common feature of diffuse large B-cell lymphoma (DLBCL). We have combined chromatin immunoprecipitation and gene expression profiling after FOXP1 depletion with functional screening to identify targets of FOXP1 contributing to tumor cell survival. We find that the sphingosine-1-phosphate receptor 2 (S1PR2) is repressed by FOXP1 in activated B-cell (ABC) and germinal center B-cell (GCB) DLBCL cell lines with aberrantly high FOXP1 levels; S1PR2 expression is further inversely correlated with FOXP1 expression in 3 patient cohorts. Ectopic expression of wild-type S1PR2, but not a point mutant incapable of activating downstream signaling pathways, induces apoptosis in DLBCL cells and restricts tumor growth in subcutaneous and orthotopic models of the disease. The proapoptotic effects of S1PR2 are phenocopied by ectopic expression of the small G protein Gα13 but are independent of AKT signaling. We further show that low S1PR2 expression is a strong negative prognosticator of patient survival, alone and especially in combination with high FOXP1 expression. The S1PR2 locus has previously been demonstrated to be recurrently mutated in GCB DLBCL; the transcriptional silencing of S1PR2 by FOXP1 represents an alternative mechanism leading to inactivation of this important hematopoietic tumor suppressor.


Subject(s)
Forkhead Transcription Factors/physiology , Lymphoma, Large B-Cell, Diffuse/pathology , Neoplasm Proteins/physiology , Receptors, Lysosphingolipid/physiology , Repressor Proteins/physiology , Signal Transduction/physiology , Animals , Apoptosis/physiology , Cell Line, Tumor , Chromatin Immunoprecipitation , Forkhead Transcription Factors/genetics , GTP-Binding Protein alpha Subunits, G12-G13/biosynthesis , GTP-Binding Protein alpha Subunits, G12-G13/genetics , Gene Expression Profiling , Gene Expression Regulation, Neoplastic , Germinal Center/pathology , Heterografts , Humans , Kaplan-Meier Estimate , Lymphoma, Large B-Cell, Diffuse/classification , Lymphoma, Large B-Cell, Diffuse/genetics , Lymphoma, Large B-Cell, Diffuse/mortality , Mice , Neoplasm Transplantation , Prognosis , Proto-Oncogene Proteins c-akt/analysis , RNA Interference , RNA, Small Interfering/genetics , Receptors, Lysosphingolipid/biosynthesis , Receptors, Lysosphingolipid/deficiency , Receptors, Lysosphingolipid/genetics , Repressor Proteins/genetics , Sphingosine-1-Phosphate Receptors
7.
Nucleic Acids Res ; 43(7): e47, 2015 Apr 20.
Article in English | MEDLINE | ID: mdl-25605792

ABSTRACT

limma is an R/Bioconductor software package that provides an integrated solution for analysing data from gene expression experiments. It contains rich features for handling complex experimental designs and for information borrowing to overcome the problem of small sample sizes. Over the past decade, limma has been a popular choice for gene discovery through differential expression analyses of microarray and high-throughput PCR data. The package contains particularly strong facilities for reading, normalizing and exploring such data. Recently, the capabilities of limma have been significantly expanded in two important directions. First, the package can now perform both differential expression and differential splicing analyses of RNA sequencing (RNA-seq) data. All the downstream analysis tools previously restricted to microarray data are now available for RNA-seq as well. These capabilities allow users to analyse both RNA-seq and microarray data with very similar pipelines. Second, the package is now able to go past the traditional gene-wise expression analyses in a variety of ways, analysing expression profiles in terms of co-regulated sets of genes or in terms of higher-order expression signatures. This provides enhanced possibilities for biological interpretation of gene expression differences. This article reviews the philosophy and design of the limma package, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.


Subject(s)
Gene Expression Regulation , Oligonucleotide Array Sequence Analysis , Sequence Analysis, RNA , Software
8.
Genome Biol ; 24(1): 107, 2023 05 05.
Article in English | MEDLINE | ID: mdl-37147723

ABSTRACT

Group heteroscedasticity is commonly observed in pseudo-bulk single-cell RNA-seq datasets and its presence can hamper the detection of differentially expressed genes. Since most bulk RNA-seq methods assume equal group variances, we introduce two new approaches that account for heteroscedastic groups, namely voomByGroup and voomWithQualityWeights using a blocked design (voomQWB). Compared to current gold-standard methods that do not account for group heteroscedasticity, we show results from simulations and various experiments that demonstrate the superior performance of voomByGroup and voomQWB in terms of error control and power when group variances in pseudo-bulk single-cell RNA-seq data are unequal.


Subject(s)
Gene Expression Profiling , Software , Gene Expression Profiling/methods , Sequence Analysis, RNA/methods , Single-Cell Gene Expression Analysis , Single-Cell Analysis/methods
9.
NAR Genom Bioinform ; 3(4): lqab116, 2021 Dec.
Article in English | MEDLINE | ID: mdl-34988439

ABSTRACT

Glimma 1.0 introduced intuitive, point-and-click interactive graphics for differential gene expression analysis. Here, we present a major update to Glimma that brings improved interactivity and reproducibility using high-level visualization frameworks for R and JavaScript. Glimma 2.0 plots are now readily embeddable in R Markdown, thus allowing users to create reproducible reports containing interactive graphics. The revamped multidimensional scaling plot features dashboard-style controls allowing the user to dynamically change the colour, shape and size of sample points according to different experimental conditions. Interactivity was enhanced in the MA-style plot for comparing differences to average expression, which now supports selecting multiple genes, export options to PNG, SVG or CSV formats and includes a new volcano plot function. Feature-rich and user-friendly, Glimma makes exploring data for gene expression analysis more accessible and intuitive and is available on Bioconductor and GitHub.

10.
Cell Rep ; 36(3): 109430, 2021 07 20.
Article in English | MEDLINE | ID: mdl-34289356

ABSTRACT

While the intrinsic apoptosis pathway is thought to play a central role in shaping the B cell lineage, its precise role in mature B cell homeostasis remains elusive. Using mice in which mature B cells are unable to undergo apoptotic cell death, we show that apoptosis constrains follicular B (FoB) cell lifespan but plays no role in marginal zone B (MZB) cell homeostasis. In these mice, FoB cells accumulate abnormally. This intensifies intercellular competition for BAFF, resulting in a contraction of the MZB cell compartment, and reducing the growth, trafficking, and fitness of FoB cells. Diminished BAFF signaling dampens the non-canonical NF-κB pathway, undermining FoB cell growth despite the concurrent triggering of a protective p53 response. Thus, MZB and FoB cells exhibit a differential requirement for the intrinsic apoptosis pathway. Homeostatic apoptosis constrains the size of the FoB cell compartment, thereby preventing competition-induced FoB cell atrophy.


Subject(s)
Apoptosis , B-Lymphocytes/pathology , Homeostasis , Animals , Antibody Formation/immunology , Atrophy , B-Cell Activating Factor/metabolism , Cell Count , Cell Differentiation/genetics , Cell Proliferation/genetics , Cell Size , Cell Survival/genetics , Cellular Senescence/genetics , Gene Deletion , Gene Expression Regulation , Mice, Knockout , Sequence Analysis, RNA , Thymus Gland/immunology , Transcription Factors/metabolism , bcl-2 Homologous Antagonist-Killer Protein/metabolism , bcl-2-Associated X Protein/metabolism
11.
NAR Genom Bioinform ; 3(2): lqab028, 2021 Jun.
Article in English | MEDLINE | ID: mdl-33937765

ABSTRACT

Application of Oxford Nanopore Technologies' long-read sequencing platform to transcriptomic analysis is increasing in popularity. However, such analysis can be challenging due to the high sequence error and small library sizes, which decreases quantification accuracy and reduces power for statistical testing. Here, we report the analysis of two nanopore RNA-seq datasets with the goal of obtaining gene- and isoform-level differential expression information. A dataset of synthetic, spliced, spike-in RNAs ('sequins') as well as a mouse neural stem cell dataset from samples with a null mutation of the epigenetic regulator Smchd1 was analysed using a mix of long-read specific tools for preprocessing together with established short-read RNA-seq methods for downstream analysis. We used limma-voom to perform differential gene expression analysis, and the novel FLAMES pipeline to perform isoform identification and quantification, followed by DRIMSeq and limma-diffSplice (with stageR) to perform differential transcript usage analysis. We compared results from the sequins dataset to the ground truth, and results of the mouse dataset to a previous short-read study on equivalent samples. Overall, our work shows that transcriptomic analysis of long-read nanopore data using long-read specific preprocessing methods together with short-read differential expression methods and software that are already in wide use can yield meaningful results.

12.
Genome Biol ; 22(1): 310, 2021 11 11.
Article in English | MEDLINE | ID: mdl-34763716

ABSTRACT

A modified Chromium 10x droplet-based protocol that subsamples cells for both short-read and long-read (nanopore) sequencing together with a new computational pipeline (FLAMES) is developed to enable isoform discovery, splicing analysis, and mutation detection in single cells. We identify thousands of unannotated isoforms and find conserved functional modules that are enriched for alternative transcript usage in different cell types and species, including ribosome biogenesis and mRNA splicing. Analysis at the transcript level allows data integration with scATAC-seq on individual promoters, improved correlation with protein expression data, and linked mutations known to confer drug resistance to transcriptome heterogeneity.


Subject(s)
Nanopore Sequencing/methods , Protein Isoforms/genetics , Protein Isoforms/metabolism , Alternative Splicing , Animals , Exons , Gene Expression Profiling/methods , High-Throughput Nucleotide Sequencing , Humans , Mice , RNA Splicing , RNA, Messenger , Transcriptome
13.
F1000Res ; 9: 1444, 2020.
Article in English | MEDLINE | ID: mdl-33604029

ABSTRACT

Differential expression analysis of genomic data types, such as RNA-sequencing experiments, use linear models to determine the size and direction of the changes in gene expression. For RNA-sequencing, there are several established software packages for this purpose accompanied with analysis pipelines that are well described. However, there are two crucial steps in the analysis process that can be a stumbling block for many -- the set up an appropriate model via design matrices and the set up of comparisons of interest via contrast matrices. These steps are particularly troublesome because an extensive catalogue for design and contrast matrices does not currently exist. One would usually search for example case studies across different platforms and mix and match the advice from those sources to suit the dataset they have at hand. This article guides the reader through the basics of how to set up design and contrast matrices. We take a practical approach by providing code and graphical representation of each case study, starting with simpler examples (e.g. models with a single explanatory variable) and move onto more complex ones (e.g. interaction models, mixed effects models, higher order time series and cyclical models). Although our work has been written specifically with a limma-style pipeline in mind, most of it is also applicable to other software packages for differential expression analysis, and the ideas covered can be adapted to data analysis of other high-throughput technologies. Where appropriate, we explain the interpretation and differences between models to aid readers in their own model choices. Unnecessary jargon and theory is omitted where possible so that our work is accessible to a wide audience of readers, from beginners to those with experience in genomics data analysis.


Subject(s)
Genomics , Gene Expression , Linear Models , Sequence Analysis, RNA
14.
NAR Genom Bioinform ; 2(3): lqaa073, 2020 Sep.
Article in English | MEDLINE | ID: mdl-33575621

ABSTRACT

RNA-seq datasets can contain millions of intron reads per library that are typically removed from downstream analysis. Only reads overlapping annotated exons are considered to be informative since mature mRNA is assumed to be the major component sequenced, especially for poly(A) RNA libraries. In this study, we show that intron reads are informative, and through exploratory data analysis of read coverage that intron signal is representative of both pre-mRNAs and intron retention. We demonstrate how intron reads can be utilized in differential expression analysis using our index method where a unique set of differentially expressed genes can be detected using intron counts. In exploring read coverage, we also developed the superintronic software that quickly and robustly calculates user-defined summary statistics for exonic and intronic regions. Across multiple datasets, superintronic enabled us to identify several genes with distinctly retained introns that had similar coverage levels to that of neighbouring exons. The work and ideas presented in this paper is the first of its kind to consider multiple biological sources for intron reads through exploratory data analysis, minimizing bias in discovery and interpretation of results. Our findings open up possibilities for further methods development for intron reads and RNA-seq data in general.

15.
Blood Adv ; 4(7): 1270-1283, 2020 04 14.
Article in English | MEDLINE | ID: mdl-32236527

ABSTRACT

In eukaryotic cells, messenger RNA (mRNA) molecules are exported from the nucleus to the cytoplasm, where they are translated. The highly conserved protein nuclear RNA export factor1 (Nxf1) is an important mediator of this process. Although studies in yeast and in human cell lines have shed light on the biochemical mechanisms of Nxf1 function, its contribution to mammalian physiology is less clear. Several groups have identified recurrent NXF1 mutations in chronic lymphocytic leukemia (CLL), placing it alongside several RNA-metabolism factors (including SF3B1, XPO, RPS15) whose dysregulation is thought to contribute to CLL pathogenesis. We report here an allelic series of germline point mutations in murine Nxf1. Mice heterozygous for these loss-of-function Nxf1 mutations exhibit thrombocytopenia and lymphopenia, together with milder hematological defects. This is primarily caused by cell-intrinsic defects in the survival of platelets and peripheral lymphocytes, which are sensitized to intrinsic apoptosis. In contrast, Nxf1 mutations have almost no effect on red blood cell homeostasis. Comparative transcriptome analysis of platelets, lymphocytes, and erythrocytes from Nxf1-mutant mice shows that, in response to impaired Nxf1 function, the cytoplasmic representation of transcripts encoding regulators of RNA metabolism is altered in a unique, lineage-specific way. Thus, blood cell lineages exhibit differential requirements for Nxf1-mediated global mRNA export.


Subject(s)
Lymphopenia , Thrombocytopenia , Animals , Germ Cells , Lymphopenia/genetics , Mice , Mutation , Nucleocytoplasmic Transport Proteins/genetics , RNA, Viral , RNA-Binding Proteins/genetics , Thrombocytopenia/genetics
17.
F1000Res ; 6: 2010, 2017.
Article in English | MEDLINE | ID: mdl-29333246

ABSTRACT

Gene set enrichment analysis is a popular approach for prioritising the biological processes perturbed in genomic datasets. The Bioconductor project hosts over 80 software packages capable of gene set analysis. Most of these packages search for enriched signatures amongst differentially regulated genes to reveal higher level biological themes that may be missed when focusing only on evidence from individual genes. With so many different methods on offer, choosing the best algorithm and visualization approach can be challenging. The EGSEA package solves this problem by combining results from up to 12 prominent gene set testing algorithms to obtain a consensus ranking of biologically relevant results.This workflow demonstrates how EGSEA can extend limma-based differential expression analyses for RNA-seq and microarray data using experiments that profile 3 distinct cell populations important for studying the origins of breast cancer. Following data normalization and set-up of an appropriate linear model for differential expression analysis, EGSEA builds gene signature specific indexes that link a wide range of mouse or human gene set collections obtained from MSigDB, GeneSetDB and KEGG to the gene expression data being investigated. EGSEA is then configured and the ensemble enrichment analysis run, returning an object that can be queried using several S4 methods for ranking gene sets and visualizing results via heatmaps, KEGG pathway views, GO graphs, scatter plots and bar plots. Finally, an HTML report that combines these displays can fast-track the sharing of results with collaborators, and thus expedite downstream biological validation. EGSEA is simple to use and can be easily integrated with existing gene expression analysis pipelines for both human and mouse data.

18.
Nat Cell Biol ; 19(3): 164-176, 2017 03.
Article in English | MEDLINE | ID: mdl-28192422

ABSTRACT

Despite accumulating evidence for a mammary differentiation hierarchy, the basal compartment comprising stem cells remains poorly characterized. Through gene expression profiling of Lgr5+ basal epithelial cells, we identify a new marker, Tetraspanin8 (Tspan8). Fractionation based on Tspan8 and Lgr5 expression uncovered three distinct mammary stem cell (MaSC) subsets in the adult mammary gland. These exist in a largely quiescent state but differ in their reconstituting ability, spatial localization, and their molecular and epigenetic signatures. Interestingly, the deeply quiescent MaSC subset (Lgr5+Tspan8hi) resides within the proximal region throughout life, and has a transcriptome strikingly similar to that of claudin-low tumours. Lgr5+Tspan8hi cells appear to originate from the embryonic mammary primordia before switching to a quiescent state postnatally but can be activated by ovarian hormones. Our findings reveal an unexpected degree of complexity within the adult MaSC compartment and identify a dormant subset poised for activation in response to physiological stimuli.


Subject(s)
Cell Cycle/drug effects , Hormones/pharmacology , Mammary Glands, Animal/cytology , Stem Cells/cytology , Animals , Cell Differentiation , Cell Movement , Cell Proliferation/drug effects , Female , Humans , Mammary Glands, Animal/drug effects , Mice , Stem Cells/drug effects , Stem Cells/metabolism
19.
Genome Biol ; 17: 12, 2016 Jan 26.
Article in English | MEDLINE | ID: mdl-26813113

ABSTRACT

BACKGROUND: RNA-seq has been a boon to the quantitative analysis of transcriptomes. A notable application is the detection of changes in transcript usage between experimental conditions. For example, discovery of pathological alternative splicing may allow the development of new treatments or better management of patients. From an analysis perspective, there are several ways to approach RNA-seq data to unravel differential transcript usage, such as annotation-based exon-level counting, differential analysis of the percentage spliced in, or quantitative analysis of assembled transcripts. The goal of this research is to compare and contrast current state-of-the-art methods, and to suggest improvements to commonly used work flows. RESULTS: We assess the performance of representative work flows using synthetic data and explore the effect of using non-standard counting bin definitions as input to DEXSeq, a state-of-the-art inference engine. Although the canonical counting provided the best results overall, several non-canonical approaches were as good or better in specific aspects and most counting approaches outperformed the evaluated event- and assembly-based methods. We show that an incomplete annotation catalog can have a detrimental effect on the ability to detect differential transcript usage in transcriptomes with few isoforms per gene and that isoform-level prefiltering can considerably improve false discovery rate control. CONCLUSION: Count-based methods generally perform well in the detection of differential transcript usage. Controlling the false discovery rate at the imposed threshold is difficult, particularly in complex organisms, but can be improved by prefiltering the annotation catalog.


Subject(s)
Alternative Splicing/genetics , RNA/genetics , Transcription, Genetic , Transcriptome/genetics , Exons/genetics , High-Throughput Nucleotide Sequencing , Humans , Protein Isoforms/genetics , Sequence Analysis, RNA
20.
F1000Res ; 52016.
Article in English | MEDLINE | ID: mdl-27441086

ABSTRACT

The ability to easily and efficiently analyse RNA-sequencing data is a key strength of the Bioconductor project. Starting with counts summarised at the gene-level, a typical analysis involves pre-processing, exploratory data analysis, differential expression testing and pathway analysis with the results obtained informing future experiments and validation studies. In this workflow article, we analyse RNA-sequencing data from the mouse mammary gland, demonstrating use of the popular edgeR package to import, organise, filter and normalise the data, followed by the limma package with its voom method, linear modelling and empirical Bayes moderation to assess differential expression and perform gene set testing. This pipeline is further enhanced by the Glimma package which enables interactive exploration of the results so that individual samples and genes can be examined by the user. The complete analysis offered by these three packages highlights the ease with which researchers can turn the raw counts from an RNA-sequencing experiment into biological insights using Bioconductor.

SELECTION OF CITATIONS
SEARCH DETAIL