Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 22
Filter
Add more filters










Publication year range
1.
NAR Genom Bioinform ; 5(4): lqad105, 2023 Dec.
Article in English | MEDLINE | ID: mdl-38046273

ABSTRACT

scPipe is a flexible R/Bioconductor package originally developed to analyse platform-independent single-cell RNA-Seq data. To expand its preprocessing capability to accommodate new single-cell technologies, we further developed scPipe to handle single-cell ATAC-Seq and multi-modal (RNA-Seq and ATAC-Seq) data. After executing multiple data cleaning steps to remove duplicated reads, low abundance features and cells of poor quality, a SingleCellExperiment object is created that contains a sparse count matrix with features of interest in the rows and cells in the columns. Quality control information (e.g. counts per cell, features per cell, total number of fragments, fraction of fragments per peak) and any relevant feature annotations are stored as metadata. We demonstrate that scPipe can efficiently identify 'true' cells and provides flexibility for the user to fine-tune the quality control thresholds using various feature and cell-based metrics collected during data preprocessing. Researchers can then take advantage of various downstream single-cell tools available in Bioconductor for further analysis of scATAC-Seq data such as dimensionality reduction, clustering, motif enrichment, differential accessibility and cis-regulatory network analysis. The scPipe package enables a complete beginning-to-end pipeline for single-cell ATAC-Seq and RNA-Seq data analysis in R.

2.
Genome Biol ; 22(1): 339, 2021 12 14.
Article in English | MEDLINE | ID: mdl-34906205

ABSTRACT

BACKGROUND: Single-cell RNA-sequencing (scRNA-seq) technologies and associated analysis methods have rapidly developed in recent years. This includes preprocessing methods, which assign sequencing reads to genes to create count matrices for downstream analysis. While several packaged preprocessing workflows have been developed to provide users with convenient tools for handling this process, how they compare to one another and how they influence downstream analysis have not been well studied. RESULTS: Here, we systematically benchmark the performance of 10 end-to-end preprocessing workflows (Cell Ranger, Optimus, salmon alevin, alevin-fry, kallisto bustools, dropSeqPipe, scPipe, zUMIs, celseq2, and scruff) using datasets yielding different biological complexity levels generated by CEL-Seq2 and 10x Chromium platforms. We compare these workflows in terms of their quantification properties directly and their impact on normalization and clustering by evaluating the performance of different method combinations. While the scRNA-seq preprocessing workflows compared vary in their detection and quantification of genes across datasets, after downstream analysis with performant normalization and clustering methods, almost all combinations produce clustering results that agree well with the known cell type labels that provided the ground truth in our analysis. CONCLUSIONS: In summary, the choice of preprocessing method was found to be less important than other steps in the scRNA-seq analysis process. Our study comprehensively compares common scRNA-seq preprocessing workflows and summarizes their characteristics to guide workflow users.


Subject(s)
Benchmarking/methods , Sequence Analysis, RNA/methods , Single-Cell Analysis/methods , Workflow , Cluster Analysis , Gene Expression Profiling/methods , RNA-Seq , Software , Transcriptome
3.
Genome Biol ; 22(1): 310, 2021 11 11.
Article in English | MEDLINE | ID: mdl-34763716

ABSTRACT

A modified Chromium 10x droplet-based protocol that subsamples cells for both short-read and long-read (nanopore) sequencing together with a new computational pipeline (FLAMES) is developed to enable isoform discovery, splicing analysis, and mutation detection in single cells. We identify thousands of unannotated isoforms and find conserved functional modules that are enriched for alternative transcript usage in different cell types and species, including ribosome biogenesis and mRNA splicing. Analysis at the transcript level allows data integration with scATAC-seq on individual promoters, improved correlation with protein expression data, and linked mutations known to confer drug resistance to transcriptome heterogeneity.


Subject(s)
Nanopore Sequencing/methods , Protein Isoforms/genetics , Protein Isoforms/metabolism , Alternative Splicing , Animals , Exons , Gene Expression Profiling/methods , High-Throughput Nucleotide Sequencing , Humans , Mice , RNA Splicing , RNA, Messenger , Transcriptome
4.
PLoS Comput Biol ; 17(10): e1009524, 2021 10.
Article in English | MEDLINE | ID: mdl-34695109

ABSTRACT

A key benefit of long-read nanopore sequencing technology is the ability to detect modified DNA bases, such as 5-methylcytosine. The lack of R/Bioconductor tools for the effective visualization of nanopore methylation profiles between samples from different experimental groups led us to develop the NanoMethViz R package. Our software can handle methylation output generated from a range of different methylation callers and manages large datasets using a compressed data format. To fully explore the methylation patterns in a dataset, NanoMethViz allows plotting of data at various resolutions. At the sample-level, we use dimensionality reduction to look at the relationships between methylation profiles in an unsupervised way. We visualize methylation profiles of classes of features such as genes or CpG islands by scaling them to relative positions and aggregating their profiles. At the finest resolution, we visualize methylation patterns across individual reads along the genome using the spaghetti plot and heatmaps, allowing users to explore particular genes or genomic regions of interest. In summary, our software makes the handling of methylation signal more convenient, expands upon the visualization options for nanopore data and works seamlessly with existing methylation analysis tools available in the Bioconductor project. Our software is available at https://bioconductor.org/packages/NanoMethViz.


Subject(s)
DNA Methylation/genetics , Genomics/methods , Nanopore Sequencing/methods , Sequence Analysis, DNA/methods , Software , Animals , Humans , Mice
5.
Sci Immunol ; 6(60)2021 06 25.
Article in English | MEDLINE | ID: mdl-34172588

ABSTRACT

CD1c presents lipid-based antigens to CD1c-restricted T cells, which are thought to be a major component of the human T cell pool. However, the study of CD1c-restricted T cells is hampered by the presence of an abundantly expressed, non-T cell receptor (TCR) ligand for CD1c on blood cells, confounding analysis of TCR-mediated CD1c tetramer staining. Here, we identified the CD36 family (CD36, SR-B1, and LIMP-2) as ligands for CD1c, CD1b, and CD1d proteins and showed that CD36 is the receptor responsible for non-TCR-mediated CD1c tetramer staining of blood cells. Moreover, CD36 blockade clarified tetramer-based identification of CD1c-restricted T cells and improved identification of CD1b- and CD1d-restricted T cells. We used this technique to characterize CD1c-restricted T cells ex vivo and showed diverse phenotypic features, TCR repertoire, and antigen-specific subsets. Accordingly, this work will enable further studies into the biology of CD1 and human CD1-restricted T cells.


Subject(s)
Antigen Presentation , Antigens, CD1/metabolism , CD36 Antigens/metabolism , Glycoproteins/metabolism , T-Lymphocyte Subsets/immunology , Blood Buffy Coat , CD36 Antigens/antagonists & inhibitors , Healthy Volunteers , Humans , Jurkat Cells , Ligands , Lipids/immunology , Primary Cell Culture , Protein Multimerization , Receptors, Antigen, T-Cell/metabolism , T-Lymphocyte Subsets/metabolism
6.
NAR Genom Bioinform ; 3(2): lqab028, 2021 Jun.
Article in English | MEDLINE | ID: mdl-33937765

ABSTRACT

Application of Oxford Nanopore Technologies' long-read sequencing platform to transcriptomic analysis is increasing in popularity. However, such analysis can be challenging due to the high sequence error and small library sizes, which decreases quantification accuracy and reduces power for statistical testing. Here, we report the analysis of two nanopore RNA-seq datasets with the goal of obtaining gene- and isoform-level differential expression information. A dataset of synthetic, spliced, spike-in RNAs ('sequins') as well as a mouse neural stem cell dataset from samples with a null mutation of the epigenetic regulator Smchd1 was analysed using a mix of long-read specific tools for preprocessing together with established short-read RNA-seq methods for downstream analysis. We used limma-voom to perform differential gene expression analysis, and the novel FLAMES pipeline to perform isoform identification and quantification, followed by DRIMSeq and limma-diffSplice (with stageR) to perform differential transcript usage analysis. We compared results from the sequins dataset to the ground truth, and results of the mouse dataset to a previous short-read study on equivalent samples. Overall, our work shows that transcriptomic analysis of long-read nanopore data using long-read specific preprocessing methods together with short-read differential expression methods and software that are already in wide use can yield meaningful results.

7.
Immunity ; 54(6): 1338-1351.e9, 2021 06 08.
Article in English | MEDLINE | ID: mdl-33862015

ABSTRACT

Despite advances in single-cell multi-omics, a single stem or progenitor cell can only be tested once. We developed clonal multi-omics, in which daughters of a clone act as surrogates of the founder, thereby allowing multiple independent assays per clone. With SIS-seq, clonal siblings in parallel "sister" assays are examined either for gene expression by RNA sequencing (RNA-seq) or for fate in culture. We identified, and then validated using CRISPR, genes that controlled fate bias for different dendritic cell (DC) subtypes. This included Bcor as a suppressor of plasmacytoid DC (pDC) and conventional DC type 2 (cDC2) numbers during Flt3 ligand-mediated emergency DC development. We then developed SIS-skew to examine development of wild-type and Bcor-deficient siblings of the same clone in parallel. We found Bcor restricted clonal expansion, especially for cDC2s, and suppressed clonal fate potential, especially for pDCs. Therefore, SIS-seq and SIS-skew can reveal the molecular and cellular mechanisms governing clonal fate.


Subject(s)
Dendritic Cells/metabolism , Proto-Oncogene Proteins/genetics , Proto-Oncogene Proteins/metabolism , Repressor Proteins/genetics , Repressor Proteins/metabolism , Animals , Cell Differentiation/genetics , Cell Line , Cell Lineage/genetics , Female , Gene Expression/genetics , HEK293 Cells , Humans , Male , Membrane Proteins/genetics , Membrane Proteins/metabolism , Mice, Inbred C57BL , Stem Cells/metabolism
8.
NAR Genom Bioinform ; 3(4): lqab116, 2021 Dec.
Article in English | MEDLINE | ID: mdl-34988439

ABSTRACT

Glimma 1.0 introduced intuitive, point-and-click interactive graphics for differential gene expression analysis. Here, we present a major update to Glimma that brings improved interactivity and reproducibility using high-level visualization frameworks for R and JavaScript. Glimma 2.0 plots are now readily embeddable in R Markdown, thus allowing users to create reproducible reports containing interactive graphics. The revamped multidimensional scaling plot features dashboard-style controls allowing the user to dynamically change the colour, shape and size of sample points according to different experimental conditions. Interactivity was enhanced in the MA-style plot for comparing differences to average expression, which now supports selecting multiple genes, export options to PNG, SVG or CSV formats and includes a new volcano plot function. Feature-rich and user-friendly, Glimma makes exploring data for gene expression analysis more accessible and intuitive and is available on Bioconductor and GitHub.

9.
Nat Commun ; 11(1): 2420, 2020 05 15.
Article in English | MEDLINE | ID: mdl-32415101

ABSTRACT

Archetypal human pluripotent stem cells (hPSC) are widely considered to be equivalent in developmental status to mouse epiblast stem cells, which correspond to pluripotent cells at a late post-implantation stage of embryogenesis. Heterogeneity within hPSC cultures complicates this interspecies comparison. Here we show that a subpopulation of archetypal hPSC enriched for high self-renewal capacity (ESR) has distinct properties relative to the bulk of the population, including a cell cycle with a very low G1 fraction and a metabolomic profile that reflects a combination of oxidative phosphorylation and glycolysis. ESR cells are pluripotent and capable of differentiation into primordial germ cell-like cells. Global DNA methylation levels in the ESR subpopulation are lower than those in mouse epiblast stem cells. Chromatin accessibility analysis revealed a unique set of open chromatin sites in ESR cells. RNA-seq at the subpopulation and single cell levels shows that, unlike mouse epiblast stem cells, the ESR subset of hPSC displays no lineage priming, and that it can be clearly distinguished from gastrulating and extraembryonic cell populations in the primate embryo. ESR hPSC correspond to an earlier stage of post-implantation development than mouse epiblast stem cells.


Subject(s)
Embryonic Stem Cells/cytology , Germ Layers/cytology , Pluripotent Stem Cells/cytology , Animals , Cell Differentiation , Chromatin/metabolism , DNA Methylation , Epigenome , Flow Cytometry , Fluorescent Antibody Technique, Indirect , G1 Phase , Germ Layers/metabolism , Glycolysis , Humans , MAP Kinase Signaling System , Metabolomics , Mice , Mitochondria/metabolism , Oxidative Phosphorylation , RNA-Seq , Signal Transduction
10.
Genome Biol ; 21(1): 30, 2020 02 07.
Article in English | MEDLINE | ID: mdl-32033565

ABSTRACT

Long-read technologies are overcoming early limitations in accuracy and throughput, broadening their application domains in genomics. Dedicated analysis tools that take into account the characteristics of long-read data are thus required, but the fast pace of development of such tools can be overwhelming. To assist in the design and analysis of long-read sequencing projects, we review the current landscape of available tools and present an online interactive database, long-read-tools.org, to facilitate their browsing. We further focus on the principles of error correction, base modification detection, and long-read transcriptomics analysis and highlight the challenges that remain.


Subject(s)
Genomics/methods , Nanopore Sequencing/methods , Whole Genome Sequencing/methods , Animals , Data Science/methods , Data Science/standards , Genomics/standards , Humans , Nanopore Sequencing/standards , Whole Genome Sequencing/standards
11.
NAR Genom Bioinform ; 2(3): lqaa073, 2020 Sep.
Article in English | MEDLINE | ID: mdl-33575621

ABSTRACT

RNA-seq datasets can contain millions of intron reads per library that are typically removed from downstream analysis. Only reads overlapping annotated exons are considered to be informative since mature mRNA is assumed to be the major component sequenced, especially for poly(A) RNA libraries. In this study, we show that intron reads are informative, and through exploratory data analysis of read coverage that intron signal is representative of both pre-mRNAs and intron retention. We demonstrate how intron reads can be utilized in differential expression analysis using our index method where a unique set of differentially expressed genes can be detected using intron counts. In exploring read coverage, we also developed the superintronic software that quickly and robustly calculates user-defined summary statistics for exonic and intronic regions. Across multiple datasets, superintronic enabled us to identify several genes with distinctly retained introns that had similar coverage levels to that of neighbouring exons. The work and ideas presented in this paper is the first of its kind to consider multiple biological sources for intron reads through exploratory data analysis, minimizing bias in discovery and interpretation of results. Our findings open up possibilities for further methods development for intron reads and RNA-seq data in general.

12.
Bioinformatics ; 36(7): 2288-2290, 2020 04 01.
Article in English | MEDLINE | ID: mdl-31778143

ABSTRACT

MOTIVATION: Bioinformatic analysis of single-cell gene expression data is a rapidly evolving field. Hundreds of bespoke methods have been developed in the past few years to deal with various aspects of single-cell analysis and consensus on the most appropriate methods to use under different settings is still emerging. Benchmarking the many methods is therefore of critical importance and since analysis of single-cell data usually involves multi-step pipelines, effective evaluation of pipelines involving different combinations of methods is required. Current benchmarks of single-cell methods are mostly implemented with ad-hoc code that is often difficult to reproduce or extend, and exhaustive manual coding of many combinations is infeasible in most instances. Therefore, new software is needed to manage pipeline benchmarking. RESULTS: The CellBench R software facilitates method comparisons in either a task-centric or combinatorial way to allow pipelines of methods to be evaluated in an effective manner. CellBench automatically runs combinations of methods, provides facilities for measuring running time and delivers output in tabular form which is highly compatible with tidyverse R packages for summary and visualization. Our software has enabled comprehensive benchmarking of single-cell RNA-seq normalization, imputation, clustering, trajectory analysis and data integration methods using various performance metrics obtained from data with available ground truth. CellBench is also amenable to benchmarking other bioinformatics analysis tasks. AVAILABILITY AND IMPLEMENTATION: Available from https://bioconductor.org/packages/CellBench.


Subject(s)
RNA-Seq , Single-Cell Analysis , Computational Biology , Sequence Analysis, RNA , Software , Exome Sequencing
13.
Cell Stem Cell ; 25(2): 258-272.e9, 2019 08 01.
Article in English | MEDLINE | ID: mdl-31374198

ABSTRACT

Tumors are composed of phenotypically heterogeneous cancer cells that often resemble various differentiation states of their lineage of origin. Within this hierarchy, it is thought that an immature subpopulation of tumor-propagating cancer stem cells (CSCs) differentiates into non-tumorigenic progeny, providing a rationale for therapeutic strategies that specifically eradicate CSCs or induce their differentiation. The clinical success of these approaches depends on CSC differentiation being unidirectional rather than reversible, yet this question remains unresolved even in prototypically hierarchical malignancies, such as acute myeloid leukemia (AML). Here, we show in murine and human models of AML that, upon perturbation of endogenous expression of the lineage-determining transcription factor PU.1 or withdrawal of established differentiation therapies, some mature leukemia cells can de-differentiate and reacquire clonogenic and leukemogenic properties. Our results reveal plasticity of CSC maturation in AML, highlighting the need to therapeutically eradicate cancer cells across a range of differentiation states.


Subject(s)
Cell Differentiation/physiology , Cell Transdifferentiation/physiology , Leukemia, Myeloid, Acute/pathology , Neoplastic Stem Cells/physiology , Proto-Oncogene Proteins/metabolism , Trans-Activators/metabolism , Animals , Carcinogenesis , Cell Plasticity , Cells, Cultured , Humans , Leukemia, Myeloid, Acute/metabolism , Mice , Proto-Oncogene Proteins/genetics , Trans-Activators/genetics , Tretinoin/metabolism
14.
F1000Res ; 8: 752, 2019.
Article in English | MEDLINE | ID: mdl-31249680

ABSTRACT

Motivation: The Bioconductor project, a large collection of open source software for the comprehension of large-scale biological data, continues to grow with new packages added each week, motivating the development of software tools focused on exposing package metadata to developers and users. The resulting BiocPkgTools package facilitates access to extensive metadata in computable form covering the Bioconductor package ecosystem, facilitating downstream applications such as custom reporting, data and text mining of Bioconductor package text descriptions, graph analytics over package dependencies, and custom search approaches. Results: The BiocPkgTools package has been incorporated into the Bioconductor project, installs using standard procedures, and runs on any system supporting R. It provides functions to load detailed package metadata, longitudinal package download statistics, package dependencies, and Bioconductor build reports, all in "tidy data" form. BiocPkgTools can convert from tidy data structures to graph structures, enabling graph-based analytics and visualization. An end-user-friendly graphical package explorer aids in task-centric package discovery. Full documentation and example use cases are included. Availability: The BiocPkgTools software and complete documentation are available from Bioconductor ( https://bioconductor.org/packages/BiocPkgTools).


Subject(s)
Data Mining , Software , Metadata
15.
Nat Methods ; 16(6): 479-487, 2019 06.
Article in English | MEDLINE | ID: mdl-31133762

ABSTRACT

Single cell RNA-sequencing (scRNA-seq) technology has undergone rapid development in recent years, leading to an explosion in the number of tailored data analysis methods. However, the current lack of gold-standard benchmark datasets makes it difficult for researchers to systematically compare the performance of the many methods available. Here, we generated a realistic benchmark experiment that included single cells and admixtures of cells or RNA to create 'pseudo cells' from up to five distinct cancer cell lines. In total, 14 datasets were generated using both droplet and plate-based scRNA-seq protocols. We compared 3,913 combinations of data analysis methods for tasks ranging from normalization and imputation to clustering, trajectory analysis and data integration. Evaluation revealed pipelines suited to different types of data for different tasks. Our data and analysis provide a comprehensive framework for benchmarking most common scRNA-seq analysis steps.


Subject(s)
Adenocarcinoma/genetics , Benchmarking , Computational Biology/methods , High-Throughput Nucleotide Sequencing/methods , Lung Neoplasms/genetics , Sequence Analysis, RNA/methods , Single-Cell Analysis/methods , Humans , Software , Tumor Cells, Cultured
16.
PLoS Comput Biol ; 14(8): e1006361, 2018 08.
Article in English | MEDLINE | ID: mdl-30096152

ABSTRACT

Single-cell RNA sequencing (scRNA-seq) technology allows researchers to profile the transcriptomes of thousands of cells simultaneously. Protocols that incorporate both designed and random barcodes have greatly increased the throughput of scRNA-seq, but give rise to a more complex data structure. There is a need for new tools that can handle the various barcoding strategies used by different protocols and exploit this information for quality assessment at the sample-level and provide effective visualization of these results in preparation for higher-level analyses. To this end, we developed scPipe, an R/Bioconductor package that integrates barcode demultiplexing, read alignment, UMI-aware gene-level quantification and quality control of raw sequencing data generated by multiple protocols that include CEL-seq, MARS-seq, Chromium 10X, Drop-seq and Smart-seq. scPipe produces a count matrix that is essential for downstream analysis along with an HTML report that summarises data quality. These results can be used as input for downstream analyses including normalization, visualization and statistical testing. scPipe performs this processing in a few simple R commands, promoting reproducible analysis of single-cell data that is compatible with the emerging suite of open-source scRNA-seq analysis tools available in R/Bioconductor and beyond. The scPipe R package is available for download from https://www.bioconductor.org/packages/scPipe.


Subject(s)
Computational Biology/methods , Sequence Analysis, RNA/methods , Single-Cell Analysis/methods , Animals , Base Sequence , High-Throughput Nucleotide Sequencing , Humans , RNA/genetics , Software
17.
Bioinformatics ; 33(13): 2050-2052, 2017 Jul 01.
Article in English | MEDLINE | ID: mdl-28203714

ABSTRACT

MOTIVATION: graphics for RNA-sequencing and microarray gene expression analyses may contain upwards of tens of thousands of points. Details about certain genes or samples of interest are easily obscured in such dense summary displays. Incorporating interactivity into summary plots would enable additional information to be displayed on demand and facilitate intuitive data exploration. RESULTS: The open-source Glimma package creates interactive graphics for exploring gene expression analysis with a few simple R commands. It extends popular plots found in the limma package, such as multi-dimensional scaling plots and mean-difference plots, to allow individual data points to be queried and additional annotation information to be displayed upon hovering or selecting particular points. It also offers links between plots so that more information can be revealed on demand. Glimma is widely applicable, supporting data analyses from a number of well-established Bioconductor workflows ( limma , edgeR and DESeq2 ) and uses D3/JavaScript to produce HTML pages with interactive displays that enable more effective data exploration by end-users. Results from Glimma can be easily shared between bioinformaticians and biologists, enhancing reporting capabilities while maintaining reproducibility. AVAILABILITY AND IMPLEMENTATION: The Glimma R package is available from http://bioconductor.org/packages/Glimma/ . CONTACT: su.s@wehi.edu.au , law@wehi.edu.au or mritchie@wehi.edu.au.


Subject(s)
Gene Expression Profiling/methods , Sequence Analysis, RNA/methods , Software , Animals , Mice
18.
Cell Rep ; 17(2): 436-447, 2016 10 04.
Article in English | MEDLINE | ID: mdl-27705792

ABSTRACT

Innate lymphoid cells (ILCs) are enriched at mucosal surfaces, where they provide immune surveillance. All ILC subsets develop from a common progenitor that gives rise to pre-committed progenitors for each of the ILC lineages. Currently, the temporal control of gene expression that guides the emergence of these progenitors is poorly understood. We used global transcriptional mapping to analyze gene expression in different ILC progenitors. We identified PD-1 to be specifically expressed in PLZF+ ILCp and revealed that the timing and order of expression of the transcription factors NFIL3, ID2, and TCF-1 was critical. Importantly, induction of ILC lineage commitment required only transient expression of NFIL3 prior to ID2 and TCF-1 expression. These findings highlight the importance of the temporal program that permits commitment of progenitors to the ILC lineage, and they expand our understanding of the core transcriptional program by identifying potential regulators of ILC development.


Subject(s)
Basic-Leucine Zipper Transcription Factors/genetics , Hepatocyte Nuclear Factor 1-alpha/genetics , Immunity, Innate/immunology , Lymphocytes/immunology , Programmed Cell Death 1 Receptor/genetics , Animals , Basic-Leucine Zipper Transcription Factors/immunology , Bone Marrow Cells/immunology , Cell Differentiation/genetics , Cell Differentiation/immunology , Cell Lineage/immunology , Gene Expression Regulation , Hepatocyte Nuclear Factor 1-alpha/immunology , Immunity, Innate/genetics , Killer Cells, Natural/immunology , Mice , Programmed Cell Death 1 Receptor/immunology , Transcription Factors/genetics , Transcription Factors/immunology
19.
F1000Res ; 52016.
Article in English | MEDLINE | ID: mdl-27441086

ABSTRACT

The ability to easily and efficiently analyse RNA-sequencing data is a key strength of the Bioconductor project. Starting with counts summarised at the gene-level, a typical analysis involves pre-processing, exploratory data analysis, differential expression testing and pathway analysis with the results obtained informing future experiments and validation studies. In this workflow article, we analyse RNA-sequencing data from the mouse mammary gland, demonstrating use of the popular edgeR package to import, organise, filter and normalise the data, followed by the limma package with its voom method, linear modelling and empirical Bayes moderation to assess differential expression and perform gene set testing. This pipeline is further enhanced by the Glimma package which enables interactive exploration of the results so that individual samples and genes can be examined by the user. The complete analysis offered by these three packages highlights the ease with which researchers can turn the raw counts from an RNA-sequencing experiment into biological insights using Bioconductor.

20.
Nucleic Acids Res ; 43(15): e97, 2015 Sep 03.
Article in English | MEDLINE | ID: mdl-25925576

ABSTRACT

Variations in sample quality are frequently encountered in small RNA-sequencing experiments, and pose a major challenge in a differential expression analysis. Removal of high variation samples reduces noise, but at a cost of reducing power, thus limiting our ability to detect biologically meaningful changes. Similarly, retaining these samples in the analysis may not reveal any statistically significant changes due to the higher noise level. A compromise is to use all available data, but to down-weight the observations from more variable samples. We describe a statistical approach that facilitates this by modelling heterogeneity at both the sample and observational levels as part of the differential expression analysis. At the sample level this is achieved by fitting a log-linear variance model that includes common sample-specific or group-specific parameters that are shared between genes. The estimated sample variance factors are then converted to weights and combined with observational level weights obtained from the mean-variance relationship of the log-counts-per-million using 'voom'. A comprehensive analysis involving both simulations and experimental RNA-sequencing data demonstrates that this strategy leads to a universally more powerful analysis and fewer false discoveries when compared to conventional approaches. This methodology has wide application and is implemented in the open-source 'limma' package.


Subject(s)
Gene Expression Profiling/methods , Sequence Analysis, RNA/methods , Animals , Cell Line, Tumor , Chromosomal Proteins, Non-Histone/genetics , Humans , Linear Models , Mice , Reproducibility of Results
SELECTION OF CITATIONS
SEARCH DETAIL
...