Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 22
Filtrar
1.
NAR Genom Bioinform ; 5(4): lqad105, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-38046273

RESUMEN

scPipe is a flexible R/Bioconductor package originally developed to analyse platform-independent single-cell RNA-Seq data. To expand its preprocessing capability to accommodate new single-cell technologies, we further developed scPipe to handle single-cell ATAC-Seq and multi-modal (RNA-Seq and ATAC-Seq) data. After executing multiple data cleaning steps to remove duplicated reads, low abundance features and cells of poor quality, a SingleCellExperiment object is created that contains a sparse count matrix with features of interest in the rows and cells in the columns. Quality control information (e.g. counts per cell, features per cell, total number of fragments, fraction of fragments per peak) and any relevant feature annotations are stored as metadata. We demonstrate that scPipe can efficiently identify 'true' cells and provides flexibility for the user to fine-tune the quality control thresholds using various feature and cell-based metrics collected during data preprocessing. Researchers can then take advantage of various downstream single-cell tools available in Bioconductor for further analysis of scATAC-Seq data such as dimensionality reduction, clustering, motif enrichment, differential accessibility and cis-regulatory network analysis. The scPipe package enables a complete beginning-to-end pipeline for single-cell ATAC-Seq and RNA-Seq data analysis in R.

2.
Genome Biol ; 22(1): 339, 2021 12 14.
Artículo en Inglés | MEDLINE | ID: mdl-34906205

RESUMEN

BACKGROUND: Single-cell RNA-sequencing (scRNA-seq) technologies and associated analysis methods have rapidly developed in recent years. This includes preprocessing methods, which assign sequencing reads to genes to create count matrices for downstream analysis. While several packaged preprocessing workflows have been developed to provide users with convenient tools for handling this process, how they compare to one another and how they influence downstream analysis have not been well studied. RESULTS: Here, we systematically benchmark the performance of 10 end-to-end preprocessing workflows (Cell Ranger, Optimus, salmon alevin, alevin-fry, kallisto bustools, dropSeqPipe, scPipe, zUMIs, celseq2, and scruff) using datasets yielding different biological complexity levels generated by CEL-Seq2 and 10x Chromium platforms. We compare these workflows in terms of their quantification properties directly and their impact on normalization and clustering by evaluating the performance of different method combinations. While the scRNA-seq preprocessing workflows compared vary in their detection and quantification of genes across datasets, after downstream analysis with performant normalization and clustering methods, almost all combinations produce clustering results that agree well with the known cell type labels that provided the ground truth in our analysis. CONCLUSIONS: In summary, the choice of preprocessing method was found to be less important than other steps in the scRNA-seq analysis process. Our study comprehensively compares common scRNA-seq preprocessing workflows and summarizes their characteristics to guide workflow users.


Asunto(s)
Benchmarking/métodos , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Flujo de Trabajo , Análisis por Conglomerados , Perfilación de la Expresión Génica/métodos , RNA-Seq , Programas Informáticos , Transcriptoma
3.
Genome Biol ; 22(1): 310, 2021 11 11.
Artículo en Inglés | MEDLINE | ID: mdl-34763716

RESUMEN

A modified Chromium 10x droplet-based protocol that subsamples cells for both short-read and long-read (nanopore) sequencing together with a new computational pipeline (FLAMES) is developed to enable isoform discovery, splicing analysis, and mutation detection in single cells. We identify thousands of unannotated isoforms and find conserved functional modules that are enriched for alternative transcript usage in different cell types and species, including ribosome biogenesis and mRNA splicing. Analysis at the transcript level allows data integration with scATAC-seq on individual promoters, improved correlation with protein expression data, and linked mutations known to confer drug resistance to transcriptome heterogeneity.


Asunto(s)
Secuenciación de Nanoporos/métodos , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Empalme Alternativo , Animales , Exones , Perfilación de la Expresión Génica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Ratones , Empalme del ARN , ARN Mensajero , Transcriptoma
4.
PLoS Comput Biol ; 17(10): e1009524, 2021 10.
Artículo en Inglés | MEDLINE | ID: mdl-34695109

RESUMEN

A key benefit of long-read nanopore sequencing technology is the ability to detect modified DNA bases, such as 5-methylcytosine. The lack of R/Bioconductor tools for the effective visualization of nanopore methylation profiles between samples from different experimental groups led us to develop the NanoMethViz R package. Our software can handle methylation output generated from a range of different methylation callers and manages large datasets using a compressed data format. To fully explore the methylation patterns in a dataset, NanoMethViz allows plotting of data at various resolutions. At the sample-level, we use dimensionality reduction to look at the relationships between methylation profiles in an unsupervised way. We visualize methylation profiles of classes of features such as genes or CpG islands by scaling them to relative positions and aggregating their profiles. At the finest resolution, we visualize methylation patterns across individual reads along the genome using the spaghetti plot and heatmaps, allowing users to explore particular genes or genomic regions of interest. In summary, our software makes the handling of methylation signal more convenient, expands upon the visualization options for nanopore data and works seamlessly with existing methylation analysis tools available in the Bioconductor project. Our software is available at https://bioconductor.org/packages/NanoMethViz.


Asunto(s)
Metilación de ADN/genética , Genómica/métodos , Secuenciación de Nanoporos/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Animales , Humanos , Ratones
5.
Sci Immunol ; 6(60)2021 06 25.
Artículo en Inglés | MEDLINE | ID: mdl-34172588

RESUMEN

CD1c presents lipid-based antigens to CD1c-restricted T cells, which are thought to be a major component of the human T cell pool. However, the study of CD1c-restricted T cells is hampered by the presence of an abundantly expressed, non-T cell receptor (TCR) ligand for CD1c on blood cells, confounding analysis of TCR-mediated CD1c tetramer staining. Here, we identified the CD36 family (CD36, SR-B1, and LIMP-2) as ligands for CD1c, CD1b, and CD1d proteins and showed that CD36 is the receptor responsible for non-TCR-mediated CD1c tetramer staining of blood cells. Moreover, CD36 blockade clarified tetramer-based identification of CD1c-restricted T cells and improved identification of CD1b- and CD1d-restricted T cells. We used this technique to characterize CD1c-restricted T cells ex vivo and showed diverse phenotypic features, TCR repertoire, and antigen-specific subsets. Accordingly, this work will enable further studies into the biology of CD1 and human CD1-restricted T cells.


Asunto(s)
Presentación de Antígeno , Antígenos CD1/metabolismo , Antígenos CD36/metabolismo , Glicoproteínas/metabolismo , Subgrupos de Linfocitos T/inmunología , Capa Leucocitaria de la Sangre , Antígenos CD36/antagonistas & inhibidores , Voluntarios Sanos , Humanos , Células Jurkat , Ligandos , Lípidos/inmunología , Cultivo Primario de Células , Multimerización de Proteína , Receptores de Antígenos de Linfocitos T/metabolismo , Subgrupos de Linfocitos T/metabolismo
6.
NAR Genom Bioinform ; 3(2): lqab028, 2021 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-33937765

RESUMEN

Application of Oxford Nanopore Technologies' long-read sequencing platform to transcriptomic analysis is increasing in popularity. However, such analysis can be challenging due to the high sequence error and small library sizes, which decreases quantification accuracy and reduces power for statistical testing. Here, we report the analysis of two nanopore RNA-seq datasets with the goal of obtaining gene- and isoform-level differential expression information. A dataset of synthetic, spliced, spike-in RNAs ('sequins') as well as a mouse neural stem cell dataset from samples with a null mutation of the epigenetic regulator Smchd1 was analysed using a mix of long-read specific tools for preprocessing together with established short-read RNA-seq methods for downstream analysis. We used limma-voom to perform differential gene expression analysis, and the novel FLAMES pipeline to perform isoform identification and quantification, followed by DRIMSeq and limma-diffSplice (with stageR) to perform differential transcript usage analysis. We compared results from the sequins dataset to the ground truth, and results of the mouse dataset to a previous short-read study on equivalent samples. Overall, our work shows that transcriptomic analysis of long-read nanopore data using long-read specific preprocessing methods together with short-read differential expression methods and software that are already in wide use can yield meaningful results.

7.
Immunity ; 54(6): 1338-1351.e9, 2021 06 08.
Artículo en Inglés | MEDLINE | ID: mdl-33862015

RESUMEN

Despite advances in single-cell multi-omics, a single stem or progenitor cell can only be tested once. We developed clonal multi-omics, in which daughters of a clone act as surrogates of the founder, thereby allowing multiple independent assays per clone. With SIS-seq, clonal siblings in parallel "sister" assays are examined either for gene expression by RNA sequencing (RNA-seq) or for fate in culture. We identified, and then validated using CRISPR, genes that controlled fate bias for different dendritic cell (DC) subtypes. This included Bcor as a suppressor of plasmacytoid DC (pDC) and conventional DC type 2 (cDC2) numbers during Flt3 ligand-mediated emergency DC development. We then developed SIS-skew to examine development of wild-type and Bcor-deficient siblings of the same clone in parallel. We found Bcor restricted clonal expansion, especially for cDC2s, and suppressed clonal fate potential, especially for pDCs. Therefore, SIS-seq and SIS-skew can reveal the molecular and cellular mechanisms governing clonal fate.


Asunto(s)
Células Dendríticas/metabolismo , Proteínas Proto-Oncogénicas/genética , Proteínas Proto-Oncogénicas/metabolismo , Proteínas Represoras/genética , Proteínas Represoras/metabolismo , Animales , Diferenciación Celular/genética , Línea Celular , Linaje de la Célula/genética , Femenino , Expresión Génica/genética , Células HEK293 , Humanos , Masculino , Proteínas de la Membrana/genética , Proteínas de la Membrana/metabolismo , Ratones Endogámicos C57BL , Células Madre/metabolismo
8.
NAR Genom Bioinform ; 3(4): lqab116, 2021 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-34988439

RESUMEN

Glimma 1.0 introduced intuitive, point-and-click interactive graphics for differential gene expression analysis. Here, we present a major update to Glimma that brings improved interactivity and reproducibility using high-level visualization frameworks for R and JavaScript. Glimma 2.0 plots are now readily embeddable in R Markdown, thus allowing users to create reproducible reports containing interactive graphics. The revamped multidimensional scaling plot features dashboard-style controls allowing the user to dynamically change the colour, shape and size of sample points according to different experimental conditions. Interactivity was enhanced in the MA-style plot for comparing differences to average expression, which now supports selecting multiple genes, export options to PNG, SVG or CSV formats and includes a new volcano plot function. Feature-rich and user-friendly, Glimma makes exploring data for gene expression analysis more accessible and intuitive and is available on Bioconductor and GitHub.

9.
Nat Commun ; 11(1): 2420, 2020 05 15.
Artículo en Inglés | MEDLINE | ID: mdl-32415101

RESUMEN

Archetypal human pluripotent stem cells (hPSC) are widely considered to be equivalent in developmental status to mouse epiblast stem cells, which correspond to pluripotent cells at a late post-implantation stage of embryogenesis. Heterogeneity within hPSC cultures complicates this interspecies comparison. Here we show that a subpopulation of archetypal hPSC enriched for high self-renewal capacity (ESR) has distinct properties relative to the bulk of the population, including a cell cycle with a very low G1 fraction and a metabolomic profile that reflects a combination of oxidative phosphorylation and glycolysis. ESR cells are pluripotent and capable of differentiation into primordial germ cell-like cells. Global DNA methylation levels in the ESR subpopulation are lower than those in mouse epiblast stem cells. Chromatin accessibility analysis revealed a unique set of open chromatin sites in ESR cells. RNA-seq at the subpopulation and single cell levels shows that, unlike mouse epiblast stem cells, the ESR subset of hPSC displays no lineage priming, and that it can be clearly distinguished from gastrulating and extraembryonic cell populations in the primate embryo. ESR hPSC correspond to an earlier stage of post-implantation development than mouse epiblast stem cells.


Asunto(s)
Células Madre Embrionarias/citología , Estratos Germinativos/citología , Células Madre Pluripotentes/citología , Animales , Diferenciación Celular , Cromatina/metabolismo , Metilación de ADN , Epigenoma , Citometría de Flujo , Técnica del Anticuerpo Fluorescente Indirecta , Fase G1 , Estratos Germinativos/metabolismo , Glucólisis , Humanos , Sistema de Señalización de MAP Quinasas , Metabolómica , Ratones , Mitocondrias/metabolismo , Fosforilación Oxidativa , RNA-Seq , Transducción de Señal
10.
Genome Biol ; 21(1): 30, 2020 02 07.
Artículo en Inglés | MEDLINE | ID: mdl-32033565

RESUMEN

Long-read technologies are overcoming early limitations in accuracy and throughput, broadening their application domains in genomics. Dedicated analysis tools that take into account the characteristics of long-read data are thus required, but the fast pace of development of such tools can be overwhelming. To assist in the design and analysis of long-read sequencing projects, we review the current landscape of available tools and present an online interactive database, long-read-tools.org, to facilitate their browsing. We further focus on the principles of error correction, base modification detection, and long-read transcriptomics analysis and highlight the challenges that remain.


Asunto(s)
Genómica/métodos , Secuenciación de Nanoporos/métodos , Secuenciación Completa del Genoma/métodos , Animales , Ciencia de los Datos/métodos , Ciencia de los Datos/normas , Genómica/normas , Humanos , Secuenciación de Nanoporos/normas , Secuenciación Completa del Genoma/normas
11.
NAR Genom Bioinform ; 2(3): lqaa073, 2020 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-33575621

RESUMEN

RNA-seq datasets can contain millions of intron reads per library that are typically removed from downstream analysis. Only reads overlapping annotated exons are considered to be informative since mature mRNA is assumed to be the major component sequenced, especially for poly(A) RNA libraries. In this study, we show that intron reads are informative, and through exploratory data analysis of read coverage that intron signal is representative of both pre-mRNAs and intron retention. We demonstrate how intron reads can be utilized in differential expression analysis using our index method where a unique set of differentially expressed genes can be detected using intron counts. In exploring read coverage, we also developed the superintronic software that quickly and robustly calculates user-defined summary statistics for exonic and intronic regions. Across multiple datasets, superintronic enabled us to identify several genes with distinctly retained introns that had similar coverage levels to that of neighbouring exons. The work and ideas presented in this paper is the first of its kind to consider multiple biological sources for intron reads through exploratory data analysis, minimizing bias in discovery and interpretation of results. Our findings open up possibilities for further methods development for intron reads and RNA-seq data in general.

12.
Bioinformatics ; 36(7): 2288-2290, 2020 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-31778143

RESUMEN

MOTIVATION: Bioinformatic analysis of single-cell gene expression data is a rapidly evolving field. Hundreds of bespoke methods have been developed in the past few years to deal with various aspects of single-cell analysis and consensus on the most appropriate methods to use under different settings is still emerging. Benchmarking the many methods is therefore of critical importance and since analysis of single-cell data usually involves multi-step pipelines, effective evaluation of pipelines involving different combinations of methods is required. Current benchmarks of single-cell methods are mostly implemented with ad-hoc code that is often difficult to reproduce or extend, and exhaustive manual coding of many combinations is infeasible in most instances. Therefore, new software is needed to manage pipeline benchmarking. RESULTS: The CellBench R software facilitates method comparisons in either a task-centric or combinatorial way to allow pipelines of methods to be evaluated in an effective manner. CellBench automatically runs combinations of methods, provides facilities for measuring running time and delivers output in tabular form which is highly compatible with tidyverse R packages for summary and visualization. Our software has enabled comprehensive benchmarking of single-cell RNA-seq normalization, imputation, clustering, trajectory analysis and data integration methods using various performance metrics obtained from data with available ground truth. CellBench is also amenable to benchmarking other bioinformatics analysis tasks. AVAILABILITY AND IMPLEMENTATION: Available from https://bioconductor.org/packages/CellBench.


Asunto(s)
RNA-Seq , Análisis de la Célula Individual , Biología Computacional , Análisis de Secuencia de ARN , Programas Informáticos , Secuenciación del Exoma
13.
Cell Stem Cell ; 25(2): 258-272.e9, 2019 08 01.
Artículo en Inglés | MEDLINE | ID: mdl-31374198

RESUMEN

Tumors are composed of phenotypically heterogeneous cancer cells that often resemble various differentiation states of their lineage of origin. Within this hierarchy, it is thought that an immature subpopulation of tumor-propagating cancer stem cells (CSCs) differentiates into non-tumorigenic progeny, providing a rationale for therapeutic strategies that specifically eradicate CSCs or induce their differentiation. The clinical success of these approaches depends on CSC differentiation being unidirectional rather than reversible, yet this question remains unresolved even in prototypically hierarchical malignancies, such as acute myeloid leukemia (AML). Here, we show in murine and human models of AML that, upon perturbation of endogenous expression of the lineage-determining transcription factor PU.1 or withdrawal of established differentiation therapies, some mature leukemia cells can de-differentiate and reacquire clonogenic and leukemogenic properties. Our results reveal plasticity of CSC maturation in AML, highlighting the need to therapeutically eradicate cancer cells across a range of differentiation states.


Asunto(s)
Diferenciación Celular/fisiología , Transdiferenciación Celular/fisiología , Leucemia Mieloide Aguda/patología , Células Madre Neoplásicas/fisiología , Proteínas Proto-Oncogénicas/metabolismo , Transactivadores/metabolismo , Animales , Carcinogénesis , Plasticidad de la Célula , Células Cultivadas , Humanos , Leucemia Mieloide Aguda/metabolismo , Ratones , Proteínas Proto-Oncogénicas/genética , Transactivadores/genética , Tretinoina/metabolismo
14.
F1000Res ; 8: 752, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31249680

RESUMEN

Motivation: The Bioconductor project, a large collection of open source software for the comprehension of large-scale biological data, continues to grow with new packages added each week, motivating the development of software tools focused on exposing package metadata to developers and users. The resulting BiocPkgTools package facilitates access to extensive metadata in computable form covering the Bioconductor package ecosystem, facilitating downstream applications such as custom reporting, data and text mining of Bioconductor package text descriptions, graph analytics over package dependencies, and custom search approaches. Results: The BiocPkgTools package has been incorporated into the Bioconductor project, installs using standard procedures, and runs on any system supporting R. It provides functions to load detailed package metadata, longitudinal package download statistics, package dependencies, and Bioconductor build reports, all in "tidy data" form. BiocPkgTools can convert from tidy data structures to graph structures, enabling graph-based analytics and visualization. An end-user-friendly graphical package explorer aids in task-centric package discovery. Full documentation and example use cases are included. Availability: The BiocPkgTools software and complete documentation are available from Bioconductor ( https://bioconductor.org/packages/BiocPkgTools).


Asunto(s)
Minería de Datos , Programas Informáticos , Metadatos
15.
Nat Methods ; 16(6): 479-487, 2019 06.
Artículo en Inglés | MEDLINE | ID: mdl-31133762

RESUMEN

Single cell RNA-sequencing (scRNA-seq) technology has undergone rapid development in recent years, leading to an explosion in the number of tailored data analysis methods. However, the current lack of gold-standard benchmark datasets makes it difficult for researchers to systematically compare the performance of the many methods available. Here, we generated a realistic benchmark experiment that included single cells and admixtures of cells or RNA to create 'pseudo cells' from up to five distinct cancer cell lines. In total, 14 datasets were generated using both droplet and plate-based scRNA-seq protocols. We compared 3,913 combinations of data analysis methods for tasks ranging from normalization and imputation to clustering, trajectory analysis and data integration. Evaluation revealed pipelines suited to different types of data for different tasks. Our data and analysis provide a comprehensive framework for benchmarking most common scRNA-seq analysis steps.


Asunto(s)
Adenocarcinoma/genética , Benchmarking , Biología Computacional/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Neoplasias Pulmonares/genética , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Humanos , Programas Informáticos , Células Tumorales Cultivadas
16.
PLoS Comput Biol ; 14(8): e1006361, 2018 08.
Artículo en Inglés | MEDLINE | ID: mdl-30096152

RESUMEN

Single-cell RNA sequencing (scRNA-seq) technology allows researchers to profile the transcriptomes of thousands of cells simultaneously. Protocols that incorporate both designed and random barcodes have greatly increased the throughput of scRNA-seq, but give rise to a more complex data structure. There is a need for new tools that can handle the various barcoding strategies used by different protocols and exploit this information for quality assessment at the sample-level and provide effective visualization of these results in preparation for higher-level analyses. To this end, we developed scPipe, an R/Bioconductor package that integrates barcode demultiplexing, read alignment, UMI-aware gene-level quantification and quality control of raw sequencing data generated by multiple protocols that include CEL-seq, MARS-seq, Chromium 10X, Drop-seq and Smart-seq. scPipe produces a count matrix that is essential for downstream analysis along with an HTML report that summarises data quality. These results can be used as input for downstream analyses including normalization, visualization and statistical testing. scPipe performs this processing in a few simple R commands, promoting reproducible analysis of single-cell data that is compatible with the emerging suite of open-source scRNA-seq analysis tools available in R/Bioconductor and beyond. The scPipe R package is available for download from https://www.bioconductor.org/packages/scPipe.


Asunto(s)
Biología Computacional/métodos , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Animales , Secuencia de Bases , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , ARN/genética , Programas Informáticos
17.
Bioinformatics ; 33(13): 2050-2052, 2017 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-28203714

RESUMEN

MOTIVATION: graphics for RNA-sequencing and microarray gene expression analyses may contain upwards of tens of thousands of points. Details about certain genes or samples of interest are easily obscured in such dense summary displays. Incorporating interactivity into summary plots would enable additional information to be displayed on demand and facilitate intuitive data exploration. RESULTS: The open-source Glimma package creates interactive graphics for exploring gene expression analysis with a few simple R commands. It extends popular plots found in the limma package, such as multi-dimensional scaling plots and mean-difference plots, to allow individual data points to be queried and additional annotation information to be displayed upon hovering or selecting particular points. It also offers links between plots so that more information can be revealed on demand. Glimma is widely applicable, supporting data analyses from a number of well-established Bioconductor workflows ( limma , edgeR and DESeq2 ) and uses D3/JavaScript to produce HTML pages with interactive displays that enable more effective data exploration by end-users. Results from Glimma can be easily shared between bioinformaticians and biologists, enhancing reporting capabilities while maintaining reproducibility. AVAILABILITY AND IMPLEMENTATION: The Glimma R package is available from http://bioconductor.org/packages/Glimma/ . CONTACT: su.s@wehi.edu.au , law@wehi.edu.au or mritchie@wehi.edu.au.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Animales , Ratones
18.
Cell Rep ; 17(2): 436-447, 2016 10 04.
Artículo en Inglés | MEDLINE | ID: mdl-27705792

RESUMEN

Innate lymphoid cells (ILCs) are enriched at mucosal surfaces, where they provide immune surveillance. All ILC subsets develop from a common progenitor that gives rise to pre-committed progenitors for each of the ILC lineages. Currently, the temporal control of gene expression that guides the emergence of these progenitors is poorly understood. We used global transcriptional mapping to analyze gene expression in different ILC progenitors. We identified PD-1 to be specifically expressed in PLZF+ ILCp and revealed that the timing and order of expression of the transcription factors NFIL3, ID2, and TCF-1 was critical. Importantly, induction of ILC lineage commitment required only transient expression of NFIL3 prior to ID2 and TCF-1 expression. These findings highlight the importance of the temporal program that permits commitment of progenitors to the ILC lineage, and they expand our understanding of the core transcriptional program by identifying potential regulators of ILC development.


Asunto(s)
Factores de Transcripción con Cremalleras de Leucina de Carácter Básico/genética , Factor Nuclear 1-alfa del Hepatocito/genética , Inmunidad Innata/inmunología , Linfocitos/inmunología , Receptor de Muerte Celular Programada 1/genética , Animales , Factores de Transcripción con Cremalleras de Leucina de Carácter Básico/inmunología , Células de la Médula Ósea/inmunología , Diferenciación Celular/genética , Diferenciación Celular/inmunología , Linaje de la Célula/inmunología , Regulación de la Expresión Génica , Factor Nuclear 1-alfa del Hepatocito/inmunología , Inmunidad Innata/genética , Células Asesinas Naturales/inmunología , Ratones , Receptor de Muerte Celular Programada 1/inmunología , Factores de Transcripción/genética , Factores de Transcripción/inmunología
19.
F1000Res ; 52016.
Artículo en Inglés | MEDLINE | ID: mdl-27441086

RESUMEN

The ability to easily and efficiently analyse RNA-sequencing data is a key strength of the Bioconductor project. Starting with counts summarised at the gene-level, a typical analysis involves pre-processing, exploratory data analysis, differential expression testing and pathway analysis with the results obtained informing future experiments and validation studies. In this workflow article, we analyse RNA-sequencing data from the mouse mammary gland, demonstrating use of the popular edgeR package to import, organise, filter and normalise the data, followed by the limma package with its voom method, linear modelling and empirical Bayes moderation to assess differential expression and perform gene set testing. This pipeline is further enhanced by the Glimma package which enables interactive exploration of the results so that individual samples and genes can be examined by the user. The complete analysis offered by these three packages highlights the ease with which researchers can turn the raw counts from an RNA-sequencing experiment into biological insights using Bioconductor.

20.
Nucleic Acids Res ; 43(15): e97, 2015 Sep 03.
Artículo en Inglés | MEDLINE | ID: mdl-25925576

RESUMEN

Variations in sample quality are frequently encountered in small RNA-sequencing experiments, and pose a major challenge in a differential expression analysis. Removal of high variation samples reduces noise, but at a cost of reducing power, thus limiting our ability to detect biologically meaningful changes. Similarly, retaining these samples in the analysis may not reveal any statistically significant changes due to the higher noise level. A compromise is to use all available data, but to down-weight the observations from more variable samples. We describe a statistical approach that facilitates this by modelling heterogeneity at both the sample and observational levels as part of the differential expression analysis. At the sample level this is achieved by fitting a log-linear variance model that includes common sample-specific or group-specific parameters that are shared between genes. The estimated sample variance factors are then converted to weights and combined with observational level weights obtained from the mean-variance relationship of the log-counts-per-million using 'voom'. A comprehensive analysis involving both simulations and experimental RNA-sequencing data demonstrates that this strategy leads to a universally more powerful analysis and fewer false discoveries when compared to conventional approaches. This methodology has wide application and is implemented in the open-source 'limma' package.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN/métodos , Animales , Línea Celular Tumoral , Proteínas Cromosómicas no Histona/genética , Humanos , Modelos Lineales , Ratones , Reproducibilidad de los Resultados
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA