Pesquisa | BVS Educação Profissional em Saúde

Comparative evaluation of full-length isoform quantification from RNA-Seq.

Sarantopoulou, Dimitra; Brooks, Thomas G; Nayak, Soumyashant; Mrcela, Antonijo; Lahens, Nicholas F; Grant, Gregory R.

BMC Bioinformatics ; 22(1): 266, 2021 May 25.

Artigo em Inglês | MEDLINE | ID: mdl-34034652

RESUMO

BACKGROUND: Full-length isoform quantification from RNA-Seq is a key goal in transcriptomics analyses and has been an area of active development since the beginning. The fundamental difficulty stems from the fact that RNA transcripts are long, while RNA-Seq reads are short. RESULTS: Here we use simulated benchmarking data that reflects many properties of real data, including polymorphisms, intron signal and non-uniform coverage, allowing for systematic comparative analyses of isoform quantification accuracy and its impact on differential expression analysis. Genome, transcriptome and pseudo alignment-based methods are included; and a simple approach is included as a baseline control. CONCLUSIONS: Salmon, kallisto, RSEM, and Cufflinks exhibit the highest accuracy on idealized data, while on more realistic data they do not perform dramatically better than the simple approach. We determine the structural parameters with the greatest impact on quantification accuracy to be length and sequence compression complexity and not so much the number of isoforms. The effect of incomplete annotation on performance is also investigated. Overall, the tested methods show sufficient divergence from the truth to suggest that full-length isoform quantification and isoform level DE should still be employed selectively.

Assuntos

Perfilação da Expressão Gênica , Transcriptoma , Isoformas de Proteínas/genética , RNA-Seq , Análise de Sequência de RNA

Yanagi: Fast and interpretable segment-based alternative splicing and gene expression analysis.

Gunady, Mohamed K; Mount, Stephen M; Corrada Bravo, Héctor.

BMC Bioinformatics ; 20(1): 421, 2019 Aug 13.

Artigo em Inglês | MEDLINE | ID: mdl-31409274

RESUMO

BACKGROUND: Ultra-fast pseudo-alignment approaches are the tool of choice in transcript-level RNA sequencing (RNA-seq) analyses. Unfortunately, these methods couple the tasks of pseudo-alignment and transcript quantification. This coupling precludes the direct usage of pseudo-alignment to other expression analyses, including alternative splicing or differential gene expression analysis, without including a non-essential transcript quantification step. RESULTS: In this paper, we introduce a transcriptome segmentation approach to decouple these two tasks. We propose an efficient algorithm to generate maximal disjoint segments given a transcriptome reference library on which ultra-fast pseudo-alignment can be used to produce per-sample segment counts. We show how to apply these maximally unambiguous count statistics in two specific expression analyses - alternative splicing and gene differential expression - without the need of a transcript quantification step. Our experiments based on simulated and experimental data showed that the use of segment counts, like other methods that rely on local coverage statistics, provides an advantage over approaches that rely on transcript quantification in detecting and correctly estimating local splicing in the case of incomplete transcript annotations. CONCLUSIONS: The transcriptome segmentation approach implemented in Yanagi exploits the computational and space efficiency of pseudo-alignment approaches. It significantly expands their applicability and interpretability in a variety of RNA-seq analyses by providing the means to model and capture local coverage variation in these analyses.

Assuntos

Algoritmos , Transcriptoma , Processamento Alternativo , Animais , Área Sob a Curva , Drosophila/genética , Humanos , RNA/química , RNA/metabolismo , Curva ROC , Análise de Sequência de RNA

Fast and Accurate Classification of Meta-Genomics Long Reads With deSAMBA.

Li, Gaoyang; Liu, Yongzhuang; Li, Deying; Liu, Bo; Li, Junyi; Hu, Yang; Wang, Yadong.

Front Cell Dev Biol ; 9: 643645, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34012962

RESUMO

There is still a lack of fast and accurate classification tools to identify the taxonomies of noisy long reads, which is a bottleneck to the use of the promising long-read metagenomic sequencing technologies. Herein, we propose de Bruijn graph-based Sparse Approximate Match Block Analyzer (deSAMBA), a tailored long-read classification approach that uses a novel pseudo alignment algorithm based on sparse approximate match block (SAMB). Benchmarks on real sequencing datasets demonstrate that deSAMBA enables to achieve high yields and fast speed simultaneously, which outperforms state-of-the-art tools and has many potentials to cutting-edge metagenomics studies.

Reference genome and transcriptome informed by the sex chromosome complement of the sample increase ability to detect sex differences in gene expression from RNA-Seq data.

Olney, Kimberly C; Brotman, Sarah M; Andrews, Jocelyn P; Valverde-Vesling, Valeria A; Wilson, Melissa A.

Biol Sex Differ ; 11(1): 42, 2020 07 21.

Artigo em Inglês | MEDLINE | ID: mdl-32693839

RESUMO

BACKGROUND: Human X and Y chromosomes share an evolutionary origin and, as a consequence, sequence similarity. We investigated whether the sequence homology between the X and Y chromosomes affects the alignment of RNA-Seq reads and estimates of differential expression. We tested the effects of using reference genomes and reference transcriptomes informed by the sex chromosome complement of the sample's genome on the measurements of RNA-Seq abundance and sex differences in expression. RESULTS: The default genome includes the entire human reference genome (GRCh38), including the entire sequence of the X and Y chromosomes. We created two sex chromosome complement informed reference genomes. One sex chromosome complement informed reference genome was used for samples that lacked a Y chromosome; for this reference genome version, we hard-masked the entire Y chromosome. For the other sex chromosome complement informed reference genome, to be used for samples with a Y chromosome, we hard-masked only the pseudoautosomal regions of the Y chromosome, because these regions are duplicated identically in the reference genome on the X chromosome. We analyzed the transcript abundance in the whole blood, brain cortex, breast, liver, and thyroid tissues from 20 genetic female (46, XX) and 20 genetic male (46, XY) samples. Each sample was aligned twice: once to the default reference genome and then independently aligned to a reference genome informed by the sex chromosome complement of the sample, repeated using two different read aligners, HISAT and STAR. We then quantified sex differences in gene expression using featureCounts to get the raw count estimates followed by Limma/Voom for normalization and differential expression. We additionally created sex chromosome complement informed transcriptome references for use in pseudo-alignment using Salmon. Transcript abundance was quantified twice for each sample: once to the default target transcripts and then independently to target transcripts informed by the sex chromosome complement of the sample. CONCLUSIONS: We show that regardless of the choice of the read aligner, using an alignment protocol informed by the sex chromosome complement of the sample results in higher expression estimates on the pseudoautosomal regions of the X chromosome in both genetic male and genetic female samples, as well as an increased number of unique genes being called as differentially expressed between the sexes. We additionally show that using a pseudo-alignment approach informed on the sex chromosome complement of the sample eliminates Y-linked expression in female XX samples.

Assuntos

Cromossomos Humanos X/genética , Cromossomos Humanos Y/genética , Regulação da Expressão Gênica/fisiologia , Genoma Humano , RNA-Seq , Transcriptoma , Feminino , Humanos , Masculino

Using equivalence class counts for fast and accurate testing of differential transcript usage.

Cmero, Marek; Davidson, Nadia M; Oshlack, Alicia.

F1000Res ; 8: 265, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-31143443

RESUMO

Background: RNA sequencing has enabled high-throughput and fine-grained quantitative analyses of the transcriptome. While differential gene expression is the most widely used application of this technology, RNA-seq data also has the resolution to infer differential transcript usage (DTU), which can elucidate the role of different transcript isoforms between experimental conditions, cell types or tissues. DTU has typically been inferred from exon-count data, which has issues with assigning reads unambiguously to counting bins, and requires alignment of reads to the genome. Recently, approaches have emerged that use transcript quantifications estimates directly for DTU. Transcript counts can be inferred from 'pseudo' or lightweight aligners, which are significantly faster than traditional genome alignment. However, recent evaluations show lower sensitivity in DTU analysis. Transcript abundances are estimated from equivalence classes (ECs), which determine the transcripts that any given read is compatible with. Recent work has proposed performing differential expression testing directly on equivalence class read counts (ECs). Methods: Here we demonstrate that ECs can be used effectively with existing count-based methods for detecting DTU. We evaluate this approach on simulated human and drosophila data, as well as on a real dataset through subset testing. Results: We find that ECs counts have similar sensitivity and false discovery rates as exon-level counts but can be generated in a fraction of the time through the use of pseudo-aligners. Conclusions: We posit that equivalence class read counts are a natural unit on which to perform many types of analysis.

Assuntos

Perfilação da Expressão Gênica , Isoformas de Proteínas , Transcriptoma , Animais , Éxons , Humanos , Camundongos , Análise de Sequência de RNA

Improved RNA-seq Workflows Using CyVerse Cyberinfrastructure.

Chougule, Kapeel M; Wang, Liya; Stein, Joshua C; Wang, Xiaofei; Devisetty, Upendra Kumar; Klein, Robert R; Ware, Doreen.

Curr Protoc Bioinformatics ; 63(1): e53, 2018 09.

Artigo em Inglês | MEDLINE | ID: mdl-30168903

RESUMO

RNA-seq is a vital method for understanding gene structure and expression patterns. Typical RNA-seq analysis protocols use sequencing reads of length 50 to 150 nucleotides for alignment to the reference genome and assembly of transcripts. The resultant transcripts are quantified and used for differential expression and visualization. Existing tools and protocols for RNA-seq are vast and diverse; given their differences in performance, it is critical to select an analysis protocol that is scalable, accurate, and easy to use. Tuxedo, a popular alignment-based protocol for RNA-seq analysis, has been updated with HISAT2, StringTie, StringTie-merge, and Ballgown, and the updated protocol outperforms its predecessor. Similarly, new pseudo-alignment-based protocols like Kallisto and Sleuth reduce runtime and improve performance. However, these tools are challenging for researchers lacking command-line experience. Here, we describe two new RNA-seq analysis protocols, in which all tools are deployed on CyVerse Cyberinfrastructure with user-friendly graphical user interfaces, and validate their performance using plant RNA-seq data. © 2018 by John Wiley & Sons, Inc.

Assuntos

Análise de Sequência de RNA , Software , Perfilação da Expressão Gênica , Anotação de Sequência Molecular , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Sorghum/genética

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA