RESUMEN
The Long-read RNA-Seq Genome Annotation Assessment Project Consortium was formed to evaluate the effectiveness of long-read approaches for transcriptome analysis. Using different protocols and sequencing platforms, the consortium generated over 427 million long-read sequences from complementary DNA and direct RNA datasets, encompassing human, mouse and manatee species. Developers utilized these data to address challenges in transcript isoform detection, quantification and de novo transcript detection. The study revealed that libraries with longer, more accurate sequences produce more accurate transcripts than those with increased read depth, whereas greater read depth improved quantification accuracy. In well-annotated genomes, tools based on reference sequences demonstrated the best performance. Incorporating additional orthogonal data and replicate samples is advised when aiming to detect rare and novel transcripts or using reference-free approaches. This collaborative study offers a benchmark for current practices and provides direction for future method development in transcriptome analysis.
Asunto(s)
Perfilación de la Expresión Génica , RNA-Seq , Humanos , Animales , Ratones , RNA-Seq/métodos , Perfilación de la Expresión Génica/métodos , Transcriptoma , Análisis de Secuencia de ARN/métodos , Anotación de Secuencia Molecular/métodosRESUMEN
The lack of benchmark data sets with inbuilt ground-truth makes it challenging to compare the performance of existing long-read isoform detection and differential expression analysis workflows. Here, we present a benchmark experiment using two human lung adenocarcinoma cell lines that were each profiled in triplicate together with synthetic, spliced, spike-in RNAs (sequins). Samples were deeply sequenced on both Illumina short-read and Oxford Nanopore Technologies long-read platforms. Alongside the ground-truth available via the sequins, we created in silico mixture samples to allow performance assessment in the absence of true positives or true negatives. Our results show that StringTie2 and bambu outperformed other tools from the six isoform detection tools tested, DESeq2, edgeR and limma-voom were best among the five differential transcript expression tools tested and there was no clear front-runner for performing differential transcript usage analysis between the five tools compared, which suggests further methods development is needed for this application.
Asunto(s)
Perfilación de la Expresión Génica , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Perfilación de la Expresión Génica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Benchmarking/métodos , ARN , Isoformas de ProteínasRESUMEN
Actinobacillus pleuropneumoniae is the cause of porcine pleuropneumonia, a severe respiratory tract infection that is responsible for major economic losses to the swine industry. Many host-adapted bacterial pathogens encode systems known as phasevarions (phase-variable regulons). Phasevarions result from variable expression of cytoplasmic DNA methyltransferases. Variable expression results in genome-wide methylation differences within a bacterial population, leading to altered expression of multiple genes via epigenetic mechanisms. Our examination of a diverse population of A. pleuropneumoniae strains determined that Type I and Type III DNA methyltransferases with the hallmarks of phase variation were present in this species. We demonstrate that phase variation is occurring in these methyltransferases, and show associations between particular Type III methyltransferase alleles and serovar. Using Pacific BioSciences Single-Molecule, Real-Time (SMRT) sequencing and Oxford Nanopore sequencing, we demonstrate the presence of the first ever characterised phase-variable, cytosine-specific Type III DNA methyltransferase. Phase variation of distinct Type III DNA methyltransferase in A. pleuropneumoniae results in the regulation of distinct phasevarions, and in multiple phenotypic differences relevant to pathobiology. Our characterisation of these newly described phasevarions in A. pleuropneumoniae will aid in the selection of stably expressed antigens, and direct and inform development of a rationally designed subunit vaccine against this major veterinary pathogen.
Asunto(s)
Actinobacillus pleuropneumoniae , Variación de la Fase , Animales , Porcinos , Actinobacillus pleuropneumoniae/genética , Actinobacillus pleuropneumoniae/metabolismo , Metilasas de Modificación del ADN/genética , Metilasas de Modificación del ADN/metabolismo , Metilación de ADN , Metiltransferasas/genética , Metiltransferasas/metabolismo , Bacterias/genética , ADN/metabolismoRESUMEN
scPipe is a flexible R/Bioconductor package originally developed to analyse platform-independent single-cell RNA-Seq data. To expand its preprocessing capability to accommodate new single-cell technologies, we further developed scPipe to handle single-cell ATAC-Seq and multi-modal (RNA-Seq and ATAC-Seq) data. After executing multiple data cleaning steps to remove duplicated reads, low abundance features and cells of poor quality, a SingleCellExperiment object is created that contains a sparse count matrix with features of interest in the rows and cells in the columns. Quality control information (e.g. counts per cell, features per cell, total number of fragments, fraction of fragments per peak) and any relevant feature annotations are stored as metadata. We demonstrate that scPipe can efficiently identify 'true' cells and provides flexibility for the user to fine-tune the quality control thresholds using various feature and cell-based metrics collected during data preprocessing. Researchers can then take advantage of various downstream single-cell tools available in Bioconductor for further analysis of scATAC-Seq data such as dimensionality reduction, clustering, motif enrichment, differential accessibility and cis-regulatory network analysis. The scPipe package enables a complete beginning-to-end pipeline for single-cell ATAC-Seq and RNA-Seq data analysis in R.
RESUMEN
The Long-read RNA-Seq Genome Annotation Assessment Project (LRGASP) Consortium was formed to evaluate the effectiveness of long-read approaches for transcriptome analysis. The consortium generated over 427 million long-read sequences from cDNA and direct RNA datasets, encompassing human, mouse, and manatee species, using different protocols and sequencing platforms. These data were utilized by developers to address challenges in transcript isoform detection and quantification, as well as de novo transcript isoform identification. The study revealed that libraries with longer, more accurate sequences produce more accurate transcripts than those with increased read depth, whereas greater read depth improved quantification accuracy. In well-annotated genomes, tools based on reference sequences demonstrated the best performance. When aiming to detect rare and novel transcripts or when using reference-free approaches, incorporating additional orthogonal data and replicate samples are advised. This collaborative study offers a benchmark for current practices and provides direction for future method development in transcriptome analysis.
RESUMEN
[This corrects the article DOI: 10.1016/j.dib.2022.107828.].
RESUMEN
Radiotherapy injury to cells of the skin and subcutaneous tissue is an inevitable consequence of external beam radiation for treatment of cancer. This sublethal injury to normal tissues plays a significant role in the development of fibrosis, lymphedema, impaired wound healing, and recurrent infections. To elucidate the transcriptional changes that occur in cells of the skin and soft tissues after radiotherapy injury, we performed genome-wide RNA-sequencing comparing irradiated cells (10Gy) with non-irradiated (0Gy) controls in normal human dermal fibroblasts, normal human keratinocytes, human microvascular endothelial cells, human dermal lymphatic endothelial cells, pericytes and adipose derived stem cell populations. These data are publicly available from the Gene Expression Omnibus database (accession number GSE184119). Further insights can be gained by comparing the mRNA signatures arising from radiation injury derived from these data to publicly available signatures from other studies involving similar or different tissue types. These global targets hold potential for manipulation to mitigate radiotherapy soft tissue injury.
RESUMEN
A modified Chromium 10x droplet-based protocol that subsamples cells for both short-read and long-read (nanopore) sequencing together with a new computational pipeline (FLAMES) is developed to enable isoform discovery, splicing analysis, and mutation detection in single cells. We identify thousands of unannotated isoforms and find conserved functional modules that are enriched for alternative transcript usage in different cell types and species, including ribosome biogenesis and mRNA splicing. Analysis at the transcript level allows data integration with scATAC-seq on individual promoters, improved correlation with protein expression data, and linked mutations known to confer drug resistance to transcriptome heterogeneity.