RESUMO
Wastewater-based epidemiology has proven to be an important public health asset during the COVID-19 pandemic. It can provide less biassed and more cost-effective population-level monitoring of the disease burden as compared to clinical testing. An essential component of SARS-CoV-2 wastewater monitoring is next-generation sequencing, providing genomic data to identify and quantify circulating viral strains rapidly. However, the specific choice of sequencing method influences the quality and timeliness of generated data and hence its usefulness for wastewater-based pathogen surveillance. Here, we systematically benchmarked Illumina Novaseq 6000, Element Aviti, ONT R9.4.1 MinION flow cell, and ONT R9.4.1 Flongle flow cell sequencing data to facilitate the selection of sequencing technology. Using a time series of wastewater samples from influent of six wastewater treatment plants throughout Switzerland, along with spike-in experiments, we show that higher sequencing error rates of ONT Nanopore sequencing reduce the accuracy of estimates of the relative abundance of viral variants, but the overall trend is in good concordance among all technologies. We find that the sequencing runtime for ONT Nanopore flow cells can be reduced to as little as five hours without significant impact on the quality of variant estimates. Our findings suggest that SARS-CoV-2 variant tracking is readily achievable with all tested technologies, albeit with different tradeoffs in terms of cost, timeliness and accuracy.
RESUMO
The large amount and diversity of viral genomic datasets generated by next-generation sequencing technologies poses a set of challenges for computational data analysis workflows, including rigorous quality control, scaling to large sample sizes, and tailored steps for specific applications. Here, we present V-pipe 3.0, a computational pipeline designed for analyzing next-generation sequencing data of short viral genomes. It is developed to enable reproducible, scalable, adaptable, and transparent inference of genetic diversity of viral samples. By presenting 2 large-scale data analysis projects, we demonstrate the effectiveness of V-pipe 3.0 in supporting sustainable viral genomic data science.
Assuntos
Variação Genética , Genoma Viral , Sequenciamento de Nucleotídeos em Larga Escala , Software , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Biologia Computacional/métodos , Genômica/métodos , Vírus/genética , HumanosRESUMO
In cancer, genetic and transcriptomic variations generate clonal heterogeneity, possibly leading to treatment resistance. Long-read single-cell RNA sequencing (LR scRNA-seq) has the potential to detect genetic and transcriptomic variations simultaneously. Here, we present LongSom, a computational workflow leveraging LR scRNA-seq data to call de novo somatic single-nucleotide variants (SNVs), copy-number alterations (CNAs), and gene fusions to reconstruct the tumor clonal heterogeneity. For SNV calling, LongSom distinguishes somatic SNVs from germline polymorphisms by reannotating marker gene expression-based cell types using called variants and applying strict filters. Applying LongSom to ovarian cancer samples, we detected clinically relevant somatic SNVs that were validated against single-cell and bulk panel DNA-seq data and could not be detected with short-read (SR) scRNA-seq. Leveraging somatic SNVs and fusions, LongSom found subclones with different predicted treatment outcomes. In summary, LongSom enables de novo SNVs, CNAs, and fusions detection, thus enabling the study of cancer evolution, clonal heterogeneity, and treatment resistance.
RESUMO
Gene fusions are found as cancer drivers in diverse adult and pediatric cancers. Accurate detection of fusion transcripts is essential in cancer clinical diagnostics, prognostics, and for guiding therapeutic development. Most currently available methods for fusion transcript detection are compatible with Illumina RNA-seq involving highly accurate short read sequences. Recent advances in long read isoform sequencing enable the detection of fusion transcripts at unprecedented resolution in bulk and single cell samples. Here we developed a new computational tool CTAT-LR-fusion to detect fusion transcripts from long read RNA-seq with or without companion short reads, with applications to bulk or single cell transcriptomes. We demonstrate that CTAT-LR-fusion exceeds fusion detection accuracy of alternative methods as benchmarked with simulated and real long read RNA-seq. Using short and long read RNA-seq, we further apply CTAT-LR-fusion to bulk transcriptomes of nine tumor cell lines, and to tumor single cells derived from a melanoma sample and three metastatic high grade serous ovarian carcinoma samples. In both bulk and in single cell RNA-seq, long isoform reads yielded higher sensitivity for fusion detection than short reads with notable exceptions. By combining short and long reads in CTAT-LR-fusion, we are able to further maximize detection of fusion splicing isoforms and fusion-expressing tumor cells. CTAT-LR-fusion is available at https://github.com/TrinityCTAT/CTAT-LR-fusion/wiki.
RESUMO
Understanding the complex background of cancer requires genotype-phenotype information in single-cell resolution. Here, we perform long-read single-cell RNA sequencing (scRNA-seq) on clinical samples from three ovarian cancer patients presenting with omental metastasis and increase the PacBio sequencing depth to 12,000 reads per cell. Our approach captures 152,000 isoforms, of which over 52,000 were not previously reported. Isoform-level analysis accounting for non-coding isoforms reveals 20% overestimation of protein-coding gene expression on average. We also detect cell type-specific isoform and poly-adenylation site usage in tumor and mesothelial cells, and find that mesothelial cells transition into cancer-associated fibroblasts in the metastasis, partly through the TGF-ß/miR-29/Collagen axis. Furthermore, we identify gene fusions, including an experimentally validated IGF2BP2::TESPA1 fusion, which is misclassified as high TESPA1 expression in matched short-read data, and call mutations confirmed by targeted NGS cancer gene panel results. With these findings, we envision long-read scRNA-seq to become increasingly relevant in oncology and personalized medicine.