Search | VHL Regional Portal

Benchmarking of computational error-correction methods for next-generation sequencing data.

Mitchell, Keith; Brito, Jaqueline J; Mandric, Igor; Wu, Qiaozhen; Knyazev, Sergey; Chang, Sei; Martin, Lana S; Karlsberg, Aaron; Gerasimov, Ekaterina; Littman, Russell; Hill, Brian L; Wu, Nicholas C; Yang, Harry Taegyun; Hsieh, Kevin; Chen, Linus; Littman, Eli; Shabani, Taylor; Enik, German; Yao, Douglas; Sun, Ren; Schroeder, Jan; Eskin, Eleazar; Zelikovsky, Alex; Skums, Pavel; Pop, Mihai; Mangul, Serghei.

Genome Biol ; 21(1): 71, 2020 03 17.

Article in English | MEDLINE | ID: mdl-32183840

ABSTRACT

BACKGROUND: Recent advancements in next-generation sequencing have rapidly improved our ability to study genomic material at an unprecedented scale. Despite substantial improvements in sequencing technologies, errors present in the data still risk confounding downstream analysis and limiting the applicability of sequencing technologies in clinical tools. Computational error correction promises to eliminate sequencing errors, but the relative accuracy of error correction algorithms remains unknown. RESULTS: In this paper, we evaluate the ability of error correction algorithms to fix errors across different types of datasets that contain various levels of heterogeneity. We highlight the advantages and limitations of computational error correction techniques across different domains of biology, including immunogenomics and virology. To demonstrate the efficacy of our technique, we apply the UMI-based high-fidelity sequencing protocol to eliminate sequencing errors from both simulated data and the raw reads. We then perform a realistic evaluation of error-correction methods. CONCLUSIONS: In terms of accuracy, we find that method performance varies substantially across different types of datasets with no single method performing best on all types of examined data. Finally, we also identify the techniques that offer a good balance between precision and sensitivity.

Subject(s)

Algorithms , High-Throughput Nucleotide Sequencing , Benchmarking , Computational Biology/methods , Humans , Receptors, Antigen, T-Cell/genetics , Viruses/genetics , Whole Genome Sequencing

ROP: dumpster diving in RNA-sequencing to find the source of 1 trillion reads across diverse adult human tissues.

Mangul, Serghei; Yang, Harry Taegyun; Strauli, Nicolas; Gruhl, Franziska; Porath, Hagit T; Hsieh, Kevin; Chen, Linus; Daley, Timothy; Christenson, Stephanie; Wesolowska-Andersen, Agata; Spreafico, Roberto; Rios, Cydney; Eng, Celeste; Smith, Andrew D; Hernandez, Ryan D; Ophoff, Roel A; Santana, Jose Rodriguez; Levanon, Erez Y; Woodruff, Prescott G; Burchard, Esteban; Seibold, Max A; Shifman, Sagiv; Eskin, Eleazar; Zaitlen, Noah.

Genome Biol ; 19(1): 36, 2018 02 15.

Article in English | MEDLINE | ID: mdl-29548336

ABSTRACT

High-throughput RNA-sequencing (RNA-seq) technologies provide an unprecedented opportunity to explore the individual transcriptome. Unmapped reads are a large and often overlooked output of standard RNA-seq analyses. Here, we present Read Origin Protocol (ROP), a tool for discovering the source of all reads originating from complex RNA molecules. We apply ROP to samples across 2630 individuals from 54 diverse human tissues. Our approach can account for 99.9% of 1 trillion reads of various read length. Additionally, we use ROP to investigate the functional mechanisms underlying connections between the immune system, microbiome, and disease. ROP is freely available at https://github.com/smangul1/rop/wiki .

Subject(s)

Gene Expression Profiling/methods , High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, RNA/methods , Software , Adult , Algorithms , Asthma/genetics , Bacteria/genetics , Bacteria/isolation & purification , Cell Line , Genes, Immunoglobulin , Genes, T-Cell Receptor , Humans

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL