Your browser doesn't support javascript.
loading
Pacybara: accurate long-read sequencing for barcoded mutagenized allelic libraries.
Weile, Jochen; Ferra, Gabrielle; Boyle, Gabriel; Pendyala, Sriram; Amorosi, Clara; Yeh, Chiann-Ling; Cote, Atina G; Kishore, Nishka; Tabet, Daniel; van Loggerenberg, Warren; Rayhan, Ashyad; Fowler, Douglas M; Dunham, Maitreya J; Roth, Frederick P.
Afiliação
  • Weile J; Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON M5G 1X5, Canada.
  • Ferra G; Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada.
  • Boyle G; Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 3E1, Canada.
  • Pendyala S; Department of Computer Science, University of Toronto, Toronto, ON M5S 2E4, Canada.
  • Amorosi C; Department of Genome Sciences, University of Washington, Seattle, WA 98195, United States.
  • Yeh CL; Department of Genome Sciences, University of Washington, Seattle, WA 98195, United States.
  • Cote AG; Department of Genome Sciences, University of Washington, Seattle, WA 98195, United States.
  • Kishore N; Department of Genome Sciences, University of Washington, Seattle, WA 98195, United States.
  • Tabet D; Department of Genome Sciences, University of Washington, Seattle, WA 98195, United States.
  • van Loggerenberg W; Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON M5G 1X5, Canada.
  • Rayhan A; Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada.
  • Fowler DM; Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 3E1, Canada.
  • Dunham MJ; Department of Computer Science, University of Toronto, Toronto, ON M5S 2E4, Canada.
  • Roth FP; Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON M5G 1X5, Canada.
Bioinformatics ; 40(4)2024 Mar 29.
Article em En | MEDLINE | ID: mdl-38569896
ABSTRACT
MOTIVATION Long-read sequencing technologies, an attractive solution for many applications, often suffer from higher error rates. Alignment of multiple reads can improve base-calling accuracy, but some applications, e.g. sequencing mutagenized libraries where multiple distinct clones differ by one or few variants, require the use of barcodes or unique molecular identifiers. Unfortunately, sequencing errors can interfere with correct barcode identification, and a given barcode sequence may be linked to multiple independent clones within a given library.

RESULTS:

Here we focus on the target application of sequencing mutagenized libraries in the context of multiplexed assays of variant effects (MAVEs). MAVEs are increasingly used to create comprehensive genotype-phenotype maps that can aid clinical variant interpretation. Many MAVE methods use long-read sequencing of barcoded mutant libraries for accurate association of barcode with genotype. Existing long-read sequencing pipelines do not account for inaccurate sequencing or nonunique barcodes. Here, we describe Pacybara, which handles these issues by clustering long reads based on the similarities of (error-prone) barcodes while also detecting barcodes that have been associated with multiple genotypes. Pacybara also detects recombinant (chimeric) clones and reduces false positive indel calls. In three example applications, we show that Pacybara identifies and correctly resolves these issues. AVAILABILITY AND IMPLEMENTATION Pacybara, freely available at https//github.com/rothlab/pacybara, is implemented using R, Python, and bash for Linux. It runs on GNU/Linux HPC clusters via Slurm, PBS, or GridEngine schedulers. A single-machine simplex version is also available.
Assuntos

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Software / Sequenciamento de Nucleotídeos em Larga Escala Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Software / Sequenciamento de Nucleotídeos em Larga Escala Idioma: En Ano de publicação: 2024 Tipo de documento: Article