Your browser doesn't support javascript.
loading
DNA-m6A calling and integrated long-read epigenetic and genetic analysis with fibertools.
Jha, Anupama; Bohaczuk, Stephanie C; Mao, Yizi; Ranchalis, Jane; Mallory, Benjamin J; Min, Alan T; Hamm, Morgan O; Swanson, Elliott; Dubocanin, Danilo; Finkbeiner, Connor; Li, Tony; Whittington, Dale; Noble, William Stafford; Stergachis, Andrew B; Vollger, Mitchell R.
Affiliation
  • Jha A; Department of Genome Sciences, University of Washington, Seattle, WA, USA.
  • Bohaczuk SC; Division of Medical Genetics, University of Washington School of Medicine, Seattle, WA, USA.
  • Mao Y; Division of Medical Genetics, University of Washington School of Medicine, Seattle, WA, USA.
  • Ranchalis J; Division of Medical Genetics, University of Washington School of Medicine, Seattle, WA, USA.
  • Mallory BJ; Department of Genome Sciences, University of Washington, Seattle, WA, USA.
  • Min AT; Department of Statistics, University of Washington, Seattle, WA, USA.
  • Hamm MO; Department of Genome Sciences, University of Washington, Seattle, WA, USA.
  • Swanson E; Department of Genome Sciences, University of Washington, Seattle, WA, USA.
  • Dubocanin D; Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA.
  • Finkbeiner C; Department of Genome Sciences, University of Washington, Seattle, WA, USA.
  • Li T; Department of Genome Sciences, University of Washington, Seattle, WA, USA.
  • Whittington D; Department of Medical Chemistry, University of Washington, Seattle, WA, USA.
  • Noble WS; Department of Genome Sciences, University of Washington, Seattle, WA, USA.
  • Stergachis AB; Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA.
  • Vollger MR; Department of Genome Sciences, University of Washington, Seattle, WA, USA.
bioRxiv ; 2023 Dec 11.
Article in En | MEDLINE | ID: mdl-37131601
ABSTRACT
Long-read DNA sequencing has recently emerged as a powerful tool for studying both genetic and epigenetic architectures at single-molecule and single-nucleotide resolution. Long-read epigenetic studies encompass both the direct identification of native cytosine methylation as well as the identification of exogenously placed DNA N6-methyladenine (DNA-m6A). However, detecting DNA-m6A modifications using single-molecule sequencing, as well as co-processing single-molecule genetic and epigenetic architectures, is limited by computational demands and a lack of supporting tools. Here, we introduce fibertools, a state-of-the-art toolkit that features a semi-supervised convolutional neural network for fast and accurate identification of m6A-marked bases using PacBio single-molecule long-read sequencing, as well as the co-processing of long-read genetic and epigenetic data produced using either PacBio or Oxford Nanopore sequencing platforms. We demonstrate accurate DNA-m6A identification (>90% precision and recall) along >20 kilobase long DNA molecules with a ~1,000-fold improvement in speed. In addition, we demonstrate that fibertools can readily integrate genetic and epigenetic data at single-molecule resolution, including the seamless conversion between molecular and reference coordinate systems, allowing for accurate genetic and epigenetic analyses of long-read data within structurally and somatically variable genomic regions.

Full text: 1 Collection: 01-internacional Database: MEDLINE Language: En Journal: BioRxiv Year: 2023 Document type: Article Affiliation country: United States

Full text: 1 Collection: 01-internacional Database: MEDLINE Language: En Journal: BioRxiv Year: 2023 Document type: Article Affiliation country: United States