Your browser doesn't support javascript.
loading
Recovering individual haplotypes and a contiguous genome assembly from pooled long-read sequencing of the diamondback moth (Lepidoptera: Plutellidae).
Whiteford, Samuel; Van't Hof, Arjen E; Krishna, Ritesh; Marubbi, Thea; Widdison, Stephanie; Saccheri, Ilik J; Guest, Marcus; Morrison, Neil I; Darby, Alistair C.
Afiliação
  • Whiteford S; Institute of Integrative Biology, University of Liverpool, Liverpool L69 7ZB, UK.
  • Van't Hof AE; Institute of Integrative Biology, University of Liverpool, Liverpool L69 7ZB, UK.
  • Krishna R; Institute of Integrative Biology, University of Liverpool, Liverpool L69 7ZB, UK.
  • Marubbi T; IBM Research UK, STFC Daresbury Laboratory, Warrington WA4 4AD, UK.
  • Widdison S; Oxitec Ltd., Abingdon OX14 4RQ, UK.
  • Saccheri IJ; General Bioinformatics, Jealott's Hill International Research Centre, Bracknell RG42 6EY, UK.
  • Guest M; Institute of Integrative Biology, University of Liverpool, Liverpool L69 7ZB, UK.
  • Morrison NI; Syngenta, Jealott's Hill International Research Centre, Bracknell, RG42 6EY, UK.
  • Darby AC; Oxitec Ltd., Abingdon OX14 4RQ, UK.
G3 (Bethesda) ; 12(10)2022 09 30.
Article em En | MEDLINE | ID: mdl-35980174
The assembly of divergent haplotypes using noisy long-read data presents a challenge to the reconstruction of haploid genome assemblies, due to overlapping distributions of technical sequencing error, intralocus genetic variation, and interlocus similarity within these data. Here, we present a comparative analysis of assembly algorithms representing overlap-layout-consensus, repeat graph, and de Bruijn graph methods. We examine how postprocessing strategies attempting to reduce redundant heterozygosity interact with the choice of initial assembly algorithm and ultimately produce a series of chromosome-level assemblies for an agricultural pest, the diamondback moth, Plutella xylostella (L.). We compare evaluation methods and show that BUSCO analyses may overestimate haplotig removal processing in long-read draft genomes, in comparison to a k-mer method. We discuss the trade-offs inherent in assembly algorithm and curation choices and suggest that "best practice" is research question dependent. We demonstrate a link between allelic divergence and allele-derived contig redundancy in final genome assemblies and document the patterns of coding and noncoding diversity between redundant sequences. We also document a link between an excess of nonsynonymous polymorphism and haplotigs that are unresolved by assembly or postassembly algorithms. Finally, we discuss how this phenomenon may have relevance for the usage of noisy long-read genome assemblies in comparative genomics.
Assuntos
Palavras-chave

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Mariposas Idioma: En Ano de publicação: 2022 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Mariposas Idioma: En Ano de publicação: 2022 Tipo de documento: Article