Your browser doesn't support javascript.
loading
Long-read sequence and assembly of segmental duplications.
Vollger, Mitchell R; Dishuck, Philip C; Sorensen, Melanie; Welch, AnneMarie E; Dang, Vy; Dougherty, Max L; Graves-Lindsay, Tina A; Wilson, Richard K; Chaisson, Mark J P; Eichler, Evan E.
Afiliação
  • Vollger MR; Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.
  • Dishuck PC; Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.
  • Sorensen M; Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.
  • Welch AE; Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.
  • Dang V; Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.
  • Dougherty ML; Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.
  • Graves-Lindsay TA; The McDonnell Genome Institute at Washington University, Washington University School of Medicine, St. Louis, MO, USA.
  • Wilson RK; Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH, USA.
  • Chaisson MJP; Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH, USA.
  • Eichler EE; University of Southern California, Los Angeles, CA, USA. mchaisso@usc.edu.
Nat Methods ; 16(1): 88-94, 2019 01.
Article em En | MEDLINE | ID: mdl-30559433
ABSTRACT
We have developed a computational method based on polyploid phasing of long sequence reads to resolve collapsed regions of segmental duplications within genome assemblies. Segmental Duplication Assembler (SDA; https//github.com/mvollger/SDA ) constructs graphs in which paralogous sequence variants define the nodes and long-read sequences provide attraction and repulsion edges, enabling the partition and assembly of long reads corresponding to distinct paralogs. We apply it to single-molecule, real-time sequence data from three human genomes and recover 33-79 megabase pairs (Mb) of duplications in which approximately half of the loci are diverged (<99.8%) compared to the reference genome. We show that the corresponding sequence is highly accurate (>99.9%) and that the diverged sequence corresponds to copy-number-variable paralogs that are absent from the human reference genome. Our method can be applied to other complex genomes to resolve the last gene-rich gaps, improve duplicate gene annotation, and better understand copy-number-variant genetic diversity at the base-pair level.
Assuntos

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Análise de Sequência de DNA / Biologia Computacional / Duplicações Segmentares Genômicas Limite: Humans Idioma: En Ano de publicação: 2019 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Análise de Sequência de DNA / Biologia Computacional / Duplicações Segmentares Genômicas Limite: Humans Idioma: En Ano de publicação: 2019 Tipo de documento: Article