Your browser doesn't support javascript.
loading
SplitMEM: a graphical algorithm for pan-genome analysis with suffix skips.
Marcus, Shoshana; Lee, Hayan; Schatz, Michael C.
Afiliación
  • Marcus S; Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA and Department of Computer Science, Stony Brook University, Stony Brook, NY 11794, USA.
  • Lee H; Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA and Department of Computer Science, Stony Brook University, Stony Brook, NY 11794, USA Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA and Department of Computer Science, Stony Brook University, Stony Brook, NY 11794, USA.
  • Schatz MC; Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA and Department of Computer Science, Stony Brook University, Stony Brook, NY 11794, USA Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA and Department of Computer Science, Stony Brook University, Stony Brook, NY 11794, USA.
Bioinformatics ; 30(24): 3476-83, 2014 Dec 15.
Article en En | MEDLINE | ID: mdl-25398610
ABSTRACT
MOTIVATION Genomics is expanding from a single reference per species paradigm into a more comprehensive pan-genome approach that analyzes multiple individuals together. A compressed de Bruijn graph is a sophisticated data structure for representing the genomes of entire populations. It robustly encodes shared segments, simple single-nucleotide polymorphisms and complex structural variations far beyond what can be represented in a collection of linear sequences alone.

RESULTS:

We explore deep topological relationships between suffix trees and compressed de Bruijn graphs and introduce an algorithm, splitMEM, that directly constructs the compressed de Bruijn graph in time and space linear to the total number of genomes for a given maximum genome size. We introduce suffix skips to traverse several suffix links simultaneously and use them to efficiently decompose maximal exact matches into graph nodes. We demonstrate the utility of splitMEM by analyzing the nine-strain pan-genome of Bacillus anthracis and up to 62 strains of Escherichia coli, revealing their core-genome properties.
Asunto(s)

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Algoritmos / Análisis de Secuencia de ADN / Genómica Tipo de estudio: Prognostic_studies Idioma: En Revista: Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2014 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Algoritmos / Análisis de Secuencia de ADN / Genómica Tipo de estudio: Prognostic_studies Idioma: En Revista: Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2014 Tipo del documento: Article País de afiliación: Estados Unidos