Your browser doesn't support javascript.
loading
KmerAperture: Retaining k-mer synteny for alignment-free extraction of core and accessory differences between bacterial genomes.
Moore, Matthew P; Laager, Mirjam; Ribeca, Paolo; Didelot, Xavier.
Afiliação
  • Moore MP; School of Life Sciences, University of Warwick, Coventry, United Kingdom.
  • Laager M; Department of Statistics, University of Warwick, Coventry, United Kingdom.
  • Ribeca P; Division of Transplant Immunology and Nephrology, University Hospital Basel, Basel, Switzerland.
  • Didelot X; UK Health Security Agency, London, United Kingdom.
PLoS Genet ; 20(4): e1011184, 2024 Apr.
Article em En | MEDLINE | ID: mdl-38683871
ABSTRACT
By decomposing genome sequences into k-mers, it is possible to estimate genome differences without alignment. Techniques such as k-mer minimisers, for example MinHash, have been developed and are often accurate approximations of distances based on full k-mer sets. These and other alignment-free methods avoid the large temporal and computational expense of alignment. However, these k-mer set comparisons are not entirely accurate within-species and can be completely inaccurate within-lineage. This is due, in part, to their inability to distinguish core polymorphism from accessory differences. Here we present a new approach, KmerAperture, which uses information on the k-mer relative genomic positions to determine the type of polymorphism causing differences in k-mer presence and absence between pairs of genomes. Single SNPs are expected to result in k unique contiguous k-mers per genome. On the other hand, contiguous series > k may be caused by accessory differences of length S-k+1; when the start and end of the sequence are contiguous with homologous sequence. Alternatively, they may be caused by multiple SNPs within k bp from each other and KmerAperture can determine whether that is the case. To demonstrate use cases KmerAperture was benchmarked using datasets including a very low diversity simulated population with accessory content independent from the number of SNPs, a simulated population where SNPs are spatially dense, a moderately diverse real cluster of genomes (Escherichia coli ST1193) with a large accessory genome and a low diversity real genome cluster (Salmonella Typhimurium ST34). We show that KmerAperture can accurately distinguish both core and accessory sequence diversity without alignment, outperforming other k-mer based tools.
Assuntos

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Genoma Bacteriano / Polimorfismo de Nucleotídeo Único Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Genoma Bacteriano / Polimorfismo de Nucleotídeo Único Idioma: En Ano de publicação: 2024 Tipo de documento: Article