Your browser doesn't support javascript.
loading
The statistical power of k-mer based aggregative statistics for alignment-free detection of horizontal gene transfer.
Huang, Guan-Da; Liu, Xue-Mei; Huang, Tian-Lai; Xia, Li-C.
Afiliação
  • Huang GD; School of Physics and Optoelectronics, South China University of Technology, Guangzhou, 510640, China.
  • Liu XM; School of Physics and Optoelectronics, South China University of Technology, Guangzhou, 510640, China.
  • Huang TL; School of Physics and Optoelectronics, South China University of Technology, Guangzhou, 510640, China.
  • Xia LC; Department of Medicine, Stanford University School of Medicine, Stanford, CA, 94305, USA.
Synth Syst Biotechnol ; 4(3): 150-156, 2019 Sep.
Article em En | MEDLINE | ID: mdl-31508512
ABSTRACT
Alignment-based database search and sequence comparison are commonly used to detect horizontal gene transfer (HGT). However, with the rapid increase of sequencing depth, hundreds of thousands of contigs are routinely assembled from metagenomics studies, which challenges alignment-based HGT analysis by overwhelming the known reference sequences. Detecting HGT by k-mer statistics thus becomes an attractive alternative. These alignment-free statistics have been demonstrated in high performance and efficiency in whole-genome and transcriptome comparisons. To adapt k-mer statistics for HGT detection, we developed two aggregative statistics T s u m S and T s u m * , which subsample metagenome contigs by their representative regions, and summarize the regional D 2 S and D 2 * metrics by their upper bounds. We systematically studied the aggregative statistics' power at different k-mer size using simulations. Our analysis showed that, in general, the power of T s u m S and T s u m * increases with sequencing coverage, and reaches a maximum power >80% at k = 6, with 5% Type-I error and the coverage ratio >0.2x. The statistical power of T s u m S and T s u m * was evaluated with realistic simulations of HGT mechanism, sequencing depth, read length, and base error. We expect these statistics to be useful distance metrics for identifying HGT in metagenomic studies.
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Tipo de estudo: Diagnostic_studies Idioma: En Revista: Synth Syst Biotechnol Ano de publicação: 2019 Tipo de documento: Article País de afiliação: China

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Tipo de estudo: Diagnostic_studies Idioma: En Revista: Synth Syst Biotechnol Ano de publicação: 2019 Tipo de documento: Article País de afiliação: China