RESUMEN
k-SLAM is a highly efficient algorithm for the characterization of metagenomic data. Unlike other ultra-fast metagenomic classifiers, full sequence alignment is performed allowing for gene identification and variant calling in addition to accurate taxonomic classification. A k-mer based method provides greater taxonomic accuracy than other classifiers and a three orders of magnitude speed increase over alignment based approaches. The use of alignments to find variants and genes along with their taxonomic origins enables novel strains to be characterized. k-SLAM's speed allows a full taxonomic classification and gene identification to be tractable on modern large data sets. A pseudo-assembly method is used to increase classification accuracy by up to 40% for species which have high sequence homology within their genus.
Asunto(s)
Biología Computacional/métodos , Código de Barras del ADN Taxonómico/métodos , Metagenoma , Metagenómica/métodos , Algoritmos , Estudios de Casos y Controles , Biología Computacional/normas , Código de Barras del ADN Taxonómico/normas , Microbioma Gastrointestinal , Genoma Bacteriano , Humanos , Cirrosis Hepática/microbiología , Metagenómica/normas , Reproducibilidad de los Resultados , Escherichia coli Shiga-Toxigénica/clasificación , Escherichia coli Shiga-Toxigénica/genéticaRESUMEN
SUMMARY: An ultrafast DNA sequence aligner (Isaac Genome Alignment Software) that takes advantage of high-memory hardware (>48 GB) and variant caller (Isaac Variant Caller) have been developed. We demonstrate that our combined pipeline (Isaac) is four to five times faster than BWA + GATK on equivalent hardware, with comparable accuracy as measured by trio conflict rates and sensitivity. We further show that Isaac is effective in the detection of disease-causing variants and can easily/economically be run on commodity hardware. AVAILABILITY: Isaac has an open source license and can be obtained at https://github.com/sequencing.