MetaSpark: a spark-based distributed processing tool to recruit metagenomic reads to reference genomes.
Bioinformatics
; 33(7): 1090-1092, 2017 04 01.
Article
em En
| MEDLINE
| ID: mdl-28065898
Summary: With the advent of next-generation sequencing, traditional bioinformatics tools are challenged by massive raw metagenomic datasets. One of the bottlenecks of metagenomic studies is lack of large-scale and cloud computing suitable data analysis tools. In this paper, we proposed a Spark based tool, called MetaSpark, to recruit metagenomic reads to reference genomes. MetaSpark benefits from the distributed data set (RDD) of Spark, which makes it able to cache data set in memory across cluster nodes and scale well with the datasets. Compared with previous metagenomics recruitment tools, MetaSpark recruited significantly more reads than many programs such as SOAP2, BWA and LAST and increased recruited reads by â¼4% compared with FR-HIT when there were 1 million reads and 0.75 GB references. Different test cases demonstrate MetaSpark's scalability and overall high performance. Availability: https://github.com/zhouweiyg/metaspark. Contact: bniu@sccas.cn , jingluo@ynu.edu.cn. Supplementary information: Supplementary data are available at Bioinformatics online.
Texto completo:
1
Base de dados:
MEDLINE
Assunto principal:
Software
/
Metagenômica
/
Sequenciamento de Nucleotídeos em Larga Escala
Tipo de estudo:
Evaluation_studies
Limite:
Humans
Idioma:
En
Ano de publicação:
2017
Tipo de documento:
Article