Your browser doesn't support javascript.
loading
metaSPARSim: a 16S rRNA gene sequencing count data simulator.
Patuzzi, Ilaria; Baruzzo, Giacomo; Losasso, Carmen; Ricci, Antonia; Di Camillo, Barbara.
Afiliação
  • Patuzzi I; Department of Information Engineering, University of Padova, via Giovanni Gradenigo, 6, Padova, 35131, Italy.
  • Baruzzo G; Microbial Ecology Unit, Istituto Zooprofilattico Sperimentale delle Venezie, Viale dell'Università, 10, Legnaro (PD), 35020, Italy.
  • Losasso C; Department of Information Engineering, University of Padova, via Giovanni Gradenigo, 6, Padova, 35131, Italy.
  • Ricci A; Microbial Ecology Unit, Istituto Zooprofilattico Sperimentale delle Venezie, Viale dell'Università, 10, Legnaro (PD), 35020, Italy.
  • Di Camillo B; Istituto Zooprofilattico Sperimentale delle Venezie, Viale dell'Università, 10, Legnaro (PD), 35020, Italy.
BMC Bioinformatics ; 20(Suppl 9): 416, 2019 Nov 22.
Article em En | MEDLINE | ID: mdl-31757204
BACKGROUND: In the last few years, 16S rRNA gene sequencing (16S rDNA-seq) has seen a surprisingly rapid increase in election rate as a methodology to perform microbial community studies. Despite the considerable popularity of this technique, an exiguous number of specific tools are currently available for proper 16S rDNA-seq count data preprocessing and simulation. Indeed, the great majority of tools have been developed adapting methodologies previously used for bulk RNA-seq data, with poor assessment of their applicability in the metagenomics field. For such tools and the few ones specifically developed for 16S rDNA-seq data, performance assessment is challenging, mainly due to the complex nature of the data and the lack of realistic simulation models. In fact, to the best of our knowledge, no software thought for data simulation are available to directly obtain synthetic 16S rDNA-seq count tables that properly model heavy sparsity and compositionality typical of these data. RESULTS: In this paper we present metaSPARSim, a sparse count matrix simulator intended for usage in development of 16S rDNA-seq metagenomic data processing pipelines. metaSPARSim implements a new generative process that models the sequencing process with a Multivariate Hypergeometric distribution in order to realistically simulate 16S rDNA-seq count table, resembling real experimental data compositionality and sparsity. It provides ready-to-use count matrices and comes with the possibility to reproduce different pre-coded scenarios and to estimate simulation parameters from real experimental data. The tool is made available at http://sysbiobig.dei.unipd.it/?q=Software#metaSPARSimand https://gitlab.com/sysbiobig/metasparsim. CONCLUSION: metaSPARSim is able to generate count matrices resembling real 16S rDNA-seq data. The availability of count data simulators is extremely valuable both for methods developers, for which a ground truth for tools validation is needed, and for users who want to assess state of the art analysis tools for choosing the most accurate one. Thus, we believe that metaSPARSim is a valuable tool for researchers involved in developing, testing and using robust and reliable data analysis methods in the context of 16S rRNA gene sequencing.
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Software / RNA Ribossômico 16S / Metagenômica / Sequenciamento de Nucleotídeos em Larga Escala Limite: Animals / Humans Idioma: En Revista: BMC Bioinformatics Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2019 Tipo de documento: Article País de afiliação: Itália

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Software / RNA Ribossômico 16S / Metagenômica / Sequenciamento de Nucleotídeos em Larga Escala Limite: Animals / Humans Idioma: En Revista: BMC Bioinformatics Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2019 Tipo de documento: Article País de afiliação: Itália