Natrix: a Snakemake-based workflow for processing, clustering, and taxonomically assigning amplicon sequencing reads.

Welzel, Marius; Lange, Anja; Heider, Dominik; Schwarz, Michael; Freisleben, Bernd; Jensen, Manfred; Boenigk, Jens; Beisser, Daniela

Welzel, Marius; Lange, Anja; Heider, Dominik; Schwarz, Michael; Freisleben, Bernd; Jensen, Manfred; Boenigk, Jens; Beisser, Daniela.

Afiliação

Welzel M; Department of Mathematics and Computer Science, University of Marburg, Marburg, Germany.
Lange A; Department of Bioinformatics and Computational Biophysics, University of Duisburg-Essen, Essen, Germany.
Heider D; Department of Mathematics and Computer Science, University of Marburg, Marburg, Germany.
Schwarz M; Department of Mathematics and Computer Science, University of Marburg, Marburg, Germany.
Freisleben B; Department of Mathematics and Computer Science, University of Marburg, Marburg, Germany.
Jensen M; Department of Biodiversity, University of Duisburg-Essen, Essen, Germany.
Boenigk J; Department of Biodiversity, University of Duisburg-Essen, Essen, Germany.
Beisser D; Department of Biodiversity, University of Duisburg-Essen, Essen, Germany. daniela.beisser@uni-due.de.

BMC Bioinformatics ; 21(1): 526, 2020 Nov 16.

Article em En | MEDLINE | ID: mdl-33198651

ABSTRACT

ABSTRACT

BACKGROUND:

Sequencing of marker genes amplified from environmental samples, known as amplicon sequencing, allows us to resolve some of the hidden diversity and elucidate evolutionary relationships and ecological processes among complex microbial communities. The analysis of large numbers of samples at high sequencing depths generated by high throughput sequencing technologies requires efficient, flexible, and reproducible bioinformatics pipelines. Only a few existing workflows can be run in a user-friendly, scalable, and reproducible manner on different computing devices using an efficient workflow management system.

RESULTS:

We present Natrix, an open-source bioinformatics workflow for preprocessing raw amplicon sequencing data. The workflow contains all analysis steps from quality assessment, read assembly, dereplication, chimera detection, split-sample merging, sequence representative assignment (OTUs or ASVs) to the taxonomic assignment of sequence representatives. The workflow is written using Snakemake, a workflow management engine for developing data analysis workflows. In addition, Conda is used for version control. Thus, Snakemake ensures reproducibility and Conda offers version control of the utilized programs. The encapsulation of rules and their dependencies support hassle-free sharing of rules between workflows and easy adaptation and extension of existing workflows. Natrix is freely available on GitHub ( https//github.com/MW55/Natrix ) or as a Docker container on DockerHub ( https//hub.docker.com/r/mw55/natrix ).

CONCLUSION:

Natrix is a user-friendly and highly extensible workflow for processing Illumina amplicon data.

Assuntos

Sequenciamento de Nucleotídeos em Larga Escala; Software; Fluxo de Trabalho; Análise por Conglomerados; DNA Ambiental/genética; DNA Ambiental/isolamento & purificação; Análise de Dados; Bases de Dados Genéticas; Inundações; Microbiota/genética; Reprodutibilidade dos Testes

Palavras-chave

Amplicon Sequence Variants; Amplicon sequencing; Illumina; Operational Taxonomic Units; Pipline; Snakemake

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Software / Fluxo de Trabalho / Sequenciamento de Nucleotídeos em Larga Escala Tipo de estudo: Prognostic_studies Idioma: En Revista: BMC Bioinformatics Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2020 Tipo de documento: Article País de afiliação: Alemanha

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google