User-friendly bioinformatics pipeline gDAT (graphical downstream analysis tool) for analysing rDNA sequences.

Vasar, Martti; Davison, John; Neuenkamp, Lena; Sepp, Siim-Kaarel; Young, J Peter W; Moora, Mari; Öpik, Maarja

Vasar, Martti; Davison, John; Neuenkamp, Lena; Sepp, Siim-Kaarel; Young, J Peter W; Moora, Mari; Öpik, Maarja.

Afiliação

Vasar M; Department of Botany, University of Tartu, Tartu, Estonia.
Davison J; Department of Botany, University of Tartu, Tartu, Estonia.
Neuenkamp L; Institute of Plant Sciences, University of Bern, Bern, Switzerland.
Sepp SK; Department of Botany, University of Tartu, Tartu, Estonia.
Young JPW; Department of Biology, University of York, York, UK.
Moora M; Department of Botany, University of Tartu, Tartu, Estonia.
Öpik M; Department of Botany, University of Tartu, Tartu, Estonia.

Mol Ecol Resour ; 21(4): 1380-1392, 2021 May.

Article em En | MEDLINE | ID: mdl-33527735

RESUMO

High-throughput sequencing (HTS) of multiple organisms in parallel (metabarcoding) has become a routine and cost-effective method for the analysis of microbial communities in environmental samples. However, careful data treatment is required to identify potential errors in HTS data, and the large volume of data generated by HTS requires in-house experience with command line tools for downstream analysis. This paper introduces a pipeline that incorporates the most common command line tools into an easy-to-use graphical interface-gDAT. By using the Python scripting language, the pipeline is compatible with the latest Windows, macOS and Linux operating systems. The pipeline supports analysis of Sanger, 454, IonTorrent, Illumina and PacBio sequences, allows custom modification of quality filtering steps, and implements both open and closed-reference operational taxonomic unit-picking for sequence identification. Predefined parameters are optimized for analysis of small subunit (SSU) rRNA gene amplicons from arbuscular mycorrhizal fungi, but the pipeline is widely applicable to metabarcoding studies targeting a broad range of organisms. The pipeline was additionally tested with data using general eukaryotic primers from the SSU gene region and fungal primers from the internal transcribed spacer (ITS) marker region. We describe the pipeline design and evaluate its performance and speed by conducting analysis of example data sets using different marker regions sequenced on Illumina platforms. The graphical interface, with the option to use the command line if needed, provides an accessible tool for rapid data analysis with repeatability and logging capabilities. Keeping the software open-source maximizes code accessibility, allowing scrutiny and bug fixes by the community.

Assuntos

Biologia Computacional; Fungos; Sequenciamento de Nucleotídeos em Larga Escala; Software; DNA Fúngico/genética; DNA Espaçador Ribossômico/genética; Fungos/genética

Palavras-chave

arbuscular mycorrhizal fungi; high-throughput sequencing; pipeline; sequencing data analysis; software; teaching tool

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Software / Biologia Computacional / Sequenciamento de Nucleotídeos em Larga Escala / Fungos Idioma: En Revista: Mol Ecol Resour Ano de publicação: 2021 Tipo de documento: Article País de afiliação: Estônia

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google