RADProc: A computationally efficient de novo locus assembler for population studies using RADseq data.
Mol Ecol Resour
; 19(1): 272-282, 2019 Jan.
Article
em En
| MEDLINE
| ID: mdl-30312001
Restriction site-associated DNA sequencing (RADseq) is a powerful tool for genotyping of individuals, but the identification of loci and assignment of sequence reads is a crucial and often challenging step. The optimal parameter settings for a given de novo RADseq assembly vary between data sets and can be difficult and computationally expensive to determine. Here, we introduce RADProc, a software package that uses a graph data structure to represent all sequence reads and their similarity relationships. Storing sequence-comparison results in a graph eliminates unnecessary and redundant sequence similarity calculations. De novo locus formation for a given parameter set can be performed on the precomputed graph, making parameter sweeps far more efficient. RADProc also uses a clustering approach for faster nucleotide-distance calculation. The performance of RADProc compares favourably with that of the widely used Stacks software. The run-time comparisons between RADProc and Stacks for 32 different parameter settings using 20 green-crab (Carcinus maenas) samples showed that RADProc took as little as 2 hr 40 min compared to 78 hr by Stacks, while 16 brown trout (Salmo trutta L.) samples were processed by RADProc and Stacks in 23 and 263 hr, respectively. Comparisons of the de novo loci formed, and catalog built using both the methods demonstrate that the improvement in processing speeds achieved by RADProc does not affect much the actual loci formed and the results of downstream analyses based on those loci.
Palavras-chave
Texto completo:
1
Base de dados:
MEDLINE
Assunto principal:
Software
/
Análise de Sequência de DNA
/
Biologia Computacional
/
Genômica
/
Loci Gênicos
Idioma:
En
Ano de publicação:
2019
Tipo de documento:
Article