Reproducibility of Illumina platform deep sequencing errors allows accurate determination of DNA barcodes in cells.

Beltman, Joost B; Urbanus, Jos; Velds, Arno; van Rooij, Nienke; Rohr, Jan C; Naik, Shalin H; Schumacher, Ton N

Beltman, Joost B; Urbanus, Jos; Velds, Arno; van Rooij, Nienke; Rohr, Jan C; Naik, Shalin H; Schumacher, Ton N.

Afiliação

Beltman JB; Division of Immunology, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX, Amsterdam, The Netherlands. j.b.beltman@lacdr.leidenuniv.nl.
Urbanus J; Division of Toxicology, Leiden Academic Centre for Drug Research, Leiden University, 2333 CC, Leiden, The Netherlands. j.b.beltman@lacdr.leidenuniv.nl.
Velds A; Division of Immunology, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX, Amsterdam, The Netherlands.
van Rooij N; Genomics Core Facility, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX, Amsterdam, The Netherlands.
Rohr JC; Division of Immunology, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX, Amsterdam, The Netherlands.
Naik SH; Division of Immunology, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX, Amsterdam, The Netherlands.
Schumacher TN; Center for Chronic Immunodeficiency (CCI), University Medical Center Freiburg and University of Freiburg, Freiburg, Germany.

BMC Bioinformatics ; 17: 151, 2016 Apr 02.

Article em En | MEDLINE | ID: mdl-27038897

ABSTRACT

ABSTRACT

BACKGROUND:

Next generation sequencing (NGS) of amplified DNA is a powerful tool to describe genetic heterogeneity within cell populations that can both be used to investigate the clonal structure of cell populations and to perform genetic lineage tracing. For applications in which both abundant and rare sequences are biologically relevant, the relatively high error rate of NGS techniques complicates data analysis, as it is difficult to distinguish rare true sequences from spurious sequences that are generated by PCR or sequencing errors. This issue, for instance, applies to cellular barcoding strategies that aim to follow the amount and type of offspring of single cells, by supplying these with unique heritable DNA tags.

RESULTS:

Here, we use genetic barcoding data from the Illumina HiSeq platform to show that straightforward read threshold-based filtering of data is typically insufficient to filter out spurious barcodes. Importantly, we demonstrate that specific sequencing errors occur at an approximately constant rate across different samples that are sequenced in parallel. We exploit this observation by developing a novel approach to filter out spurious sequences.

CONCLUSIONS:

Application of our new method demonstrates its value in the identification of true sequences amongst spurious sequences in biological data sets.

Assuntos

Código de Barras de DNA Taxonômico; DNA/análise; Sequenciamento de Nucleotídeos em Larga Escala; Animais; Sequência de Bases; DNA/química; Camundongos; Reação em Cadeia da Polimerase; Análise de Sequência de DNA; Células-Tronco/citologia; Células-Tronco/metabolismo

Palavras-chave

Cellular barcoding; Illumina; Lineage tracing; Next generation sequencing; PCR error; Sequencing error

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: DNA / Código de Barras de DNA Taxonômico / Sequenciamento de Nucleotídeos em Larga Escala Limite: Animals Idioma: En Ano de publicação: 2016 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google