Your browser doesn't support javascript.
loading
Distinguishing low frequency mutations from RT-PCR and sequence errors in viral deep sequencing data.
Orton, Richard J; Wright, Caroline F; Morelli, Marco J; King, David J; Paton, David J; King, Donald P; Haydon, Daniel T.
Afiliação
  • Orton RJ; Boyd Orr Centre for Population and Ecosystem Health, Institute of Biodiversity, Animal Health and Comparative Medicine, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, G12 8QQ, United Kingdom. Richard.Orton@glasgow.ac.uk.
  • Wright CF; Medical Research Council-University of Glasgow Centre for Virus Research, Institute of Infection, Inflammation and Immunity, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, G12 8QQ, United Kingdom. Richard.Orton@glasgow.ac.uk.
  • Morelli MJ; Pirbright Institute, Ash Road, Pirbright, GU24 0NF, UK. caroline.wright79@gmail.com.
  • King DJ; Center for Genomic Science of IIT@SEMM, Istituto Italiano di Tecnologia at the IFOM-IEO Campus, Via Adamello 16, Milano, 20139, Italy. Marco.Morelli@iit.it.
  • Paton DJ; Pirbright Institute, Ash Road, Pirbright, GU24 0NF, UK. david.king@pirbright.ac.uk.
  • King DP; Pirbright Institute, Ash Road, Pirbright, GU24 0NF, UK. david.paton@pirbright.ac.uk.
  • Haydon DT; Pirbright Institute, Ash Road, Pirbright, GU24 0NF, UK. donald.king@pirbright.ac.uk.
BMC Genomics ; 16: 229, 2015 Mar 24.
Article em En | MEDLINE | ID: mdl-25886445
ABSTRACT

BACKGROUND:

RNA viruses have high mutation rates and exist within their hosts as large, complex and heterogeneous populations, comprising a spectrum of related but non-identical genome sequences. Next generation sequencing is revolutionising the study of viral populations by enabling the ultra deep sequencing of their genomes, and the subsequent identification of the full spectrum of variants within the population. Identification of low frequency variants is important for our understanding of mutational dynamics, disease progression, immune pressure, and for the detection of drug resistant or pathogenic mutations. However, the current challenge is to accurately model the errors in the sequence data and distinguish real viral variants, particularly those that exist at low frequency, from errors introduced during sequencing and sample processing, which can both be substantial.

RESULTS:

We have created a novel set of laboratory control samples that are derived from a plasmid containing a full-length viral genome with extremely limited diversity in the starting population. One sample was sequenced without PCR amplification whilst the other samples were subjected to increasing amounts of RT and PCR amplification prior to ultra-deep sequencing. This enabled the level of error introduced by the RT and PCR processes to be assessed and minimum frequency thresholds to be set for true viral variant identification. We developed a genome-scale computational model of the sample processing and NGS calling process to gain a detailed understanding of the errors at each step, which predicted that RT and PCR errors are more likely to occur at some genomic sites than others. The model can also be used to investigate whether the number of observed mutations at a given site of interest is greater than would be expected from processing errors alone in any NGS data set. After providing basic sample processing information and the site's coverage and quality scores, the model utilises the fitted RT-PCR error distributions to simulate the number of mutations that would be observed from processing errors alone.

CONCLUSIONS:

These data sets and models provide an effective means of separating true viral mutations from those erroneously introduced during sample processing and sequencing.
Assuntos

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Reação em Cadeia da Polimerase Via Transcriptase Reversa / Sequenciamento de Nucleotídeos em Larga Escala Tipo de estudo: Prognostic_studies Idioma: En Ano de publicação: 2015 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Reação em Cadeia da Polimerase Via Transcriptase Reversa / Sequenciamento de Nucleotídeos em Larga Escala Tipo de estudo: Prognostic_studies Idioma: En Ano de publicação: 2015 Tipo de documento: Article