Your browser doesn't support javascript.
loading
Bias from removing read duplication in ultra-deep sequencing experiments.
Zhou, Wanding; Chen, Tenghui; Zhao, Hao; Eterovic, Agda Karina; Meric-Bernstam, Funda; Mills, Gordon B; Chen, Ken.
Afiliação
  • Zhou W; Department of Bioinformatics and Computational Biology, Department of Systems Biology, Institute of Personalized Cancer Therapy and Department of Investigational Cancer Therapy, The University of Texas MD Anderson Cancer Center, Houston TX 77030, USA.
  • Chen T; Department of Bioinformatics and Computational Biology, Department of Systems Biology, Institute of Personalized Cancer Therapy and Department of Investigational Cancer Therapy, The University of Texas MD Anderson Cancer Center, Houston TX 77030, USA.
  • Zhao H; Department of Bioinformatics and Computational Biology, Department of Systems Biology, Institute of Personalized Cancer Therapy and Department of Investigational Cancer Therapy, The University of Texas MD Anderson Cancer Center, Houston TX 77030, USA.
  • Eterovic AK; Department of Bioinformatics and Computational Biology, Department of Systems Biology, Institute of Personalized Cancer Therapy and Department of Investigational Cancer Therapy, The University of Texas MD Anderson Cancer Center, Houston TX 77030, USA Department of Bioinformatics and Computational Bi
  • Meric-Bernstam F; Department of Bioinformatics and Computational Biology, Department of Systems Biology, Institute of Personalized Cancer Therapy and Department of Investigational Cancer Therapy, The University of Texas MD Anderson Cancer Center, Houston TX 77030, USA Department of Bioinformatics and Computational Bi
  • Mills GB; Department of Bioinformatics and Computational Biology, Department of Systems Biology, Institute of Personalized Cancer Therapy and Department of Investigational Cancer Therapy, The University of Texas MD Anderson Cancer Center, Houston TX 77030, USA Department of Bioinformatics and Computational Bi
  • Chen K; Department of Bioinformatics and Computational Biology, Department of Systems Biology, Institute of Personalized Cancer Therapy and Department of Investigational Cancer Therapy, The University of Texas MD Anderson Cancer Center, Houston TX 77030, USA.
Bioinformatics ; 30(8): 1073-1080, 2014 04 15.
Article em En | MEDLINE | ID: mdl-24389657
ABSTRACT
MOTIVATION Identifying subclonal mutations and their implications requires accurate estimation of mutant allele fractions from possibly duplicated sequencing reads. Removing duplicate reads assumes that polymerase chain reaction amplification from library constructions is the primary source. The alternative-sampling coincidence from DNA fragmentation-has not been systematically investigated.

RESULTS:

With sufficiently high-sequencing depth, sampling-induced read duplication is non-negligible, and removing duplicate reads can overcorrect read counts, causing systemic biases in variant allele fraction and copy number variation estimations. Minimal overcorrection occurs when duplicate reads are identified accounting for their mate reads, inserts are of a variety of lengths and samples are sequenced in separate batches. We investigate sampling-induced read duplication in deep sequencing data with 500× to 2000× duplicates-removed sequence coverage. We provide a quantitative solution to overcorrection and guidance for effective designs of deep sequencing platforms that facilitate accurate estimation of variant allele fraction and copy number variation. AVAILABILITY AND IMPLEMENTATION A Python implementation is freely available at https//bitbucket.org/wanding/duprecover/overview CONTACT wzhou1@mdanderson.org, kchen3@mdanderson.org Supplementary information Supplementary data are available at Bioinformatics online.
Assuntos

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Análise de Sequência de DNA / Variações do Número de Cópias de DNA / Sequenciamento de Nucleotídeos em Larga Escala Idioma: En Ano de publicação: 2014 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Análise de Sequência de DNA / Variações do Número de Cópias de DNA / Sequenciamento de Nucleotídeos em Larga Escala Idioma: En Ano de publicação: 2014 Tipo de documento: Article