Your browser doesn't support javascript.
loading
Detection of simple and complex de novo mutations with multiple reference sequences.
Garimella, Kiran V; Iqbal, Zamin; Krause, Michael A; Campino, Susana; Kekre, Mihir; Drury, Eleanor; Kwiatkowski, Dominic; Sá, Juliana M; Wellems, Thomas E; McVean, Gil.
Afiliação
  • Garimella KV; Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA.
  • Iqbal Z; Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, Oxfordshire, OX3 7BN, United Kingdom.
  • Krause MA; Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, Oxfordshire, OX3 7LF, United Kingdom.
  • Campino S; Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, Oxfordshire, OX3 7BN, United Kingdom.
  • Kekre M; European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, United Kingdom.
  • Drury E; Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, Oxfordshire, OX3 7BN, United Kingdom.
  • Kwiatkowski D; The Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, United Kingdom.
  • Sá JM; Laboratory of Malaria and Vector Research, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland 20892, USA.
  • Wellems TE; The Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, United Kingdom.
  • McVean G; The Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, United Kingdom.
Genome Res ; 30(8): 1154-1169, 2020 08.
Article em En | MEDLINE | ID: mdl-32817236
ABSTRACT
The characterization of de novo mutations in regions of high sequence and structural diversity from whole-genome sequencing data remains highly challenging. Complex structural variants tend to arise in regions of high repetitiveness and low complexity, challenging both de novo assembly, in which short reads do not capture the long-range context required for resolution, and mapping approaches, in which improper alignment of reads to a reference genome that is highly diverged from that of the sample can lead to false or partial calls. Long-read technologies can potentially solve such problems but are currently unfeasible to use at scale. Here we present Corticall, a graph-based method that combines the advantages of multiple technologies and prior data sources to detect arbitrary classes of genetic variant. We construct multisample, colored de Bruijn graphs from short-read data for all samples, align long-read-derived haplotypes and multiple reference data sources to restore graph connectivity information, and call variants using graph path-finding algorithms and a model for simultaneous alignment and recombination. We validate and evaluate the approach using extensive simulations and use it to characterize the rate and spectrum of de novo mutation events in 119 progeny from four Plasmodium falciparum experimental crosses, using long-read data on the parents to inform reconstructions of the progeny and to detect several known and novel nonallelic homologous recombination events.
Assuntos

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Plasmodium falciparum / Genoma de Protozoário / Sequenciamento de Nucleotídeos em Larga Escala / Sequenciamento Completo do Genoma / Mutação Tipo de estudo: Diagnostic_studies / Prognostic_studies Idioma: En Ano de publicação: 2020 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Plasmodium falciparum / Genoma de Protozoário / Sequenciamento de Nucleotídeos em Larga Escala / Sequenciamento Completo do Genoma / Mutação Tipo de estudo: Diagnostic_studies / Prognostic_studies Idioma: En Ano de publicação: 2020 Tipo de documento: Article