Population-based structural variation discovery with Hydra-Multi.

Lindberg, Michael R; Hall, Ira M; Quinlan, Aaron R

Lindberg, Michael R; Hall, Ira M; Quinlan, Aaron R.

Afiliação

Lindberg MR; Department of Biochemistry and Molecular Genetics, Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA, Department of Medicine, The Genome Institute, Washington University School of Medicine, St. Louis MO, USA and Department of Public Health Sciences, University of Virginia, Charlottesville, VA, USA.
Hall IM; Department of Biochemistry and Molecular Genetics, Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA, Department of Medicine, The Genome Institute, Washington University School of Medicine, St. Louis MO, USA and Department of Public Health Sciences, University of Virginia, Charlottesville, VA, USA Department of Biochemistry and Molecular Genetics, Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA, Department of Medicine, The Gen
Quinlan AR; Department of Biochemistry and Molecular Genetics, Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA, Department of Medicine, The Genome Institute, Washington University School of Medicine, St. Louis MO, USA and Department of Public Health Sciences, University of Virginia, Charlottesville, VA, USA Department of Biochemistry and Molecular Genetics, Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA, Department of Medicine, The Gen

Bioinformatics ; 31(8): 1286-9, 2015 Apr 15.

Article em En | MEDLINE | ID: mdl-25527832

RESUMO

UNLABELLED: Current strategies for SNP and INDEL discovery incorporate sequence alignments from multiple individuals to maximize sensitivity and specificity. It is widely accepted that this approach also improves structural variant (SV) detection. However, multisample SV analysis has been stymied by the fundamental difficulties of SV calling, e.g. library insert size variability, SV alignment signal integration and detecting long-range genomic rearrangements involving disjoint loci. Extant tools suffer from poor scalability, which limits the number of genomes that can be co-analyzed and complicates analysis workflows. We have developed an approach that enables multisample SV analysis in hundreds to thousands of human genomes using commodity hardware. Here, we describe Hydra-Multi and measure its accuracy, speed and scalability using publicly available datasets provided by The 1000 Genomes Project and by The Cancer Genome Atlas (TCGA). AVAILABILITY AND IMPLEMENTATION: Hydra-Multi is written in C++ and is freely available at https://github.com/arq5x/Hydra. CONTACT: aaronquinlan@gmail.com or ihall@genome.wustl.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Genética Populacional; Variação Estrutural do Genoma; Genômica/métodos; Hydra/genética; Software; Animais; Bases de Dados Factuais; Deleção de Genes; Humanos; Hydra/classificação; Alinhamento de Sequência

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Software / Genômica / Variação Estrutural do Genoma / Genética Populacional / Hydra Limite: Animals / Humans Idioma: En Ano de publicação: 2015 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google