Scalable linkage-disequilibrium-based selective sweep detection: a performance guide.

Alachiotis, Nikolaos; Pavlidis, Pavlos

Alachiotis, Nikolaos; Pavlidis, Pavlos.

Afiliação

Alachiotis N; Department of Electrical and Computer Engineering, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, 15213 PA USA.
Pavlidis P; Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology-Hellas, Crete, 70013 Greece.

Gigascience ; 5: 7, 2016.

Article em En | MEDLINE | ID: mdl-26862394

ABSTRACT

ABSTRACT

BACKGROUND:

Linkage disequilibrium is defined as the non-random associations of alleles at different loci, and it occurs when genotypes at the two loci depend on each other. The model of genetic hitchhiking predicts that strong positive selection affects the patterns of linkage disequilibrium around the site of a beneficial allele, resulting in specific motifs of correlation between neutral polymorphisms that surround the fixed beneficial allele. Increased levels of linkage disequilibrium are observed on the same side of a beneficial allele, and diminish between sites on different sides of a beneficial mutation. This specific pattern of linkage disequilibrium occurs more frequently when positive selection has acted on the population rather than under various neutral models. Thus, detecting such patterns could accurately reveal targets of positive selection along a recombining chromosome or a genome. Calculating linkage disequilibria in whole genomes is computationally expensive because allele correlations need to be evaluated for millions of pairs of sites. To analyze large datasets efficiently, algorithmic implementations used in modern population genetics need to exploit multiple cores of current workstations in a scalable way. However, population genomic datasets come in various types and shapes while typically showing SNP density heterogeneity, which makes the implementation of generally scalable parallel algorithms a challenging task.

FINDINGS:

Here we present a series of four parallelization strategies targeting shared-memory systems for the computationally intensive problem of detecting genomic regions that have contributed to the past adaptation of the species, also referred to as regions that have undergone a selective sweep, based on linkage disequilibrium patterns. We provide a thorough performance evaluation of the proposed parallel algorithms for computing linkage disequilibrium, and outline the benefits of each approach. Furthermore, we compare the accuracy of our open-source sweep-detection software OmegaPlus, which implements all four parallelization strategies presented here, with a variety of neutrality tests.

CONCLUSIONS:

The computational demands of selective sweep detection algorithms depend greatly on the SNP density heterogeneity and the data representation. Choosing the right parallel algorithm for the analysis can lead to significant processing time reduction and major energy savings. However, determining which parallel algorithm will execute more efficiently on a specific processor architecture and number of available cores for a particular dataset is not straightforward.

Assuntos

Algoritmos; Biologia Computacional/métodos; Desequilíbrio de Ligação; Seleção Genética; Alelos; Cromossomos Humanos Par 1/genética; Simulação por Computador; Frequência do Gene; Genética Populacional/métodos; Genoma Humano/genética; Genótipo; Humanos; Modelos Genéticos; Mutação; Polimorfismo de Nucleotídeo Único; Reprodutibilidade dos Testes; Análise de Sequência de DNA

Palavras-chave

High performance; Linkage disequilibrium; Omega statistic; OmegaPlus

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Seleção Genética / Algoritmos / Desequilíbrio de Ligação / Biologia Computacional Tipo de estudo: Diagnostic_studies / Prognostic_studies Limite: Humans Idioma: En Revista: Gigascience Ano de publicação: 2016 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google