The parallelism motifs of genomic data analysis.

Yelick, Katherine; Buluç, Aydin; Awan, Muaaz; Azad, Ariful; Brock, Benjamin; Egan, Rob; Ekanayake, Saliya; Ellis, Marquita; Georganas, Evangelos; Guidi, Giulia; Hofmeyr, Steven; Selvitopi, Oguz; Teodoropol, Cristina; Oliker, Leonid

Yelick, Katherine; Buluç, Aydin; Awan, Muaaz; Azad, Ariful; Brock, Benjamin; Egan, Rob; Ekanayake, Saliya; Ellis, Marquita; Georganas, Evangelos; Guidi, Giulia; Hofmeyr, Steven; Selvitopi, Oguz; Teodoropol, Cristina; Oliker, Leonid.

Afiliação

Yelick K; Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
Buluç A; Department of Electrical Engineering and Computer Sciences, University of California at Berkeley, Berkeley, CA, USA.
Awan M; Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
Azad A; Department of Electrical Engineering and Computer Sciences, University of California at Berkeley, Berkeley, CA, USA.
Brock B; Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
Egan R; School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN, USA.
Ekanayake S; Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
Ellis M; Department of Electrical Engineering and Computer Sciences, University of California at Berkeley, Berkeley, CA, USA.
Georganas E; DOE Joint Genome Institute, Walnut Creek, CA, USA.
Guidi G; Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
Hofmeyr S; Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
Selvitopi O; Department of Electrical Engineering and Computer Sciences, University of California at Berkeley, Berkeley, CA, USA.
Teodoropol C; Intel Labs, Santa Clara, CA, USA.
Oliker L; Lawrence Berkeley National Laboratory, Berkeley, CA, USA.

Philos Trans A Math Phys Eng Sci ; 378(2166): 20190394, 2020 Mar 06.

Article em En | MEDLINE | ID: mdl-31955674

ABSTRACT

ABSTRACT

Genomic datasets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share these data with the research community, but some of these genomic data analysis problems require large-scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high-end parallel systems today and place different requirements on programming support, software libraries and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high-performance genomics analysis, including alignment, profiling, clustering and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or 'motifs' that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'.

Palavras-chave

bioinformatics; high-performance data analytics; parallel computing

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Revista: Philos Trans A Math Phys Eng Sci Assunto da revista: BIOFISICA / ENGENHARIA BIOMEDICA Ano de publicação: 2020 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google