Pesquisa | Portal Regional da BVS

Correction: Memory hierarchy characterization of SPEC CPU2006 and SPEC CPU2017 on the Intel Xeon Skylake-SP.

Navarro-Torres, Agustín; Alastruey-Benedé, Jesús; Ibáñez-Marín, Pablo; Viñals-Yúfera, Víctor.

PLoS One ; 19(5): e0303712, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38722938

RESUMO

[This corrects the article DOI: 10.1371/journal.pone.0220135.].

Porting and Optimizing BWA-MEM2 Using the Fujitsu A64FX Processor.

Langarita, Ruben; Armejach, Adria; Ibanez, Pablo; Alastruey-Benede, Jesus; Moreto, Miquel.

IEEE/ACM Trans Comput Biol Bioinform ; 20(5): 3139-3153, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37018085

RESUMO

Sequence alignment pipelines for human genomes are an emerging workload that will dominate in the precision medicine field. BWA-MEM2 is a tool widely used in the scientific community to perform read mapping studies. In this paper, we port BWA-MEM2 to the AArch64 architecture using the ARMv8-A specification, and we compare the resulting version against an Intel Skylake system both in performance and in energy-to-solution. The porting effort entails numerous code modifications, since BWA-MEM2 implements certain kernels using x86_64 specific intrinsics, e.g., AVX-512. To adapt this code we use the recently introduced Arm's Scalable Vector Extensions (SVE). More specifically, we use Fujitsu's A64FX processor, the first to implement SVE. The A64FX powers the Fugaku Supercomputer that led the Top500 ranking from June 2020 to November 2021. After porting BWA-MEM2 we define and implement a number of optimizations to improve performance in the A64FX target architecture. We show that while the A64FX performance is lower than that of the Skylake system, A64FX delivers 11.6% better energy-to-solution on average. All the code used for this article is available at https://gitlab.bsc.es/rlangari/bwa-a64fx.

Assuntos

Algoritmos , Software , Humanos , Análise de Sequência de DNA/métodos , Computadores , Alinhamento de Sequência , Sequenciamento de Nucleotídeos em Larga Escala/métodos

Compressed Sparse FM-Index: Fast Sequence Alignment Using Large K-Steps.

Langarita, Ruben; Armejach, Adria; Setoain, Javier; Ibanez-Marin, Pablo; Alastruey-Benede, Jesus; Moreto, Miquel.

IEEE/ACM Trans Comput Biol Bioinform ; 19(1): 355-368, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-32750858

RESUMO

The FM-index is a data structure used in genomics for exact search of input sequences over large reference genomes. Algorithms based on the FM-index show an irregular memory access pattern, resulting in a memory bound problem. We analyze a recent implementation of the FM-index and highlight existing throughput-memory trade-offs, showing that memory requirements limit implementation of large k-steps. We propose COFI, a COmpressed FM-Index for large K-steps. COFI enables a 15-step FM-index using less than 16 GB for a human genome reference of 3 giga base pairs. An algorithm based on this new layout is evaluated on both a Knights Landing (KNL) and an Skylake-based system (SKX). We achieve average speed-ups of 1.46× and 1.39×, respectively, with respect to an state-of-the-art FM-index implementation that is already well optimized.

Assuntos

Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Algoritmos , Genoma Humano , Humanos , Alinhamento de Sequência , Análise de Sequência de DNA , Software

Expanding the Limits of Computer-Assisted Sperm Analysis through the Development of Open Software.

Yániz, Jesús; Alquézar-Baeta, Carlos; Yagüe-Martínez, Jorge; Alastruey-Benedé, Jesús; Palacín, Inmaculada; Boryshpolets, Sergii; Kholodnyy, Vitaliy; Gadêlha, Hermes; Pérez-Pe, Rosaura.

Biology (Basel) ; 9(8)2020 Aug 05.

Artigo em Inglês | MEDLINE | ID: mdl-32764457

RESUMO

Computer assisted sperm analysis (CASA) systems can reduce errors occurring in manual analysis. However, commercial CASA systems are frequently not applicable at the forefront of challenging research endeavors. The development of open source software may offer important solutions for researchers working in related areas. Here, we present an example of this, with the development of three new modules for the OpenCASA software (hosted at Github). The first is the Chemotactic Sperm Accumulation Module, a powerful tool for studying sperm chemotactic behavior, analyzing the sperm accumulation in the direct vicinity of the stimuli. This module was validated by comparing fish sperm accumulation, with or without the influence of an attractant. The analysis clearly indicated cell accumulation in the treatment group, while the distribution of sperm was random in the control group. The second is the Sperm Functionality Module, based on the ability to recognize five sperm subpopulations according to their fluorescence patterns associated with the plasma membrane and acrosomal status. The last module is the Sperm Concentration Module, which expands the utilities of OpenCASA. These last two modules were validated, using bull sperm, by comparing them with visual counting by an observer. A high level of correlation was achieved in almost all the data, and a good agreement between both methods was obtained. With these newly developed modules, OpenCASA is consolidated as a powerful free and open-source tool that allows different aspects of sperm quality to be evaluated, with many potential applications for researchers.

Accelerating Sequence Alignments Based on FM-Index Using the Intel KNL Processor.

Herruzo, Jose M; Gonzalez-Navarro, Sonia; Ibanez-Marin, Pablo; Vinals-Yufera, Victor; Alastruey-Benede, Jesus; Plata, Oscar.

IEEE/ACM Trans Comput Biol Bioinform ; 17(4): 1093-1104, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-30530369

RESUMO

FM-index is a compact data structure suitable for fast matches of short reads to large reference genomes. The matching algorithm using this index exhibits irregular memory access patterns that cause frequent cache misses, resulting in a memory bound problem. This paper analyzes different FM-index versions presented in the literature, focusing on those computing aspects related to the data access. As a result of the analysis, we propose a new organization of FM-index that minimizes the demand for memory bandwidth, allowing a great improvement of performance on processors with high-bandwidth memory, such as the second-generation Intel Xeon Phi (Knights Landing, or KNL), integrating ultra high-bandwidth stacked memory technology. As the roofline model shows, our implementation reaches 95 percent of the peak random access bandwidth limit when executed on the KNL and almost all of the available bandwidth when executed on other Intel Xeon architectures with conventional DDR memory. In addition, the obtained throughput in KNL is much higher than the results reported for GPUs in the literature.

Assuntos

Genômica , Alinhamento de Sequência , Algoritmos , Computadores , DNA/genética , Genoma Humano/genética , Genômica/instrumentação , Genômica/métodos , Humanos , Alinhamento de Sequência/instrumentação , Alinhamento de Sequência/métodos

Memory hierarchy characterization of SPEC CPU2006 and SPEC CPU2017 on the Intel Xeon Skylake-SP.

Navarro-Torres, Agustín; Alastruey-Benedé, Jesús; Ibáñez-Marín, Pablo; Viñals-Yúfera, Víctor.

PLoS One ; 14(8): e0220135, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-31369592

RESUMO

SPEC CPU is one of the most common benchmark suites used in computer architecture research. CPU2017 has recently been released to replace CPU2006. In this paper we present a detailed evaluation of the memory hierarchy performance for both the CPU2006 and single-threaded CPU2017 benchmarks. The experiments were executed on an Intel Xeon Skylake-SP, which is the first Intel processor to implement a mostly non-inclusive last-level cache (LLC). We present a classification of the benchmarks according to their memory pressure and analyze the performance impact of different LLC sizes. We also test all the hardware prefetchers showing they improve performance in most of the benchmarks. After comprehensive experimentation, we can highlight the following conclusions: i) almost half of SPEC CPU benchmarks have very low miss ratios in the second and third level caches, even with small LLC sizes and without hardware prefetching, ii) overall, the SPEC CPU2017 benchmarks demand even less memory hierarchy resources than the SPEC CPU2006 ones, iii) hardware prefetching is very effective in reducing LLC misses for most benchmarks, even with the smallest LLC size, and iv) from the memory hierarchy standpoint the methodologies commonly used to select benchmarks or simulation points do not guarantee representative workloads.

Assuntos

Algoritmos , Benchmarking , Sistemas Computacionais/normas , Computadores/normas , Software

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA