Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 43
Filtrar
Mais filtros

País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Bioinformatics ; 39(5)2023 05 04.
Artigo em Inglês | MEDLINE | ID: mdl-36971586

RESUMO

MOTIVATION: Sequence alignment is a memory bound computation whose performance in modern systems is limited by the memory bandwidth bottleneck. Processing-in-memory (PIM) architectures alleviate this bottleneck by providing the memory with computing competencies. We propose Alignment-in-Memory (AIM), a framework for high-throughput sequence alignment using PIM, and evaluate it on UPMEM, the first publicly available general-purpose programmable PIM system. RESULTS: Our evaluation shows that a real PIM system can substantially outperform server-grade multi-threaded CPU systems running at full-scale when performing sequence alignment for a variety of algorithms, read lengths, and edit distance thresholds. We hope that our findings inspire more work on creating and accelerating bioinformatics algorithms for such real PIM systems. AVAILABILITY AND IMPLEMENTATION: Our code is available at https://github.com/safaad/aim.


Assuntos
Algoritmos , Software , Alinhamento de Sequência , Biologia Computacional , Análise de Sequência de DNA , Sequenciamento de Nucleotídeos em Larga Escala
2.
Bioinformatics ; 39(39 Suppl 1): i297-i307, 2023 06 30.
Artigo em Inglês | MEDLINE | ID: mdl-37387139

RESUMO

Nanopore sequencers generate electrical raw signals in real-time while sequencing long genomic strands. These raw signals can be analyzed as they are generated, providing an opportunity for real-time genome analysis. An important feature of nanopore sequencing, Read Until, can eject strands from sequencers without fully sequencing them, which provides opportunities to computationally reduce the sequencing time and cost. However, existing works utilizing Read Until either (i) require powerful computational resources that may not be available for portable sequencers or (ii) lack scalability for large genomes, rendering them inaccurate or ineffective. We propose RawHash, the first mechanism that can accurately and efficiently perform real-time analysis of nanopore raw signals for large genomes using a hash-based similarity search. To enable this, RawHash ensures the signals corresponding to the same DNA content lead to the same hash value, regardless of the slight variations in these signals. RawHash achieves an accurate hash-based similarity search via an effective quantization of the raw signals such that signals corresponding to the same DNA content have the same quantized value and, subsequently, the same hash value. We evaluate RawHash on three applications: (i) read mapping, (ii) relative abundance estimation, and (iii) contamination analysis. Our evaluations show that RawHash is the only tool that can provide high accuracy and high throughput for analyzing large genomes in real-time. When compared to the state-of-the-art techniques, UNCALLED and Sigmap, RawHash provides (i) 25.8× and 3.4× better average throughput and (ii) significantly better accuracy for large genomes, respectively. Source code is available at https://github.com/CMU-SAFARI/RawHash.


Assuntos
Sequenciamento por Nanoporos , Nanoporos , Genômica , Ploidias , DNA
3.
Bioinformatics ; 39(5)2023 05 04.
Artigo em Inglês | MEDLINE | ID: mdl-36961334

RESUMO

MOTIVATION: Pairwise sequence alignment is a very time-consuming step in common bioinformatics pipelines. Speeding up this step requires heuristics, efficient implementations, and/or hardware acceleration. A promising candidate for all of the above is the recently proposed GenASM algorithm. We identify and address three inefficiencies in the GenASM algorithm: it has a high amount of data movement, a large memory footprint, and does some unnecessary work. RESULTS: We propose Scrooge, a fast and memory-frugal genomic sequence aligner. Scrooge includes three novel algorithmic improvements which reduce the data movement, memory footprint, and the number of operations in the GenASM algorithm. We provide efficient open-source implementations of the Scrooge algorithm for CPUs and GPUs, which demonstrate the significant benefits of our algorithmic improvements. For long reads, the CPU version of Scrooge achieves a 20.1×, 1.7×, and 2.1× speedup over KSW2, Edlib, and a CPU implementation of GenASM, respectively. The GPU version of Scrooge achieves a 4.0×, 80.4×, 6.8×, 12.6×, and 5.9× speedup over the CPU version of Scrooge, KSW2, Edlib, Darwin-GPU, and a GPU implementation of GenASM, respectively. We estimate an ASIC implementation of Scrooge to use 3.6× less chip area and 2.1× less power than a GenASM ASIC while maintaining the same throughput. Further, we systematically analyze the throughput and accuracy behavior of GenASM and Scrooge under various configurations. As the best configuration of Scrooge depends on the computing platform, we make several observations that can help guide future implementations of Scrooge. AVAILABILITY AND IMPLEMENTATION: https://github.com/CMU-SAFARI/Scrooge.


Assuntos
Algoritmos , Computadores , Genoma , Genômica , Biologia Computacional
4.
Dev Dyn ; 252(10): 1247-1268, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37002896

RESUMO

High resolution assessment of cardiac functional parameters is crucial in translational animal research. The chick embryo is a historically well-used in vivo model for cardiovascular research due to its many practical advantages, and the conserved form and function of the chick and human cardiogenesis programs. This review aims to provide an overview of several different technical approaches for chick embryo cardiac assessment. Doppler echocardiography, optical coherence tomography, micromagnetic resonance imaging, microparticle image velocimetry, real-time pressure monitoring, and associated issues with the techniques will be discussed. Alongside this discussion, we also highlight recent advances in cardiac function measurements in chick embryos.


Assuntos
Fenômenos Fisiológicos Cardiovasculares , Coração , Animais , Embrião de Galinha , Humanos , Velocidade do Fluxo Sanguíneo/fisiologia , Coração/fisiologia , Tomografia de Coerência Óptica/métodos , Hemodinâmica
5.
Bioinformatics ; 38(19): 4633-4635, 2022 09 30.
Artigo em Inglês | MEDLINE | ID: mdl-35976109

RESUMO

MOTIVATION: A genome read dataset can be quickly and efficiently remapped from one reference to another similar reference (e.g., between two reference versions or two similar species) using a variety of tools, e.g., the commonly used CrossMap tool. With the explosion of available genomic datasets and references, high-performance remapping tools will be even more important for keeping up with the computational demands of genome assembly and analysis. RESULTS: We provide FastRemap, a fast and efficient tool for remapping reads between genome assemblies. FastRemap provides up to a 7.82× speedup (6.47×, on average) and uses as low as 61.7% (80.7%, on average) of the peak memory consumption compared to the state-of-the-art remapping tool, CrossMap. AVAILABILITY AND IMPLEMENTATION: FastRemap is written in C++. Source code and user manual are freely available at: github.com/CMU-SAFARI/FastRemap. Docker image available at: https://hub.docker.com/r/alkanlab/fastremap. Also available in Bioconda at: https://anaconda.org/bioconda/fastremap-bio.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Software , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Genômica/métodos , Genoma
6.
Bioinformatics ; 38(2): 453-460, 2022 01 03.
Artigo em Inglês | MEDLINE | ID: mdl-34529036

RESUMO

MOTIVATION: Agent-based modeling is an indispensable tool for studying complex biological systems. However, existing simulation platforms do not always take full advantage of modern hardware and often have a field-specific software design. RESULTS: We present a novel simulation platform called BioDynaMo that alleviates both of these problems. BioDynaMo features a modular and high-performance simulation engine. We demonstrate that BioDynaMo can be used to simulate use cases in: neuroscience, oncology and epidemiology. For each use case, we validate our findings with experimental data or an analytical solution. Our performance results show that BioDynaMo performs up to three orders of magnitude faster than the state-of-the-art baselines. This improvement makes it feasible to simulate each use case with one billion agents on a single server, showcasing the potential BioDynaMo has for computational biology research. AVAILABILITY AND IMPLEMENTATION: BioDynaMo is an open-source project under the Apache 2.0 license and is available at www.biodynamo.org. Instructions to reproduce the results are available in the supplementary information. SUPPLEMENTARY INFORMATION: Available at https://doi.org/10.5281/zenodo.5121618.


Assuntos
Algoritmos , Software , Simulação por Computador , Biologia Computacional/métodos , Design de Software
7.
Bioinformatics ; 36(22-23): 5282-5290, 2021 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-33315064

RESUMO

MOTIVATION: We introduce SneakySnake, a highly parallel and highly accurate pre-alignment filter that remarkably reduces the need for computationally costly sequence alignment. The key idea of SneakySnake is to reduce the approximate string matching (ASM) problem to the single net routing (SNR) problem in VLSI chip layout. In the SNR problem, we are interested in finding the optimal path that connects two terminals with the least routing cost on a special grid layout that contains obstacles. The SneakySnake algorithm quickly solves the SNR problem and uses the found optimal path to decide whether or not performing sequence alignment is necessary. Reducing the ASM problem into SNR also makes SneakySnake efficient to implement on CPUs, GPUs and FPGAs. RESULTS: SneakySnake significantly improves the accuracy of pre-alignment filtering by up to four orders of magnitude compared to the state-of-the-art pre-alignment filters, Shouji, GateKeeper and SHD. For short sequences, SneakySnake accelerates Edlib (state-of-the-art implementation of Myers's bit-vector algorithm) and Parasail (state-of-the-art sequence aligner with a configurable scoring function), by up to 37.7× and 43.9× (>12× on average), respectively, with its CPU implementation, and by up to 413× and 689× (>400× on average), respectively, with FPGA and GPU acceleration. For long sequences, the CPU implementation of SneakySnake accelerates Parasail and KSW2 (sequence aligner of minimap2) by up to 979× (276.9× on average) and 91.7× (31.7× on average), respectively. As SneakySnake does not replace sequence alignment, users can still obtain all capabilities (e.g. configurable scoring functions) of the aligner of their choice, unlike existing acceleration efforts that sacrifice some aligner capabilities. AVAILABILITYAND IMPLEMENTATION: https://github.com/CMU-SAFARI/SneakySnake. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

8.
Tuberk Toraks ; 70(2): 149-156, 2022 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-35785879

RESUMO

Introduction: Pneumococcal infections and exacerbations are important causes of mortality and morbidity in chronic obstructive pulmonary disease (COPD). The use of inhaled corticosteroids and pneumococcal vaccination are suggested for the control of the disease progression and exacerbations. The aim of this study is to assess the effect of pneumococcal conjugate vaccine on pneumonia and exacerbation in COPD patients using inhaled corticosteroids (ICSs). The secondary aim is to analyze the effect of ICS use and different ICS types, if administered, on exacerbation and pneumonia incidence in the study population. Materials and Methods: Medical records of 108 adult patients with COPD who were vaccinated with the pneumococcal conjugate vaccine (PCV13) were retrospectively evaluated. The number of acute exacerbations and pneumonia within one year before and after vaccination were evaluated in all included COPD patients. The comparison analysis was also performed based on the ICS types. Result: There were statistically significant differences between the mean numbers of pneumonia and exacerbations before and after vaccination (p<0.05). There were no significant differences in the mean pneumonia attacks and acute exacerbations between patients using ICS and not using ICS (p> 0.05). Conclusions: This study revealed that PCV13 provides a significant decrease in both exacerbation and pneumonia episodes in COPD patients. On the other hand, the use of ICSs and the types of ICSs were not found to have adverse effects on pneumonia and acute exacerbations in vaccinated COPD patients.


Assuntos
Corticosteroides , Vacinas Pneumocócicas , Pneumonia , Doença Pulmonar Obstrutiva Crônica , Administração por Inalação , Corticosteroides/uso terapêutico , Adulto , Humanos , Vacinas Pneumocócicas/uso terapêutico , Pneumonia/complicações , Pneumonia/prevenção & controle , Doença Pulmonar Obstrutiva Crônica/complicações , Doença Pulmonar Obstrutiva Crônica/tratamento farmacológico , Estudos Retrospectivos , Vacinação , Vacinas Conjugadas/uso terapêutico
9.
Brief Bioinform ; 20(4): 1542-1559, 2019 07 19.
Artigo em Inglês | MEDLINE | ID: mdl-29617724

RESUMO

Nanopore sequencing technology has the potential to render other sequencing technologies obsolete with its ability to generate long reads and provide portability. However, high error rates of the technology pose a challenge while generating accurate genome assemblies. The tools used for nanopore sequence analysis are of critical importance, as they should overcome the high error rates of the technology. Our goal in this work is to comprehensively analyze current publicly available tools for nanopore sequence analysis to understand their advantages, disadvantages and performance bottlenecks. It is important to understand where the current tools do not perform well to develop better tools. To this end, we (1) analyze the multiple steps and the associated tools in the genome assembly pipeline using nanopore sequence data, and (2) provide guidelines for determining the appropriate tools for each step. Based on our analyses, we make four key observations: (1) the choice of the tool for basecalling plays a critical role in overcoming the high error rates of nanopore sequencing technology. (2) Read-to-read overlap finding tools, GraphMap and Minimap, perform similarly in terms of accuracy. However, Minimap has a lower memory usage, and it is faster than GraphMap. (3) There is a trade-off between accuracy and performance when deciding on the appropriate tool for the assembly step. The fast but less accurate assembler Miniasm can be used for quick initial assembly, and further polishing can be applied on top of it to increase the accuracy, which leads to faster overall assembly. (4) The state-of-the-art polishing tool, Racon, generates high-quality consensus sequences while providing a significant speedup over another polishing tool, Nanopolish. We analyze various combinations of different tools and expose the trade-offs between accuracy, performance, memory usage and scalability. We conclude that our observations can guide researchers and practitioners in making conscious and effective choices for each step of the genome assembly pipeline using nanopore sequence data. Also, with the help of bottlenecks we have found, developers can improve the current tools or build new ones that are both accurate and fast, to overcome the high error rates of the nanopore sequencing technology.


Assuntos
Genômica/métodos , Sequenciamento por Nanoporos/métodos , Animais , Mapeamento Cromossômico , Biologia Computacional , Escherichia coli/genética , Genoma Bacteriano , Genômica/estatística & dados numéricos , Genômica/tendências , Humanos , Sequenciamento por Nanoporos/estatística & dados numéricos , Sequenciamento por Nanoporos/tendências , Análise de Sequência de DNA , Software
10.
Bioinformatics ; 36(12): 3669-3679, 2020 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-32167530

RESUMO

MOTIVATION: Third-generation sequencing technologies can sequence long reads that contain as many as 2 million base pairs. These long reads are used to construct an assembly (i.e. the subject's genome), which is further used in downstream genome analysis. Unfortunately, third-generation sequencing technologies have high sequencing error rates and a large proportion of base pairs in these long reads is incorrectly identified. These errors propagate to the assembly and affect the accuracy of genome analysis. Assembly polishing algorithms minimize such error propagation by polishing or fixing errors in the assembly by using information from alignments between reads and the assembly (i.e. read-to-assembly alignment information). However, current assembly polishing algorithms can only polish an assembly using reads from either a certain sequencing technology or a small assembly. Such technology-dependency and assembly-size dependency require researchers to (i) run multiple polishing algorithms and (ii) use small chunks of a large genome to use all available readsets and polish large genomes, respectively. RESULTS: We introduce Apollo, a universal assembly polishing algorithm that scales well to polish an assembly of any size (i.e. both large and small genomes) using reads from all sequencing technologies (i.e. second- and third-generation). Our goal is to provide a single algorithm that uses read sets from all available sequencing technologies to improve the accuracy of assembly polishing and that can polish large genomes. Apollo (i) models an assembly as a profile hidden Markov model (pHMM), (ii) uses read-to-assembly alignment to train the pHMM with the Forward-Backward algorithm and (iii) decodes the trained model with the Viterbi algorithm to produce a polished assembly. Our experiments with real readsets demonstrate that Apollo is the only algorithm that (i) uses reads from any sequencing technology within a single run and (ii) scales well to polish large assemblies without splitting the assembly into multiple parts. AVAILABILITY AND IMPLEMENTATION: Source code is available at https://github.com/CMU-SAFARI/Apollo. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Software , Sequenciamento de Nucleotídeos em Larga Escala , Polônia , Análise de Sequência de DNA , Tecnologia
11.
Bioinformatics ; 35(21): 4255-4263, 2019 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-30923804

RESUMO

MOTIVATION: The ability to generate massive amounts of sequencing data continues to overwhelm the processing capability of existing algorithms and compute infrastructures. In this work, we explore the use of hardware/software co-design and hardware acceleration to significantly reduce the execution time of short sequence alignment, a crucial step in analyzing sequenced genomes. We introduce Shouji, a highly parallel and accurate pre-alignment filter that remarkably reduces the need for computationally-costly dynamic programming algorithms. The first key idea of our proposed pre-alignment filter is to provide high filtering accuracy by correctly detecting all common subsequences shared between two given sequences. The second key idea is to design a hardware accelerator that adopts modern field-programmable gate array (FPGA) architectures to further boost the performance of our algorithm. RESULTS: Shouji significantly improves the accuracy of pre-alignment filtering by up to two orders of magnitude compared to the state-of-the-art pre-alignment filters, GateKeeper and SHD. Our FPGA-based accelerator is up to three orders of magnitude faster than the equivalent CPU implementation of Shouji. Using a single FPGA chip, we benchmark the benefits of integrating Shouji with five state-of-the-art sequence aligners, designed for different computing platforms. The addition of Shouji as a pre-alignment step reduces the execution time of the five state-of-the-art sequence aligners by up to 18.8×. Shouji can be adapted for any bioinformatics pipeline that performs sequence alignment for verification. Unlike most existing methods that aim to accelerate sequence alignment, Shouji does not sacrifice any of the aligner capabilities, as it does not modify or replace the alignment step. AVAILABILITY AND IMPLEMENTATION: https://github.com/CMU-SAFARI/Shouji. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Software , Algoritmos , Genoma , Alinhamento de Sequência , Análise de Sequência de DNA , Design de Software
12.
BMC Genomics ; 19(Suppl 2): 89, 2018 May 09.
Artigo em Inglês | MEDLINE | ID: mdl-29764378

RESUMO

BACKGROUND: Seed location filtering is critical in DNA read mapping, a process where billions of DNA fragments (reads) sampled from a donor are mapped onto a reference genome to identify genomic variants of the donor. State-of-the-art read mappers 1) quickly generate possible mapping locations for seeds (i.e., smaller segments) within each read, 2) extract reference sequences at each of the mapping locations, and 3) check similarity between each read and its associated reference sequences with a computationally-expensive algorithm (i.e., sequence alignment) to determine the origin of the read. A seed location filter comes into play before alignment, discarding seed locations that alignment would deem a poor match. The ideal seed location filter would discard all poor match locations prior to alignment such that there is no wasted computation on unnecessary alignments. RESULTS: We propose a novel seed location filtering algorithm, GRIM-Filter, optimized to exploit 3D-stacked memory systems that integrate computation within a logic layer stacked under memory layers, to perform processing-in-memory (PIM). GRIM-Filter quickly filters seed locations by 1) introducing a new representation of coarse-grained segments of the reference genome, and 2) using massively-parallel in-memory operations to identify read presence within each coarse-grained segment. Our evaluations show that for a sequence alignment error tolerance of 0.05, GRIM-Filter 1) reduces the false negative rate of filtering by 5.59x-6.41x, and 2) provides an end-to-end read mapper speedup of 1.81x-3.65x, compared to a state-of-the-art read mapper employing the best previous seed location filtering algorithm. CONCLUSION: GRIM-Filter exploits 3D-stacked memory, which enables the efficient use of processing-in-memory, to overcome the memory bandwidth bottleneck in seed location filtering. We show that GRIM-Filter significantly improves the performance of a state-of-the-art read mapper. GRIM-Filter is a universal seed location filter that can be applied to any read mapper. We hope that our results provide inspiration for new works to design other bioinformatics algorithms that take advantage of emerging technologies and new processing paradigms, such as processing-in-memory using 3D-stacked memory devices.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Algoritmos , Bases de Dados Genéticas , Genoma Humano , Humanos , Software
13.
Bioinformatics ; 33(21): 3355-3363, 2017 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-28575161

RESUMO

MOTIVATION: High throughput DNA sequencing (HTS) technologies generate an excessive number of small DNA segments -called short reads- that cause significant computational burden. To analyze the entire genome, each of the billions of short reads must be mapped to a reference genome based on the similarity between a read and 'candidate' locations in that reference genome. The similarity measurement, called alignment, formulated as an approximate string matching problem, is the computational bottleneck because: (i) it is implemented using quadratic-time dynamic programming algorithms and (ii) the majority of candidate locations in the reference genome do not align with a given read due to high dissimilarity. Calculating the alignment of such incorrect candidate locations consumes an overwhelming majority of a modern read mapper's execution time. Therefore, it is crucial to develop a fast and effective filter that can detect incorrect candidate locations and eliminate them before invoking computationally costly alignment algorithms. RESULTS: We propose GateKeeper, a new hardware accelerator that functions as a pre-alignment step that quickly filters out most incorrect candidate locations. GateKeeper is the first design to accelerate pre-alignment using Field-Programmable Gate Arrays (FPGAs), which can perform pre-alignment much faster than software. When implemented on a single FPGA chip, GateKeeper maintains high accuracy (on average >96%) while providing, on average, 90-fold and 130-fold speedup over the state-of-the-art software pre-alignment techniques, Adjacency Filter and Shifted Hamming Distance (SHD), respectively. The addition of GateKeeper as a pre-alignment step can reduce the verification time of the mrFAST mapper by a factor of 10. AVAILABILITY AND IMPLEMENTATION: https://github.com/BilkentCompGen/GateKeeper. CONTACT: mohammedalser@bilkent.edu.tr or onur.mutlu@inf.ethz.ch or calkan@cs.bilkent.edu.tr. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Software , Algoritmos , Genoma Humano , Humanos , Alinhamento de Sequência/métodos
14.
Tuberk Toraks ; 66(4): 340-344, 2018 Dec.
Artigo em Turco | MEDLINE | ID: mdl-30683030

RESUMO

Leptomeningeal metastasis is a very rare complication of infiltration of leptomeninges and subarachnoid space with malignant cells. It is an indicator of poor prognosis. Its incidence is 3.8% in non-small cell lung carcinoma (NSCLC). This rate is higher in patients with epidermal growth factor receptor (EGFR) mutation. Brain magnetic resonance imaging (MRI) is the first choice in the diagnosis. The diagnosis of leptomeningeal metastasis is difficult and often bypassed because it is rare and does not cause gross mass lesions such as brain metastasis. Systemic chemotherapy, intrathecal therapy, cranial radiotherapy and targeted treatment agents are an option in the treatment. It has been shown that targeted therapies can be promising because of the ability to switch to cerebrospinal fluid in appropriate patients. We present the case with EGFR positive lung adenocarcinoma whit leptomeningeal metastasis (LM) due to its rarity, difficulty in diagnosis and its association with EGFR mutation.


Assuntos
Adenocarcinoma de Pulmão/secundário , Neoplasias Pulmonares/patologia , Neoplasias Meníngeas/secundário , Adenocarcinoma de Pulmão/diagnóstico , Adenocarcinoma de Pulmão/genética , Biomarcadores Tumorais/genética , Biomarcadores Tumorais/metabolismo , Análise Mutacional de DNA , DNA de Neoplasias , Receptores ErbB/genética , Receptores ErbB/metabolismo , Feminino , Humanos , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/metabolismo , Neoplasias Meníngeas/diagnóstico , Neoplasias Meníngeas/genética , Pessoa de Meia-Idade , Mutação , Estudos Retrospectivos
15.
Bioinformatics ; 32(11): 1632-42, 2016 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-26568624

RESUMO

MOTIVATION: Optimizing seed selection is an important problem in read mapping. The number of non-overlapping seeds a mapper selects determines the sensitivity of the mapper while the total frequency of all selected seeds determines the speed of the mapper. Modern seed-and-extend mappers usually select seeds with either an equal and fixed-length scheme or with an inflexible placement scheme, both of which limit the ability of the mapper in selecting less frequent seeds to speed up the mapping process. Therefore, it is crucial to develop a new algorithm that can adjust both the individual seed length and the seed placement, as well as derive less frequent seeds. RESULTS: We present the Optimal Seed Solver (OSS), a dynamic programming algorithm that discovers the least frequently-occurring set of x seeds in an L-base-pair read in [Formula: see text] operations on average and in [Formula: see text] operations in the worst case, while generating a maximum of [Formula: see text] seed frequency database lookups. We compare OSS against four state-of-the-art seed selection schemes and observe that OSS provides a 3-fold reduction in average seed frequency over the best previous seed selection optimizations. AVAILABILITY AND IMPLEMENTATION: We provide an implementation of the Optimal Seed Solver in C++ at: https://github.com/CMU-SAFARI/Optimal-Seed-Solver CONTACT: hxin@cmu.edu, calkan@cs.bilkent.edu.tr or onur@cmu.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos
17.
Bioinformatics ; 31(10): 1553-60, 2015 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-25577434

RESUMO

MOTIVATION: Calculating the edit-distance (i.e. minimum number of insertions, deletions and substitutions) between short DNA sequences is the primary task performed by seed-and-extend based mappers, which compare billions of sequences. In practice, only sequence pairs with a small edit-distance provide useful scientific data. However, the majority of sequence pairs analyzed by seed-and-extend based mappers differ by significantly more errors than what is typically allowed. Such error-abundant sequence pairs needlessly waste resources and severely hinder the performance of read mappers. Therefore, it is crucial to develop a fast and accurate filter that can rapidly and efficiently detect error-abundant string pairs and remove them from consideration before more computationally expensive methods are used. RESULTS: We present a simple and efficient algorithm, Shifted Hamming Distance (SHD), which accelerates the alignment verification procedure in read mapping, by quickly filtering out error-abundant sequence pairs using bit-parallel and SIMD-parallel operations. SHD only filters string pairs that contain more errors than a user-defined threshold, making it fully comprehensive. It also maintains high accuracy with moderate error threshold (up to 5% of the string length) while achieving a 3-fold speedup over the best previous algorithm (Gene Myers's bit-vector algorithm). SHD is compatible with all mappers that perform sequence alignment for verification.


Assuntos
Biologia Computacional/métodos , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Software , Algoritmos , Sequência de Bases , Humanos , Dados de Sequência Molecular , Homologia de Sequência do Ácido Nucleico
18.
Methods ; 79-80: 3-10, 2015 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-25461772

RESUMO

Many recent advances in genomics and the expectations of personalized medicine are made possible thanks to power of high throughput sequencing (HTS) in sequencing large collections of human genomes. There are tens of different sequencing technologies currently available, and each HTS platform have different strengths and biases. This diversity both makes it possible to use different technologies to correct for shortcomings; but also requires to develop different algorithms for each platform due to the differences in data types and error models. The first problem to tackle in analyzing HTS data for resequencing applications is the read mapping stage, where many tools have been developed for the most popular HTS methods, but publicly available and open source aligners are still lacking for the Complete Genomics (CG) platform. Unfortunately, Burrows-Wheeler based methods are not practical for CG data due to the gapped nature of the reads generated by this method. Here we provide a sensitive read mapper (sirFAST) for the CG technology based on the seed-and-extend paradigm that can quickly map CG reads to a reference genome. We evaluate the performance and accuracy of sirFAST using both simulated and publicly available real data sets, showing high precision and recall rates.


Assuntos
Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Algoritmos , Processamento Eletrônico de Dados/métodos , Genoma Humano , Humanos , Alinhamento de Sequência , Análise de Sequência de DNA , Software
19.
BMC Biomed Eng ; 6(1): 3, 2024 Apr 23.
Artigo em Inglês | MEDLINE | ID: mdl-38654382

RESUMO

Compared to classical techniques of morphological analysis, micro-CT (µ-CT) has become an effective approach allowing rapid screening of morphological changes. In the present work, we aimed to provide an optimized micro-CT dense agent perfusion protocol and µ-CT guidelines for different stages of chick embryo cardiogenesis. Our study was conducted over a period of 10 embryonic days (Hamburger-Hamilton HH36) in chick embryo hearts. During the perfusion of the micro-CT dense agent at different developmental stages (HH19, HH24, HH27, HH29, HH31, HH34, HH35, and HH36), we demonstrated that durations and volumes of the injected contrast agent gradually increased with the heart developmental stages contrary to the flow rate that was unchanged during the whole experiment. Analysis of the CT imaging confirmed the efficiency of the optimized parameters of the heart perfusion.

20.
Genome Biol ; 25(1): 49, 2024 02 16.
Artigo em Inglês | MEDLINE | ID: mdl-38365730

RESUMO

Nanopore sequencing generates noisy electrical signals that need to be converted into a standard string of DNA nucleotide bases using a computational step called basecalling. The performance of basecalling has critical implications for all later steps in genome analysis. Therefore, there is a need to reduce the computation and memory cost of basecalling while maintaining accuracy. We present RUBICON, a framework to develop efficient hardware-optimized basecallers. We demonstrate the effectiveness of RUBICON by developing RUBICALL, the first hardware-optimized mixed-precision basecaller that performs efficient basecalling, outperforming the state-of-the-art basecallers. We believe RUBICON offers a promising path to develop future hardware-optimized basecallers.


Assuntos
Aprendizado Profundo , Nanoporos , Análise de Sequência de DNA , Genômica , Nucleotídeos , DNA/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA