Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
1.
Bioinformatics ; 39(3)2023 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-36924420

RESUMO

MOTIVATION: Long non-coding RNA (lncRNA) plays a key role in many biological processes. For instance, lncRNA regulates chromatin using different molecular mechanisms, including direct RNA-DNA hybridization via triplexes, cotranscriptional RNA-RNA interactions, and RNA-DNA binding mediated by protein complexes. While the functional annotation of lncRNA transcripts has been widely studied over the last 20 years, barely a handful of tools have been developed with the specific purpose of detecting and evaluating lncRNA-DNA triple helices. What is worse, some of these tools have nearly grown a decade old, making new triplex-centric pipelines depend on legacy software that cannot thoroughly process all the data made available by next-generation sequencing (NGS) technologies. RESULTS: We present PATO, a modern, fast, and efficient tool for the detection of lncRNA-DNA triplexes that matches NGS processing capabilities. PATO enables the prediction of triple helices at the genome scale and can process in as little as 1 h more than 60 GB of sequence data using a two-socket server. Moreover, PATO's efficiency allows a more exhaustive search of the triplex-forming solution space, and so PATO achieves higher levels of prediction accuracy in far less time than other tools in the state of the art. AVAILABILITY AND IMPLEMENTATION: Source code, user manual, and tests are freely available to download under the MIT License at https://github.com/UDC-GAC/pato.


Assuntos
RNA Longo não Codificante , RNA Longo não Codificante/genética , RNA Longo não Codificante/metabolismo , DNA/metabolismo , Software
2.
BMC Bioinformatics ; 23(1): 117, 2022 Apr 02.
Artigo em Inglês | MEDLINE | ID: mdl-35366804

RESUMO

BACKGROUND: Epistasis is the interaction between different genes when expressing a certain phenotype. If epistasis involves more than two loci it is called high-order epistasis. High-order epistasis is an area under active research because it could be the cause of many complex traits. The most common way to specify an epistasis interaction is through a penetrance table. RESULTS: This paper presents PyToxo, a Python tool for generating penetrance tables from any-order epistasis models. Unlike other tools available in the bibliography, PyToxo is able to work with high-order models and realistic penetrance and heritability values, achieving high-precision results in a short time. In addition, PyToxo is distributed as open-source software and includes several interfaces to ease its use. CONCLUSIONS: PyToxo provides the scientific community with a useful tool to evaluate algorithms and methods that can detect high-order epistasis to continue advancing in the discovery of the causes behind complex diseases.


Assuntos
Epistasia Genética , Modelos Genéticos , Penetrância , Fenótipo , Software
3.
BMC Bioinformatics ; 21(1): 138, 2020 Apr 09.
Artigo em Inglês | MEDLINE | ID: mdl-32272874

RESUMO

BACKGROUND: Epistasis is defined as the interaction between different genes when expressing a specific phenotype. The most common way to characterize an epistatic relationship is using a penetrance table, which contains the probability of expressing the phenotype under study given a particular allele combination. Available simulators can only create penetrance tables for well-known epistasis models involving a small number of genes and under a large number of limitations. RESULTS: Toxo is a MATLAB library designed to calculate penetrance tables of epistasis models of any interaction order which resemble real data more closely. The user specifies the desired heritability (or prevalence) and the program maximizes the table's prevalence (or heritability) according to the input epistatic model boundaries. CONCLUSIONS: Toxo extends the capabilities of existing simulators that define epistasis using penetrance tables. These tables can be directly used as input for software simulators such as GAMETES so that they are able to generate data samples with larger interactions and more realistic prevalences/heritabilities.


Assuntos
Epistasia Genética , Interface Usuário-Computador , Genótipo , Modelos Genéticos , Penetrância , Fenótipo
4.
Bioinformatics ; 33(17): 2762-2764, 2017 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-28475668

RESUMO

SUMMARY: This article presents MarDRe, a de novo cloud-ready duplicate and near-duplicate removal tool that can process single- and paired-end reads from FASTQ/FASTA datasets. MarDRe takes advantage of the widely adopted MapReduce programming model to fully exploit Big Data technologies on cloud-based infrastructures. Written in Java to maximize cross-platform compatibility, MarDRe is built upon the open-source Apache Hadoop project, the most popular distributed computing framework for scalable Big Data processing. On a 16-node cluster deployed on the Amazon EC2 cloud platform, MarDRe is up to 8.52 times faster than a representative state-of-the-art tool. AVAILABILITY AND IMPLEMENTATION: Source code in Java and Hadoop as well as a user's guide are freely available under the GNU GPLv3 license at http://mardre.des.udc.es . CONTACT: rreye@udc.es.


Assuntos
Análise de Sequência de DNA/métodos , Software , Algoritmos
5.
Bioinformatics ; 32(10): 1562-4, 2016 05 15.
Artigo em Inglês | MEDLINE | ID: mdl-26803159

RESUMO

UNLABELLED: Current next generation sequencing technologies often generate duplicated or near-duplicated reads that (depending on the application scenario) do not provide any interesting biological information but can increase memory requirements and computational time of downstream analysis. In this work we present ParDRe, a de novo parallel tool to remove duplicated and near-duplicated reads through the clustering of Single-End or Paired-End sequences from fasta or fastq files. It uses a novel bitwise approach to compare the suffixes of DNA strings and employs hybrid MPI/multithreading to reduce runtime on multicore systems. We show that ParDRe is up to 27.29 times faster than Fulcrum (a representative state-of-the-art tool) on a platform with two 8-core Sandy-Bridge processors. AVAILABILITY AND IMPLEMENTATION: Source code in C ++ and MPI running on Linux systems as well as a reference manual are available at https://sourceforge.net/projects/pardre/ CONTACT: jgonzalezd@udc.es.


Assuntos
Algoritmos , Análise de Sequência de DNA/métodos , Análise por Conglomerados , Sequenciamento de Nucleotídeos em Larga Escala
6.
Bioinformatics ; 32(24): 3826-3828, 2016 12 15.
Artigo em Inglês | MEDLINE | ID: mdl-27638400

RESUMO

MSAProbs is a state-of-the-art protein multiple sequence alignment tool based on hidden Markov models. It can achieve high alignment accuracy at the expense of relatively long runtimes for large-scale input datasets. In this work we present MSAProbs-MPI, a distributed-memory parallel version of the multithreaded MSAProbs tool that is able to reduce runtimes by exploiting the compute capabilities of common multicore CPU clusters. Our performance evaluation on a cluster with 32 nodes (each containing two Intel Haswell processors) shows reductions in execution time of over one order of magnitude for typical input datasets. Furthermore, MSAProbs-MPI using eight nodes is faster than the GPU-accelerated QuickProbs running on a Tesla K20. Another strong point is that MSAProbs-MPI can deal with large datasets for which MSAProbs and QuickProbs might fail due to time and memory constraints, respectively. AVAILABILITY AND IMPLEMENTATION: Source code in C ++ and MPI running on Linux systems as well as a reference manual are available at http://msaprobs.sourceforge.net CONTACT: jgonzalezd@udc.esSupplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Proteínas , Alinhamento de Sequência/métodos , Algoritmos , Sequência de Aminoácidos , Cadeias de Markov , Software
7.
IEEE/ACM Trans Comput Biol Bioinform ; 20(3): 2041-2049, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37015593

RESUMO

The discovery of Differentially Methylated (DM) regions is an important research field in biology, as it can help to anticipate the risk of suffering from specific diseases. Nevertheless, the high computational cost of the bioinformatic tools developed for this purpose prevents their application to large-scale datasets. Hence, much faster tools are required to further progress in this research field. In this work we present ParRADMeth, a parallel tool that applies beta-binomial regression for the identification of these DM regions. It is based on the state-of-the-art sequential tool RADMeth, which proved superior biological accuracy compared to counterparts in previous experimental evaluations. ParRADMeth provides the same DM regions as RADMeth but at significantly reduced runtime thanks to exploiting the compute capabilities of common multicore CPU clusters. For example, our tool is up to 189 times faster for real data experiments on a cluster with 16 nodes, each one containing two eight-core processors. The source code of ParRADMeth, as well as a reference manual, are available at https://github.com/UDC-GAC/ParRADMeth.


Assuntos
Biologia Computacional , Software , Algoritmos
8.
Artigo em Inglês | MEDLINE | ID: mdl-33055017

RESUMO

Finding epistatic interactions among loci when expressing a phenotype is a widely employed strategy to understand the genetic architecture of complex traits in GWAS. The abundance of methods dedicated to the same purpose, however, makes it increasingly difficult for scientists to decide which method is more suitable for their studies. This work compares the different epistasis detection methods published during the last decade in terms of runtime, detection power and type I error rate, with a special emphasis on high-order interactions. Results show that in terms of detection power, the only methods that perform well across all experiments are the exhaustive methods, although their computational cost may be prohibitive in large-scale studies. Regarding non-exhaustive methods, not one could consistently find epistasis interactions when marginal effects are absent. If marginal effects are present, there are methods that perform well for high-order interactions, such as BADTrees, FDHE-IW, SingleMI or SNPHarvester. As for false-positive control, only SNPHarvester, FDHE-IW and DCHE show good results. The study concludes that there is no single epistasis detection method to recommend in all scenarios. Authors should prioritize exhaustive methods when sufficient computational resources are available considering the data set size, and resort to non-exhaustive methods when the analysis time is prohibitive.


Assuntos
Epistasia Genética , Estudo de Associação Genômica Ampla , Epistasia Genética/genética , Estudo de Associação Genômica Ampla/métodos , Fenótipo , Polimorfismo de Nucleotídeo Único
9.
Methods Mol Biol ; 2231: 39-47, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33289885

RESUMO

Multiple sequence alignment (MSA) is a central step in many bioinformatics and computational biology analyses. Although there exist many methods to perform MSA, most of them fail when dealing with large datasets due to their high computational cost. MSAProbs-MPI is a publicly available tool ( http://msaprobs.sourceforge.net ) that provides highly accurate results in relatively short runtime thanks to exploiting the hardware resources of multicore clusters. In this chapter, I explain the statistical and biological concepts employed in MSAProbs-MPI to complete the alignments, as well as the high-performance computing techniques used to accelerate it. Moreover, I provide some hints about the configuration parameters that should be used to guarantee high-performance executions.


Assuntos
Biologia Computacional/métodos , Alinhamento de Sequência/métodos , Software , Algoritmos , Biologia Computacional/instrumentação , Metodologias Computacionais , Alinhamento de Sequência/instrumentação
10.
Methods Mol Biol ; 1986: 227-243, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31115891

RESUMO

Parallel and high performance computing is continuously gaining attention in the last years as a means to accelerate several kind of computationally expensive applications. This chapter is a review of different research works and publicly available tools whose target is the acceleration of microarray data analysis, thanks to exploiting high performance computing systems.


Assuntos
Metodologias Computacionais , Análise em Microsséries/métodos , Computação em Nuvem , Epistasia Genética , Redes Reguladoras de Genes
11.
IEEE/ACM Trans Comput Biol Bioinform ; 15(5): 1732-1737, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29028205

RESUMO

In this work, we present MPIGeneNet, a parallel tool that applies Pearson's correlation and Random Matrix Theory to construct gene co-expression networks. It is based on the state-of-the-art sequential tool RMTGeneNet, which provides networks with high robustness and sensitivity at the expenses of relatively long runtimes for large scale input datasets. MPIGeneNet returns the same results as RMTGeneNet but improves the memory management, reduces the I/O cost, and accelerates the two most computationally demanding steps of co-expression network construction by exploiting the compute capabilities of common multicore CPU clusters. Our performance evaluation on two different systems using three typical input datasets shows that MPIGeneNet is significantly faster than RMTGeneNet. As an example, our tool is up to 175.41 times faster on a cluster with eight nodes, each one containing two 12-core Intel Haswell processors. The source code of MPIGeneNet, as well as a reference manual, are available at https://sourceforge.net/projects/mpigenenet/.


Assuntos
Biologia Computacional/métodos , Expressão Gênica/genética , Redes Reguladoras de Genes/genética , Software , Modelos Estatísticos , Alinhamento de Sequência , Análise de Sequência de DNA
12.
PLoS One ; 13(4): e0194361, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29608567

RESUMO

Biclustering techniques are gaining attention in the analysis of large-scale datasets as they identify two-dimensional submatrices where both rows and columns are correlated. In this work we present ParBiBit, a parallel tool to accelerate the search of interesting biclusters on binary datasets, which are very popular on different fields such as genetics, marketing or text mining. It is based on the state-of-the-art sequential Java tool BiBit, which has been proved accurate by several studies, especially on scenarios that result on many large biclusters. ParBiBit uses the same methodology as BiBit (grouping the binary information into patterns) and provides the same results. Nevertheless, our tool significantly improves performance thanks to an efficient implementation based on C++11 that includes support for threads and MPI processes in order to exploit the compute capabilities of modern distributed-memory systems, which provide several multicore CPU nodes interconnected through a network. Our performance evaluation with 18 representative input datasets on two different eight-node systems shows that our tool is significantly faster than the original BiBit. Source code in C++ and MPI running on Linux systems as well as a reference manual are available at https://sourceforge.net/projects/parbibit/.


Assuntos
Software , Análise por Conglomerados , Linguagens de Programação , Navegador
13.
PLoS One ; 13(7): e0201483, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30063721

RESUMO

Nowadays, the analysis of transcriptome sequencing (RNA-seq) data has become the standard method for quantifying the levels of gene expression. In RNA-seq experiments, the mapping of short reads to a reference genome or transcriptome is considered a crucial step that remains as one of the most time-consuming. With the steady development of Next Generation Sequencing (NGS) technologies, unprecedented amounts of genomic data introduce significant challenges in terms of storage, processing and downstream analysis. As cost and throughput continue to improve, there is a growing need for new software solutions that minimize the impact of increasing data volume on RNA read alignment. In this work we introduce HSRA, a Big Data tool that takes advantage of the MapReduce programming model to extend the multithreading capabilities of a state-of-the-art spliced read aligner for RNA-seq data (HISAT2) to distributed memory systems such as multi-core clusters or cloud platforms. HSRA has been built upon the Hadoop MapReduce framework and supports both single- and paired-end reads from FASTQ/FASTA datasets, providing output alignments in SAM format. The design of HSRA has been carefully optimized to avoid the main limitations and major causes of inefficiency found in previous Big Data mapping tools, which cannot fully exploit the raw performance of the underlying aligner. On a 16-node multi-core cluster, HSRA is on average 2.3 times faster than previous Hadoop-based tools. Source code in Java as well as a user's guide are publicly available for download at http://hsra.dec.udc.es.


Assuntos
Big Data , Sequenciamento de Nucleotídeos em Larga Escala , Dobramento de RNA , Alinhamento de Sequência/métodos , Análise de Sequência de RNA/métodos , Software
14.
Comput Methods Programs Biomed ; 139: 51-60, 2017 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-28187895

RESUMO

BACKGROUND AND OBJECTIVES: The analysis of the interference patterns on the tear film lipid layer is a useful clinical test to diagnose dry eye syndrome. This task can be automated with a high degree of accuracy by means of the use of tear film maps. However, the time required by the existing applications to generate them prevents a wider acceptance of this method by medical experts. Multithreading has been previously successfully employed by the authors to accelerate the tear film map definition on multicore single-node machines. In this work, we propose a hybrid message-passing and multithreading parallel approach that further accelerates the generation of tear film maps by exploiting the computational capabilities of distributed-memory systems such as multicore clusters and supercomputers. METHODS: The algorithm for drawing tear film maps is parallelized using Message Passing Interface (MPI) for inter-node communications and the multithreading support available in the C++11 standard for intra-node parallelization. The original algorithm is modified to reduce the communications and increase the scalability. RESULTS: The hybrid method has been tested on 32 nodes of an Intel cluster (with two 12-core Haswell 2680v3 processors per node) using 50 representative images. Results show that maximum runtime is reduced from almost two minutes using the previous only-multithreaded approach to less than ten seconds using the hybrid method. CONCLUSIONS: The hybrid MPI/multithreaded implementation can be used by medical experts to obtain tear film maps in only a few seconds, which will significantly accelerate and facilitate the diagnosis of the dry eye syndrome.


Assuntos
Síndromes do Olho Seco/diagnóstico , Lágrimas , Algoritmos , Humanos
15.
PLoS One ; 11(1): e0145490, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-26731399

RESUMO

The growth of next-generation sequencing (NGS) datasets poses a challenge to the alignment of reads to reference genomes in terms of alignment quality and execution speed. Some available aligners have been shown to obtain high quality mappings at the expense of long execution times. Finding fast yet accurate software solutions is of high importance to research, since availability and size of NGS datasets continue to increase. In this work we present an efficient parallelization approach for NGS short-read alignment on multi-core clusters. Our approach takes advantage of a distributed shared memory programming model based on the new UPC++ language. Experimental results using the CUSHAW3 aligner show that our implementation based on dynamic scheduling obtains good scalability on multi-core clusters. Through our evaluation, we are able to complete the single-end and paired-end alignments of 246 million reads of length 150 base-pairs in 11.54 and 16.64 minutes, respectively, using 32 nodes with four AMD Opteron 6272 16-core CPUs per node. In contrast, the multi-threaded original tool needs 2.77 and 5.54 hours to perform the same alignments on the 64 cores of one node. The source code of our parallel implementation is publicly available at the CUSHAW3 homepage (http://cushaw3.sourceforge.net).


Assuntos
Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Linguagens de Programação , Alinhamento de Sequência/métodos , Algoritmos , Genoma Humano/genética , Humanos , Internet , Reprodutibilidade dos Testes
16.
Artigo em Inglês | MEDLINE | ID: mdl-26451813

RESUMO

High-throughput genotyping technologies (such as SNP-arrays) allow the rapid collection of up to a few million genetic markers of an individual. Detecting epistasis (based on 2-SNP interactions) in Genome-Wide Association Studies is an important but time consuming operation since statistical computations have to be performed for each pair of measured markers. Computational methods to detect epistasis therefore suffer from prohibitively long runtimes; e.g., processing a moderately-sized dataset consisting of about 500,000 SNPs and 5,000 samples requires several days using state-of-the-art tools on a standard 3 GHz CPU. In this paper, we demonstrate how this task can be accelerated using a combination of fine-grained and coarse-grained parallelism on two different computing systems. The first architecture is based on reconfigurable hardware (FPGAs) while the second architecture uses multiple GPUs connected to the same host. We show that both systems can achieve speedups of around four orders-of-magnitude compared to the sequential implementation. This significantly reduces the runtimes for detecting epistasis to only a few minutes for moderately-sized datasets and to a few hours for large-scale datasets.


Assuntos
Gráficos por Computador/instrumentação , Análise Mutacional de DNA/instrumentação , Epistasia Genética/genética , Estudo de Associação Genômica Ampla/instrumentação , Sequenciamento de Nucleotídeos em Larga Escala/instrumentação , Polimorfismo de Nucleotídeo Único/genética , Mapeamento Cromossômico/instrumentação , Mapeamento Cromossômico/métodos , Desenho de Equipamento , Análise de Falha de Equipamento , Estudo de Associação Genômica Ampla/métodos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Processamento de Sinais Assistido por Computador/instrumentação
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA