Búsqueda | Portal de Búsqueda de la BVS Colombia

Genome-scale de novo assembly using ALGA.

Swat, Sylwester; Laskowski, Artur; Badura, Jan; Frohmberg, Wojciech; Wojciechowski, Pawel; Swiercz, Aleksandra; Kasprzak, Marta; Blazewicz, Jacek.

Bioinformatics ; 37(12): 1644-1651, 2021 Jul 19.

Artículo en Inglés | MEDLINE | ID: mdl-33471088

RESUMEN

MOTIVATION: There are very few methods for de novo genome assembly based on the overlap graph approach. It is considered as giving more exact results than the so-called de Bruijn graph approach but in much greater time and of much higher memory usage. It is not uncommon that assembly methods involving the overlap graph model are not able to successfully compute greater datasets, mainly due to memory limitation of a computer. This was the reason for developing in last decades mainly de Bruijn-based assembly methods, fast and fairly accurate. However, the latter methods can fail for longer or more repetitive genomes, as they decompose reads to shorter fragments and lose a part of information. An efficient assembler for processing big datasets and using the overlap graph model is still looked out. RESULTS: We propose a new genome-scale de novo assembler based on the overlap graph approach, designed for short-read sequencing data. The method, ALGA, incorporates several new ideas resulting in more exact contigs produced in short time. Among these ideas, we have creation of a sparse but quite informative graph, reduction of the graph including a procedure referring to the problem of minimum spanning tree of a local subgraph, and graph traversal connected with simultaneous analysis of contigs stored so far. What is rare in genome assembly, the algorithm is almost parameter-free, with only one optional parameter to be set by a user. ALGA was compared with nine state-of-the-art assemblers in tests on genome-scale sequencing data obtained from real experiments on six organisms, differing in size, coverage, GC content and repetition rate. ALGA produced best results in the sense of overall quality of genome reconstruction, understood as a good balance between genome coverage, accuracy and length of resulting sequences. The algorithm is one of tools involved in processing data in currently realized national project Genomic Map of Poland. AVAILABILITY AND IMPLEMENTATION: ALGA is available at http://alga.put.poznan.pl. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

GRASShopPER-An algorithm for de novo assembly based on GPU alignments.

Swiercz, Aleksandra; Frohmberg, Wojciech; Kierzynka, Michal; Wojciechowski, Pawel; Zurkowski, Piotr; Badura, Jan; Laskowski, Artur; Kasprzak, Marta; Blazewicz, Jacek.

PLoS One ; 13(8): e0202355, 2018.

Artículo en Inglés | MEDLINE | ID: mdl-30114279

RESUMEN

Next generation sequencers produce billions of short DNA sequences in a massively parallel manner, which causes a great computational challenge in accurately reconstructing a genome sequence de novo using these short sequences. Here, we propose the GRASShopPER assembler, which follows an approach of overlap-layout-consensus. It uses an efficient GPU implementation for the sequence alignment during the graph construction stage and a greedy hyper-heuristic algorithm at the fork detection stage. A two-part fork detection method allows us to identify repeated fragments of a genome and to reconstruct them without misassemblies. The assemblies of data sets of bacteria Candidatus Microthrix, nematode Caenorhabditis elegans, and human chromosome 14 were evaluated with the golden standard tool QUAST. In comparison with other assemblers, GRASShopPER provided contigs that covered the largest part of the genomes and, at the same time, kept good values of other metrics, e.g., NG50 and misassembly rate.

Asunto(s)

Algoritmos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Actinomycetales/genética , Animales , Caenorhabditis elegans/genética , Cromosomas Humanos Par 14 , Genómica/métodos , Humanos

Datasets for Benchmarking RNA Design Algorithms.

Badura, Jan; Zok, Tomasz; Rybarczyk, Agnieszka.

Methods Mol Biol ; 2847: 229-240, 2025.

Artículo en Inglés | MEDLINE | ID: mdl-39312148

RESUMEN

RNA molecules play vital roles in many biological processes, such as gene regulation or protein synthesis. The adoption of a specific secondary and tertiary structure by RNA is essential to perform these diverse functions, making RNA a popular tool in bioengineering therapeutics. The field of RNA design responds to the need to develop novel RNA molecules that possess specific functional attributes. In recent years, computational tools for predicting RNA sequences with desired folding characteristics have improved and expanded. However, there is still a lack of well-defined and standardized datasets to assess these programs. Here, we present a large dataset of internal and multibranched loops extracted from PDB-deposited RNA structures that encompass a wide spectrum of design difficulties. Furthermore, we conducted benchmarking tests of widely utilized open-source RNA design algorithms employing this dataset.

Asunto(s)

Algoritmos , Benchmarking , Biología Computacional , Conformación de Ácido Nucleico , ARN , ARN/genética , ARN/química , Biología Computacional/métodos , Programas Informáticos

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA