Pesquisa | BVS CLAP/SMR-OPAS/OMS

Genome-scale de novo assembly using ALGA.

Swat, Sylwester; Laskowski, Artur; Badura, Jan; Frohmberg, Wojciech; Wojciechowski, Pawel; Swiercz, Aleksandra; Kasprzak, Marta; Blazewicz, Jacek.

Bioinformatics ; 37(12): 1644-1651, 2021 Jul 19.

Artigo em Inglês | MEDLINE | ID: mdl-33471088

RESUMO

MOTIVATION: There are very few methods for de novo genome assembly based on the overlap graph approach. It is considered as giving more exact results than the so-called de Bruijn graph approach but in much greater time and of much higher memory usage. It is not uncommon that assembly methods involving the overlap graph model are not able to successfully compute greater datasets, mainly due to memory limitation of a computer. This was the reason for developing in last decades mainly de Bruijn-based assembly methods, fast and fairly accurate. However, the latter methods can fail for longer or more repetitive genomes, as they decompose reads to shorter fragments and lose a part of information. An efficient assembler for processing big datasets and using the overlap graph model is still looked out. RESULTS: We propose a new genome-scale de novo assembler based on the overlap graph approach, designed for short-read sequencing data. The method, ALGA, incorporates several new ideas resulting in more exact contigs produced in short time. Among these ideas, we have creation of a sparse but quite informative graph, reduction of the graph including a procedure referring to the problem of minimum spanning tree of a local subgraph, and graph traversal connected with simultaneous analysis of contigs stored so far. What is rare in genome assembly, the algorithm is almost parameter-free, with only one optional parameter to be set by a user. ALGA was compared with nine state-of-the-art assemblers in tests on genome-scale sequencing data obtained from real experiments on six organisms, differing in size, coverage, GC content and repetition rate. ALGA produced best results in the sense of overall quality of genome reconstruction, understood as a good balance between genome coverage, accuracy and length of resulting sequences. The algorithm is one of tools involved in processing data in currently realized national project Genomic Map of Poland. AVAILABILITY AND IMPLEMENTATION: ALGA is available at http://alga.put.poznan.pl. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Protein alignment algorithms with an efficient backtracking routine on multiple GPUs.

Blazewicz, Jacek; Frohmberg, Wojciech; Kierzynka, Michal; Pesch, Erwin; Wojciechowski, Pawel.

BMC Bioinformatics ; 12: 181, 2011 May 20.

Artigo em Inglês | MEDLINE | ID: mdl-21599912

RESUMO

BACKGROUND: Pairwise sequence alignment methods are widely used in biological research. The increasing number of sequences is perceived as one of the upcoming challenges for sequence alignment methods in the nearest future. To overcome this challenge several GPU (Graphics Processing Unit) computing approaches have been proposed lately. These solutions show a great potential of a GPU platform but in most cases address the problem of sequence database scanning and computing only the alignment score whereas the alignment itself is omitted. Thus, the need arose to implement the global and semiglobal Needleman-Wunsch, and Smith-Waterman algorithms with a backtracking procedure which is needed to construct the alignment. RESULTS: In this paper we present the solution that performs the alignment of every given sequence pair, which is a required step for progressive multiple sequence alignment methods, as well as for DNA recognition at the DNA assembly stage. Performed tests show that the implementation, with performance up to 6.3 GCUPS on a single GPU for affine gap penalties, is very efficient in comparison to other CPU and GPU-based solutions. Moreover, multiple GPUs support with load balancing makes the application very scalable. CONCLUSIONS: The article shows that the backtracking procedure of the sequence alignment algorithms may be designed to fit in with the GPU architecture. Therefore, our algorithm, apart from scores, is able to compute pairwise alignments. This opens a wide range of new possibilities, allowing other methods from the area of molecular biology to take advantage of the new computational architecture. Performed tests show that the efficiency of the implementation is excellent. Moreover, the speed of our GPU-based algorithms can be almost linearly increased when using more than one graphics card.

Assuntos

Proteínas/química , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Software , Algoritmos

GRASShopPER-An algorithm for de novo assembly based on GPU alignments.

Swiercz, Aleksandra; Frohmberg, Wojciech; Kierzynka, Michal; Wojciechowski, Pawel; Zurkowski, Piotr; Badura, Jan; Laskowski, Artur; Kasprzak, Marta; Blazewicz, Jacek.

PLoS One ; 13(8): e0202355, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-30114279

RESUMO

Next generation sequencers produce billions of short DNA sequences in a massively parallel manner, which causes a great computational challenge in accurately reconstructing a genome sequence de novo using these short sequences. Here, we propose the GRASShopPER assembler, which follows an approach of overlap-layout-consensus. It uses an efficient GPU implementation for the sequence alignment during the graph construction stage and a greedy hyper-heuristic algorithm at the fork detection stage. A two-part fork detection method allows us to identify repeated fragments of a genome and to reconstruct them without misassemblies. The assemblies of data sets of bacteria Candidatus Microthrix, nematode Caenorhabditis elegans, and human chromosome 14 were evaluated with the golden standard tool QUAST. In comparison with other assemblers, GRASShopPER provided contigs that covered the largest part of the genomes and, at the same time, kept good values of other metrics, e.g., NG50 and misassembly rate.

Assuntos

Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Actinomycetales/genética , Animais , Caenorhabditis elegans/genética , Cromossomos Humanos Par 14 , Genômica/métodos , Humanos

Quantitative Trait Loci for Yield and Yield-Related Traits in Spring Barley Populations Derived from Crosses between European and Syrian Cultivars.

Mikolajczak, Krzysztof; Ogrodowicz, Piotr; Gudys, Kornelia; Krystkowiak, Karolina; Sawikowska, Aneta; Frohmberg, Wojciech; Górny, Andrzej; Kedziora, Andrzej; Jankowiak, Janusz; Józefczyk, Damian; Karg, Grzegorz; Andrusiak, Joanna; Krajewski, Pawel; Szarejko, Iwona; Surma, Maria; Adamski, Tadeusz; Guzy-Wróbelska, Justyna; Kuczynska, Anetta.

PLoS One ; 11(5): e0155938, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-27227880

RESUMO

In response to climatic changes, breeding programmes should be aimed at creating new cultivars with improved resistance to water scarcity. The objective of this study was to examine the yield potential of barley recombinant inbred lines (RILs) derived from three cross-combinations of European and Syrian spring cultivars, and to identify quantitative trait loci (QTLs) for yield-related traits in these populations. RILs were evaluated in field experiments over a period of three years (2011 to 2013) and genotyped with simple sequence repeat (SSR) and single nucleotide polymorphism (SNP) markers; a genetic map for each population was constructed and then one consensus map was developed. Biological interpretation of identified QTLs was achieved by reference to Ensembl Plants barley gene space. Twelve regions in the genomes of studied RILs were distinguished after QTL analysis. Most of the QTLs were identified on the 2H chromosome, which was the hotspot region in all three populations. Syrian parental cultivars contributed alleles decreasing traits' values at majority of QTLs for grain weight, grain number, spike length and time to heading, and numerous alleles increasing stem length. The phenomic and molecular approaches distinguished the lines with an acceptable grain yield potential combining desirable features or alleles from their parents, that is, early heading from the Syrian breeding line (Cam/B1/CI08887//CI05761) and short plant stature from the European semidwarf cultivar (Maresi).

Assuntos

Cromossomos de Plantas/genética , Cruzamentos Genéticos , Genes de Plantas/genética , Hordeum/genética , Locos de Características Quantitativas/genética , Mapeamento Cromossômico , Europa (Continente) , Fenótipo , Síria

Measures for interoperability of phenotypic data: minimum information requirements and formatting.

Cwiek-Kupczynska, Hanna; Altmann, Thomas; Arend, Daniel; Arnaud, Elizabeth; Chen, Dijun; Cornut, Guillaume; Fiorani, Fabio; Frohmberg, Wojciech; Junker, Astrid; Klukas, Christian; Lange, Matthias; Mazurek, Cezary; Nafissi, Anahita; Neveu, Pascal; van Oeveren, Jan; Pommier, Cyril; Poorter, Hendrik; Rocca-Serra, Philippe; Sansone, Susanna-Assunta; Scholz, Uwe; van Schriek, Marco; Seren, Ümit; Usadel, Björn; Weise, Stephan; Kersey, Paul; Krajewski, Pawel.

Plant Methods ; 12: 44, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-27843484

RESUMO

BACKGROUND: Plant phenotypic data shrouds a wealth of information which, when accurately analysed and linked to other data types, brings to light the knowledge about the mechanisms of life. As phenotyping is a field of research comprising manifold, diverse and time-consuming experiments, the findings can be fostered by reusing and combining existing datasets. Their correct interpretation, and thus replicability, comparability and interoperability, is possible provided that the collected observations are equipped with an adequate set of metadata. So far there have been no common standards governing phenotypic data description, which hampered data exchange and reuse. RESULTS: In this paper we propose the guidelines for proper handling of the information about plant phenotyping experiments, in terms of both the recommended content of the description and its formatting. We provide a document called "Minimum Information About a Plant Phenotyping Experiment", which specifies what information about each experiment should be given, and a Phenotyping Configuration for the ISA-Tab format, which allows to practically organise this information within a dataset. We provide examples of ISA-Tab-formatted phenotypic data, and a general description of a few systems where the recommendations have been implemented. CONCLUSIONS: Acceptance of the rules described in this paper by the plant phenotyping community will help to achieve findable, accessible, interoperable and reusable data.

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA