Search | VHL Search Portal

Fastphylo: fast tools for phylogenetics.

Khan, Mehmood Alam; Elias, Isaac; Sjölund, Erik; Nylander, Kristina; Guimera, Roman Valls; Schobesberger, Richard; Schmitzberger, Peter; Lagergren, Jens; Arvestad, Lars.

BMC Bioinformatics ; 14: 334, 2013 Nov 20.

Article in English | MEDLINE | ID: mdl-24255987

ABSTRACT

BACKGROUND: Distance methods are ubiquitous tools in phylogenetics. Their primary purpose may be to reconstruct evolutionary history, but they are also used as components in bioinformatic pipelines. However, poor computational efficiency has been a constraint on the applicability of distance methods on very large problem instances. RESULTS: We present fastphylo, a software package containing implementations of efficient algorithms for two common problems in phylogenetics: estimating DNA/protein sequence distances and reconstructing a phylogeny from a distance matrix. We compare fastphylo with other neighbor joining based methods and report the results in terms of speed and memory efficiency. CONCLUSIONS: Fastphylo is a fast, memory efficient, and easy to use software suite. Due to its modular architecture, fastphylo is a flexible tool for many phylogenetic studies.

Subject(s)

Computational Biology/instrumentation , Computational Biology/methods , Phylogeny , Algorithms , Amino Acid Sequence , Biological Evolution , Language , Memory , Multigene Family , Software

Arteria: An automation system for a sequencing core facility.

Dahlberg, Johan; Hermansson, Johan; Sturlaugsson, Steinar; Lysenkova, Mariya; Smeds, Patrik; Ladenvall, Claes; Guimera, Roman Valls; Reisinger, Florian; Hofmann, Oliver; Larsson, Pontus.

Gigascience ; 8(12)2019 12 01.

Article in English | MEDLINE | ID: mdl-31825479

ABSTRACT

BACKGROUND: In recent years, nucleotide sequencing has become increasingly instrumental in both research and clinical settings. This has led to an explosive growth in sequencing data produced worldwide. As the amount of data increases, so does the need for automated solutions for data processing and analysis. The concept of workflows has gained favour in the bioinformatics community, but there is little in the scientific literature describing end-to-end automation systems. Arteria is an automation system that aims at providing a solution to the data-related operational challenges that face sequencing core facilities. FINDINGS: Arteria is built on existing open source technologies, with a modular design allowing for a community-driven effort to create plug-and-play micro-services. In this article we describe the system, elaborate on the underlying conceptual framework, and present an example implementation. Arteria can be reduced to 3 conceptual levels: orchestration (using an event-based model of automation), process (the steps involved in processing sequencing data, modelled as workflows), and execution (using a series of RESTful micro-services). This creates a system that is both flexible and scalable. Arteria-based systems have been successfully deployed at 3 sequencing core facilities. The Arteria Project code, written largely in Python, is available as open source software, and more information can be found at https://arteria-project.github.io/ . CONCLUSIONS: We describe the Arteria system and the underlying conceptual framework, demonstrating how this model can be used to automate data handling and analysis in the context of a sequencing core facility.

Subject(s)

Electronic Data Processing/methods , High-Throughput Nucleotide Sequencing/methods , Humans , Software , Workflow

Journal of Open Source Software (JOSS): design and first-year review.

Smith, Arfon M; Niemeyer, Kyle E; Katz, Daniel S; Barba, Lorena A; Githinji, George; Gymrek, Melissa; Huff, Kathryn D; Madan, Christopher R; Mayes, Abigail Cabunoc; Moerman, Kevin M; Prins, Pjotr; Ram, Karthik; Rokem, Ariel; Teal, Tracy K; Guimera, Roman Valls; Vanderplas, Jacob T.

PeerJ Prepr ; 4: e147, 2018.

Article in English | MEDLINE | ID: mdl-32704456

ABSTRACT

This article describes the motivation, design, and progress of the Journal of Open Source Software (JOSS). JOSS is a free and open-access journal that publishes articles describing research software. It has the dual goals of improving the quality of the software submitted and providing a mechanism for research software developers to receive credit. While designed to work within the current merit system of science, JOSS addresses the dearth of rewards for key contributions to science made in the form of software. JOSS publishes articles that encapsulate scholarship contained in the software itself, and its rigorous peer review targets the software components: functionality, documentation, tests, continuous integration, and the license. A JOSS article contains an abstract describing the purpose and functionality of the software, references, and a link to the software archive. The article is the entry point of a JOSS submission, which encompasses the full set of software artifacts. Submission and review proceed in the open, on GitHub. Editors, reviewers, and authors work collaboratively and openly. Unlike other journals, JOSS does not reject articles requiring major revision; while not yet accepted, articles remain visible and under review until the authors make adequate changes (or withdraw, if unable to meet requirements). Once an article is accepted, JOSS gives it a digital object identifier (DOI), deposits its metadata in Crossref, and the article can begin collecting citations on indexers like Google Scholar and other services. Authors retain copyright of their JOSS article, releasing it under a Creative Commons Attribution 4.0 International License. In its first year, starting in May 2016, JOSS published 111 articles, with more than 40 additional articles under review. JOSS is a sponsored project of the nonprofit organization NumFOCUS and is an affiliate of the Open Source Initiative (OSI).

Experiences with workflows for automating data-intensive bioinformatics.

Spjuth, Ola; Bongcam-Rudloff, Erik; Hernández, Guillermo Carrasco; Forer, Lukas; Giovacchini, Mario; Guimera, Roman Valls; Kallio, Aleksi; Korpelainen, Eija; Kandula, Maciej M; Krachunov, Milko; Kreil, David P; Kulev, Ognyan; Labaj, Pawel P; Lampa, Samuel; Pireddu, Luca; Schönherr, Sebastian; Siretskiy, Alexey; Vassilev, Dimitar.

Biol Direct ; 10: 43, 2015 Aug 19.

Article in English | MEDLINE | ID: mdl-26282399

ABSTRACT

High-throughput technologies, such as next-generation sequencing, have turned molecular biology into a data-intensive discipline, requiring bioinformaticians to use high-performance computing resources and carry out data management and analysis tasks on large scale. Workflow systems can be useful to simplify construction of analysis pipelines that automate tasks, support reproducibility and provide measures for fault-tolerance. However, workflow systems can incur significant development and administration overhead so bioinformatics pipelines are often still built without them. We present the experiences with workflows and workflow systems within the bioinformatics community participating in a series of hackathons and workshops of the EU COST action SeqAhead. The organizations are working on similar problems, but we have addressed them with different strategies and solutions. This fragmentation of efforts is inefficient and leads to redundant and incompatible solutions. Based on our experiences we define a set of recommendations for future systems to enable efficient yet simple bioinformatics workflow construction and execution.

Subject(s)

Computational Biology/methods , Electronic Data Processing/methods , Workflow , High-Throughput Nucleotide Sequencing , Reproducibility of Results

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL