RESUMO
Unipept Desktop 2.0 is the most recent iteration of the Unipept Desktop tool that adds support for the analysis of metaproteogenomics datasets. Unipept Desktop now supports the automatic construction of targeted protein reference databases that only contain proteins (originating from the UniProtKB resource) associated with a predetermined list of taxa. This improves both the taxonomic and functional resolution of a metaproteomic analysis and yields several technical advantages. By limiting the proteins present in a reference database, it is also possible to perform (meta)proteogenomics analyses. Since the protein reference database resides on the user's local machine, they have complete control over the database used during an analysis. Data no longer need to be transmitted over the Internet, decreasing the time required for an analysis and better safeguarding privacy-sensitive data. As a proof of concept, we present a case study in which a human gut metaproteome dataset is analyzed with Unipept Desktop 2.0 using different targeted databases based on matched 16S rRNA gene sequencing data.
Assuntos
Metagenômica , Proteínas , Humanos , Bases de Dados de Proteínas , RNA Ribossômico 16SRESUMO
SUMMARY: The Unipept Visualizations library is a JavaScript package to generate interactive visualizations of both hierarchical and non-hierarchical quantitative data. It provides four different visualizations: a sunburst, a treemap, a treeview and a heatmap. Every visualization is fully configurable, supports TypeScript and uses the excellent D3.js library. AVAILABILITY AND IMPLEMENTATION: The Unipept Visualizations library is available for download on NPM: https://npmjs.com/unipept-visualizations. All source code is freely available from GitHub under the MIT license: https://github.com/unipept/unipept-visualizations.
Assuntos
Visualização de Dados , Software , Biologia ComputacionalRESUMO
BACKGROUND: FragGeneScan is currently the most accurate and popular tool for gene prediction in short and error-prone reads, but its execution speed is insufficient for use on larger data sets. The parallelization which should have addressed this is inefficient. Its alternative implementation FragGeneScan+ is faster, but introduced a number of bugs related to memory management, race conditions and even output accuracy. RESULTS: This paper introduces FragGeneScanRs, a faster Rust implementation of the FragGeneScan gene prediction model. Its command line interface is backward compatible and adds extra features for more flexible usage. Its output is equivalent to the original FragGeneScan implementation. CONCLUSIONS: Compared to the current C implementation, shotgun metagenomic reads are processed up to 22 times faster using a single thread, with better scaling for multithreaded execution. The Rust code of FragGeneScanRs is freely available from GitHub under the GPL-3.0 license with instructions for installation, usage and other documentation ( https://github.com/unipept/FragGeneScanRs ).
Assuntos
Algoritmos , Software , Metagenoma , MetagenômicaRESUMO
BACKGROUND: Shotgun metagenomics yields ever richer and larger data volumes on the complex communities living in diverse environments. Extracting deep insights from the raw reads heavily depends on the availability of fast, accurate and user-friendly biodiversity analysis tools. RESULTS: Because environmental samples may contain strains and species that are not covered in reference databases and because protein sequences are more conserved than the genes encoding them, we explore the alternative route of taxonomic profiling based on protein coding regions translated from the shotgun metagenomics reads, instead of directly processing the DNA reads. We therefore developed the Unipept MetaGenomics Analysis Pipeline (UMGAP), a highly versatile suite of open source tools that are implemented in Rust and support parallelization to achieve optimal performance. Six preconfigured pipelines with different performance trade-offs were carefully selected, and benchmarked against a selection of state-of-the-art shotgun metagenomics taxonomic profiling tools. CONCLUSIONS: UMGAP's protein space detour for taxonomic profiling makes it competitive with state-of-the-art shotgun metagenomics tools. Despite our design choices of an extra protein translation step, a broad spectrum index that can identify both archaea, bacteria, eukaryotes and viruses, and a highly configurable non-monolithic design, UMGAP achieves low runtime, manageable memory footprint and high accuracy. Its interactive visualizations allow for easy exploration and comparison of complex communities.
Assuntos
Metagenômica , Vírus , Algoritmos , Bactérias/genética , Análise de Sequência de DNA , Software , Vírus/genéticaRESUMO
Metaproteomics has become an important research tool to study microbial systems, which has resulted in increased metaproteomics data generation. However, efficient tools for processing the acquired data have lagged behind. One widely used tool for metaproteomics data interpretation is Unipept, a web-based tool that provides, among others, interactive and insightful visualizations. Due to its web-based implementation, however, the Unipept web application is limited in the amount of data that can be analyzed. In this manuscript we therefore present Unipept Desktop, a desktop application version of Unipept that is designed to drastically increase the throughput and capacity of metaproteomics data analysis. Moreover, it provides a novel comparative analysis pipeline and improves the organization of experimental data into projects, thus addressing the growing need for more efficient and versatile analysis tools for metaproteomics data.
Assuntos
Análise de Dados , SoftwareRESUMO
SUMMARY: Unipept is an ecosystem of tools developed for fast metaproteomics data-analysis consisting of a web application, a set of web services (application programming interface, API) and a command-line interface (CLI). After the successful introduction of version 4 of the Unipept web application, we here introduce version 2.0 of the API and CLI. Next to the existing taxonomic analysis, version 2.0 of the API and CLI provides access to Unipept's powerful functional analysis for metaproteomics samples. The functional analysis pipeline supports retrieval of Enzyme Commission numbers, Gene Ontology terms and InterPro entries for the individual peptides in a metaproteomics sample. This paves the way for other applications and developers to integrate these new information sources into their data processing pipelines, which greatly increases insight into the functions performed by the organisms in a specific environment. Both the API and CLI have also been expanded with the ability to render interactive visualizations from a list of taxon ids. These visualizations are automatically made available on a dedicated website and can easily be shared by users. AVAILABILITY AND IMPLEMENTATION: The API is available at http://api.unipept.ugent.be. Information regarding the CLI can be found at https://unipept.ugent.be/clidocs. Both interfaces are freely available and open-source under the MIT license. CONTACT: pieter.verschaffelt@ugent.be. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Ecossistema , Software , Análise de Dados , PeptídeosRESUMO
Although metaproteomics, the study of the collective proteome of microbial communities, has become increasingly powerful and popular over the past few years, the field has lagged behind on the availability of user-friendly, end-to-end pipelines for data analysis. We therefore describe the connection from two commonly used metaproteomics data processing tools in the field, MetaProteomeAnalyzer and PeptideShaker, to Unipept for downstream analysis. Through these connections, direct end-to-end pipelines are built from database searching to taxonomic and functional annotation.
Assuntos
Análise de Dados , Microbiota , Proteoma , Proteômica , SoftwareRESUMO
Unipept ( https://unipept.ugent.be ) is a web application for metaproteome data analysis, with an initial focus on tryptic-peptide-based biodiversity analysis of MS/MS samples. Because the true potential of metaproteomics lies in gaining insight into the expressed functions of complex environmental samples, the 4.0 release of Unipept introduces complementary functional analysis based on GO terms and EC numbers. Integration of this new functional analysis with the existing biodiversity analysis is an important asset of the extended pipeline. As a proof of concept, a human faecal metaproteome data set from 15 healthy subjects was reanalyzed with Unipept 4.0, yielding fast, detailed, and straightforward characterization of taxon-specific catalytic functions that is shown to be consistent with previous results from a BLAST-based functional analysis of the same data.
Assuntos
Análise de Dados , Proteômica/métodos , Software , Biodiversidade , Misturas Complexas/análise , Fezes/química , Voluntários Saudáveis , Humanos , Estudo de Prova de Conceito , Espectrometria de Massas em TandemRESUMO
BACKGROUND: This chapter reports the evaluation of two shotgun metaproteomic workflows. The methods were developed to investigate gut dysbiosis via analysis of the faecal microbiota from patients with cystic fibrosis (CF). We aimed to set up an unbiased and effective method to extract the entire proteome, i.e. to extract sufficient bacterial proteins from the faecal samples in combination with a maximum of host proteins giving information on the disease state. METHODS: Two protocols were compared; the first method involves an enrichment of the bacterial proteins while the second method is a more direct method to generate a whole faecal proteome extract. The different extracts were analysed using denaturing polyacrylamide gel electrophoresis followed by liquid chromatography-tandem mass spectrometry aiming a maximal coverage of the bacterial protein content in faecal samples. RESULTS AND CONCLUSIONS: In all extracts, microbial proteins are detected, and in addition, nonbacterial proteins are detected in all samples providing information about the host status. Our study demonstrates the huge influence of the used protein extraction method on the obtained result and shows the need for a standardised and appropriate sample preparation for metaproteomic analysis. To address questions on the health status of the patients, a whole protein extract is preferred over a method to enrich the bacterial fraction. In addition, the method of the whole protein fraction is faster, which gives the possibility to analyse more biological replicates.
Assuntos
Fibrose Cística/complicações , Disbiose/diagnóstico , Fezes/química , Proteoma , Proteômica/métodos , Proteínas de Bactérias/análise , Cromatografia Líquida , Humanos , Espectrometria de Massas em TandemRESUMO
UNLABELLED: Unipept is an open source web application that is designed for metaproteomics analysis with a focus on interactive datavisualization. It is underpinned by a fast index built from UniProtKB and the NCBI taxonomy that enables quick retrieval of all UniProt entries in which a given tryptic peptide occurs. Unipept version 2.4 introduced web services that provide programmatic access to the metaproteomics analysis features. This enables integration of Unipept functionality in custom applications and data processing pipelines. AVAILABILITY AND IMPLEMENTATION: The web services are freely available at http://api.unipept.ugent.be and are open sourced under the MIT license. CONTACT: Unipept@ugent.be SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Metabolômica , Biologia Computacional , Bases de Dados Genéticas , Armazenamento e Recuperação da Informação , Internet , Bases de Conhecimento , Peptídeos , Software , Interface Usuário-Computador , Vocabulário ControladoRESUMO
Microbes hold the key to life. They hold the secrets to our past (as the descendants of the earliest forms of life) and the prospects for our future (as we mine their genes for solutions to some of the planet's most pressing problems, from global warming to antibiotic resistance). However, the piecemeal approach that has defined efforts to study microbial genetic diversity for over 20 years and in over 30,000 genome projects risks squandering that promise. These efforts have covered less than 20% of the diversity of the cultured archaeal and bacterial species, which represent just 15% of the overall known prokaryotic diversity. Here we call for the funding of a systematic effort to produce a comprehensive genomic catalog of all cultured Bacteria and Archaea by sequencing, where available, the type strain of each species with a validly published name (currentlyâ¼11,000). This effort will provide an unprecedented level of coverage of our planet's genetic diversity, allow for the large-scale discovery of novel genes and functions, and lead to an improved understanding of microbial evolution and function in the environment.
Assuntos
Genoma Arqueal/genética , Genoma Bacteriano/genética , Genômica , Análise de Sequência de DNA , Archaea/classificação , Archaea/genética , Bactérias/classificação , Bactérias/genética , Bases de Dados Genéticas , FilogeniaRESUMO
Despite growing evidence that biofilm formation on plastic debris in the marine environment may be essential for its biodegradation, the underlying processes have yet to be fully understood. Thus, far, bacterial biofilm formation had only been studied after short-term exposure or on floating plastic, yet a prominent share of plastic litter accumulates on the seafloor. In this study, we explored the taxonomic composition of bacterial and fungal communities on polyethylene plastic sheets and dolly ropes during long-term exposure on the seafloor, both at a harbor and an offshore location in the Belgian part of the North Sea. We reconstructed the sequence of events during biofilm formation on plastic in the harbor environment and identified a core bacteriome and subsets of bacterial indicator species for early, intermediate, and late stages of biofilm formation. Additionally, by implementing ITS2 metabarcoding on plastic debris, we identified and characterized for the first time fungal genera on plastic debris. Surprisingly, none of the plastics exposed to offshore conditions displayed the typical signature of a late stage biofilm, suggesting that biofilm formation is severely hampered in the natural environment where most plastic debris accumulates.
Assuntos
Biodegradação Ambiental , Plásticos , Resíduos , Bélgica , Mar do NorteRESUMO
The Unique Peptide Finder (http://unipept.ugent.be/peptidefinder) is an interactive web application to quickly hunt for tryptic peptides that are unique to a particular species, genus, or any other taxon. Biodiversity within the target taxon is represented by a set of proteomes selected from a monthly updated list of complete and nonredundant UniProt proteomes, supplemented with proprietary proteomes loaded into persistent local browser storage. The software computes and visualizes pan and core peptidomes as unions and intersections of tryptic peptides occurring in the selected proteomes. In addition, it also computes and displays unique peptidomes as the set of all tryptic peptides that occur in all selected proteomes but not in any UniProt record not assigned to the target taxon. As a result, the unique peptides can serve as robust biomarkers for the target taxon, for example, in targeted metaproteomics studies. Computations are extremely fast since they are underpinned by the Unipept database, the lowest common ancestor algorithm implemented in Unipept and modern web technologies that facilitate in-browser data storage and parallel processing.
Assuntos
Peptídeos/análise , Proteoma/química , Proteômica/métodos , Animais , Bactérias/química , Proteínas de Bactérias/química , Bases de Dados de Proteínas , Humanos , SoftwareRESUMO
Unipept (http://unipept.ugent.be) is a web application that offers a user-friendly way to explore the biodiversity of complex metaproteome samples by providing interactive visualizations. In this article, the updates and changes to Unipept since its initial release are presented. This includes the addition of interactive sunburst and treeview visualizations to the multipeptide analysis, the foundations of an application programming interface (API) and a command line interface, updated data sources, and the open-sourcing of the entire application under the MIT license.
Assuntos
Proteômica , Interface Usuário-Computador , Gráficos por Computador , Humanos , Metagenoma , Microbiota , Anotação de Sequência Molecular , Fragmentos de Peptídeos/química , FilogeniaRESUMO
BACKGROUND: Rapid evolutions in sequencing technology force read mappers into flexible adaptation to longer reads, changing error models, memory barriers and novel applications. RESULTS: ALFALFA achieves a high performance in accurately mapping long single-end and paired-end reads to gigabase-scale reference genomes, while remaining competitive for mapping shorter reads. Its seed-and-extend workflow is underpinned by fast retrieval of super-maximal exact matches from an enhanced sparse suffix array, with flexible parameter tuning to balance performance, memory footprint and accuracy. CONCLUSIONS: ALFALFA is open source and available at http://alfalfa.ugent.be .
Assuntos
Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Software , Algoritmos , Evolução Biológica , Humanos , Fluxo de TrabalhoRESUMO
Bacterial colonization of marine plastic litter (MPL) is known for over four decades. Still, only a few studies on the plastic colonization process and its influencing factors are reported. In this study, seafloor MPL was sampled at different locations across the Belgian part of the North Sea to study bacterial community structure using 16S metabarcoding. These marine plastic bacterial communities were compared with those of sediment and seawater, and resin pellets sampled on the beach, to investigate the origin and uniqueness of plastic bacterial communities. Plastics display great variation of bacterial community composition, while each showed significant differences from those of sediment and seawater, indicating that plastics represent a distinct environmental niche. Various environmental factors correlate with the diversity of MPL bacterial composition across plastics. In addition, intrinsic plastic-related factors such as pigment content may contribute to the differences in bacterial colonization. Furthermore, the differential abundance of known primary and secondary colonizers across the various plastics may indicate different stages of bacterial colonization, and may confound comparisons of free-floating plastics. Our studies provide insights in the factors that shape plastic bacterial colonization and shed light on the possible role of plastic as transport vehicle for bacteria through the aquatic environment.
Assuntos
Bactérias/genética , Plásticos , Água do Mar/microbiologia , Bactérias/classificação , Bélgica , Biodiversidade , DNA Ribossômico , Mar do Norte , Plásticos/químicaRESUMO
Rotors occurring in the heart underlie the mechanisms of cardiac arrhythmias. Answering the question whether or not the location of rotors is related to local properties of cardiac tissue has important practical applications. This is because ablation of rotors has been shown to be an effective way to fight cardiac arrhythmias. In this study, we investigate, in silico, the dynamics of rotors in two-dimensional and in an anatomical model of human ventricles using a Ten Tusscher-Noble-Noble-Panfilov (TNNP) model for ventricular cells. We study the effect of small size ionic heterogeneities, similar to those measured experimentally. It is shown that such heterogeneities cannot only anchor, but can also attract, rotors rotating at a substantial distance from the heterogeneity. This attraction distance depends on the extent of the heterogeneities and can be as large as 5-6 cm in realistic conditions. We conclude that small size ionic heterogeneities can be preferred localization points for rotors and discuss their possible mechanism and value for applications.
Assuntos
Arritmias Cardíacas/fisiopatologia , Sistema de Condução Cardíaco/fisiopatologia , Ventrículos do Coração/fisiopatologia , Potenciais de Ação , Arritmias Cardíacas/patologia , Gráficos por Computador , Simulação por Computador , Sistema de Condução Cardíaco/patologia , Ventrículos do Coração/patologia , Humanos , Cinética , Modelos Anatômicos , Modelos CardiovascularesRESUMO
We have developed essaMEM, a tool for finding maximal exact matches that can be used in genome comparison and read mapping. essaMEM enhances an existing sparse suffix array implementation with a sparse child array. Tests indicate that the enhanced algorithm for finding maximal exact matches is much faster, while maintaining the same memory footprint. In this way, sparse suffix arrays remain competitive with the more complex compressed suffix arrays.
Assuntos
Análise de Sequência/métodos , Software , Algoritmos , Animais , Drosophila/genética , GenomaRESUMO
A vast and rich body of information has grown up as a result of the world's enthusiasm for 'omics technologies. Finding ways to describe and make available this information that maximise its usefulness has become a major effort across the 'omics world. At the heart of this effort is the Genomic Standards Consortium (GSC), an open-membership organization that drives community-based standardization activities, Here we provide a short history of the GSC, provide an overview of its range of current activities, and make a call for the scientific community to join forces to improve the quality and quantity of contextual information about our public collections of genomes, metagenomes, and marker gene sequences.
Assuntos
Bases de Dados Genéticas , Genômica/normas , Cooperação Internacional , MetagenomaRESUMO
The combination of incessant advances in sequencing technology producing large amounts of data and innovative bioinformatics approaches, designed to cope with this data flood, has led to new interesting results in the life sciences. Given the magnitude of sequence data to be processed, many bioinformatics tools rely on efficient solutions to a variety of complex string problems. These solutions include fast heuristic algorithms and advanced data structures, generally referred to as index structures. Although the importance of index structures is generally known to the bioinformatics community, the design and potency of these data structures, as well as their properties and limitations, are less understood. Moreover, the last decade has seen a boom in the number of variant index structures featuring complex and diverse memory-time trade-offs. This article brings a comprehensive state-of-the-art overview of the most popular index structures and their recently developed variants. Their features, interrelationships, the trade-offs they impose, but also their practical limitations, are explained and compared.