Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
BMC Bioinformatics ; 25(1): 228, 2024 Jul 02.
Artículo en Inglés | MEDLINE | ID: mdl-38956506

RESUMEN

BACKGROUND: Fungi play a key role in several important ecological functions, ranging from organic matter decomposition to symbiotic associations with plants. Moreover, fungi naturally inhabit the human body and can be beneficial when administered as probiotics. In mycology, the internal transcribed spacer (ITS) region was adopted as the universal marker for classifying fungi. Hence, an accurate and robust method for ITS classification is not only desired for the purpose of better diversity estimation, but it can also help us gain a deeper insight into the dynamics of environmental communities and ultimately comprehend whether the abundance of certain species correlate with health and disease. Although many methods have been proposed for taxonomic classification, to the best of our knowledge, none of them fully explore the taxonomic tree hierarchy when building their models. This in turn, leads to lower generalization power and higher risk of committing classification errors. RESULTS: Here we introduce HiTaC, a robust hierarchical machine learning model for accurate ITS classification, which requires a small amount of data for training and can handle imbalanced datasets. HiTaC was thoroughly evaluated with the established TAXXI benchmark and could correctly classify fungal ITS sequences of varying lengths and a range of identity differences between the training and test data. HiTaC outperforms state-of-the-art methods when trained over noisy data, consistently achieving higher F1-score and sensitivity across different taxonomic ranks, improving sensitivity by 6.9 percentage points over top methods in the most noisy dataset available on TAXXI. CONCLUSIONS: HiTaC is publicly available at the Python package index, BIOCONDA and Docker Hub. It is released under the new BSD license, allowing free use in academia and industry. Source code and documentation, which includes installation and usage instructions, are available at https://gitlab.com/dacs-hpi/hitac .


Asunto(s)
Hongos , Aprendizaje Automático , Hongos/genética , Hongos/clasificación , ADN Espaciador Ribosómico/genética , Programas Informáticos
2.
Gut Pathog ; 16(1): 27, 2024 May 12.
Artículo en Inglés | MEDLINE | ID: mdl-38735967

RESUMEN

BACKGROUND: Enhancing our understanding of the underlying influences of medical interventions on the microbiome, resistome and mycobiome of preterm born infants holds significant potential for advancing infection prevention and treatment strategies. We conducted a prospective quasi-intervention study to better understand how antibiotics, and probiotics, and other medical factors influence the gut development of preterm infants. A controlled neonatal mice model was conducted in parallel, designed to closely reflect and predict exposures. Preterm infants and neonatal mice were stratified into four groups: antibiotics only, probiotics only, antibiotics followed by probiotics, and none of these interventions. Stool samples from both preterm infants and neonatal mice were collected at varying time points and analyzed by 16 S rRNA amplicon sequencing, ITS amplicon sequencing and whole genome shotgun sequencing. RESULTS: The human infant microbiomes showed an unexpectedly high degree of heterogeneity. Little impact from medical exposure (antibiotics/probiotics) was observed on the strain patterns, however, Bifidobacterium bifidum was found more abundant after exposure to probiotics, regardless of prior antibiotic administration. Twenty-seven antibiotic resistant genes were identified in the resistome. High intra-variability was evident within the different treatment groups. Lastly, we found significant effects of antibiotics and probiotics on the mycobiome but not on the microbiome and resistome of preterm infants. CONCLUSIONS: Although our analyses showed transient effects, these results provide positive motivation to continue the research on the effects of medical interventions on the microbiome, resistome and mycobiome of preterm infants.

3.
Gigascience ; 122022 12 28.
Artículo en Inglés | MEDLINE | ID: mdl-36994872

RESUMEN

BACKGROUND: Contamination detection is a important step that should be carefully considered in early stages when designing and performing microbiome studies to avoid biased outcomes. Detecting and removing true contaminants is challenging, especially in low-biomass samples or in studies lacking proper controls. Interactive visualizations and analysis platforms are crucial to better guide this step, to help to identify and detect noisy patterns that could potentially be contamination. Additionally, external evidence, like aggregation of several contamination detection methods and the use of common contaminants reported in the literature, could help to discover and mitigate contamination. RESULTS: We propose GRIMER, a tool that performs automated analyses and generates a portable and interactive dashboard integrating annotation, taxonomy, and metadata. It unifies several sources of evidence to help detect contamination. GRIMER is independent of quantification methods and directly analyzes contingency tables to create an interactive and offline report. Reports can be created in seconds and are accessible for nonspecialists, providing an intuitive set of charts to explore data distribution among observations and samples and its connections with external sources. Further, we compiled and used an extensive list of possible external contaminant taxa and common contaminants with 210 genera and 627 species reported in 22 published articles. CONCLUSION: GRIMER enables visual data exploration and analysis, supporting contamination detection in microbiome studies. The tool and data presented are open source and available at https://gitlab.com/dacs-hpi/grimer.


Asunto(s)
Microbiota , Biomasa , Metadatos
4.
Bioinformatics ; 36(Suppl_1): i12-i20, 2020 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-32657362

RESUMEN

MOTIVATION: The exponential growth of assembled genome sequences greatly benefits metagenomics studies. However, currently available methods struggle to manage the increasing amount of sequences and their frequent updates. Indexing the current RefSeq can take days and hundreds of GB of memory on large servers. Few methods address these issues thus far, and even though many can theoretically handle large amounts of references, time/memory requirements are prohibitive in practice. As a result, many studies that require sequence classification use often outdated and almost never truly up-to-date indices. RESULTS: Motivated by those limitations, we created ganon, a k-mer-based read classification tool that uses Interleaved Bloom Filters in conjunction with a taxonomic clustering and a k-mer counting/filtering scheme. Ganon provides an efficient method for indexing references, keeping them updated. It requires <55 min to index the complete RefSeq of bacteria, archaea, fungi and viruses. The tool can further keep these indices up-to-date in a fraction of the time necessary to create them. Ganon makes it possible to query against very large reference sets and therefore it classifies significantly more reads and identifies more species than similar methods. When classifying a high-complexity CAMI challenge dataset against complete genomes from RefSeq, ganon shows strongly increased precision with equal or better sensitivity compared with state-of-the-art tools. With the same dataset against the complete RefSeq, ganon improved the F1-score by 65% at the genus level. It supports taxonomy- and assembly-level classification, multiple indices and hierarchical classification. AVAILABILITY AND IMPLEMENTATION: The software is open-source and available at: https://gitlab.com/rki_bioinformatics/ganon. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Metagenómica , Archaea , Análisis de Secuencia de ADN , Programas Informáticos
5.
NAR Genom Bioinform ; 2(3): lqaa058, 2020 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-33575609

RESUMEN

The study of bacterial symbioses has grown exponentially in the recent past. However, existing bioinformatic workflows of microbiome data analysis do commonly not integrate multiple meta-omics levels and are mainly geared toward human microbiomes. Microbiota are better understood when analyzed in their biological context; that is together with their host or environment. Nevertheless, this is a limitation when studying non-model organisms mainly due to the lack of well-annotated sequence references. Here, we present gNOMO, a bioinformatic pipeline that is specifically designed to process and analyze non-model organism samples of up to three meta-omics levels: metagenomics, metatranscriptomics and metaproteomics in an integrative manner. The pipeline has been developed using the workflow management framework Snakemake in order to obtain an automated and reproducible pipeline. Using experimental datasets of the German cockroach Blattella germanica, a non-model organism with very complex gut microbiome, we show the capabilities of gNOMO with regard to meta-omics data integration, expression ratio comparison, taxonomic and functional analysis as well as intuitive output visualization. In conclusion, gNOMO is a bioinformatic pipeline that can easily be configured, for integrating and analyzing multiple meta-omics data types and for producing output visualizations, specifically designed for integrating paired-end sequencing data with mass spectrometry from non-model organisms.

7.
Bioinformatics ; 34(17): i766-i772, 2018 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-30423080

RESUMEN

Motivation: Mapping-based approaches have become limited in their application to very large sets of references since computing an FM-index for very large databases (e.g. >10 GB) has become a bottleneck. This affects many analyses that need such index as an essential step for approximate matching of the NGS reads to reference databases. For instance, in typical metagenomics analysis, the size of the reference sequences has become prohibitive to compute a single full-text index on standard machines. Even on large memory machines, computing such index takes about 1 day of computing time. As a result, updates of indices are rarely performed. Hence, it is desirable to create an alternative way of indexing while preserving fast search times. Results: To solve the index construction and update problem we propose the DREAM (Dynamic seaRchablE pArallel coMpressed index) framework and provide an implementation. The main contributions are the introduction of an approximate search distributor via a novel use of Bloom filters. We combine several Bloom filters to form an interleaved Bloom filter and use this new data structure to quickly exclude reads for parts of the databases where they cannot match. This allows us to keep the databases in several indices which can be easily rebuilt if parts are updated while maintaining a fast search time. The second main contribution is an implementation of DREAM-Yara a distributed version of a fully sensitive read mapper under the DREAM framework. Availability and implementation: https://gitlab.com/pirovc/dream_yara/.


Asunto(s)
Bases de Datos Factuales , Programas Informáticos , Humanos , Factores de Tiempo
8.
Microbiome ; 5(1): 101, 2017 08 14.
Artículo en Inglés | MEDLINE | ID: mdl-28807044

RESUMEN

BACKGROUND: Many metagenome analysis tools are presently available to classify sequences and profile environmental samples. In particular, taxonomic profiling and binning methods are commonly used for such tasks. Tools available among these two categories make use of several techniques, e.g., read mapping, k-mer alignment, and composition analysis. Variations on the construction of the corresponding reference sequence databases are also common. In addition, different tools provide good results in different datasets and configurations. All this variation creates a complicated scenario to researchers to decide which methods to use. Installation, configuration and execution can also be difficult especially when dealing with multiple datasets and tools. RESULTS: We propose MetaMeta: a pipeline to execute and integrate results from metagenome analysis tools. MetaMeta provides an easy workflow to run multiple tools with multiple samples, producing a single enhanced output profile for each sample. MetaMeta includes a database generation, pre-processing, execution, and integration steps, allowing easy execution and parallelization. The integration relies on the co-occurrence of organisms from different methods as the main feature to improve community profiling while accounting for differences in their databases. CONCLUSIONS: In a controlled case with simulated and real data, we show that the integrated profiles of MetaMeta overcome the best single profile. Using the same input data, it provides more sensitive and reliable results with the presence of each organism being supported by several methods. MetaMeta uses Snakemake and has six pre-configured tools, all available at BioConda channel for easy installation (conda install -c bioconda metameta). The MetaMeta pipeline is open-source and can be downloaded at: https://gitlab.com/rki_bioinformatics .


Asunto(s)
Bacterias/clasificación , Biología Computacional/métodos , Metagenoma , Programas Informáticos , Algoritmos , Simulación por Computador , Bases de Datos Genéticas , Humanos , Metagenómica/métodos , Filogenia , Análisis de Secuencia de ADN/métodos
9.
Bioinformatics ; 32(15): 2272-80, 2016 08 01.
Artículo en Inglés | MEDLINE | ID: mdl-27153591

RESUMEN

MOTIVATION: Species identification and quantification are common tasks in metagenomics and pathogen detection studies. The most recent techniques are built on mapping the sequenced reads against a reference database (e.g. whole genomes, marker genes, proteins) followed by application-dependent analysis steps. Although these methods have been proven to be useful in many scenarios, there is still room for improvement in species and strain level detection, mainly for low abundant organisms. RESULTS: We propose a new method: DUDes, a reference-based taxonomic profiler that introduces a novel top-down approach to analyze metagenomic Next-generation sequencing (NGS) samples. Rather than predicting an organism presence in the sample based only on relative abundances, DUDes first identifies possible candidates by comparing the strength of the read mapping in each node of the taxonomic tree in an iterative manner. Instead of using the lowest common ancestor we propose a new approach: the deepest uncommon descendent. We showed in experiments that DUDes works for single and multiple organisms and can identify low abundant taxonomic groups with high precision. AVAILABILITY AND IMPLEMENTATION: DUDes is open source and it is available at http://sf.net/p/dudes SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. CONTACT: renardB@rki.de.


Asunto(s)
Algoritmos , Secuenciación de Nucleótidos de Alto Rendimiento , Metagenómica , Genoma
10.
BMC Bioinformatics ; 16: 240, 2015 Jul 30.
Artículo en Inglés | MEDLINE | ID: mdl-26224355

RESUMEN

BACKGROUND: Evaluating the quality and reliability of a de novo assembly and of single contigs in particular is challenging since commonly a ground truth is not readily available and numerous factors may influence results. Currently available procedures provide assembly scores but lack a comparative quality ranking of contigs within an assembly. RESULTS: We present SuRankCo, which relies on a machine learning approach to predict quality scores for contigs and to enable the ranking of contigs within an assembly. The result is a sorted contig set which allows selective contig usage in downstream analysis. Benchmarking on datasets with known ground truth shows promising sensitivity and specificity and favorable comparison to existing methodology. CONCLUSIONS: SuRankCo analyzes the reliability of de novo assemblies on the contig level and thereby allows quality control and ranking prior to further downstream and validation experiments.


Asunto(s)
Mapeo Contig/métodos , Programas Informáticos , Algoritmos , Escherichia coli/genética , Escherichia coli/metabolismo , Curva ROC
11.
BMC Res Notes ; 7: 371, 2014 Jun 18.
Artículo en Inglés | MEDLINE | ID: mdl-24938749

RESUMEN

BACKGROUND: The fast reduction of prices of DNA sequencing allowed rapid accumulation of genome data. However, the process of obtaining complete genome sequences is still very time consuming and labor demanding. In addition, data produced from various sequencing technologies or alternative assemblies remain underexplored to improve assembly of incomplete genome sequences. FINDINGS: We have developed FGAP, a tool for closing gaps of draft genome sequences that takes advantage of different datasets. FGAP uses BLAST to align multiple contigs against a draft genome assembly aiming to find sequences that overlap gaps. The algorithm selects the best sequence to fill and eliminate the gap. CONCLUSIONS: FGAP reduced the number of gaps by 78% in an E. coli draft genome assembly using two different sequencing technologies, Illumina and 454. Using PacBio long reads, 98% of gaps were solved. In human chromosome 14 assemblies, FGAP reduced the number of gaps by 35%. All the inserted sequences were validated with a reference genome using QUAST. The source code and a web tool are available at http://www.bioinfo.ufpr.br/fgap/.


Asunto(s)
Mapeo Contig/métodos , Escherichia coli/genética , Genoma Bacteriano , Genoma Humano , Programas Informáticos , Algoritmos , Secuencia de Bases , Cromosomas Humanos Par 14 , Mapeo Contig/estadística & datos numéricos , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Datos de Secuencia Molecular
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA