Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 26
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
PLoS Comput Biol ; 19(1): e1010752, 2023 01.
Artículo en Inglés | MEDLINE | ID: mdl-36622853

RESUMEN

There is an ongoing explosion of scientific datasets being generated, brought on by recent technological advances in many areas of the natural sciences. As a result, the life sciences have become increasingly computational in nature, and bioinformatics has taken on a central role in research studies. However, basic computational skills, data analysis, and stewardship are still rarely taught in life science educational programs, resulting in a skills gap in many of the researchers tasked with analysing these big datasets. In order to address this skills gap and empower researchers to perform their own data analyses, the Galaxy Training Network (GTN) has previously developed the Galaxy Training Platform (https://training.galaxyproject.org), an open access, community-driven framework for the collection of FAIR (Findable, Accessible, Interoperable, Reusable) training materials for data analysis utilizing the user-friendly Galaxy framework as its primary data analysis platform. Since its inception, this training platform has thrived, with the number of tutorials and contributors growing rapidly, and the range of topics extending beyond life sciences to include topics such as climatology, cheminformatics, and machine learning. While initially aimed at supporting researchers directly, the GTN framework has proven to be an invaluable resource for educators as well. We have focused our efforts in recent years on adding increased support for this growing community of instructors. New features have been added to facilitate the use of the materials in a classroom setting, simplifying the contribution flow for new materials, and have added a set of train-the-trainer lessons. Here, we present the latest developments in the GTN project, aimed at facilitating the use of the Galaxy Training materials by educators, and its usage in different learning environments.


Asunto(s)
Biología Computacional , Programas Informáticos , Humanos , Biología Computacional/métodos , Análisis de Datos , Investigadores
2.
PLoS Comput Biol ; 17(5): e1008923, 2021 05.
Artículo en Inglés | MEDLINE | ID: mdl-33983944

RESUMEN

The COVID-19 pandemic is shifting teaching to an online setting all over the world. The Galaxy framework facilitates the online learning process and makes it accessible by providing a library of high-quality community-curated training materials, enabling easy access to data and tools, and facilitates sharing achievements and progress between students and instructors. By combining Galaxy with robust communication channels, effective instruction can be designed inclusively, regardless of the students' environments.


Asunto(s)
COVID-19/epidemiología , Instrucción por Computador , Educación a Distancia/organización & administración , COVID-19/virología , Biología Computacional , Humanos , Difusión de la Información , Pandemias , SARS-CoV-2/aislamiento & purificación
3.
BMC Microbiol ; 21(1): 171, 2021 06 07.
Artículo en Inglés | MEDLINE | ID: mdl-34098864

RESUMEN

BACKGROUND: Bacterial plasmids often carry antibiotic resistance genes and are a significant factor in the spread of antibiotic resistance. The ability to completely assemble plasmid sequences would facilitate the localization of antibiotic resistance genes, the identification of genes that promote plasmid transmission and the accurate tracking of plasmid mobility. However, the complete assembly of plasmid sequences using the currently most widely used sequencing platform (Illumina-based sequencing) is restricted due to the generation of short sequence lengths. The long-read Oxford Nanopore Technologies (ONT) sequencing platform overcomes this limitation. Still, the assembly of plasmid sequence data remains challenging due to software incompatibility with long-reads and the error rate generated using ONT sequencing. Bioinformatics pipelines have been developed for ONT-generated sequencing but require computational skills that frequently are beyond the abilities of scientific researchers. To overcome this challenge, the authors developed 'WeFaceNano', a user-friendly Web interFace for rapid assembly and analysis of plasmid DNA sequences generated using the ONT platform. WeFaceNano includes: a read statistics report; two assemblers (Miniasm and Flye); BLAST searching; the detection of antibiotic resistance- and replicon genes and several plasmid visualizations. A user-friendly interface displays the main features of WeFaceNano and gives access to the analysis tools. RESULTS: Publicly available ONT sequence data of 21 plasmids were used to validate WeFaceNano, with plasmid assemblages and anti-microbial resistance gene detection being concordant with the published results. Interestingly, the "Flye" assembler with "meta" settings generated the most complete plasmids. CONCLUSIONS: WeFaceNano is a user-friendly open-source software pipeline suitable for accurate plasmid assembly and the detection of anti-microbial resistance genes in (clinical) samples where multiple plasmids can be present.


Asunto(s)
Bacterias/genética , Anotación de Secuencia Molecular/métodos , Plásmidos/genética , Programas Informáticos , Bacterias/clasificación , Bacterias/efectos de los fármacos , Bacterias/aislamiento & purificación , Proteínas Bacterianas/genética , Biología Computacional/instrumentación , Biología Computacional/métodos , Farmacorresistencia Bacteriana , Secuenciación de Nucleótidos de Alto Rendimiento
4.
Nucleic Acids Res ; 46(W1): W537-W544, 2018 07 02.
Artículo en Inglés | MEDLINE | ID: mdl-29790989

RESUMEN

Galaxy (homepage: https://galaxyproject.org, main public server: https://usegalaxy.org) is a web-based scientific analysis platform used by tens of thousands of scientists across the world to analyze large biomedical datasets such as those found in genomics, proteomics, metabolomics and imaging. Started in 2005, Galaxy continues to focus on three key challenges of data-driven biomedical science: making analyses accessible to all researchers, ensuring analyses are completely reproducible, and making it simple to communicate analyses so that they can be reused and extended. During the last two years, the Galaxy team and the open-source community around Galaxy have made substantial improvements to Galaxy's core framework, user interface, tools, and training materials. Framework and user interface improvements now enable Galaxy to be used for analyzing tens of thousands of datasets, and >5500 tools are now available from the Galaxy ToolShed. The Galaxy community has led an effort to create numerous high-quality tutorials focused on common types of genomic analyses. The Galaxy developer and user communities continue to grow and be integral to Galaxy's development. The number of Galaxy public servers, developers contributing to the Galaxy framework and its tools, and users of the main Galaxy server have all increased substantially.


Asunto(s)
Genómica/estadística & datos numéricos , Metabolómica/estadística & datos numéricos , Imagen Molecular/estadística & datos numéricos , Proteómica/estadística & datos numéricos , Interfaz Usuario-Computador , Conjuntos de Datos como Asunto , Humanos , Difusión de la Información , Cooperación Internacional , Internet , Reproducibilidad de los Resultados
5.
Genome Res ; 25(9): 1382-90, 2015 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-26209359

RESUMEN

Tumor analyses commonly employ a correction with a matched normal (MN), a sample from healthy tissue of the same individual, in order to distinguish germline mutations from somatic mutations. Since the majority of variants found in an individual are thought to be common within the population, we constructed a set of 931 samples from healthy, unrelated individuals, originating from two different sequencing platforms, to serve as a virtual normal (VN) in the absence of such an associated normal sample. Our approach removed (1) >96% of the germline variants also removed by the MN sample and (2) a large number (2%-8%) of additional variants not corrected for by the associated normal. The combination of the VN with the MN improved the correction for polymorphisms significantly, with up to ∼30% compared with MN and ∼15% compared with VN only. We determined the number of unrelated genomes needed in order to correct at least as efficiently as the MN is about 200 for structural variations (SVs) and about 400 for single-nucleotide variants (SNVs) and indels. In addition, we propose that the removal of common variants with purely position-based methods is inaccurate and incurs additional false-positive somatic variants, and more sophisticated algorithms, which are capable of leveraging information about the area surrounding variants, are needed for optimal accuracy. Our VN correction method can be used to analyze any list of variants, regardless of sequencing platform of origin. This VN methodology is available for use on our public Galaxy server.


Asunto(s)
ADN de Neoplasias , Mutación de Línea Germinal , Mutación , Neoplasias/genética , Neoplasias de la Mama/genética , Biología Computacional/métodos , Bases de Datos de Ácidos Nucleicos , Femenino , Genómica/métodos , Humanos , Mutación INDEL , Masculino , Polimorfismo de Nucleótido Simple , Neoplasias de la Próstata/genética , Reproducibilidad de los Resultados , Navegador Web
6.
Eur J Clin Microbiol Infect Dis ; 37(6): 1081-1089, 2018 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-29549470

RESUMEN

Microbiota profiling has the potential to greatly impact on routine clinical diagnostics by detecting DNA derived from live, fastidious, and dead bacterial cells present within clinical samples. Such results could potentially be used to benefit patients by influencing antibiotic prescribing practices or to generate new classical-based diagnostic methods, e.g., culture or PCR. However, technical flaws in 16S rRNA gene next-generation sequencing (NGS) protocols, together with the requirement for access to bioinformatics, currently hinder the introduction of microbiota analysis into clinical diagnostics. Here, we report on the development and evaluation of an "end-to-end" microbiota profiling platform (MYcrobiota), which combines our previously validated micelle PCR/NGS (micPCR/NGS) methodology with an easy-to-use, dedicated bioinformatics pipeline. The newly designed bioinformatics pipeline processes micPCR/NGS data automatically and summarizes the results in interactive, but simple web reports. In order to explore the utility of MYcrobiota in clinical diagnostics, 47 clinical samples (40 "damaged skin" samples and 7 synovial fluids) were investigated using routine bacterial culture as comparator. MYcrobiota confirmed the presence of bacterial DNA in 37/37 culture-positive samples and detected bacterial taxa in 2/10 culture-negative samples. Moreover, 36/38 potentially relevant aerobic bacterial taxa and 3/3 mixtures of anaerobic bacteria were identified using culture and MYcrobiota, with the sensitivity and specificity being 95%. Interestingly, the majority of the 448 bacterial taxa identified using MYcrobiota were not identified using culture, which could potentially have an impact on clinical decision-making. Taken together, the development of MYcrobiota is a promising step towards the introduction of microbiota analysis into clinical diagnostic laboratories.


Asunto(s)
Bacterias/genética , Técnicas de Laboratorio Clínico/métodos , Biología Computacional/métodos , ADN Bacteriano/genética , Microbiota/genética , Bacterias/aislamiento & purificación , Técnicas de Laboratorio Clínico/instrumentación , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Técnicas de Diagnóstico Molecular/instrumentación , Técnicas de Diagnóstico Molecular/métodos , Filogenia , Reacción en Cadena de la Polimerasa/métodos , ARN Ribosómico 16S/genética , Estudios Retrospectivos , Análisis de Secuencia de ADN/métodos , Úlcera/microbiología , Heridas y Lesiones/microbiología
7.
Bioinformatics ; 32(8): 1226-8, 2016 04 15.
Artículo en Inglés | MEDLINE | ID: mdl-26656567

RESUMEN

UNLABELLED: A new generation of tools that identify fusion genes in RNA-seq data is limited in either sensitivity and or specificity. To allow further downstream analysis and to estimate performance, predicted fusion genes from different tools have to be compared. However, the transcriptomic context complicates genomic location-based matching. FusionMatcher (FuMa) is a program that reports identical fusion genes based on gene-name annotations. FuMa automatically compares and summarizes all combinations of two or more datasets in a single run, without additional programming necessary. FuMa uses one gene annotation, avoiding mismatches caused by tool-specific gene annotations. FuMa matches 10% more fusion genes compared with exact gene matching due to overlapping genes and accepts intermediate output files that allow a stepwise analysis of corresponding tools. AVAILABILITY AND IMPLEMENTATION: The code is available at: https://github.com/ErasmusMC-Bioinformatics/fuma and available for Galaxy in the tool sheds and directly accessible at https://bioinf-galaxian.erasmusmc.nl/galaxy/ CONTACT: y.hoogstrate@erasmusmc.nl or a.stubbs@erasmusmc.nl SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Análisis de Secuencia de ARN , Programas Informáticos , Genoma , Genómica , Humanos , ARN
8.
BMC Immunol ; 15: 59, 2014 Dec 13.
Artículo en Inglés | MEDLINE | ID: mdl-25495099

RESUMEN

BACKGROUND: Sequence analysis of immunoglobulin heavy chain (IGH) gene rearrangements and frequency analysis is a powerful tool for studying the immune repertoire, immune responses and immune dysregulation in health and disease. The challenge is to provide user friendly, secure and reproducible analytical services that are available for both small and large laboratories which are determining VDJ repertoire using NGS technology. RESULTS: In this study we describe ImmunoGlobulin Galaxy (IGGalaxy)- a convenient web based application for analyzing next-generation sequencing results and reporting IGH gene rearrangements for both repertoire and clonality studies. IGGalaxy has two analysis options one using the built in igBLAST algorithm and the second using output from IMGT; in either case repertoire summaries for the B-cell populations tested are available. IGGalaxy supports multi-sample and multi-replicate input analysis for both igBLAST and IMGT/HIGHV-QUEST. We demonstrate the technical validity of this platform using a standard dataset, S22, used for benchmarking the performance of antibody alignment utilities with a 99.9 % concordance with previous results. Re-analysis of NGS data from our samples of RAG-deficient patients demonstrated the validity and user friendliness of this tool. CONCLUSIONS: IGGalaxy provides clinical researchers with detailed insight into the repertoire of the B-cell population per individual sequenced and between control and pathogenic genomes. IGGalaxy was developed for 454 NGS results but is capable of analyzing alternative NGS data (e.g. Illumina, Ion Torrent). We demonstrate the use of a Galaxy virtual machine to determine the VDJ repertoire for reference data and from B-cells taken from immune deficient patients. IGGalaxy is available as a VM for download and use on a desktop PC or on a server.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Reordenamiento Génico de Cadena Pesada de Linfocito B , Síndromes de Inmunodeficiencia/genética , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Humanos , Síndromes de Inmunodeficiencia/inmunología
9.
Bioinformatics ; 29(13): 1700-1, 2013 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-23661695

RESUMEN

UNLABELLED: We present iFUSE (integrated fusion gene explorer), an online visualization tool that provides a fast and informative view of structural variation data and prioritizes those breaks likely representing fusion genes. This application uses calculated break points to determine fusion genes based on the latest annotation for genomic sequence information, and where relevant the structural variation (SV) events are annotated with predicted RNA and protein sequences. iFUSE takes as input a Complete Genomics (CG) junction file, a FusionMap fusion detection report file or a file already analysed and annotated by the iFUSE application on a previous occasion. RESULTS: We demonstrate the use of iFUSE with case studies from tumour-normal SV detection derived from Complete Genomics whole-genome sequencing results. AVAILABILITY: iFUSE is available as a web service at http://ifuse.erasmusmc.nl.


Asunto(s)
Fusión Génica , Variación Estructural del Genoma , Programas Informáticos , Genes Relacionados con las Neoplasias , Genómica/métodos , Humanos
10.
Gigascience ; 132024 01 02.
Artículo en Inglés | MEDLINE | ID: mdl-38280189

RESUMEN

BACKGROUND: In clinical research, data have to be accessible and reproducible, but the generated data are becoming larger and analysis complex. Here we propose a platform for Findable, Accessible, Interoperable, and Reusable (FAIR) data access and creating reproducible findings. Standardized access to a major genomic repository, the European Genome-Phenome Archive (EGA), has been achieved with API services like PyEGA3. We aim to provide a FAIR data analysis service in Galaxy by retrieving genomic data from the EGA and provide a generalized "omics" platform for FAIR data analysis. RESULTS: To demonstrate this, we implemented an end-to-end Galaxy workflow to replicate the findings from an RD-Connect synthetic dataset Beyond the 1 Million Genomes (synB1MG) available from the EGA. We developed the PyEGA3 connector within Galaxy to easily download multiple datasets from the EGA. We added the gene.iobio tool, a diagnostic environment for precision genomics, to Galaxy and demonstrate that it provides a more dynamic and interpretable view for trio analysis results. We developed a Galaxy trio analysis workflow to determine the pathogenic variants from the synB1MG trios using the GEMINI and gene.iobio tool. The complete workflow is available at WorkflowHub, and an associated tutorial was created in the Galaxy Training Network, which helps researchers unfamiliar with Galaxy to run the workflow. CONCLUSIONS: We showed the feasibility of reusing data from the EGA in Galaxy via PyEGA3 and validated the workflow by rediscovering spiked-in variants in synthetic data. Finally, we improved existing tools in Galaxy and created a workflow for trio analysis to demonstrate the value of FAIR genomics analysis in Galaxy.


Asunto(s)
Genómica , Programas Informáticos , Genómica/métodos , Genoma , Flujo de Trabajo
11.
Hum Genet ; 132(6): 709-13, 2013 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-23615946

RESUMEN

The VCaP cell line is widely used in prostate cancer research as it is a unique model to study castrate resistant disease expressing high levels of the wild type androgen receptor and the TMPRSS2-ERG fusion transcript. Using next generation sequencing, we assembled the structural variations in VCaP genomic DNA and observed a massive number of genomic rearrangements along the q arm of chromosome 5, characteristic of chromothripsis. Chromothripsis is a recently recognized phenomenon characterized by extensive chromosomal shattering in a single catastrophic event, mainly detected in cancer cells. Various structural events identified on chromosome 5q of VCaP resulted in gene fusions. Out of the 18 gene fusion candidates tested, 15 were confirmed on genomic level. In our set of gene fusions, only rarely we observe microhomology flanking the breakpoints. On RNA level, only five transcripts were detected and NDUFAF2-MAST4 was the only resulting in an in-frame fusion transcript. Our data indicate that although a marker of genomic instability, chromothripsis might lead to only a limited number of functionally relevant fusion genes.


Asunto(s)
Cromosomas Humanos Par 5/genética , Fusión Génica , Reordenamiento Génico , Neoplasias de la Próstata/genética , Línea Celular Tumoral , Dosificación de Gen , Heterocigoto , Humanos , Masculino , Translocación Genética
12.
Gigascience ; 122022 12 28.
Artículo en Inglés | MEDLINE | ID: mdl-37395629

RESUMEN

BACKGROUND: Hands-on training, whether in bioinformatics or other domains, often requires significant technical resources and knowledge to set up and run. Instructors must have access to powerful compute infrastructure that can support resource-intensive jobs running efficiently. Often this is achieved using a private server where there is no contention for the queue. However, this places a significant prerequisite knowledge or labor barrier for instructors, who must spend time coordinating deployment and management of compute resources. Furthermore, with the increase of virtual and hybrid teaching, where learners are located in separate physical locations, it is difficult to track student progress as efficiently as during in-person courses. FINDINGS: Originally developed by Galaxy Europe and the Gallantries project, together with the Galaxy community, we have created Training Infrastructure-as-a-Service (TIaaS), aimed at providing user-friendly training infrastructure to the global training community. TIaaS provides dedicated training resources for Galaxy-based courses and events. Event organizers register their course, after which trainees are transparently placed in a private queue on the compute infrastructure, which ensures jobs complete quickly, even when the main queue is experiencing high wait times. A built-in dashboard allows instructors to monitor student progress. CONCLUSIONS: TIaaS provides a significant improvement for instructors and learners, as well as infrastructure administrators. The instructor dashboard makes remote events not only possible but also easy. Students experience continuity of learning, as all training happens on Galaxy, which they can continue to use after the event. In the past 60 months, 504 training events with over 24,000 learners have used this infrastructure for Galaxy training.


Asunto(s)
Aprendizaje , Programas Informáticos , Humanos , Europa (Continente) , Biología Computacional
13.
Sci Data ; 9(1): 169, 2022 04 13.
Artículo en Inglés | MEDLINE | ID: mdl-35418585

RESUMEN

The genomes of thousands of individuals are profiled within Dutch healthcare and research each year. However, this valuable genomic data, associated clinical data and consent are captured in different ways and stored across many systems and organizations. This makes it difficult to discover rare disease patients, reuse data for personalized medicine and establish research cohorts based on specific parameters. FAIR Genomes aims to enable NGS data reuse by developing metadata standards for the data descriptions needed to FAIRify genomic data while also addressing ELSI issues. We developed a semantic schema of essential data elements harmonized with international FAIR initiatives. The FAIR Genomes schema v1.1 contains 110 elements in 9 modules. It reuses common ontologies such as NCIT, DUO and EDAM, only introducing new terms when necessary. The schema is represented by a YAML file that can be transformed into templates for data entry software (EDC) and programmatic interfaces (JSON, RDF) to ease genomic data sharing in research and healthcare. The schema, documentation and MOLGENIS reference implementation are available at https://fairgenomes.org .


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Metadatos , Atención a la Salud , Genómica , Humanos , Programas Informáticos
14.
PLoS One ; 17(4): e0267140, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35436301

RESUMEN

BACKGROUND: The ability to accurately distinguish bacterial from viral infection would help clinicians better target antimicrobial therapy during suspected lower respiratory tract infections (LRTI). Although technological developments make it feasible to rapidly generate patient-specific microbiota profiles, evidence is required to show the clinical value of using microbiota data for infection diagnosis. In this study, we investigated whether adding nasal cavity microbiota profiles to readily available clinical information could improve machine learning classifiers to distinguish bacterial from viral infection in patients with LRTI. RESULTS: Various multi-parametric Random Forests classifiers were evaluated on the clinical and microbiota data of 293 LRTI patients for their prediction accuracies to differentiate bacterial from viral infection. The most predictive variable was C-reactive protein (CRP). We observed a marginal prediction improvement when 7 most prevalent nasal microbiota genera were added to the CRP model. In contrast, adding three clinical variables, absolute neutrophil count, consolidation on X-ray, and age group to the CRP model significantly improved the prediction. The best model correctly predicted 85% of the 'bacterial' patients and 82% of the 'viral' patients using 13 clinical and 3 nasal cavity microbiota genera (Staphylococcus, Moraxella, and Streptococcus). CONCLUSIONS: We developed high-accuracy multi-parametric machine learning classifiers to differentiate bacterial from viral infections in LRTI patients of various ages. We demonstrated the predictive value of four easy-to-collect clinical variables which facilitate personalized and accurate clinical decision-making. We observed that nasal cavity microbiota correlate with the clinical variables and thus may not add significant value to diagnostic algorithms that aim to differentiate bacterial from viral infections.


Asunto(s)
Infecciones Bacterianas , Microbiota , Infecciones del Sistema Respiratorio , Virosis , Infecciones Bacterianas/tratamiento farmacológico , Proteína C-Reactiva/metabolismo , Humanos , Nariz/microbiología , Infecciones del Sistema Respiratorio/tratamiento farmacológico , Virosis/diagnóstico
15.
F1000Res ; 10: 103, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34484688

RESUMEN

The Earth Microbiome Project (EMP) aided in understanding the role of microbial communities and the influence of collective genetic material (the 'microbiome') and microbial diversity patterns across the habitats of our planet. With the evolution of new sequencing technologies, researchers can now investigate the microbiome and map its influence on the environment and human health. Advances in bioinformatics methods for next-generation sequencing (NGS) data analysis have helped researchers to gain an in-depth knowledge about the taxonomic and genetic composition of microbial communities. Metagenomic-based methods have been the most commonly used approaches for microbiome analysis; however, it primarily extracts information about taxonomic composition and genetic potential of the microbiome under study, lacking quantification of the gene products (RNA and proteins). On the other hand, metatranscriptomics, the study of a microbial community's RNA expression, can reveal the dynamic gene expression of individual microbial populations and the community as a whole, ultimately providing information about the active pathways in the microbiome.  In order to address the analysis of NGS data, the ASaiM analysis framework was previously developed and made available via the Galaxy platform. Although developed for both metagenomics and metatranscriptomics, the original publication demonstrated the use of ASaiM only for metagenomics, while thorough testing for metatranscriptomics data was lacking.  In the current study, we have focused on validating and optimizing the tools within ASaiM for metatranscriptomics data. As a result, we deliver a robust workflow that will enable researchers to understand dynamic functional response of the microbiome in a wide variety of metatranscriptomics studies. This improved and optimized ASaiM-metatranscriptomics (ASaiM-MT) workflow is publicly available via the ASaiM framework, documented and supported with training material so that users can interrogate and characterize metatranscriptomic data, as part of larger meta-omic studies of microbiomes.


Asunto(s)
Metagenómica , Microbiota , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Metagenoma , Microbiota/genética , Flujo de Trabajo
16.
Gigascience ; 9(6)2020 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-32530465

RESUMEN

BACKGROUND: Circos is a popular, highly flexible software package for the circular visualization of complex datasets. While especially popular in the field of genomic analysis, Circos enables interactive graphing of any analytical data, including alternative scientific domain data and non-scientific data. This high degree of flexibility also comes with a high degree of complexity, which may present an obstacle for researchers not trained in programming or the UNIX command line. The Galaxy platform provides a user-friendly browser-based graphical interface incorporating a broad range of "wrapped" command line tools to facilitate accessibility. FINDINGS: We have developed a Galaxy wrapper for Circos, thus combining the power of Circos with the accessibility and ease of use of the Galaxy platform. The combination substantially simplifies the specification and configuration of Circos plots for end users while retaining the power to produce publication-quality visualizations of complex multidimensional datasets. CONCLUSIONS: Galactic Circos enables the creation of publication-ready Circos plots using only a web browser, via the Galaxy platform. Users may download the full set of Circos configuration files of their plots for further manual customization. This version of Circos is available as an open-source installable application from the Galaxy ToolShed, with its use clarified in a training manual hosted by the Galaxy Training Network.


Asunto(s)
Biología Computacional/métodos , Genómica/métodos , Programas Informáticos , Biología Computacional/normas , Genómica/normas , Flujo de Trabajo
17.
Gigascience ; 9(10)2020 10 17.
Artículo en Inglés | MEDLINE | ID: mdl-33068114

RESUMEN

BACKGROUND: Long-read sequencing can be applied to generate very long contigs and even completely assembled genomes at relatively low cost and with minimal sample preparation. As a result, long-read sequencing platforms are becoming more popular. In this respect, the Oxford Nanopore Technologies-based long-read sequencing "nanopore" platform is becoming a widely used tool with a broad range of applications and end-users. However, the need to explore and manipulate the complex data generated by long-read sequencing platforms necessitates accompanying specialized bioinformatics platforms and tools to process the long-read data correctly. Importantly, such tools should additionally help democratize bioinformatics analysis by enabling easy access and ease-of-use solutions for researchers. RESULTS: The Galaxy platform provides a user-friendly interface to computational command line-based tools, handles the software dependencies, and provides refined workflows. The users do not have to possess programming experience or extended computer skills. The interface enables researchers to perform powerful bioinformatics analysis, including the assembly and analysis of short- or long-read sequence data. The newly developed "NanoGalaxy" is a Galaxy-based toolkit for analysing long-read sequencing data, which is suitable for diverse applications, including de novo genome assembly from genomic, metagenomic, and plasmid sequence reads. CONCLUSIONS: A range of best-practice tools and workflows for long-read sequence genome assembly has been integrated into a NanoGalaxy platform to facilitate easy access and use of bioinformatics tools for researchers. NanoGalaxy is freely available at the European Galaxy server https://nanopore.usegalaxy.eu with supporting self-learning training material available at https://training.galaxyproject.org.


Asunto(s)
Secuenciación de Nanoporos , Nanoporos , Análisis de Datos , Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de Secuencia de ADN , Programas Informáticos
18.
Genes (Basel) ; 11(9)2020 09 21.
Artículo en Inglés | MEDLINE | ID: mdl-32967250

RESUMEN

Illumina and nanopore sequencing technologies are powerful tools that can be used to determine the bacterial composition of complex microbial communities. In this study, we compared nasal microbiota results at genus level using both Illumina and nanopore 16S rRNA gene sequencing. We also monitored the progression of nanopore sequencing in the accurate identification of species, using pure, single species cultures, and evaluated the performance of the nanopore EPI2ME 16S data analysis pipeline. Fifty-nine nasal swabs were sequenced using Illumina MiSeq and Oxford Nanopore 16S rRNA gene sequencing technologies. In addition, five pure cultures of relevant bacterial species were sequenced with the nanopore sequencing technology. The Illumina MiSeq sequence data were processed using bioinformatics modules present in the Mothur software package. Albacore and Guppy base calling, a workflow in nanopore EPI2ME (Oxford Nanopore Technologies-ONT, Oxford, UK) and an in-house developed bioinformatics script were used to analyze the nanopore data. At genus level, similar bacterial diversity profiles were found, and five main and established genera were identified by both platforms. However, probably due to mismatching of the nanopore sequence primers, the nanopore sequencing platform identified Corynebacterium in much lower abundance compared to Illumina sequencing. Further, when using default settings in the EPI2ME workflow, almost all sequence reads that seem to belong to the bacterial genus Dolosigranulum and a considerable part to the genus Haemophilus were only identified at family level. Nanopore sequencing of single species cultures demonstrated at least 88% accurate identification of the species at genus and species level for 4/5 strains tested, including improvements in accurate sequence read identification when the basecaller Guppy and Albacore, and when flowcell versions R9.4 (Oxford Nanopore Technologies-ONT, Oxford, UK) and R9.2 (Oxford Nanopore Technologies-ONT, Oxford, UK) were compared. In conclusion, the current study shows that the nanopore sequencing platform is comparable with the Illumina platform in detection bacterial genera of the nasal microbiota, but the nanopore platform does have problems in detecting bacteria within the genus Corynebacterium. Although advances are being made, thorough validation of the nanopore platform is still recommendable.


Asunto(s)
Genes de ARNr/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Microbiota/genética , Secuenciación de Nanoporos/métodos , Cavidad Nasal/microbiología , ARN Ribosómico 16S/genética , Adolescente , Adulto , Anciano , Niño , Preescolar , Biología Computacional/métodos , Cartilla de ADN/genética , ADN Bacteriano/genética , Femenino , Humanos , Lactante , Masculino , Persona de Mediana Edad , Nanoporos , Adulto Joven
19.
Gigascience ; 8(2)2019 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-30597007

RESUMEN

BACKGROUND: The determination of microbial communities using the mothur tool suite (https://www.mothur.org) is well established. However, mothur requires bioinformatics-based proficiency in order to perform calculations via the command-line. Galaxy is a project dedicated to providing a user-friendly web interface for such command-line tools (https://galaxyproject.org/). RESULTS: We have integrated the full set of 125+ mothur tools into Galaxy as the Galaxy mothur Toolset (GmT) and provided a set of workflows to perform end-to-end 16S rRNA gene analyses and integrate with third-party visualization and reporting tools. We demonstrate the utility of GmT by analyzing the mothur MiSeq standard operating procedure (SOP) dataset (https://www.mothur.org/wiki/MiSeq_SOP). CONCLUSIONS: GmT is available from the Galaxy Tool Shed, and a workflow definition file and full Galaxy training manual for the mothur SOP have been created. A Docker image with a fully configured GmT Galaxy is also available.


Asunto(s)
Biología Computacional/métodos , Microbiota/genética , ARN Ribosómico 16S , Análisis de Secuencia de ADN/métodos , Programas Informáticos
20.
Gigascience ; 7(6)2018 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-29790941

RESUMEN

Background: New generations of sequencing platforms coupled to numerous bioinformatics tools have led to rapid technological progress in metagenomics and metatranscriptomics to investigate complex microorganism communities. Nevertheless, a combination of different bioinformatic tools remains necessary to draw conclusions out of microbiota studies. Modular and user-friendly tools would greatly improve such studies. Findings: We therefore developed ASaiM, an Open-Source Galaxy-based framework dedicated to microbiota data analyses. ASaiM provides an extensive collection of tools to assemble, extract, explore, and visualize microbiota information from raw metataxonomic, metagenomic, or metatranscriptomic sequences. To guide the analyses, several customizable workflows are included and are supported by tutorials and Galaxy interactive tours, which guide users through the analyses step by step. ASaiM is implemented as a Galaxy Docker flavour. It is scalable to thousands of datasets but also can be used on a normal PC. The associated source code is available under Apache 2 license at https://github.com/ASaiM/framework and documentation can be found online (http://asaim.readthedocs.io). Conclusions: Based on the Galaxy framework, ASaiM offers a sophisticated environment with a variety of tools, workflows, documentation, and training to scientists working on complex microorganism communities. It makes analysis and exploration analyses of microbiota data easy, quick, transparent, reproducible, and shareable.


Asunto(s)
Microbiota , Programas Informáticos , Estadística como Asunto , Secuencia de Bases , Metagenómica
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA