Búsqueda | Portal Regional de la BVS

1.

The Planemo toolkit for developing, deploying, and executing scientific data analyses in Galaxy and beyond.

Bray, Simon; Chilton, John; Bernt, Matthias; Soranzo, Nicola; van den Beek, Marius; Batut, Bérénice; Rasche, Helena; Cech, Martin; Cock, Peter J A; Grüning, Björn; Nekrutenko, Anton.

Genome Res ; 33(2): 261-268, 2023 02.

Artículo en Inglés | MEDLINE | ID: mdl-36828587

RESUMEN

There are thousands of well-maintained high-quality open-source software utilities for all aspects of scientific data analysis. For more than a decade, the Galaxy Project has been providing computational infrastructure and a unified user interface for these tools to make them accessible to a wide range of researchers. To streamline the process of integrating tools and constructing workflows as much as possible, we have developed Planemo, a software development kit for tool and workflow developers and Galaxy power users. Here we outline Planemo's implementation and describe its broad range of functionality for designing, testing, and executing Galaxy tools, workflows, and training material. In addition, we discuss the philosophy underlying Galaxy tool and workflow development, and how Planemo encourages the use of development best practices, such as test-driven development, by its users, including those who are not professional software developers.

Asunto(s)

Biología Computacional , Programas Informáticos , Flujo de Trabajo , Análisis de Datos

2.

Galaxy Training: A powerful framework for teaching!

Hiltemann, Saskia; Rasche, Helena; Gladman, Simon; Hotz, Hans-Rudolf; Larivière, Delphine; Blankenberg, Daniel; Jagtap, Pratik D; Wollmann, Thomas; Bretaudeau, Anthony; Goué, Nadia; Griffin, Timothy J; Royaux, Coline; Le Bras, Yvan; Mehta, Subina; Syme, Anna; Coppens, Frederik; Droesbeke, Bert; Soranzo, Nicola; Bacon, Wendi; Psomopoulos, Fotis; Gallardo-Alba, Cristóbal; Davis, John; Föll, Melanie Christine; Fahrner, Matthias; Doyle, Maria A; Serrano-Solano, Beatriz; Fouilloux, Anne Claire; van Heusden, Peter; Maier, Wolfgang; Clements, Dave; Heyl, Florian; Grüning, Björn; Batut, Bérénice.

PLoS Comput Biol ; 19(1): e1010752, 2023 01.

Artículo en Inglés | MEDLINE | ID: mdl-36622853

RESUMEN

There is an ongoing explosion of scientific datasets being generated, brought on by recent technological advances in many areas of the natural sciences. As a result, the life sciences have become increasingly computational in nature, and bioinformatics has taken on a central role in research studies. However, basic computational skills, data analysis, and stewardship are still rarely taught in life science educational programs, resulting in a skills gap in many of the researchers tasked with analysing these big datasets. In order to address this skills gap and empower researchers to perform their own data analyses, the Galaxy Training Network (GTN) has previously developed the Galaxy Training Platform (https://training.galaxyproject.org), an open access, community-driven framework for the collection of FAIR (Findable, Accessible, Interoperable, Reusable) training materials for data analysis utilizing the user-friendly Galaxy framework as its primary data analysis platform. Since its inception, this training platform has thrived, with the number of tutorials and contributors growing rapidly, and the range of topics extending beyond life sciences to include topics such as climatology, cheminformatics, and machine learning. While initially aimed at supporting researchers directly, the GTN framework has proven to be an invaluable resource for educators as well. We have focused our efforts in recent years on adding increased support for this growing community of instructors. New features have been added to facilitate the use of the materials in a classroom setting, simplifying the contribution flow for new materials, and have added a set of train-the-trainer lessons. Here, we present the latest developments in the GTN project, aimed at facilitating the use of the Galaxy Training materials by educators, and its usage in different learning environments.

Asunto(s)

Biología Computacional , Programas Informáticos , Humanos , Biología Computacional/métodos , Análisis de Datos , Investigadores

3.

LotuS2: an ultrafast and highly accurate tool for amplicon sequencing analysis.

Özkurt, Ezgi; Fritscher, Joachim; Soranzo, Nicola; Ng, Duncan Y K; Davey, Robert P; Bahram, Mohammad; Hildebrand, Falk.

Microbiome ; 10(1): 176, 2022 10 19.

Artículo en Inglés | MEDLINE | ID: mdl-36258257

RESUMEN

BACKGROUND: Amplicon sequencing is an established and cost-efficient method for profiling microbiomes. However, many available tools to process this data require both bioinformatics skills and high computational power to process big datasets. Furthermore, there are only few tools that allow for long read amplicon data analysis. To bridge this gap, we developed the LotuS2 (less OTU scripts 2) pipeline, enabling user-friendly, resource friendly, and versatile analysis of raw amplicon sequences. RESULTS: In LotuS2, six different sequence clustering algorithms as well as extensive pre- and post-processing options allow for flexible data analysis by both experts, where parameters can be fully adjusted, and novices, where defaults are provided for different scenarios. We benchmarked three independent gut and soil datasets, where LotuS2 was on average 29 times faster compared to other pipelines, yet could better reproduce the alpha- and beta-diversity of technical replicate samples. Further benchmarking a mock community with known taxon composition showed that, compared to the other pipelines, LotuS2 recovered a higher fraction of correctly identified taxa and a higher fraction of reads assigned to true taxa (48% and 57% at species; 83% and 98% at genus level, respectively). At ASV/OTU level, precision and F-score were highest for LotuS2, as was the fraction of correctly reported 16S sequences. CONCLUSION: LotuS2 is a lightweight and user-friendly pipeline that is fast, precise, and streamlined, using extensive pre- and post-ASV/OTU clustering steps to further increase data quality. High data usage rates and reliability enable high-throughput microbiome analysis in minutes. AVAILABILITY: LotuS2 is available from GitHub, conda, or via a Galaxy web interface, documented at http://lotus2.earlham.ac.uk/ . Video Abstract.

Asunto(s)

Programas Informáticos , Suelo , ARN Ribosómico 16S , Reproducibilidad de los Resultados , Análisis de Secuencia , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos

4.

Expanding the Galaxy's reference data.

VijayKrishna, Nagampalli; Joshi, Jayadev; Coraor, Nate; Hillman-Jackson, Jennifer; Bouvier, Dave; van den Beek, Marius; Eguinoa, Ignacio; Coppens, Frederik; Davis, John; Stolarczyk, Michal; Sheffield, Nathan C; Gladman, Simon; Cuccuru, Gianmauro; Grüning, Björn; Soranzo, Nicola; Rasche, Helena; Langhorst, Bradley W; Bernt, Matthias; Fornika, Dan; de Lima Morais, David Anderson; Barrette, Michel; van Heusden, Peter; Petrillo, Mauro; Puertas-Gallardo, Antonio; Patak, Alex; Hotz, Hans-Rudolf; Blankenberg, Daniel.

Bioinform Adv ; 2(1): vbac030, 2022.

Artículo en Inglés | MEDLINE | ID: mdl-35669346

RESUMEN

Summary: Properly and effectively managing reference datasets is an important task for many bioinformatics analyses. Refgenie is a reference asset management system that allows users to easily organize, retrieve and share such datasets. Here, we describe the integration of refgenie into the Galaxy platform. Server administrators are able to configure Galaxy to make use of reference datasets made available on a refgenie instance. In addition, a Galaxy Data Manager tool has been developed to provide a graphical interface to refgenie's remote reference retrieval functionality. A large collection of reference datasets has also been made available using the CVMFS (CernVM File System) repository from GalaxyProject.org, with mirrors across the USA, Canada, Europe and Australia, enabling easy use outside of Galaxy. Availability and implementation: The ability of Galaxy to use refgenie assets was added to the core Galaxy framework in version 22.01, which is available from https://github.com/galaxyproject/galaxy under the Academic Free License version 3.0. The refgenie Data Manager tool can be installed via the Galaxy ToolShed, with source code managed at https://github.com/BlankenbergLab/galaxy-tools-blankenberg/tree/main/data_managers/data_manager_refgenie_pull and released using an MIT license. Access to existing data is also available through CVMFS, with instructions at https://galaxyproject.org/admin/reference-data-repo/. No new data were generated or analyzed in support of this research.

5.

RNA-Seq Data Analysis in Galaxy.

Batut, Bérénice; van den Beek, Marius; Doyle, Maria A; Soranzo, Nicola.

Methods Mol Biol ; 2284: 367-392, 2021.

Artículo en Inglés | MEDLINE | ID: mdl-33835453

RESUMEN

A complete RNA-Seq analysis involves the use of several different tools, with substantial software and computational requirements. The Galaxy platform simplifies the execution of such bioinformatics analyses by embedding the needed tools in its web interface, while also providing reproducibility. Here, we describe how to perform a reference-based RNA-Seq analysis using Galaxy, from data upload to visualization and functional enrichment analysis of differentially expressed genes.

Asunto(s)

RNA-Seq/métodos , Programas Informáticos , Animales , Biología Computacional/métodos , Análisis de Datos , Conjuntos de Datos como Asunto/estadística & datos numéricos , Perfilación de la Expresión Génica/métodos , Perfilación de la Expresión Génica/estadística & datos numéricos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/estadística & datos numéricos , Humanos , Reproducibilidad de los Resultados , Análisis de Secuencia de ARN/métodos , Análisis de Secuencia de ARN/estadística & datos numéricos , Secuenciación del Exoma/métodos , Secuenciación del Exoma/estadística & datos numéricos

6.

A single-cell RNA-sequencing training and analysis suite using the Galaxy framework.

Tekman, Mehmet; Batut, Bérénice; Ostrovsky, Alexander; Antoniewski, Christophe; Clements, Dave; Ramirez, Fidel; Etherington, Graham J; Hotz, Hans-Rudolf; Scholtalbers, Jelle; Manning, Jonathan R; Bellenger, Lea; Doyle, Maria A; Heydarian, Mohammad; Huang, Ni; Soranzo, Nicola; Moreno, Pablo; Mautner, Stefan; Papatheodorou, Irene; Nekrutenko, Anton; Taylor, James; Blankenberg, Daniel; Backofen, Rolf; Grüning, Björn.

Gigascience ; 9(10)2020 10 20.

Artículo en Inglés | MEDLINE | ID: mdl-33079170

RESUMEN

BACKGROUND: The vast ecosystem of single-cell RNA-sequencing tools has until recently been plagued by an excess of diverging analysis strategies, inconsistent file formats, and compatibility issues between different software suites. The uptake of 10x Genomics datasets has begun to calm this diversity, and the bioinformatics community leans once more towards the large computing requirements and the statistically driven methods needed to process and understand these ever-growing datasets. RESULTS: Here we outline several Galaxy workflows and learning resources for single-cell RNA-sequencing, with the aim of providing a comprehensive analysis environment paired with a thorough user learning experience that bridges the knowledge gap between the computational methods and the underlying cell biology. The Galaxy reproducible bioinformatics framework provides tools, workflows, and trainings that not only enable users to perform 1-click 10x preprocessing but also empower them to demultiplex raw sequencing from custom tagged and full-length sequencing protocols. The downstream analysis supports a range of high-quality interoperable suites separated into common stages of analysis: inspection, filtering, normalization, confounder removal, and clustering. The teaching resources cover concepts from computer science to cell biology. Access to all resources is provided at the singlecell.usegalaxy.eu portal. CONCLUSIONS: The reproducible and training-oriented Galaxy framework provides a sustainable high-performance computing environment for users to run flexible analyses on both 10x and alternative platforms. The tutorials from the Galaxy Training Network along with the frequent training workshops hosted by the Galaxy community provide a means for users to learn, publish, and teach single-cell RNA-sequencing analysis.

Asunto(s)

Ecosistema , Programas Informáticos , Biología Computacional , ARN , Análisis de Secuencia de ARN

7.

A Galaxy-based training resource for single-cell RNA-sequencing quality control and analyses.

Etherington, Graham J; Soranzo, Nicola; Mohammed, Suhaib; Haerty, Wilfried; Davey, Robert P; Palma, Federica Di.

Gigascience ; 8(12)2019 12 01.

Artículo en Inglés | MEDLINE | ID: mdl-31825480

RESUMEN

BACKGROUND: It is not a trivial step to move from single-cell RNA-sequencing (scRNA-seq) data production to data analysis. There is a lack of intuitive training materials and easy-to-use analysis tools, and researchers can find it difficult to master the basics of scRNA-seq quality control and the later analysis. RESULTS: We have developed a range of practical scripts, together with their corresponding Galaxy wrappers, that make scRNA-seq training and quality control accessible to researchers previously daunted by the prospect of scRNA-seq analysis. We implement a "visualize-filter-visualize" paradigm through simple command line tools that use the Loom format to exchange data between the tools. The point-and-click nature of Galaxy makes it easy to assess, visualize, and filter scRNA-seq data from short-read sequencing data. CONCLUSION: We have developed a suite of scRNA-seq tools that can be used for both training and more in-depth analyses.

Asunto(s)

Biología Computacional/educación , Análisis de Secuencia de ARN/normas , Análisis de la Célula Individual/normas , Análisis de Datos , Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Programas Informáticos , Interfaz Usuario-Computador

8.

Aequatus: an open-source homology browser.

Thanki, Anil S; Soranzo, Nicola; Herrero, Javier; Haerty, Wilfried; Davey, Robert P.

Gigascience ; 7(11)2018 11 01.

Artículo en Inglés | MEDLINE | ID: mdl-30395211

RESUMEN

Background: Phylogenetic information inferred from the study of homologous genes helps us to understand the evolution of genes and gene families, including the identification of ancestral gene duplication events as well as regions under positive or purifying selection within lineages. Gene family and orthogroup characterization enables the identification of syntenic blocks, which can then be visualized with various tools. Unfortunately, currently available tools display only an overview of syntenic regions as a whole, limited to the gene level, and none provide further details about structural changes within genes, such as the conservation of ancestral exon boundaries amongst multiple genomes. Findings: We present Aequatus, an open-source web-based tool that provides an in-depth view of gene structure across gene families, with various options to render and filter visualizations. It relies on precalculated alignment and gene feature information typically held in, but not limited to, the Ensembl Compara and Core databases. We also offer Aequatus.js, a reusable JavaScript module that fulfills the visualization aspects of Aequatus, available within the Galaxy web platform as a visualization plug-in, which can be used to visualize gene trees generated by the GeneSeqToFamily workflow.

Asunto(s)

Biología Computacional/métodos , Genoma/genética , Genómica/métodos , Programas Informáticos , Almacenamiento y Recuperación de la Información/métodos , Internet , Filogenia , Proteínas/clasificación , Proteínas/genética , Reproducibilidad de los Resultados , Alineación de Secuencia/métodos

9.

Practical Computational Reproducibility in the Life Sciences.

Grüning, Björn; Chilton, John; Köster, Johannes; Dale, Ryan; Soranzo, Nicola; van den Beek, Marius; Goecks, Jeremy; Backofen, Rolf; Nekrutenko, Anton; Taylor, James.

Cell Syst ; 6(6): 631-635, 2018 06 27.

Artículo en Inglés | MEDLINE | ID: mdl-29953862

RESUMEN

Many areas of research suffer from poor reproducibility, particularly in computationally intensive domains where results rely on a series of complex methodological decisions that are not well captured by traditional publication approaches. Various guidelines have emerged for achieving reproducibility, but implementation of these practices remains difficult due to the challenge of assembling software tools plus associated libraries, connecting tools together into pipelines, and specifying parameters. Here, we discuss a suite of cutting-edge technologies that make computational reproducibility not just possible, but practical in both time and effort. This suite combines three well-tested components-a system for building highly portable packages of bioinformatics software, containerization and virtualization technologies for isolating reusable execution environments for these packages, and workflow systems that automatically orchestrate the composition of these packages for entire pipelines-to achieve an unprecedented level of computational reproducibility. We also provide a practical implementation and five recommendations to help set a typical researcher on the path to performing data analyses reproducibly.

Asunto(s)

Biología Computacional/métodos , Reproducibilidad de los Resultados , Disciplinas de las Ciencias Biológicas , Humanos , Investigadores , Programas Informáticos , Tecnología , Interfaz Usuario-Computador , Flujo de Trabajo

10.

Community-Driven Data Analysis Training for Biology.

Batut, Bérénice; Hiltemann, Saskia; Bagnacani, Andrea; Baker, Dannon; Bhardwaj, Vivek; Blank, Clemens; Bretaudeau, Anthony; Brillet-Guéguen, Loraine; Cech, Martin; Chilton, John; Clements, Dave; Doppelt-Azeroual, Olivia; Erxleben, Anika; Freeberg, Mallory Ann; Gladman, Simon; Hoogstrate, Youri; Hotz, Hans-Rudolf; Houwaart, Torsten; Jagtap, Pratik; Larivière, Delphine; Le Corguillé, Gildas; Manke, Thomas; Mareuil, Fabien; Ramírez, Fidel; Ryan, Devon; Sigloch, Florian Christoph; Soranzo, Nicola; Wolff, Joachim; Videm, Pavankumar; Wolfien, Markus; Wubuli, Aisanjiang; Yusuf, Dilmurat; Taylor, James; Backofen, Rolf; Nekrutenko, Anton; Grüning, Björn.

Cell Syst ; 6(6): 752-758.e1, 2018 06 27.

Artículo en Inglés | MEDLINE | ID: mdl-29953864

RESUMEN

The primary problem with the explosion of biomedical datasets is not the data, not computational resources, and not the required storage space, but the general lack of trained and skilled researchers to manipulate and analyze these data. Eliminating this problem requires development of comprehensive educational resources. Here we present a community-driven framework that enables modern, interactive teaching of data analytics in life sciences and facilitates the development of training materials. The key feature of our system is that it is not a static but a continuously improved collection of tutorials. By coupling tutorials with a web-based analysis framework, biomedical researchers can learn by performing computation themselves through a web browser without the need to install software or search for example datasets. Our ultimate goal is to expand the breadth of training materials to include fundamental statistical and data science topics and to precipitate a complete re-engineering of undergraduate and graduate curricula in life sciences. This project is accessible at https://training.galaxyproject.org.

Asunto(s)

Biología Computacional/educación , Biología Computacional/métodos , Investigadores/educación , Curriculum , Análisis de Datos , Educación a Distancia/métodos , Educación a Distancia/tendencias , Humanos , Programas Informáticos

11.

The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update.

Afgan, Enis; Baker, Dannon; Batut, Bérénice; van den Beek, Marius; Bouvier, Dave; Cech, Martin; Chilton, John; Clements, Dave; Coraor, Nate; Grüning, Björn A; Guerler, Aysam; Hillman-Jackson, Jennifer; Hiltemann, Saskia; Jalili, Vahid; Rasche, Helena; Soranzo, Nicola; Goecks, Jeremy; Taylor, James; Nekrutenko, Anton; Blankenberg, Daniel.

Nucleic Acids Res ; 46(W1): W537-W544, 2018 07 02.

Artículo en Inglés | MEDLINE | ID: mdl-29790989

RESUMEN

Galaxy (homepage: https://galaxyproject.org, main public server: https://usegalaxy.org) is a web-based scientific analysis platform used by tens of thousands of scientists across the world to analyze large biomedical datasets such as those found in genomics, proteomics, metabolomics and imaging. Started in 2005, Galaxy continues to focus on three key challenges of data-driven biomedical science: making analyses accessible to all researchers, ensuring analyses are completely reproducible, and making it simple to communicate analyses so that they can be reused and extended. During the last two years, the Galaxy team and the open-source community around Galaxy have made substantial improvements to Galaxy's core framework, user interface, tools, and training materials. Framework and user interface improvements now enable Galaxy to be used for analyzing tens of thousands of datasets, and >5500 tools are now available from the Galaxy ToolShed. The Galaxy community has led an effort to create numerous high-quality tutorials focused on common types of genomic analyses. The Galaxy developer and user communities continue to grow and be integral to Galaxy's development. The number of Galaxy public servers, developers contributing to the Galaxy framework and its tools, and users of the main Galaxy server have all increased substantially.

Asunto(s)

Genómica/estadística & datos numéricos , Metabolómica/estadística & datos numéricos , Imagen Molecular/estadística & datos numéricos , Proteómica/estadística & datos numéricos , Interfaz Usuario-Computador , Conjuntos de Datos como Asunto , Humanos , Difusión de la Información , Cooperación Internacional , Internet , Reproducibilidad de los Resultados

12.

GeneSeqToFamily: a Galaxy workflow to find gene families based on the Ensembl Compara GeneTrees pipeline.

Thanki, Anil S; Soranzo, Nicola; Haerty, Wilfried; Davey, Robert P.

Gigascience ; 7(3): 1-10, 2018 03 01.

Artículo en Inglés | MEDLINE | ID: mdl-29425291

RESUMEN

Background: Gene duplication is a major factor contributing to evolutionary novelty, and the contraction or expansion of gene families has often been associated with morphological, physiological, and environmental adaptations. The study of homologous genes helps us to understand the evolution of gene families. It plays a vital role in finding ancestral gene duplication events as well as identifying genes that have diverged from a common ancestor under positive selection. There are various tools available, such as MSOAR, OrthoMCL, and HomoloGene, to identify gene families and visualize syntenic information between species, providing an overview of syntenic regions evolution at the family level. Unfortunately, none of them provide information about structural changes within genes, such as the conservation of ancestral exon boundaries among multiple genomes. The Ensembl GeneTrees computational pipeline generates gene trees based on coding sequences, provides details about exon conservation, and is used in the Ensembl Compara project to discover gene families. Findings: A certain amount of expertise is required to configure and run the Ensembl Compara GeneTrees pipeline via command line. Therefore, we converted this pipeline into a Galaxy workflow, called GeneSeqToFamily, and provided additional functionality. This workflow uses existing tools from the Galaxy ToolShed, as well as providing additional wrappers and tools that are required to run the workflow. Conclusions: GeneSeqToFamily represents the Ensembl GeneTrees pipeline as a set of interconnected Galaxy tools, so they can be run interactively within the Galaxy's user-friendly workflow environment while still providing the flexibility to tailor the analysis by changing configurations and tools if necessary. Additional tools allow users to subsequently visualize the gene families produced by the workflow, using the Aequatus.js interactive tool, which has been developed as part of the Aequatus software project.

Asunto(s)

Biología Computacional , Genoma/genética , Filogenia , Programas Informáticos , Algoritmos , Interfaz Usuario-Computador , Flujo de Trabajo

13.

ReGaTE: Registration of Galaxy Tools in Elixir.

Doppelt-Azeroual, Olivia; Mareuil, Fabien; Deveaud, Eric; Kalas, Matús; Soranzo, Nicola; van den Beek, Marius; Grüning, Björn; Ison, Jon; Ménager, Hervé.

Gigascience ; 6(6): 1-4, 2017 06 01.

Artículo en Inglés | MEDLINE | ID: mdl-28402416

RESUMEN

Background: Bioinformaticians routinely use multiple software tools and data sources in their day-to-day work and have been guided in their choices by a number of cataloguing initiatives. The ELIXIR Tools and Data Services Registry (bio.tools) aims to provide a central information point, independent of any specific scientific scope within bioinformatics or technological implementation. Meanwhile, efforts to integrate bioinformatics software in workbench and workflow environments have accelerated to enable the design, automation, and reproducibility of bioinformatics experiments. One such popular environment is the Galaxy framework, with currently more than 80 publicly available Galaxy servers around the world. In the context of a generic registry for bioinformatics software, such as bio.tools, Galaxy instances constitute a major source of valuable content. Yet there has been, to date, no convenient mechanism to register such services en masse. We present ReGaTE (Registration of Galaxy Tools in Elixir), a software utility that automates the process of registering the services available in a Galaxy instance. This utility uses the BioBlend application program interface to extract service metadata from a Galaxy server, enhance the metadata with the scientific information required by bio.tools, and push it to the registry. ReGaTE provides a fast and convenient way to publish Galaxy services in bio.tools. By doing so, service providers may increase the visibility of their services while enriching the software discovery function that bio.tools provides for its users. The source code of ReGaTE is freely available on Github at https://github.com/C3BI-pasteur-fr/ReGaTE .

Asunto(s)

Biología Computacional/métodos , Automatización , Sistemas de Computación , Internet , Reproducibilidad de los Resultados , Programas Informáticos , Interfaz Usuario-Computador , Flujo de Trabajo

14.

The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update.

Afgan, Enis; Baker, Dannon; van den Beek, Marius; Blankenberg, Daniel; Bouvier, Dave; Cech, Martin; Chilton, John; Clements, Dave; Coraor, Nate; Eberhard, Carl; Grüning, Björn; Guerler, Aysam; Hillman-Jackson, Jennifer; Von Kuster, Greg; Rasche, Eric; Soranzo, Nicola; Turaga, Nitesh; Taylor, James; Nekrutenko, Anton; Goecks, Jeremy.

Nucleic Acids Res ; 44(W1): W3-W10, 2016 07 08.

Artículo en Inglés | MEDLINE | ID: mdl-27137889

RESUMEN

High-throughput data production technologies, particularly 'next-generation' DNA sequencing, have ushered in widespread and disruptive changes to biomedical research. Making sense of the large datasets produced by these technologies requires sophisticated statistical and computational methods, as well as substantial computational power. This has led to an acute crisis in life sciences, as researchers without informatics training attempt to perform computation-dependent analyses. Since 2005, the Galaxy project has worked to address this problem by providing a framework that makes advanced computational tools usable by non experts. Galaxy seeks to make data-intensive research more accessible, transparent and reproducible by providing a Web-based environment in which users can perform computational analyses and have all of the details automatically tracked for later inspection, publication, or reuse. In this report we highlight recently added features enabling biomedical analyses on a large scale.

Asunto(s)

Biología Computacional/estadística & datos numéricos , Conjuntos de Datos como Asunto/estadística & datos numéricos , Interfaz Usuario-Computador , Investigación Biomédica , Biología Computacional/métodos , Bases de Datos Genéticas , Humanos , Internet , Reproducibilidad de los Resultados

15.

NCBI BLAST+ integrated into Galaxy.

Cock, Peter J A; Chilton, John M; Grüning, Björn; Johnson, James E; Soranzo, Nicola.

Gigascience ; 4: 39, 2015.

Artículo en Inglés | MEDLINE | ID: mdl-26336600

RESUMEN

BACKGROUND: The NCBI BLAST suite has become ubiquitous in modern molecular biology and is used for small tasks such as checking capillary sequencing results of single PCR products, genome annotation or even larger scale pan-genome analyses. For early adopters of the Galaxy web-based biomedical data analysis platform, integrating BLAST into Galaxy was a natural step for sequence comparison workflows. FINDINGS: The command line NCBI BLAST+ tool suite was wrapped for use within Galaxy. Appropriate datatypes were defined as needed. The integration of the BLAST+ tool suite into Galaxy has the goal of making common BLAST tasks easy and advanced tasks possible. CONCLUSIONS: This project is an informal international collaborative effort, and is deployed and used on Galaxy servers worldwide. Several examples of applications are described here.

Asunto(s)

Biología Computacional , Internet , National Institutes of Health (U.S.) , Estados Unidos

16.

Alterations of a Cellular Cholesterol Metabolism Network Are a Molecular Feature of Obesity-Related Type 2 Diabetes and Cardiovascular Disease.

Ding, Jingzhong; Reynolds, Lindsay M; Zeller, Tanja; Müller, Christian; Lohman, Kurt; Nicklas, Barbara J; Kritchevsky, Stephen B; Huang, Zhiqing; de la Fuente, Alberto; Soranzo, Nicola; Settlage, Robert E; Chuang, Chia-Chi; Howard, Timothy; Xu, Ning; Goodarzi, Mark O; Chen, Y-D Ida; Rotter, Jerome I; Siscovick, David S; Parks, John S; Murphy, Susan; Jacobs, David R; Post, Wendy; Tracy, Russell P; Wild, Philipp S; Blankenberg, Stefan; Hoeschele, Ina; Herrington, David; McCall, Charles E; Liu, Yongmei.

Diabetes ; 64(10): 3464-74, 2015 Oct.

Artículo en Inglés | MEDLINE | ID: mdl-26153245

RESUMEN

Obesity is linked to type 2 diabetes (T2D) and cardiovascular diseases; however, the underlying molecular mechanisms remain unclear. We aimed to identify obesity-associated molecular features that may contribute to obesity-related diseases. Using circulating monocytes from 1,264 Multi-Ethnic Study of Atherosclerosis (MESA) participants, we quantified the transcriptome and epigenome. We discovered that alterations in a network of coexpressed cholesterol metabolism genes are a signature feature of obesity and inflammatory stress. This network included 11 BMI-associated genes related to sterol uptake (↑LDLR, ↓MYLIP), synthesis (↑SCD, FADS1, HMGCS1, FDFT1, SQLE, CYP51A1, SC4MOL), and efflux (↓ABCA1, ABCG1), producing a molecular profile expected to increase intracellular cholesterol. Importantly, these alterations were associated with T2D and coronary artery calcium (CAC), independent from cardiometabolic factors, including serum lipid profiles. This network mediated the associations between obesity and T2D/CAC. Several genes in the network harbored C-phosphorus-G dinucleotides (e.g., ABCG1/cg06500161), which overlapped Encyclopedia of DNA Elements (ENCODE)-annotated regulatory regions and had methylation profiles that mediated the associations between BMI/inflammation and expression of their cognate genes. Taken together with several lines of previous experimental evidence, these data suggest that alterations of the cholesterol metabolism gene network represent a molecular link between obesity/inflammation and T2D/CAC.

Asunto(s)

Enfermedades Cardiovasculares/etiología , Colesterol/metabolismo , Diabetes Mellitus Tipo 2/etiología , Obesidad/complicaciones , Anciano , Anciano de 80 o más Años , delta-5 Desaturasa de Ácido Graso , Femenino , Dosificación de Gen , Regulación de la Expresión Génica , Humanos , Masculino , Transcriptoma , Pérdida de Peso/fisiología

17.

Transcriptomic profiles of aging in purified human immune cells.

Reynolds, Lindsay M; Ding, Jingzhong; Taylor, Jackson R; Lohman, Kurt; Soranzo, Nicola; de la Fuente, Alberto; Liu, Tie Fu; Johnson, Craig; Barr, R Graham; Register, Thomas C; Donohue, Kathleen M; Talor, Monica V; Cihakova, Daniela; Gu, Charles; Divers, Jasmin; Siscovick, David; Burke, Gregory; Post, Wendy; Shea, Steven; Jacobs, David R; Hoeschele, Ina; McCall, Charles E; Kritchevsky, Stephen B; Herrington, David; Tracy, Russell P; Liu, Yongmei.

BMC Genomics ; 16: 333, 2015 Apr 22.

Artículo en Inglés | MEDLINE | ID: mdl-25898983

RESUMEN

BACKGROUND: Transcriptomic studies hold great potential towards understanding the human aging process. Previous transcriptomic studies have identified many genes with age-associated expression levels; however, small samples sizes and mixed cell types often make these results difficult to interpret. RESULTS: Using transcriptomic profiles in CD14+ monocytes from 1,264 participants of the Multi-Ethnic Study of Atherosclerosis (aged 55-94 years), we identified 2,704 genes differentially expressed with chronological age (false discovery rate, FDR ≤ 0.001). We further identified six networks of co-expressed genes that included prominent genes from three pathways: protein synthesis (particularly mitochondrial ribosomal genes), oxidative phosphorylation, and autophagy, with expression patterns suggesting these pathways decline with age. Expression of several chromatin remodeler and transcriptional modifier genes strongly correlated with expression of oxidative phosphorylation and ribosomal protein synthesis genes. 17% of genes with age-associated expression harbored CpG sites whose degree of methylation significantly mediated the relationship between age and gene expression (p < 0.05). Lastly, 15 genes with age-associated expression were also associated (FDR ≤ 0.01) with pulse pressure independent of chronological age. Comparing transcriptomic profiles of CD14+ monocytes to CD4+ T cells from a subset (n = 423) of the population, we identified 30 age-associated (FDR < 0.01) genes in common, while larger sets of differentially expressed genes were unique to either T cells (188 genes) or monocytes (383 genes). At the pathway level, a decline in ribosomal protein synthesis machinery gene expression with age was detectable in both cell types. CONCLUSIONS: An overall decline in expression of ribosomal protein synthesis genes with age was detected in CD14+ monocytes and CD4+ T cells, demonstrating that some patterns of aging are likely shared between different cell types. Our findings also support cell-specific effects of age on gene expression, illustrating the importance of using purified cell samples for future transcriptomic studies. Longitudinal work is required to establish the relationship between identified age-associated genes/pathways and aging-related diseases.

Asunto(s)

Envejecimiento/genética , Monocitos/metabolismo , Transcriptoma , Anciano , Anciano de 80 o más Años , Autofagia/genética , Islas de CpG/genética , Metilación de ADN/genética , Femenino , Humanos , Receptores de Lipopolisacáridos/metabolismo , Masculino , Persona de Mediana Edad , Monocitos/citología , Fosforilación Oxidativa , Biosíntesis de Proteínas/genética , Ribosomas/genética , Ribosomas/metabolismo , Linfocitos T/citología , Linfocitos T/metabolismo

18.

BioBlend.objects: metacomputing with Galaxy.

Leo, Simone; Pireddu, Luca; Cuccuru, Gianmauro; Lianas, Luca; Soranzo, Nicola; Afgan, Enis; Zanetti, Gianluigi.

Bioinformatics ; 30(19): 2816-7, 2014 Oct.

Artículo en Inglés | MEDLINE | ID: mdl-24928211

RESUMEN

SUMMARY: BioBlend.objects is a new component of the BioBlend package, adding an object-oriented interface for the Galaxy REST-based application programming interface. It improves support for metacomputing on Galaxy entities by providing higher-level functionality and allowing users to more easily create programs to explore, query and create Galaxy datasets and workflows. AVAILABILITY AND IMPLEMENTATION: BioBlend.objects is available online at https://github.com/afgane/bioblend. The new object-oriented API is implemented by the galaxy/objects subpackage.

Asunto(s)

Biología Computacional/métodos , Algoritmos , Automatización , Gráficos por Computador , Sistemas de Computación , Lenguajes de Programación , Programas Informáticos , Interfaz Usuario-Computador

19.

Orione, a web-based framework for NGS analysis in microbiology.

Cuccuru, Gianmauro; Orsini, Massimiliano; Pinna, Andrea; Sbardellati, Andrea; Soranzo, Nicola; Travaglione, Antonella; Uva, Paolo; Zanetti, Gianluigi; Fotia, Giorgio.

Bioinformatics ; 30(13): 1928-9, 2014 Jul 01.

Artículo en Inglés | MEDLINE | ID: mdl-24618473

RESUMEN

UNLABELLED: End-to-end next-generation sequencing microbiology data analysis requires a diversity of tools covering bacterial resequencing, de novo assembly, scaffolding, bacterial RNA-Seq, gene annotation and metagenomics. However, the construction of computational pipelines that use different software packages is difficult owing to a lack of interoperability, reproducibility and transparency. To overcome these limitations we present Orione, a Galaxy-based framework consisting of publicly available research software and specifically designed pipelines to build complex, reproducible workflows for next-generation sequencing microbiology data analysis. Enabling microbiology researchers to conduct their own custom analysis and data manipulation without software installation or programming, Orione provides new opportunities for data-intensive computational analyses in microbiology and metagenomics. AVAILABILITY AND IMPLEMENTATION: Orione is available online at http://orione.crs4.it.

Asunto(s)

Programas Informáticos , Secuenciación de Nucleótidos de Alto Rendimiento , Internet , Metagenómica , Técnicas Microbiológicas , Reproducibilidad de los Resultados

20.

Decompositions of large-scale biological systems based on dynamical properties.

Soranzo, Nicola; Ramezani, Fahimeh; Iacono, Giovanni; Altafini, Claudio.

Bioinformatics ; 28(1): 76-83, 2012 Jan 01.

Artículo en Inglés | MEDLINE | ID: mdl-22072388

RESUMEN

MOTIVATION: Given a large-scale biological network represented as an influence graph, in this article we investigate possible decompositions of the network aimed at highlighting specific dynamical properties. RESULTS: The first decomposition we study consists in finding a maximal directed acyclic subgraph of the network, which dynamically corresponds to searching for a maximal open-loop subsystem of the given system. Another dynamical property investigated is strong monotonicity. We propose two methods to deal with this property, both aimed at decomposing the system into strongly monotone subsystems, but with different structural characteristics: one method tends to produce a single large strongly monotone component, while the other typically generates a set of smaller disjoint strongly monotone subsystems. AVAILABILITY: Original heuristics for the methods investigated are described in the article. CONTACT: altafini@sissa.it

Asunto(s)

Biología Computacional/métodos , Biología de Sistemas/métodos , Inteligencia Artificial , Escherichia coli/metabolismo , Modelos Biológicos , Saccharomyces cerevisiae/citología , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA