Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 27
Filtrar
1.
Genome Res ; 33(2): 261-268, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36828587

RESUMO

There are thousands of well-maintained high-quality open-source software utilities for all aspects of scientific data analysis. For more than a decade, the Galaxy Project has been providing computational infrastructure and a unified user interface for these tools to make them accessible to a wide range of researchers. To streamline the process of integrating tools and constructing workflows as much as possible, we have developed Planemo, a software development kit for tool and workflow developers and Galaxy power users. Here we outline Planemo's implementation and describe its broad range of functionality for designing, testing, and executing Galaxy tools, workflows, and training material. In addition, we discuss the philosophy underlying Galaxy tool and workflow development, and how Planemo encourages the use of development best practices, such as test-driven development, by its users, including those who are not professional software developers.


Assuntos
Biologia Computacional , Software , Fluxo de Trabalho , Análise de Dados
2.
PLoS Comput Biol ; 19(1): e1010752, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-36622853

RESUMO

There is an ongoing explosion of scientific datasets being generated, brought on by recent technological advances in many areas of the natural sciences. As a result, the life sciences have become increasingly computational in nature, and bioinformatics has taken on a central role in research studies. However, basic computational skills, data analysis, and stewardship are still rarely taught in life science educational programs, resulting in a skills gap in many of the researchers tasked with analysing these big datasets. In order to address this skills gap and empower researchers to perform their own data analyses, the Galaxy Training Network (GTN) has previously developed the Galaxy Training Platform (https://training.galaxyproject.org), an open access, community-driven framework for the collection of FAIR (Findable, Accessible, Interoperable, Reusable) training materials for data analysis utilizing the user-friendly Galaxy framework as its primary data analysis platform. Since its inception, this training platform has thrived, with the number of tutorials and contributors growing rapidly, and the range of topics extending beyond life sciences to include topics such as climatology, cheminformatics, and machine learning. While initially aimed at supporting researchers directly, the GTN framework has proven to be an invaluable resource for educators as well. We have focused our efforts in recent years on adding increased support for this growing community of instructors. New features have been added to facilitate the use of the materials in a classroom setting, simplifying the contribution flow for new materials, and have added a set of train-the-trainer lessons. Here, we present the latest developments in the GTN project, aimed at facilitating the use of the Galaxy Training materials by educators, and its usage in different learning environments.


Assuntos
Biologia Computacional , Software , Humanos , Biologia Computacional/métodos , Análise de Dados , Pesquisadores
3.
Nucleic Acids Res ; 46(W1): W537-W544, 2018 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-29790989

RESUMO

Galaxy (homepage: https://galaxyproject.org, main public server: https://usegalaxy.org) is a web-based scientific analysis platform used by tens of thousands of scientists across the world to analyze large biomedical datasets such as those found in genomics, proteomics, metabolomics and imaging. Started in 2005, Galaxy continues to focus on three key challenges of data-driven biomedical science: making analyses accessible to all researchers, ensuring analyses are completely reproducible, and making it simple to communicate analyses so that they can be reused and extended. During the last two years, the Galaxy team and the open-source community around Galaxy have made substantial improvements to Galaxy's core framework, user interface, tools, and training materials. Framework and user interface improvements now enable Galaxy to be used for analyzing tens of thousands of datasets, and >5500 tools are now available from the Galaxy ToolShed. The Galaxy community has led an effort to create numerous high-quality tutorials focused on common types of genomic analyses. The Galaxy developer and user communities continue to grow and be integral to Galaxy's development. The number of Galaxy public servers, developers contributing to the Galaxy framework and its tools, and users of the main Galaxy server have all increased substantially.


Assuntos
Genômica/estatística & dados numéricos , Metabolômica/estatística & dados numéricos , Imagem Molecular/estatística & dados numéricos , Proteômica/estatística & dados numéricos , Interface Usuário-Computador , Conjuntos de Dados como Assunto , Humanos , Disseminação de Informação , Cooperação Internacional , Internet , Reprodutibilidade dos Testes
4.
Nucleic Acids Res ; 44(W1): W3-W10, 2016 07 08.
Artigo em Inglês | MEDLINE | ID: mdl-27137889

RESUMO

High-throughput data production technologies, particularly 'next-generation' DNA sequencing, have ushered in widespread and disruptive changes to biomedical research. Making sense of the large datasets produced by these technologies requires sophisticated statistical and computational methods, as well as substantial computational power. This has led to an acute crisis in life sciences, as researchers without informatics training attempt to perform computation-dependent analyses. Since 2005, the Galaxy project has worked to address this problem by providing a framework that makes advanced computational tools usable by non experts. Galaxy seeks to make data-intensive research more accessible, transparent and reproducible by providing a Web-based environment in which users can perform computational analyses and have all of the details automatically tracked for later inspection, publication, or reuse. In this report we highlight recently added features enabling biomedical analyses on a large scale.


Assuntos
Biologia Computacional/estatística & dados numéricos , Conjuntos de Dados como Assunto/estatística & dados numéricos , Interface Usuário-Computador , Pesquisa Biomédica , Biologia Computacional/métodos , Bases de Dados Genéticas , Humanos , Internet , Reprodutibilidade dos Testes
5.
BMC Genomics ; 16: 333, 2015 Apr 22.
Artigo em Inglês | MEDLINE | ID: mdl-25898983

RESUMO

BACKGROUND: Transcriptomic studies hold great potential towards understanding the human aging process. Previous transcriptomic studies have identified many genes with age-associated expression levels; however, small samples sizes and mixed cell types often make these results difficult to interpret. RESULTS: Using transcriptomic profiles in CD14+ monocytes from 1,264 participants of the Multi-Ethnic Study of Atherosclerosis (aged 55-94 years), we identified 2,704 genes differentially expressed with chronological age (false discovery rate, FDR ≤ 0.001). We further identified six networks of co-expressed genes that included prominent genes from three pathways: protein synthesis (particularly mitochondrial ribosomal genes), oxidative phosphorylation, and autophagy, with expression patterns suggesting these pathways decline with age. Expression of several chromatin remodeler and transcriptional modifier genes strongly correlated with expression of oxidative phosphorylation and ribosomal protein synthesis genes. 17% of genes with age-associated expression harbored CpG sites whose degree of methylation significantly mediated the relationship between age and gene expression (p < 0.05). Lastly, 15 genes with age-associated expression were also associated (FDR ≤ 0.01) with pulse pressure independent of chronological age. Comparing transcriptomic profiles of CD14+ monocytes to CD4+ T cells from a subset (n = 423) of the population, we identified 30 age-associated (FDR < 0.01) genes in common, while larger sets of differentially expressed genes were unique to either T cells (188 genes) or monocytes (383 genes). At the pathway level, a decline in ribosomal protein synthesis machinery gene expression with age was detectable in both cell types. CONCLUSIONS: An overall decline in expression of ribosomal protein synthesis genes with age was detected in CD14+ monocytes and CD4+ T cells, demonstrating that some patterns of aging are likely shared between different cell types. Our findings also support cell-specific effects of age on gene expression, illustrating the importance of using purified cell samples for future transcriptomic studies. Longitudinal work is required to establish the relationship between identified age-associated genes/pathways and aging-related diseases.


Assuntos
Envelhecimento/genética , Monócitos/metabolismo , Transcriptoma , Idoso , Idoso de 80 Anos ou mais , Autofagia/genética , Ilhas de CpG/genética , Metilação de DNA/genética , Feminino , Humanos , Receptores de Lipopolissacarídeos/metabolismo , Masculino , Pessoa de Meia-Idade , Monócitos/citologia , Fosforilação Oxidativa , Biossíntese de Proteínas/genética , Ribossomos/genética , Ribossomos/metabolismo , Linfócitos T/citologia , Linfócitos T/metabolismo
6.
Bioinformatics ; 30(13): 1928-9, 2014 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-24618473

RESUMO

UNLABELLED: End-to-end next-generation sequencing microbiology data analysis requires a diversity of tools covering bacterial resequencing, de novo assembly, scaffolding, bacterial RNA-Seq, gene annotation and metagenomics. However, the construction of computational pipelines that use different software packages is difficult owing to a lack of interoperability, reproducibility and transparency. To overcome these limitations we present Orione, a Galaxy-based framework consisting of publicly available research software and specifically designed pipelines to build complex, reproducible workflows for next-generation sequencing microbiology data analysis. Enabling microbiology researchers to conduct their own custom analysis and data manipulation without software installation or programming, Orione provides new opportunities for data-intensive computational analyses in microbiology and metagenomics. AVAILABILITY AND IMPLEMENTATION: Orione is available online at http://orione.crs4.it.


Assuntos
Software , Sequenciamento de Nucleotídeos em Larga Escala , Internet , Metagenômica , Técnicas Microbiológicas , Reprodutibilidade dos Testes
7.
Bioinformatics ; 30(19): 2816-7, 2014 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-24928211

RESUMO

SUMMARY: BioBlend.objects is a new component of the BioBlend package, adding an object-oriented interface for the Galaxy REST-based application programming interface. It improves support for metacomputing on Galaxy entities by providing higher-level functionality and allowing users to more easily create programs to explore, query and create Galaxy datasets and workflows. AVAILABILITY AND IMPLEMENTATION: BioBlend.objects is available online at https://github.com/afgane/bioblend. The new object-oriented API is implemented by the galaxy/objects subpackage.


Assuntos
Biologia Computacional/métodos , Algoritmos , Automação , Gráficos por Computador , Sistemas Computacionais , Linguagens de Programação , Software , Interface Usuário-Computador
8.
Bioinformatics ; 28(1): 76-83, 2012 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-22072388

RESUMO

MOTIVATION: Given a large-scale biological network represented as an influence graph, in this article we investigate possible decompositions of the network aimed at highlighting specific dynamical properties. RESULTS: The first decomposition we study consists in finding a maximal directed acyclic subgraph of the network, which dynamically corresponds to searching for a maximal open-loop subsystem of the given system. Another dynamical property investigated is strong monotonicity. We propose two methods to deal with this property, both aimed at decomposing the system into strongly monotone subsystems, but with different structural characteristics: one method tends to produce a single large strongly monotone component, while the other typically generates a set of smaller disjoint strongly monotone subsystems. AVAILABILITY: Original heuristics for the methods investigated are described in the article. CONTACT: altafini@sissa.it


Assuntos
Biologia Computacional/métodos , Biologia de Sistemas/métodos , Inteligência Artificial , Escherichia coli/metabolismo , Modelos Biológicos , Saccharomyces cerevisiae/citologia , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo
9.
Bioinformatics ; 27(17): 2459-62, 2011 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-21737438

RESUMO

SUMMARY: SysGenSIM is a software package to simulate Systems Genetics (SG) experiments in model organisms, for the purpose of evaluating and comparing statistical and computational methods and their implementations for analyses of SG data [e.g. methods for expression quantitative trait loci (eQTL) mapping and network inference]. SysGenSIM allows the user to select a variety of network topologies, genetic and kinetic parameters to simulate SG data ( genotyping, gene expression and phenotyping) with large gene networks with thousands of nodes. The software is encoded in MATLAB, and a user-friendly graphical user interface is provided. AVAILABILITY: The open-source software code and user manual can be downloaded at: http://sysgensim.sourceforge.net/ CONTACT: alf@crs4.it.


Assuntos
Redes Reguladoras de Genes , Genótipo , Software , Simulação por Computador , Expressão Gênica , Fenótipo
10.
Microbiome ; 10(1): 176, 2022 10 19.
Artigo em Inglês | MEDLINE | ID: mdl-36258257

RESUMO

BACKGROUND: Amplicon sequencing is an established and cost-efficient method for profiling microbiomes. However, many available tools to process this data require both bioinformatics skills and high computational power to process big datasets. Furthermore, there are only few tools that allow for long read amplicon data analysis. To bridge this gap, we developed the LotuS2 (less OTU scripts 2) pipeline, enabling user-friendly, resource friendly, and versatile analysis of raw amplicon sequences. RESULTS: In LotuS2, six different sequence clustering algorithms as well as extensive pre- and post-processing options allow for flexible data analysis by both experts, where parameters can be fully adjusted, and novices, where defaults are provided for different scenarios. We benchmarked three independent gut and soil datasets, where LotuS2 was on average 29 times faster compared to other pipelines, yet could better reproduce the alpha- and beta-diversity of technical replicate samples. Further benchmarking a mock community with known taxon composition showed that, compared to the other pipelines, LotuS2 recovered a higher fraction of correctly identified taxa and a higher fraction of reads assigned to true taxa (48% and 57% at species; 83% and 98% at genus level, respectively). At ASV/OTU level, precision and F-score were highest for LotuS2, as was the fraction of correctly reported 16S sequences. CONCLUSION: LotuS2 is a lightweight and user-friendly pipeline that is fast, precise, and streamlined, using extensive pre- and post-ASV/OTU clustering steps to further increase data quality. High data usage rates and reliability enable high-throughput microbiome analysis in minutes. AVAILABILITY: LotuS2 is available from GitHub, conda, or via a Galaxy web interface, documented at http://lotus2.earlham.ac.uk/ . Video Abstract.


Assuntos
Software , Solo , RNA Ribossômico 16S , Reprodutibilidade dos Testes , Análise de Sequência , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos
11.
Bioinform Adv ; 2(1): vbac030, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35669346

RESUMO

Summary: Properly and effectively managing reference datasets is an important task for many bioinformatics analyses. Refgenie is a reference asset management system that allows users to easily organize, retrieve and share such datasets. Here, we describe the integration of refgenie into the Galaxy platform. Server administrators are able to configure Galaxy to make use of reference datasets made available on a refgenie instance. In addition, a Galaxy Data Manager tool has been developed to provide a graphical interface to refgenie's remote reference retrieval functionality. A large collection of reference datasets has also been made available using the CVMFS (CernVM File System) repository from GalaxyProject.org, with mirrors across the USA, Canada, Europe and Australia, enabling easy use outside of Galaxy. Availability and implementation: The ability of Galaxy to use refgenie assets was added to the core Galaxy framework in version 22.01, which is available from https://github.com/galaxyproject/galaxy under the Academic Free License version 3.0. The refgenie Data Manager tool can be installed via the Galaxy ToolShed, with source code managed at https://github.com/BlankenbergLab/galaxy-tools-blankenberg/tree/main/data_managers/data_manager_refgenie_pull and released using an MIT license. Access to existing data is also available through CVMFS, with instructions at https://galaxyproject.org/admin/reference-data-repo/. No new data were generated or analyzed in support of this research.

12.
Methods Mol Biol ; 2284: 367-392, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33835453

RESUMO

A complete RNA-Seq analysis involves the use of several different tools, with substantial software and computational requirements. The Galaxy platform simplifies the execution of such bioinformatics analyses by embedding the needed tools in its web interface, while also providing reproducibility. Here, we describe how to perform a reference-based RNA-Seq analysis using Galaxy, from data upload to visualization and functional enrichment analysis of differentially expressed genes.


Assuntos
RNA-Seq/métodos , Software , Animais , Biologia Computacional/métodos , Análise de Dados , Conjuntos de Dados como Assunto/estatística & dados numéricos , Perfilação da Expressão Gênica/métodos , Perfilação da Expressão Gênica/estatística & dados numéricos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Humanos , Reprodutibilidade dos Testes , Análise de Sequência de RNA/métodos , Análise de Sequência de RNA/estatística & dados numéricos , Sequenciamento do Exoma/métodos , Sequenciamento do Exoma/estatística & dados numéricos
13.
Bioinformatics ; 25(21): 2853-4, 2009 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-19713415

RESUMO

SUMMARY: ERNEST Reaction Network Equilibria Study Toolbox is a MATLAB package which, by checking various different criteria on the structure of a chemical reaction network, can exclude the multistationarity of the corresponding reaction system. The results obtained are independent of the rate constants of the reactions, and can be used for model discrimination. AVAILABILITY AND IMPLEMENTATION: The software, implemented in MATLAB, is available under the GNU GPL free software license from http://people.sissa.it/ approximately altafini/papers/SoAl09/. It requires the MATLAB Optimization Toolbox. CONTACT: altafini@sissa.it.


Assuntos
Biologia Computacional/métodos , Modelos Químicos , Software , Processos Estocásticos
14.
Gigascience ; 9(10)2020 10 20.
Artigo em Inglês | MEDLINE | ID: mdl-33079170

RESUMO

BACKGROUND: The vast ecosystem of single-cell RNA-sequencing tools has until recently been plagued by an excess of diverging analysis strategies, inconsistent file formats, and compatibility issues between different software suites. The uptake of 10x Genomics datasets has begun to calm this diversity, and the bioinformatics community leans once more towards the large computing requirements and the statistically driven methods needed to process and understand these ever-growing datasets. RESULTS: Here we outline several Galaxy workflows and learning resources for single-cell RNA-sequencing, with the aim of providing a comprehensive analysis environment paired with a thorough user learning experience that bridges the knowledge gap between the computational methods and the underlying cell biology. The Galaxy reproducible bioinformatics framework provides tools, workflows, and trainings that not only enable users to perform 1-click 10x preprocessing but also empower them to demultiplex raw sequencing from custom tagged and full-length sequencing protocols. The downstream analysis supports a range of high-quality interoperable suites separated into common stages of analysis: inspection, filtering, normalization, confounder removal, and clustering. The teaching resources cover concepts from computer science to cell biology. Access to all resources is provided at the singlecell.usegalaxy.eu portal. CONCLUSIONS: The reproducible and training-oriented Galaxy framework provides a sustainable high-performance computing environment for users to run flexible analyses on both 10x and alternative platforms. The tutorials from the Galaxy Training Network along with the frequent training workshops hosted by the Galaxy community provide a means for users to learn, publish, and teach single-cell RNA-sequencing analysis.


Assuntos
Ecossistema , Software , Biologia Computacional , RNA , Análise de Sequência de RNA
15.
Bioinformatics ; 24(13): 1510-5, 2008 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-18467346

RESUMO

BACKGROUND: In the past years devising methods for discovering gene regulatory mechanisms at a genome-wide level has become a fundamental topic in the field of systems biology. The aim is to infer gene-gene interactions in an increasingly sophisticated and reliable way through the continuous improvement of reverse engineering algorithms exploiting microarray data. MOTIVATION: This work is inspired by the several studies suggesting that coexpression is mostly related to 'static' stable binding relationships, like belonging to the same protein complex, rather than other types of interactions more of a 'causal' and transient nature (e.g. transcription factor-binding site interactions). The aim of this work is to verify if direct or conditional network inference algorithms (e.g. Pearson correlation for the former, partial Pearson correlation for the latter) are indeed useful in discerning static from causal dependencies in artificial and real gene networks (derived from Escherichia coli and Saccharomyces cerevisiae). RESULTS: Even in the regime of weak inference power we have to work in, our analysis confirms the differences in the performances of the algorithms: direct methods are more robust in detecting stable interactions, conditional ones are better for causal interactions especially in presence of combinatorial transcriptional regulation. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Mapeamento Cromossômico/métodos , Engenharia Genética/métodos , Modelos Genéticos , Mapeamento de Interação de Proteínas/métodos , Proteoma/genética , Transdução de Sinais/genética , Simulação por Computador
16.
Gigascience ; 8(12)2019 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-31825480

RESUMO

BACKGROUND: It is not a trivial step to move from single-cell RNA-sequencing (scRNA-seq) data production to data analysis. There is a lack of intuitive training materials and easy-to-use analysis tools, and researchers can find it difficult to master the basics of scRNA-seq quality control and the later analysis. RESULTS: We have developed a range of practical scripts, together with their corresponding Galaxy wrappers, that make scRNA-seq training and quality control accessible to researchers previously daunted by the prospect of scRNA-seq analysis. We implement a "visualize-filter-visualize" paradigm through simple command line tools that use the Loom format to exchange data between the tools. The point-and-click nature of Galaxy makes it easy to assess, visualize, and filter scRNA-seq data from short-read sequencing data. CONCLUSION: We have developed a suite of scRNA-seq tools that can be used for both training and more in-depth analyses.


Assuntos
Biologia Computacional/educação , Análise de Sequência de RNA/normas , Análise de Célula Única/normas , Análise de Dados , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Software , Interface Usuário-Computador
17.
Bioinformatics ; 23(13): 1640-7, 2007 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-17485431

RESUMO

MOTIVATION: Inferring a gene regulatory network exclusively from microarray expression profiles is a difficult but important task. The aim of this work is to compare the predictive power of some of the most popular algorithms in different conditions (like data taken at equilibrium or time courses) and on both synthetic and real microarray data. We are in particular interested in comparing similarity measures both of linear type (like correlations and partial correlations) and of non-linear type (mutual information and conditional mutual information), and in investigating the underdetermined case (less samples than genes). RESULTS: In our simulations we see that all network inference algorithms obtain better performances from data produced with 'structural' perturbations, like gene knockouts at steady state, than with any dynamical perturbation. The predictive power of all algorithms is confirmed on a reverse engineering problem from Escherichia coli gene profiling data: the edges of the 'physical' network of transcription factor-binding sites are significantly overrepresented among the highest weighting edges of the graph that we infer directly from the data without any structure supervision. Comparing synthetic and in vivo data on the same network graph allows us to give an indication of how much more complex a real transcriptional regulation program is with respect to an artificial model. AVAILABILITY: Software is freely available at the URL http://people.sissa.it/~altafini/papers/SoBiAl07/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Regulação da Expressão Gênica/fisiologia , Modelos Biológicos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Proteoma/metabolismo , Transdução de Sinais/fisiologia , Engenharia Biomédica/métodos , Simulação por Computador , Bases de Dados de Proteínas , Perfilação da Expressão Gênica
18.
Gigascience ; 7(11)2018 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-30395211

RESUMO

Background: Phylogenetic information inferred from the study of homologous genes helps us to understand the evolution of genes and gene families, including the identification of ancestral gene duplication events as well as regions under positive or purifying selection within lineages. Gene family and orthogroup characterization enables the identification of syntenic blocks, which can then be visualized with various tools. Unfortunately, currently available tools display only an overview of syntenic regions as a whole, limited to the gene level, and none provide further details about structural changes within genes, such as the conservation of ancestral exon boundaries amongst multiple genomes. Findings: We present Aequatus, an open-source web-based tool that provides an in-depth view of gene structure across gene families, with various options to render and filter visualizations. It relies on precalculated alignment and gene feature information typically held in, but not limited to, the Ensembl Compara and Core databases. We also offer Aequatus.js, a reusable JavaScript module that fulfills the visualization aspects of Aequatus, available within the Galaxy web platform as a visualization plug-in, which can be used to visualize gene trees generated by the GeneSeqToFamily workflow.


Assuntos
Biologia Computacional/métodos , Genoma/genética , Genômica/métodos , Software , Armazenamento e Recuperação da Informação/métodos , Internet , Filogenia , Proteínas/classificação , Proteínas/genética , Reprodutibilidade dos Testes , Alinhamento de Sequência/métodos
19.
Gigascience ; 7(3): 1-10, 2018 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-29425291

RESUMO

Background: Gene duplication is a major factor contributing to evolutionary novelty, and the contraction or expansion of gene families has often been associated with morphological, physiological, and environmental adaptations. The study of homologous genes helps us to understand the evolution of gene families. It plays a vital role in finding ancestral gene duplication events as well as identifying genes that have diverged from a common ancestor under positive selection. There are various tools available, such as MSOAR, OrthoMCL, and HomoloGene, to identify gene families and visualize syntenic information between species, providing an overview of syntenic regions evolution at the family level. Unfortunately, none of them provide information about structural changes within genes, such as the conservation of ancestral exon boundaries among multiple genomes. The Ensembl GeneTrees computational pipeline generates gene trees based on coding sequences, provides details about exon conservation, and is used in the Ensembl Compara project to discover gene families. Findings: A certain amount of expertise is required to configure and run the Ensembl Compara GeneTrees pipeline via command line. Therefore, we converted this pipeline into a Galaxy workflow, called GeneSeqToFamily, and provided additional functionality. This workflow uses existing tools from the Galaxy ToolShed, as well as providing additional wrappers and tools that are required to run the workflow. Conclusions: GeneSeqToFamily represents the Ensembl GeneTrees pipeline as a set of interconnected Galaxy tools, so they can be run interactively within the Galaxy's user-friendly workflow environment while still providing the flexibility to tailor the analysis by changing configurations and tools if necessary. Additional tools allow users to subsequently visualize the gene families produced by the workflow, using the Aequatus.js interactive tool, which has been developed as part of the Aequatus software project.


Assuntos
Biologia Computacional , Genoma/genética , Filogenia , Software , Algoritmos , Interface Usuário-Computador , Fluxo de Trabalho
20.
Cell Syst ; 6(6): 631-635, 2018 06 27.
Artigo em Inglês | MEDLINE | ID: mdl-29953862

RESUMO

Many areas of research suffer from poor reproducibility, particularly in computationally intensive domains where results rely on a series of complex methodological decisions that are not well captured by traditional publication approaches. Various guidelines have emerged for achieving reproducibility, but implementation of these practices remains difficult due to the challenge of assembling software tools plus associated libraries, connecting tools together into pipelines, and specifying parameters. Here, we discuss a suite of cutting-edge technologies that make computational reproducibility not just possible, but practical in both time and effort. This suite combines three well-tested components-a system for building highly portable packages of bioinformatics software, containerization and virtualization technologies for isolating reusable execution environments for these packages, and workflow systems that automatically orchestrate the composition of these packages for entire pipelines-to achieve an unprecedented level of computational reproducibility. We also provide a practical implementation and five recommendations to help set a typical researcher on the path to performing data analyses reproducibly.


Assuntos
Biologia Computacional/métodos , Reprodutibilidade dos Testes , Disciplinas das Ciências Biológicas , Humanos , Pesquisadores , Software , Tecnologia , Interface Usuário-Computador , Fluxo de Trabalho
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA