Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24
Filtrar
2.
bioRxiv ; 2023 Jun 30.
Artigo em Inglês | MEDLINE | ID: mdl-37425881

RESUMO

Improvements in genome sequencing and assembly are enabling high-quality reference genomes for all species. However, the assembly process is still laborious, computationally and technically demanding, lacks standards for reproducibility, and is not readily scalable. Here we present the latest Vertebrate Genomes Project assembly pipeline and demonstrate that it delivers high-quality reference genomes at scale across a set of vertebrate species arising over the last ~500 million years. The pipeline is versatile and combines PacBio HiFi long-reads and Hi-C-based haplotype phasing in a new graph-based paradigm. Standardized quality control is performed automatically to troubleshoot assembly issues and assess biological complexities. We make the pipeline freely accessible through Galaxy, accommodating researchers even without local computational resources and enhanced reproducibility by democratizing the training and assembly process. We demonstrate the flexibility and reliability of the pipeline by assembling reference genomes for 51 vertebrate species from major taxonomic groups (fish, amphibians, reptiles, birds, and mammals).

3.
BMC Bioinformatics ; 24(1): 263, 2023 Jun 23.
Artigo em Inglês | MEDLINE | ID: mdl-37353753

RESUMO

BACKGROUND: Protein-protein interactions play a crucial role in almost all cellular processes. Identifying interacting proteins reveals insight into living organisms and yields novel drug targets for disease treatment. Here, we present a publicly available, automated pipeline to predict genome-wide protein-protein interactions and produce high-quality multimeric structural models. RESULTS: Application of our method to the Human and Yeast genomes yield protein-protein interaction networks similar in quality to common experimental methods. We identified and modeled Human proteins likely to interact with the papain-like protease of SARS-CoV2's non-structural protein 3. We also produced models of SARS-CoV2's spike protein (S) interacting with myelin-oligodendrocyte glycoprotein receptor and dipeptidyl peptidase-4. CONCLUSIONS: The presented method is capable of confidently identifying interactions while providing high-quality multimeric structural models for experimental validation. The interactome modeling pipeline is available at usegalaxy.org and usegalaxy.eu.


Assuntos
COVID-19 , Mapeamento de Interação de Proteínas , Humanos , RNA Viral/metabolismo , SARS-CoV-2 , Saccharomyces cerevisiae/metabolismo
4.
Genome Res ; 33(2): 261-268, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36828587

RESUMO

There are thousands of well-maintained high-quality open-source software utilities for all aspects of scientific data analysis. For more than a decade, the Galaxy Project has been providing computational infrastructure and a unified user interface for these tools to make them accessible to a wide range of researchers. To streamline the process of integrating tools and constructing workflows as much as possible, we have developed Planemo, a software development kit for tool and workflow developers and Galaxy power users. Here we outline Planemo's implementation and describe its broad range of functionality for designing, testing, and executing Galaxy tools, workflows, and training material. In addition, we discuss the philosophy underlying Galaxy tool and workflow development, and how Planemo encourages the use of development best practices, such as test-driven development, by its users, including those who are not professional software developers.


Assuntos
Biologia Computacional , Software , Fluxo de Trabalho , Análise de Dados
5.
Methods Mol Biol ; 2607: 311-327, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36449168

RESUMO

The extent of transposable element (TE) mobilization in different somatic tissues and throughout diverse species is not well understood. Somatic transposition is often challenging to study as it generates de novo TE insertions that represent rare genetic variants present in heterogenous tissues. Here, we describe experimental approaches that can be applied to address TE mobility in somatic tissues with the use of short- and long-read whole-genome DNA sequencing. Focusing on the analysis of the Drosophila melanogaster intestinal and head tissues, we provide instructions on how to design, perform, and validate experiments that aim at detecting somatic transposition. In addition to providing examples of protocols, this chapter intends to deliver general experimental guidelines that may be adapted to other fly tissues or to other species.


Assuntos
Drosophila melanogaster , Drosophila , Animais , Drosophila melanogaster/genética , Sequenciamento Completo do Genoma , Elementos de DNA Transponíveis/genética
6.
Nat Commun ; 13(1): 3695, 2022 06 27.
Artigo em Inglês | MEDLINE | ID: mdl-35760813

RESUMO

Millions of transcriptomic profiles have been deposited in public archives, yet remain underused for the interpretation of new experiments. We present a method for interpreting new transcriptomic datasets through instant comparison to public datasets without high-performance computing requirements. We apply Principal Component Analysis on 536 studies comprising 44,890 human RNA sequencing profiles and aggregate sufficiently similar loading vectors to form Replicable Axes of Variation (RAV). RAVs are annotated with metadata of originating studies and by gene set enrichment analysis. Functionality to associate new datasets with RAVs, extract interpretable annotations, and provide intuitive visualization are implemented as the GenomicSuperSignature R/Bioconductor package. We demonstrate the efficient and coherent database search, robustness to batch effects and heterogeneous training data, and transfer learning capacity of our method using TCGA and rare diseases datasets. GenomicSuperSignature aids in analyzing new gene expression data in the context of existing databases using minimal computing resources.


Assuntos
Bases de Dados Genéticas , Software , Humanos , RNA-Seq , Transcriptoma/genética
7.
Bioinform Adv ; 2(1): vbac030, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35669346

RESUMO

Summary: Properly and effectively managing reference datasets is an important task for many bioinformatics analyses. Refgenie is a reference asset management system that allows users to easily organize, retrieve and share such datasets. Here, we describe the integration of refgenie into the Galaxy platform. Server administrators are able to configure Galaxy to make use of reference datasets made available on a refgenie instance. In addition, a Galaxy Data Manager tool has been developed to provide a graphical interface to refgenie's remote reference retrieval functionality. A large collection of reference datasets has also been made available using the CVMFS (CernVM File System) repository from GalaxyProject.org, with mirrors across the USA, Canada, Europe and Australia, enabling easy use outside of Galaxy. Availability and implementation: The ability of Galaxy to use refgenie assets was added to the core Galaxy framework in version 22.01, which is available from https://github.com/galaxyproject/galaxy under the Academic Free License version 3.0. The refgenie Data Manager tool can be installed via the Galaxy ToolShed, with source code managed at https://github.com/BlankenbergLab/galaxy-tools-blankenberg/tree/main/data_managers/data_manager_refgenie_pull and released using an MIT license. Access to existing data is also available through CVMFS, with instructions at https://galaxyproject.org/admin/reference-data-repo/. No new data were generated or analyzed in support of this research.

9.
Genome Res ; 31(8): 1419-1432, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-34168010

RESUMO

Spontaneous mutations can alter tissue dynamics and lead to cancer initiation. Although large-scale sequencing projects have illuminated processes that influence somatic mutation and subsequent tumor evolution, the mutational dynamics operating in the very early stages of cancer development are currently not well understood. To explore mutational processes in the early stages of cancer evolution, we exploited neoplasia arising spontaneously in the Drosophila intestine. Analysing whole-genome sequencing data with a dedicated bioinformatic pipeline, we found neoplasia formation to be driven largely through the inactivation of Notch by structural variants, many of which involve highly complex genomic rearrangements. The genome-wide mutational burden in neoplasia was found to be similar to that of several human cancers. Finally, we identified genomic features associated with spontaneous mutation, and defined the evolutionary dynamics and mutational landscape operating within intestinal neoplasia over the short lifespan of the adult fly. Our findings provide unique insight into mutational dynamics operating over a short timescale in the genetic model system, Drosophila melanogaster.


Assuntos
Drosophila melanogaster , Drosophila , Animais , Drosophila/genética , Drosophila melanogaster/genética , Genômica , Intestinos , Mutação , Células-Tronco
10.
bioRxiv ; 2021 Mar 25.
Artigo em Inglês | MEDLINE | ID: mdl-33791701

RESUMO

The COVID-19 pandemic is the first global health crisis to occur in the age of big genomic data.Although data generation capacity is well established and sufficiently standardized, analytical capacity is not. To establish analytical capacity it is necessary to pull together global computational resources and deliver the best open source tools and analysis workflows within a ready to use, universally accessible resource. Such a resource should not be controlled by a single research group, institution, or country. Instead it should be maintained by a community of users and developers who ensure that the system remains operational and populated with current tools. A community is also essential for facilitating the types of discourse needed to establish best analytical practices. Bringing together public computational research infrastructure from the USA, Europe, and Australia, we developed a distributed data analysis platform that accomplishes these goals. It is immediately accessible to anyone in the world and is designed for the analysis of rapidly growing collections of deep sequencing datasets. We demonstrate its utility by detecting allelic variants in high-quality existing SARS-CoV-2 sequencing datasets and by continuous reanalysis of COG-UK data. All workflows, data, and documentation is available at https://covid19.galaxyproject.org .

11.
Methods Mol Biol ; 2284: 367-392, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33835453

RESUMO

A complete RNA-Seq analysis involves the use of several different tools, with substantial software and computational requirements. The Galaxy platform simplifies the execution of such bioinformatics analyses by embedding the needed tools in its web interface, while also providing reproducibility. Here, we describe how to perform a reference-based RNA-Seq analysis using Galaxy, from data upload to visualization and functional enrichment analysis of differentially expressed genes.


Assuntos
RNA-Seq/métodos , Software , Animais , Biologia Computacional/métodos , Análise de Dados , Conjuntos de Dados como Assunto/estatística & dados numéricos , Perfilação da Expressão Gênica/métodos , Perfilação da Expressão Gênica/estatística & dados numéricos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Humanos , Reprodutibilidade dos Testes , Análise de Sequência de RNA/métodos , Análise de Sequência de RNA/estatística & dados numéricos , Sequenciamento do Exoma/métodos , Sequenciamento do Exoma/estatística & dados numéricos
12.
EMBO J ; 40(9): e106388, 2021 05 03.
Artigo em Inglês | MEDLINE | ID: mdl-33634906

RESUMO

Transposable elements (TEs) play a significant role in evolution, contributing to genetic variation. However, TE mobilization in somatic cells is not well understood. Here, we address the prevalence of transposition in a somatic tissue, exploiting the Drosophila midgut as a model. Using whole-genome sequencing of in vivo clonally expanded gut tissue, we have mapped hundreds of high-confidence somatic TE integration sites genome-wide. We show that somatic retrotransposon insertions are associated with inactivation of the tumor suppressor Notch, likely contributing to neoplasia formation. Moreover, applying Oxford Nanopore long-read sequencing technology we provide evidence for tissue-specific differences in retrotransposition. Comparing somatic TE insertional activity with transcriptomic and small RNA sequencing data, we demonstrate that transposon mobility cannot be simply predicted by whole tissue TE expression levels or by small RNA pathway activity. Finally, we reveal that somatic TE insertions in the adult fly intestine are enriched in genic regions and in transcriptionally active chromatin. Together, our findings provide clear evidence of ongoing somatic transposition in Drosophila and delineate previously unknown features underlying somatic TE mobility in vivo.


Assuntos
Elementos de DNA Transponíveis , Proteínas de Drosophila/genética , Drosophila melanogaster/genética , Neoplasias Intestinais/genética , Receptores Notch/genética , Animais , Evolução Clonal , Feminino , Perfilação da Expressão Gênica , Inativação Gênica , Masculino , Especificidade de Órgãos , Recombinação Genética , Análise de Sequência de RNA/métodos , Sequenciamento Completo do Genoma
13.
Gigascience ; 9(10)2020 10 17.
Artigo em Inglês | MEDLINE | ID: mdl-33068114

RESUMO

BACKGROUND: Long-read sequencing can be applied to generate very long contigs and even completely assembled genomes at relatively low cost and with minimal sample preparation. As a result, long-read sequencing platforms are becoming more popular. In this respect, the Oxford Nanopore Technologies-based long-read sequencing "nanopore" platform is becoming a widely used tool with a broad range of applications and end-users. However, the need to explore and manipulate the complex data generated by long-read sequencing platforms necessitates accompanying specialized bioinformatics platforms and tools to process the long-read data correctly. Importantly, such tools should additionally help democratize bioinformatics analysis by enabling easy access and ease-of-use solutions for researchers. RESULTS: The Galaxy platform provides a user-friendly interface to computational command line-based tools, handles the software dependencies, and provides refined workflows. The users do not have to possess programming experience or extended computer skills. The interface enables researchers to perform powerful bioinformatics analysis, including the assembly and analysis of short- or long-read sequence data. The newly developed "NanoGalaxy" is a Galaxy-based toolkit for analysing long-read sequencing data, which is suitable for diverse applications, including de novo genome assembly from genomic, metagenomic, and plasmid sequence reads. CONCLUSIONS: A range of best-practice tools and workflows for long-read sequence genome assembly has been integrated into a NanoGalaxy platform to facilitate easy access and use of bioinformatics tools for researchers. NanoGalaxy is freely available at the European Galaxy server https://nanopore.usegalaxy.eu with supporting self-learning training material available at https://training.galaxyproject.org.


Assuntos
Sequenciamento por Nanoporos , Nanoporos , Análise de Dados , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA , Software
14.
PLoS Pathog ; 16(8): e1008643, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32790776

RESUMO

The current state of much of the Wuhan pneumonia virus (severe acute respiratory syndrome coronavirus 2 [SARS-CoV-2]) research shows a regrettable lack of data sharing and considerable analytical obfuscation. This impedes global research cooperation, which is essential for tackling public health emergencies and requires unimpeded access to data, analysis tools, and computational infrastructure. Here, we show that community efforts in developing open analytical software tools over the past 10 years, combined with national investments into scientific computational infrastructure, can overcome these deficiencies and provide an accessible platform for tackling global health emergencies in an open and transparent manner. Specifically, we use all SARS-CoV-2 genomic data available in the public domain so far to (1) underscore the importance of access to raw data and (2) demonstrate that existing community efforts in curation and deployment of biomedical software can reliably support rapid, reproducible research during global health crises. All our analyses are fully documented at https://github.com/galaxyproject/SARS-CoV-2.


Assuntos
Betacoronavirus/patogenicidade , Infecções por Coronavirus/virologia , Pneumonia Viral/virologia , Saúde Pública , Síndrome Respiratória Aguda Grave/virologia , COVID-19 , Análise de Dados , Humanos , Pandemias , SARS-CoV-2
15.
Dev Cell ; 49(4): 556-573.e6, 2019 05 20.
Artigo em Inglês | MEDLINE | ID: mdl-31112698

RESUMO

Chromatin remodeling accompanies differentiation, however, its role in self-renewal is less well understood. We report that in Drosophila, the chromatin remodeler Kismet/CHD7/CHD8 limits intestinal stem cell (ISC) number and proliferation without affecting differentiation. Stem-cell-specific whole-genome profiling of Kismet revealed its enrichment at transcriptionally active regions bound by RNA polymerase II and Brahma, its recruitment to the transcription start site of activated genes and developmental enhancers and its depletion from regions bound by Polycomb, Histone H1, and heterochromatin Protein 1. We demonstrate that the Trithorax-related/MLL3/4 chromatin modifier regulates ISC proliferation, colocalizes extensively with Kismet throughout the ISC genome, and co-regulates genes in ISCs, including Cbl, a negative regulator of Epidermal Growth Factor Receptor (EGFR). Loss of kismet or trr leads to elevated levels of EGFR protein and signaling, thereby promoting ISC self-renewal. We propose that Kismet with Trr establishes a chromatin state that limits EGFR proliferative signaling, preventing tumor-like stem cell overgrowths.


Assuntos
Cromatina/metabolismo , DNA Helicases/metabolismo , Proteínas de Drosophila/metabolismo , Histona-Lisina N-Metiltransferase/metabolismo , Proteínas de Homeodomínio/metabolismo , Animais , Diferenciação Celular/fisiologia , Proliferação de Células/fisiologia , Montagem e Desmontagem da Cromatina/fisiologia , DNA Helicases/fisiologia , Proteínas de Drosophila/fisiologia , Drosophila melanogaster/metabolismo , Receptores ErbB/metabolismo , Histona-Lisina N-Metiltransferase/fisiologia , Histonas/metabolismo , Proteínas de Homeodomínio/fisiologia , RNA Polimerase II/genética , RNA Polimerase II/metabolismo , Receptores de Peptídeos de Invertebrados/metabolismo , Transdução de Sinais/fisiologia , Células-Tronco/metabolismo , Fatores de Transcrição/metabolismo
16.
RNA ; 24(12): 1749-1760, 2018 12.
Artigo em Inglês | MEDLINE | ID: mdl-30217866

RESUMO

piRNA-mediated repression of transposable elements (TE) in the germline limits the accumulation of mutations caused by their transposition. It is not clear whether the piRNA pathway plays a role in adult, nongonadal tissues in Drosophila melanogaster. To address this question, we analyzed the small RNA content of adult Drosophila melanogaster heads. We found that the varying amount of piRNA-sized, ping-pong positive molecules in heads correlates with contamination by gonadal tissue during RNA extraction, suggesting that most of the piRNAs detected in heads originate from gonads. We next sequenced the heads of wild-type and piwi mutants to address whether piwi loss of function would affect the low amount of piRNA-sized, ping-pong negative molecules that are still detected in heads hand-checked to avoid gonadal contamination. We find that loss of piwi does not significantly affect these 24-28 nt RNAs. Instead, we observe increased siRNA levels against the majority of Drosophila TE families. To determine the effect of this siRNA level change on transposon expression, we sequenced the transcriptome of wild-type, piwi, dicer-2 and piwi, dicer-2 double-mutant heads. We find that RNA expression levels of the majority of TE in piwi or dicer-2 mutants remain unchanged and that TE transcripts increase only in piwi, dicer-2 double-mutants. These results lead us to suggest a dual-layer model for TE repression in adult somatic tissues. Piwi-mediated gene silencing established during embryogenesis constitutes the first layer of TE repression whereas Dicer-2-dependent siRNA-mediated silencing provides a backup mechanism to repress TEs that escape silencing by Piwi.


Assuntos
Elementos de DNA Transponíveis/genética , Drosophila melanogaster/genética , Cabeça/crescimento & desenvolvimento , RNA Interferente Pequeno/genética , Animais , Proteínas de Drosophila/genética , Drosophila melanogaster/crescimento & desenvolvimento , Regulação da Expressão Gênica no Desenvolvimento/genética , Inativação Gênica , Células Germinativas , Mutação em Linhagem Germinativa/genética , Gônadas/crescimento & desenvolvimento , Gônadas/metabolismo , RNA Helicases/genética , Ribonuclease III/genética
17.
Cell Syst ; 6(6): 631-635, 2018 06 27.
Artigo em Inglês | MEDLINE | ID: mdl-29953862

RESUMO

Many areas of research suffer from poor reproducibility, particularly in computationally intensive domains where results rely on a series of complex methodological decisions that are not well captured by traditional publication approaches. Various guidelines have emerged for achieving reproducibility, but implementation of these practices remains difficult due to the challenge of assembling software tools plus associated libraries, connecting tools together into pipelines, and specifying parameters. Here, we discuss a suite of cutting-edge technologies that make computational reproducibility not just possible, but practical in both time and effort. This suite combines three well-tested components-a system for building highly portable packages of bioinformatics software, containerization and virtualization technologies for isolating reusable execution environments for these packages, and workflow systems that automatically orchestrate the composition of these packages for entire pipelines-to achieve an unprecedented level of computational reproducibility. We also provide a practical implementation and five recommendations to help set a typical researcher on the path to performing data analyses reproducibly.


Assuntos
Biologia Computacional/métodos , Reprodutibilidade dos Testes , Disciplinas das Ciências Biológicas , Humanos , Pesquisadores , Software , Tecnologia , Interface Usuário-Computador , Fluxo de Trabalho
18.
Nucleic Acids Res ; 46(W1): W537-W544, 2018 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-29790989

RESUMO

Galaxy (homepage: https://galaxyproject.org, main public server: https://usegalaxy.org) is a web-based scientific analysis platform used by tens of thousands of scientists across the world to analyze large biomedical datasets such as those found in genomics, proteomics, metabolomics and imaging. Started in 2005, Galaxy continues to focus on three key challenges of data-driven biomedical science: making analyses accessible to all researchers, ensuring analyses are completely reproducible, and making it simple to communicate analyses so that they can be reused and extended. During the last two years, the Galaxy team and the open-source community around Galaxy have made substantial improvements to Galaxy's core framework, user interface, tools, and training materials. Framework and user interface improvements now enable Galaxy to be used for analyzing tens of thousands of datasets, and >5500 tools are now available from the Galaxy ToolShed. The Galaxy community has led an effort to create numerous high-quality tutorials focused on common types of genomic analyses. The Galaxy developer and user communities continue to grow and be integral to Galaxy's development. The number of Galaxy public servers, developers contributing to the Galaxy framework and its tools, and users of the main Galaxy server have all increased substantially.


Assuntos
Genômica/estatística & dados numéricos , Metabolômica/estatística & dados numéricos , Imagem Molecular/estatística & dados numéricos , Proteômica/estatística & dados numéricos , Interface Usuário-Computador , Conjuntos de Dados como Assunto , Humanos , Disseminação de Informação , Cooperação Internacional , Internet , Reprodutibilidade dos Testes
19.
Gigascience ; 6(6): 1-4, 2017 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-28402416

RESUMO

Background: Bioinformaticians routinely use multiple software tools and data sources in their day-to-day work and have been guided in their choices by a number of cataloguing initiatives. The ELIXIR Tools and Data Services Registry (bio.tools) aims to provide a central information point, independent of any specific scientific scope within bioinformatics or technological implementation. Meanwhile, efforts to integrate bioinformatics software in workbench and workflow environments have accelerated to enable the design, automation, and reproducibility of bioinformatics experiments. One such popular environment is the Galaxy framework, with currently more than 80 publicly available Galaxy servers around the world. In the context of a generic registry for bioinformatics software, such as bio.tools, Galaxy instances constitute a major source of valuable content. Yet there has been, to date, no convenient mechanism to register such services en masse. We present ReGaTE (Registration of Galaxy Tools in Elixir), a software utility that automates the process of registering the services available in a Galaxy instance. This utility uses the BioBlend application program interface to extract service metadata from a Galaxy server, enhance the metadata with the scientific information required by bio.tools, and push it to the registry. ReGaTE provides a fast and convenient way to publish Galaxy services in bio.tools. By doing so, service providers may increase the visibility of their services while enriching the software discovery function that bio.tools provides for its users. The source code of ReGaTE is freely available on Github at https://github.com/C3BI-pasteur-fr/ReGaTE .


Assuntos
Biologia Computacional/métodos , Automação , Sistemas Computacionais , Internet , Reprodutibilidade dos Testes , Software , Interface Usuário-Computador , Fluxo de Trabalho
20.
PLoS One ; 12(1): e0168397, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28045932

RESUMO

Metavisitor is a software package that allows biologists and clinicians without specialized bioinformatics expertise to detect and assemble viral genomes from deep sequence datasets. The package is composed of a set of modular bioinformatic tools and workflows that are implemented in the Galaxy framework. Using the graphical Galaxy workflow editor, users with minimal computational skills can use existing Metavisitor workflows or adapt them to suit specific needs by adding or modifying analysis modules. Metavisitor works with DNA, RNA or small RNA sequencing data over a range of read lengths and can use a combination of de novo and guided approaches to assemble genomes from sequencing reads. We show that the software has the potential for quick diagnosis as well as discovery of viruses from a vast array of organisms. Importantly, we provide here executable Metavisitor use cases, which increase the accessibility and transparency of the software, ultimately enabling biologists or clinicians to focus on biological or medical questions.


Assuntos
Biologia Computacional , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de RNA , Software , Vírus/genética , Ebolavirus/genética , Biblioteca Gênica , Genoma Viral , Humanos , Internet , Vírus Lassa/genética , Fluxo de Trabalho
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA