RESUMO
SUMMARY: Many aspects of the global response to the COVID-19 pandemic are enabled by the fast and open publication of SARS-CoV-2 genetic sequence data. The European Nucleotide Archive (ENA) is the European recommended open repository for genetic sequences. In this work, we present a tool for submitting raw sequencing reads of SARS-CoV-2 to ENA. The tool features a single-step submission process, a graphical user interface, tabular-formatted metadata and the possibility to remove human reads prior to submission. A Galaxy wrap of the tool allows users with little or no bioinformatics knowledge to do bulk sequencing read submissions. The tool is also packed in a Docker container to ease deployment. AVAILABILITY AND IMPLEMENTATION: CLI ENA upload tool is available at github.com/usegalaxy-eu/ena-upload-cli (DOI 10.5281/zenodo.4537621); Galaxy ENA upload tool at toolshed.g2.bx.psu.edu/view/iuc/ena_upload/382518f24d6d and github.com/galaxyproject/tools-iuc/tree/master/tools/ena_upload (development); and ENA upload Galaxy container at github.com/ELIXIR-Belgium/ena-upload-container (DOI 10.5281/zenodo.4730785).
Assuntos
COVID-19 , Software , Humanos , SARS-CoV-2 , Nucleotídeos , PandemiasRESUMO
The current state of much of the Wuhan pneumonia virus (severe acute respiratory syndrome coronavirus 2 [SARS-CoV-2]) research shows a regrettable lack of data sharing and considerable analytical obfuscation. This impedes global research cooperation, which is essential for tackling public health emergencies and requires unimpeded access to data, analysis tools, and computational infrastructure. Here, we show that community efforts in developing open analytical software tools over the past 10 years, combined with national investments into scientific computational infrastructure, can overcome these deficiencies and provide an accessible platform for tackling global health emergencies in an open and transparent manner. Specifically, we use all SARS-CoV-2 genomic data available in the public domain so far to (1) underscore the importance of access to raw data and (2) demonstrate that existing community efforts in curation and deployment of biomedical software can reliably support rapid, reproducible research during global health crises. All our analyses are fully documented at https://github.com/galaxyproject/SARS-CoV-2.
Assuntos
Betacoronavirus/patogenicidade , Infecções por Coronavirus/virologia , Pneumonia Viral/virologia , Saúde Pública , Síndrome Respiratória Aguda Grave/virologia , COVID-19 , Análise de Dados , Humanos , Pandemias , SARS-CoV-2RESUMO
Condensins are best known for their role in shaping chromosomes. Other functions such as organizing interphase chromatin and transcriptional control have been reported in yeasts and animals, but little is known about their function in plants. To elucidate the specific composition of condensin complexes and the expression of CAP-D2 (condensin I) and CAP-D3 (condensin II), we performed biochemical analyses in Arabidopsis. The role of CAP-D3 in interphase chromatin organization and function was evaluated using cytogenetic and transcriptome analysis in cap-d3 T-DNA insertion mutants. CAP-D2 and CAP-D3 are highly expressed in mitotically active tissues. In silico and pull-down experiments indicate that both CAP-D proteins interact with the other condensin I and II subunits. In cap-d3 mutants, an association of heterochromatic sequences occurs, but the nuclear size and the general histone and DNA methylation patterns remain unchanged. Also, CAP-D3 influences the expression of genes affecting the response to water, chemicals, and stress. The expression and composition of the condensin complexes in Arabidopsis are similar to those in other higher eukaryotes. We propose a model for the CAP-D3 function during interphase in which CAP-D3 localizes in euchromatin loops to stiffen them and consequently separates centromeric regions and 45S rDNA repeats.
Assuntos
Arabidopsis , Cromatina , Adenosina Trifosfatases/genética , Animais , Arabidopsis/genética , Proteínas de Ligação a DNA , Interfase , Complexos MultiproteicosRESUMO
Summary: Properly and effectively managing reference datasets is an important task for many bioinformatics analyses. Refgenie is a reference asset management system that allows users to easily organize, retrieve and share such datasets. Here, we describe the integration of refgenie into the Galaxy platform. Server administrators are able to configure Galaxy to make use of reference datasets made available on a refgenie instance. In addition, a Galaxy Data Manager tool has been developed to provide a graphical interface to refgenie's remote reference retrieval functionality. A large collection of reference datasets has also been made available using the CVMFS (CernVM File System) repository from GalaxyProject.org, with mirrors across the USA, Canada, Europe and Australia, enabling easy use outside of Galaxy. Availability and implementation: The ability of Galaxy to use refgenie assets was added to the core Galaxy framework in version 22.01, which is available from https://github.com/galaxyproject/galaxy under the Academic Free License version 3.0. The refgenie Data Manager tool can be installed via the Galaxy ToolShed, with source code managed at https://github.com/BlankenbergLab/galaxy-tools-blankenberg/tree/main/data_managers/data_manager_refgenie_pull and released using an MIT license. Access to existing data is also available through CVMFS, with instructions at https://galaxyproject.org/admin/reference-data-repo/. No new data were generated or analyzed in support of this research.
RESUMO
For mass spectrometry-based peptide and protein quantification, label-free quantification (LFQ) based on precursor mass peak (MS1) intensities is considered reliable due to its dynamic range, reproducibility, and accuracy. LFQ enables peptide-level quantitation, which is useful in proteomics (analyzing peptides carrying post-translational modifications) and multi-omics studies such as metaproteomics (analyzing taxon-specific microbial peptides) and proteogenomics (analyzing non-canonical sequences). Bioinformatics workflows accessible via the Galaxy platform have proven useful for analysis of such complex multi-omic studies. However, workflows within the Galaxy platform have lacked well-tested LFQ tools. In this study, we have evaluated moFF and FlashLFQ, two open-source LFQ tools, and implemented them within the Galaxy platform to offer access and use via established workflows. Through rigorous testing and communication with the tool developers, we have optimized the performance of each tool. Software features evaluated include: (a) match-between-runs (MBR); (b) using multiple file-formats as input for improved quantification; (c) use of containers and/or conda packages; (d) parameters needed for analyzing large datasets; and (e) optimization and validation of software performance. This work establishes a process for software implementation, optimization, and validation, and offers access to two robust software tools for LFQ-based analysis within the Galaxy platform.