ABSTRACT
The main goals and challenges for the life science communities in the Open Science framework are to increase reuse and sustainability of data resources, software tools, and workflows, especially in large-scale data-driven research and computational analyses. Here, we present key findings, procedures, effective measures and recommendations for generating and establishing sustainable life science resources based on the collaborative, cross-disciplinary work done within the EOSC-Life (European Open Science Cloud for Life Sciences) consortium. Bringing together 13 European life science research infrastructures, it has laid the foundation for an open, digital space to support biological and medical research. Using lessons learned from 27 selected projects, we describe the organisational, technical, financial and legal/ethical challenges that represent the main barriers to sustainability in the life sciences. We show how EOSC-Life provides a model for sustainable data management according to FAIR (findability, accessibility, interoperability, and reusability) principles, including solutions for sensitive- and industry-related resources, by means of cross-disciplinary training and best practices sharing. Finally, we illustrate how data harmonisation and collaborative work facilitate interoperability of tools, data, solutions and lead to a better understanding of concepts, semantics and functionalities in the life sciences.
Subject(s)
Biological Science Disciplines , Biomedical Research , Software , WorkflowABSTRACT
There is an ongoing explosion of scientific datasets being generated, brought on by recent technological advances in many areas of the natural sciences. As a result, the life sciences have become increasingly computational in nature, and bioinformatics has taken on a central role in research studies. However, basic computational skills, data analysis, and stewardship are still rarely taught in life science educational programs, resulting in a skills gap in many of the researchers tasked with analysing these big datasets. In order to address this skills gap and empower researchers to perform their own data analyses, the Galaxy Training Network (GTN) has previously developed the Galaxy Training Platform (https://training.galaxyproject.org), an open access, community-driven framework for the collection of FAIR (Findable, Accessible, Interoperable, Reusable) training materials for data analysis utilizing the user-friendly Galaxy framework as its primary data analysis platform. Since its inception, this training platform has thrived, with the number of tutorials and contributors growing rapidly, and the range of topics extending beyond life sciences to include topics such as climatology, cheminformatics, and machine learning. While initially aimed at supporting researchers directly, the GTN framework has proven to be an invaluable resource for educators as well. We have focused our efforts in recent years on adding increased support for this growing community of instructors. New features have been added to facilitate the use of the materials in a classroom setting, simplifying the contribution flow for new materials, and have added a set of train-the-trainer lessons. Here, we present the latest developments in the GTN project, aimed at facilitating the use of the Galaxy Training materials by educators, and its usage in different learning environments.
Subject(s)
Computational Biology , Software , Humans , Computational Biology/methods , Data Analysis , Research PersonnelABSTRACT
Bioimaging has now entered the era of big data with faster-than-ever development of complex microscopy technologies leading to increasingly complex datasets. This enormous increase in data size and informational complexity within those datasets has brought with it several difficulties in terms of common and harmonized data handling, analysis, and management practices, which are currently hampering the full potential of image data being realized. Here, we outline a wide range of efforts and solutions currently being developed by the microscopy community to address these challenges on the path towards FAIR bioimaging data. We also highlight how different actors in the microscopy ecosystem are working together, creating synergies that develop new approaches, and how research infrastructures, such as Euro-BioImaging, are fostering these interactions to shape the field.
Subject(s)
Ecosystem , MicroscopyABSTRACT
The Coronavirus Disease 2019 (COVID-19) outbreaks have caused universities all across the globe to close their campuses and forced them to initiate online teaching. This article reviews the pedagogical foundations for developing effective distance education practices, starting from the assumption that promoting autonomous thinking is an essential element to guarantee full citizenship in a democracy and for moral decision-making in situations of rapid change, which has become a pressing need in the context of a pandemic. In addition, the main obstacles related to this new context are identified, and solutions are proposed according to the existing bibliography in learning sciences.
Subject(s)
COVID-19/epidemiology , Computational Biology , Education, Distance/organization & administration , Quarantine , Teaching , COVID-19/virology , Decision Making , Humans , Pandemics , SARS-CoV-2/isolation & purificationABSTRACT
The COVID-19 pandemic is shifting teaching to an online setting all over the world. The Galaxy framework facilitates the online learning process and makes it accessible by providing a library of high-quality community-curated training materials, enabling easy access to data and tools, and facilitates sharing achievements and progress between students and instructors. By combining Galaxy with robust communication channels, effective instruction can be designed inclusively, regardless of the students' environments.
Subject(s)
COVID-19/epidemiology , Computer-Assisted Instruction , Education, Distance/organization & administration , COVID-19/virology , Computational Biology , Humans , Information Dissemination , Pandemics , SARS-CoV-2/isolation & purificationABSTRACT
BACKGROUND: Loss-of-function phenotypes are widely used to infer gene function using the principle that similar phenotypes are indicative of similar functions. However, converting phenotypic to functional annotations requires careful interpretation of phenotypic descriptions and assessment of phenotypic similarity. Understanding how functions and phenotypes are linked will be crucial for the development of methods for the automatic conversion of gene loss-of-function phenotypes to gene functional annotations. RESULTS: We explored the relation between cellular phenotypes from RNAi-based screens in human cells and gene annotations of cellular functions as provided by the Gene Ontology (GO). Comparing different similarity measures, we found that information content-based measures of phenotypic similarity were the best at capturing gene functional similarity. However, phenotypic similarities did not map to the Gene Ontology organization of gene function but to functions defined as groups of GO terms with shared gene annotations. CONCLUSIONS: Our observations have implications for the use and interpretation of phenotypic similarities as a proxy for gene functions both in RNAi screen data analysis and curation and in the prediction of disease genes.
Subject(s)
Computational Biology/methods , Area Under Curve , Cluster Analysis , Humans , Phenotype , RNA Interference , ROC CurveABSTRACT
We previously described a protocol for genome engineering of mammalian cultured cells with clustered regularly interspaced short palindromic repeats and associated protein 9 (CRISPR-Cas9) to generate homozygous knock-ins of fluorescent tags into endogenous genes. Here we are updating this former protocol to reflect major improvements in the workflow regarding efficiency and throughput. In brief, we have improved our method by combining high-efficiency electroporation of optimized CRISPR-Cas9 reagents, screening of single cell-derived clones by automated bright-field and fluorescence imaging, rapidly assessing the number of tagged alleles and potential off-targets using digital polymerase chain reaction (PCR) and automated data analysis. Compared with the original protocol, our current procedure (1) substantially increases the efficiency of tag integration, (2) automates the identification of clones derived from single cells with correct subcellular localization of the tagged protein and (3) provides a quantitative and high throughput assay to measure the number of on- and off-target integrations with digital PCR. The increased efficiency of the new procedure reduces the number of clones that need to be analyzed in-depth by more than tenfold and yields to more than 26% of homozygous clones in polyploid cancer cell lines in a single genome engineering round. Overall, we were able to dramatically reduce the hands-on time from 30 d to 10 d during the overall ~10 week procedure, allowing a single person to process up to five genes in parallel, assuming that validated reagents-for example, PCR primers, digital PCR assays and western blot antibodies-are available.
ABSTRACT
Many bioimage analysis projects produce quantitative descriptors of regions of interest in images. Associating these descriptors with visual characteristics of the objects they describe is a key step in understanding the data at hand. However, as many bioimage data and their analysis workflows are moving to the cloud, addressing interactive data exploration in remote environments has become a pressing issue. To address it, we developed the Image Data Explorer (IDE) as a web application that integrates interactive linked visualization of images and derived data points with exploratory data analysis methods, annotation, classification and feature selection functionalities. The IDE is written in R using the shiny framework. It can be easily deployed on a remote server or on a local computer. The IDE is available at https://git.embl.de/heriche/image-data-explorer and a cloud deployment is accessible at https://shiny-portal.embl.de/shinyapps/app/01_image-data-explorer.
Subject(s)
SoftwareABSTRACT
BACKGROUND: Hands-on training, whether in bioinformatics or other domains, often requires significant technical resources and knowledge to set up and run. Instructors must have access to powerful compute infrastructure that can support resource-intensive jobs running efficiently. Often this is achieved using a private server where there is no contention for the queue. However, this places a significant prerequisite knowledge or labor barrier for instructors, who must spend time coordinating deployment and management of compute resources. Furthermore, with the increase of virtual and hybrid teaching, where learners are located in separate physical locations, it is difficult to track student progress as efficiently as during in-person courses. FINDINGS: Originally developed by Galaxy Europe and the Gallantries project, together with the Galaxy community, we have created Training Infrastructure-as-a-Service (TIaaS), aimed at providing user-friendly training infrastructure to the global training community. TIaaS provides dedicated training resources for Galaxy-based courses and events. Event organizers register their course, after which trainees are transparently placed in a private queue on the compute infrastructure, which ensures jobs complete quickly, even when the main queue is experiencing high wait times. A built-in dashboard allows instructors to monitor student progress. CONCLUSIONS: TIaaS provides a significant improvement for instructors and learners, as well as infrastructure administrators. The instructor dashboard makes remote events not only possible but also easy. Students experience continuity of learning, as all training happens on Galaxy, which they can continue to use after the event. In the past 60 months, 504 training events with over 24,000 learners have used this infrastructure for Galaxy training.