RESUMO
Approaches to rapidly collecting global biodiversity data are increasingly important, but biodiversity blind spots persist. We organized a three-day Datathon event to improve the openness of local biodiversity data and facilitate data reuse by local researchers. The first Datathon, organized among microbial ecologists in Uruguay and Argentina assembled the largest microbiome dataset in the region to date and formed collaborative consortia for microbiome data synthesis.
Assuntos
Biodiversidade , Ecologia , Microbiota , Argentina , UruguaiRESUMO
Generative artificial intelligence (AI) models will have broad impacts on society including the scientific enterprise; ecology and environmental science will be no exception. Here, we discuss the potential opportunities and risks of advanced generative AI for visual material (images and video) for the science of ecology and the environment itself. There are clearly opportunities for positive impacts, related to improved communication, for example; we also see possibilities for ecological research to benefit from generative AI (e.g., image gap filling, biodiversity surveys, and improved citizen science). However, there are also risks, threatening to undermine the credibility of our science, mostly related to actions of bad actors, for example in terms of spreading fake information or committing fraud. Risks need to be mitigated at the level of government regulatory measures, but we also highlight what can be done right now, including discussing issues with the next generation of ecologists and transforming towards radically open science workflows.
Assuntos
Inteligência Artificial , BiodiversidadeRESUMO
Scientific workflows facilitate the automation of data analysis tasks by integrating various software and tools executed in a particular order. To enable transparency and reusability in workflows, it is essential to implement the FAIR principles. Here, we describe our experiences implementing the FAIR principles for metabolomics workflows using the Metabolome Annotation Workflow (MAW) as a case study. MAW is specified using the Common Workflow Language (CWL), allowing for the subsequent execution of the workflow on different workflow engines. MAW is registered using a CWL description on WorkflowHub. During the submission process on WorkflowHub, a CWL description is used for packaging MAW using the Workflow RO-Crate profile, which includes metadata in Bioschemas. Researchers can use this narrative discussion as a guideline to commence using FAIR practices for their bioinformatics or cheminformatics workflows while incorporating necessary amendments specific to their research area.
RESUMO
Macro- and microscopic images of organisms are pivotal in biodiversity research. Despite that bioimages have manifold applications such as assessing the diversity of form and function, FAIR bioimaging data in the context of biodiversity are still very scarce, especially for difficult taxonomic groups such as bryophytes. Here, we present a high-quality reference dataset containing macroscopic and bright-field microscopic images documenting various phenotypic characters of the species belonging to the liverwort family of Scapaniaceae occurring in Europe. To encourage data reuse in biodiversity and adjacent research areas, we annotated the imaging data with machine-actionable metadata using community-accepted semantics. Furthermore, raw imaging data are retained and any contextual image processing like multi-focus image fusion and stitching were documented to foster good scientific practices through source tracking and provenance. The information contained in the raw images are also of particular interest for machine learning and image segmentation used in bioinformatics and computational ecology. We expect that this richly annotated reference dataset will encourage future studies to follow our principles.
Assuntos
Briófitas , Hepatófitas , Biodiversidade , Biologia Computacional/métodos , Processamento de Imagem Assistida por Computador/métodosRESUMO
Scientific data management plays a key role in the reproducibility of scientific results. To reproduce results, not only the results but also the data and steps of scientific experiments must be made findable, accessible, interoperable, and reusable. Tracking, managing, describing, and visualizing provenance helps in the understandability, reproducibility, and reuse of experiments for the scientific community. Current systems lack a link between the data, steps, and results from the computational and non-computational processes of an experiment. Such a link, however, is vital for the reproducibility of results. We present a novel solution for the end-to-end provenance management of scientific experiments. We provide a framework, CAESAR (CollAborative Environment for Scientific Analysis with Reproducibility), which allows scientists to capture, manage, query and visualize the complete path of a scientific experiment consisting of computational and non-computational data and steps in an interoperable way. CAESAR integrates the REPRODUCE-ME provenance model, extended from existing semantic web standards, to represent the whole picture of an experiment describing the path it took from its design to its result. ProvBook, an extension for Jupyter Notebooks, is developed and integrated into CAESAR to support computational reproducibility. We have applied and evaluated our contributions to a set of scientific experiments in microscopy research projects.
RESUMO
BACKGROUND: The advancement of science and technologies play an immense role in the way scientific experiments are being conducted. Understanding how experiments are performed and how results are derived has become significantly more complex with the recent explosive growth of heterogeneous research data and methods. Therefore, it is important that the provenance of results is tracked, described, and managed throughout the research lifecycle starting from the beginning of an experiment to its end to ensure reproducibility of results described in publications. However, there is a lack of interoperable representation of end-to-end provenance of scientific experiments that interlinks data, processing steps, and results from an experiment's computational and non-computational processes. RESULTS: We present the "REPRODUCE-ME" data model and ontology to describe the end-to-end provenance of scientific experiments by extending existing standards in the semantic web. The ontology brings together different aspects of the provenance of scientific studies by interlinking non-computational data and steps with computational data and steps to achieve understandability and reproducibility. We explain the important classes and properties of the ontology and how they are mapped to existing ontologies like PROV-O and P-Plan. The ontology is evaluated by answering competency questions over the knowledge base of scientific experiments consisting of computational and non-computational data and steps. CONCLUSION: We have designed and developed an interoperable way to represent the complete path of a scientific experiment consisting of computational and non-computational steps. We have applied and evaluated our approach to a set of scientific experiments in different subject domains like computational science, biological imaging, and microscopy.
Assuntos
Bases de Conhecimento , Semântica , Reprodutibilidade dos Testes , Web SemânticaRESUMO
Background: Biodiversity is the assortment of life on earth covering evolutionary, ecological, biological, and social forms. To preserve life in all its variety and richness, it is imperative to monitor the current state of biodiversity and its change over time and to understand the forces driving it. This need has resulted in numerous works being published in this field. With this, a large amount of textual data (publications) and metadata (e.g. dataset description) has been generated. To support the management and analysis of these data, two techniques from computer science are of interest, namely Named Entity Recognition (NER) and Relation Extraction (RE). While the former enables better content discovery and understanding, the latter fosters the analysis by detecting connections between entities and, thus, allows us to draw conclusions and answer relevant domain-specific questions. To automatically predict entities and their relations, machine/deep learning techniques could be used. The training and evaluation of those techniques require labelled corpora. New information: In this paper, we present two gold-standard corpora for Named Entity Recognition (NER) and Relation Extraction (RE) generated from biodiversity datasets metadata and abstracts that can be used as evaluation benchmarks for the development of new computer-supported tools that require machine learning or deep learning techniques. These corpora are manually labelled and verified by biodiversity experts. In addition, we explain the detailed steps of constructing these datasets. Moreover, we demonstrate the underlying ontology for the classes and relations used to annotate such corpora.
RESUMO
BACKGROUND: Obtaining fit-to-use data associated with diverse aspects of biodiversity, ecology and environment is challenging since often it is fragmented, sub-optimally managed and available in heterogeneous formats. Recently, with the universal acceptance of the FAIR data principles, the requirements and standards of data publications have changed substantially. Researchers are encouraged to manage the data as per the FAIR data principles and ensure that the raw data, metadata, processed data, software, codes and associated material are securely stored and the data be made available with the completion of the research. NEW INFORMATION: We have developed BEXIS2 as an open-source community-driven web-based research data management system to support research data management needs of mid to large-scale research projects with multiple sub-projects and up to several hundred researchers. BEXIS2 is a modular and extensible system providing a range of functions to realise the complete data lifecycle from data structure design to data collection, data discovery, dissemination, integration, quality assurance and research planning. It is an extensible and customisable system that allows for the development of new functions and customisation of its various components from database schemas to the user interface layout, elements and look and feel.During the development of BEXIS2, we aimed to incorporate key aspects of what is encoded in FAIR data principles. To investigate the extent to which BEXIS2 conforms to these principles, we conducted the self-assessment using the FAIR indicators, definitions and criteria provided in the FAIR Data Maturity Model. Even though the FAIR data maturity model is developed initially to judge the conformance of datasets, the self-assessment results indicated that BEXIS2 remarkably conforms and supports FAIR indicators. BEXIS2 strongly conforms to the indicators Findability and Accessibility. The indicator Interoperability is moderately supported as of now; however, for many of the lesssupported facets, we have concrete plans for improvement. Reusability (as defined by the FAIR data principles) is partially achieved.This paper also illustrates community deployment examples of the BEXIS2 instances as success stories to exemplify its capacity to meet the biodiversity and ecological data management needs of differently sized projects and serve as an organisational research data management system.
RESUMO
Earthworms are an important soil taxon as ecosystem engineers, providing a variety of crucial ecosystem functions and services. Little is known about their diversity and distribution at large spatial scales, despite the availability of considerable amounts of local-scale data. Earthworm diversity data, obtained from the primary literature or provided directly by authors, were collated with information on site locations, including coordinates, habitat cover, and soil properties. Datasets were required, at a minimum, to include abundance or biomass of earthworms at a site. Where possible, site-level species lists were included, as well as the abundance and biomass of individual species and ecological groups. This global dataset contains 10,840 sites, with 184 species, from 60 countries and all continents except Antarctica. The data were obtained from 182 published articles, published between 1973 and 2017, and 17 unpublished datasets. Amalgamating data into a single global database will assist researchers in investigating and answering a wide variety of pressing questions, for example, jointly assessing aboveground and belowground biodiversity distributions and drivers of biodiversity change.
Assuntos
Biodiversidade , Oligoquetos/classificação , Animais , BiomassaRESUMO
Scientific experiments and research practices vary across disciplines. The research practices followed by scientists in each domain play an essential role in the understandability and reproducibility of results. The "Reproducibility Crisis", where researchers find difficulty in reproducing published results, is currently faced by several disciplines. To understand the underlying problem in the context of the reproducibility crisis, it is important to first know the different research practices followed in their domain and the factors that hinder reproducibility. We performed an exploratory study by conducting a survey addressed to researchers representing a range of disciplines to understand scientific experiments and research practices for reproducibility. The survey findings identify a reproducibility crisis and a strong need for sharing data, code, methods, steps, and negative and positive results. Insufficient metadata, lack of publicly available data, and incomplete information in study methods are considered to be the main reasons for poor reproducibility. The survey results also address a wide number of research questions on the reproducibility of scientific results. Based on the results of our explorative study and supported by the existing published literature, we offer general recommendations that could help the scientific community to understand, reproduce, and reuse experimental data and results in the research data lifecycle.
RESUMO
The increasing amount of publicly available research data provides the opportunity to link and integrate data in order to create and prove novel hypotheses, to repeat experiments or to compare recent data to data collected at a different time or place. However, recent studies have shown that retrieving relevant data for data reuse is a time-consuming task in daily research practice. In this study, we explore what hampers dataset retrieval in biodiversity research, a field that produces a large amount of heterogeneous data. In particular, we focus on scholarly search interests and metadata, the primary source of data in a dataset retrieval system. We show that existing metadata currently poorly reflect information needs and therefore are the biggest obstacle in retrieving relevant data. Our findings indicate that for data seekers in the biodiversity domain environments, materials and chemicals, species, biological and chemical processes, locations, data parameters and data types are important information categories. These interests are well covered in metadata elements of domain-specific standards. However, instead of utilizing these standards, large data repositories tend to use metadata standards with domain-independent metadata fields that cover search interests only to some extent. A second problem are arbitrary keywords utilized in descriptive fields such as title, description or subject. Keywords support scholars in a full text search only if the provided terms syntactically match or their semantic relationship to terms used in a user query is known.
Assuntos
Biodiversidade , Mineração de Dados , Metadados , PesquisaRESUMO
Soil is one of the most biodiverse terrestrial habitats. Yet, we lack an integrative conceptual framework for understanding the patterns and mechanisms driving soil biodiversity. One of the underlying reasons for our poor understanding of soil biodiversity patterns relates to whether key biodiversity theories (historically developed for aboveground and aquatic organisms) are applicable to patterns of soil biodiversity. Here, we present a systematic literature review to investigate whether and how key biodiversity theories (species-energy relationship, theory of island biogeography, metacommunity theory, niche theory and neutral theory) can explain observed patterns of soil biodiversity. We then discuss two spatial compartments nested within soil at which biodiversity theories can be applied to acknowledge the scale-dependent nature of soil biodiversity.
Assuntos
Biodiversidade , Solo , Animais , Microbiologia do SoloRESUMO
Soil organisms, including earthworms, are a key component of terrestrial ecosystems. However, little is known about their diversity, their distribution, and the threats affecting them. We compiled a global dataset of sampled earthworm communities from 6928 sites in 57 countries as a basis for predicting patterns in earthworm diversity, abundance, and biomass. We found that local species richness and abundance typically peaked at higher latitudes, displaying patterns opposite to those observed in aboveground organisms. However, high species dissimilarity across tropical locations may cause diversity across the entirety of the tropics to be higher than elsewhere. Climate variables were found to be more important in shaping earthworm communities than soil properties or habitat cover. These findings suggest that climate change may have serious implications for earthworm communities and for the functions they provide.
Assuntos
Biodiversidade , Oligoquetos , Distribuição Animal , Animais , Biomassa , Clima , Planeta Terra , Ecossistema , Modelos Lineares , Modelos Biológicos , SoloRESUMO
The study of biodiversity has grown exponentially in the last thirty years in response to demands for greater understanding of the function and importance of Earth's biodiversity and finding solutions to conserve it. Here, we test the hypothesis that biodiversity science has become more interdisciplinary over time. To do so, we analyze 97,945 peer-reviewed articles over a twenty-two-year time period (1990-2012) with a continuous time dynamic model, which classifies articles into concepts (i.e., topics and ideas) based on word co-occurrences. Using the model output, we then quantify different aspects of interdisciplinarity: concept diversity, that is, the diversity of topics and ideas across subdisciplines in biodiversity science, subdiscipline diversity, that is, the diversity of subdisciplines across concepts, and network structure, which captures interactions between concepts and subdisciplines. We found that, on average, concept and subdiscipline diversity in biodiversity science were either stable or declining, patterns which were driven by the persistence of rare concepts and subdisciplines and a decline in the diversity of common concepts and subdisciplines, respectively. Moreover, our results provide evidence that conceptual homogenization, that is, decreases in temporal ß concept diversity, underlies the observed trends in interdisciplinarity. Together, our results reveal that biodiversity science is undergoing a dynamic phase as a scientific discipline that is consolidating around a core set of concepts. Our results suggest that progress toward addressing the biodiversity crisis via greater interdisciplinarity during the study period may have been slowed by extrinsic factors, such as the failure to invest in research spanning across concepts and disciplines. However, recent initiatives such as the Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services (IPBES) may attract broader support for biodiversity-related issues and hence interdisciplinary approaches to address scientific, political, and societal challenges in the coming years.
RESUMO
Concern about the functional consequences of unprecedented loss in biodiversity has prompted biodiversity-ecosystem functioning (BEF) research to become one of the most active fields of ecological research in the past 25 years. Hundreds of experiments have manipulated biodiversity as an independent variable and found compelling support that the functioning of ecosystems increases with the diversity of their ecological communities. This research has also identified some of the mechanisms underlying BEF relationships, some context-dependencies of the strength of relationships, as well as implications for various ecosystem services that mankind depends upon. In this paper, we argue that a multitrophic perspective of biotic interactions in random and non-random biodiversity change scenarios is key to advance future BEF research and to address some of its most important remaining challenges. We discuss that the study and the quantification of multitrophic interactions in space and time facilitates scaling up from small-scale biodiversity manipulations and ecosystem function assessments to management-relevant spatial scales across ecosystem boundaries. We specifically consider multitrophic conceptual frameworks to understand and predict the context-dependency of BEF relationships. Moreover, we highlight the importance of the eco-evolutionary underpinnings of multitrophic BEF relationships. We outline that FAIR data (meeting the standards of findability, accessibility, interoperability, and reusability) and reproducible processing will be key to advance this field of research by making it more integrative. Finally, we show how these BEF insights may be implemented for ecosystem management, society, and policy. Given that human well-being critically depends on the multiple services provided by diverse, multitrophic communities, integrating the approaches of evolutionary ecology, community ecology, and ecosystem ecology in future BEF research will be key to refine conservation targets and develop sustainable management strategies.
RESUMO
Ecology has become a data intensive science over the last decades which often relies on the reuse of data in cross-experimental analyses. However, finding data which qualifies for the reuse in a specific context can be challenging. It requires good quality metadata and annotations as well as efficient search strategies. To date, full text search (often on the metadata only) is the most widely used search strategy although it is known to be inaccurate. Faceted navigation is providing a filter mechanism which is based on fine granular metadata, categorizing search objects along numeric and categorical parameters relevant for their discovery. Selecting from these parameters during a full text search creates a system of filters which allows to refine and improve the results towards more relevance. We developed a framework for the efficient annotation and faceted navigation in ecology. It consists of an XML schema for storing the annotation of search objects and is accompanied by a vocabulary focused on ecology to support the annotation process. The framework consolidates ideas which originate from widely accepted metadata standards, textbooks, scientific literature, and vocabularies as well as from expert knowledge contributed by researchers from ecology and adjacent disciplines.
Assuntos
Curadoria de Dados/métodos , Ecologia , Internet , Vocabulário ControladoRESUMO
We are witnessing a growing gap separating primary research data from derived data products presented as knowledge in publications. Although journals today more often require the underlying data products used to derive the results as a prerequisite for a publication, the important link to the primary data is lost. However, documenting the postprocessing steps of data linking, the primary data with derived data products has the potential to increase the accuracy and the reproducibility of scientific findings significantly. Here, we introduce the rBEFdata R package as companion to the collaborative data management platform BEFdata. The R package provides programmatic access to features of the platform. It allows to search for data and integrates the search with external thesauri to improve the data discovery. It allows to download and import data and metadata into R for analysis. A batched download is available as well which works along a paper proposal mechanism implemented by BEFdata. This feature of BEFdata allows to group primary data and metadata and streamlines discussions and collaborations revolving around a certain research idea. The upload functionality of the R package in combination with the paper proposal mechanism of the portal allows to attach derived data products and scripts directly from R, thus addressing major aspects of documenting data postprocessing. We present the core features of the rBEFdata R package along an ecological analysis example and further discuss the potential of postprocessing documentation for data, linking primary data with derived data products and knowledge.
RESUMO
Data management in the life sciences has evolved from simple storage of data to complex information systems providing additional functionalities like analysis and visualization capabilities, demanding the integration of statistical tools. In many cases the used statistical tools are hard-coded within the system. That leads to an expensive integration, substitution, or extension of tools because all changes have to be done in program code. Other systems are using generic solutions for tool integration but adapting them to another system is mostly rather extensive work. This paper shows a way to provide statistical functionality over a statistics web service, which can be easily integrated in any information system and set up using XML configuration files. The statistical functionality is extendable by simply adding the description of a new application to a configuration file. The service architecture as well as the data exchange process between client and service and the adding of analysis applications to the underlying service provider are described. Furthermore a practical example demonstrates the functionality of the service.