RESUMEN
The main goals and challenges for the life science communities in the Open Science framework are to increase reuse and sustainability of data resources, software tools, and workflows, especially in large-scale data-driven research and computational analyses. Here, we present key findings, procedures, effective measures and recommendations for generating and establishing sustainable life science resources based on the collaborative, cross-disciplinary work done within the EOSC-Life (European Open Science Cloud for Life Sciences) consortium. Bringing together 13 European life science research infrastructures, it has laid the foundation for an open, digital space to support biological and medical research. Using lessons learned from 27 selected projects, we describe the organisational, technical, financial and legal/ethical challenges that represent the main barriers to sustainability in the life sciences. We show how EOSC-Life provides a model for sustainable data management according to FAIR (findability, accessibility, interoperability, and reusability) principles, including solutions for sensitive- and industry-related resources, by means of cross-disciplinary training and best practices sharing. Finally, we illustrate how data harmonisation and collaborative work facilitate interoperability of tools, data, solutions and lead to a better understanding of concepts, semantics and functionalities in the life sciences.
Asunto(s)
Disciplinas de las Ciencias Biológicas , Investigación Biomédica , Programas Informáticos , Flujo de TrabajoRESUMEN
Biomedical data are generated and collected from various sources, including medical imaging, laboratory tests and genome sequencing. Sharing these data for research can help address unmet health needs, contribute to scientific breakthroughs, accelerate the development of more effective treatments and inform public health policy. Due to the potential sensitivity of such data, however, privacy concerns have led to policies that restrict data sharing. In addition, sharing sensitive data requires a secure and robust infrastructure with appropriate storage solutions. Here, we examine and compare the centralized and federated data sharing models through the prism of five large-scale and real-world use cases of strategic significance within the European data sharing landscape: the French Health Data Hub, the BBMRI-ERIC Colorectal Cancer Cohort, the federated European Genome-phenome Archive, the Observational Medical Outcomes Partnership/OHDSI network and the EBRAINS Medical Informatics Platform. Our analysis indicates that centralized models facilitate data linkage, harmonization and interoperability, while federated models facilitate scaling up and legal compliance, as the data typically reside on the data generator's premises, allowing for better control of how data are shared. This comparative study thus offers guidance on the selection of the most appropriate sharing strategy for sensitive datasets and provides key insights for informed decision-making in data sharing efforts.
Asunto(s)
Disciplinas de las Ciencias Biológicas , Difusión de la Información , Humanos , Informática Médica/métodosRESUMEN
Human genomics is undergoing a step change from being a predominantly research-driven activity to one driven through health care as many countries in Europe now have nascent precision medicine programmes. To maximize the value of the genomic data generated, these data will need to be shared between institutions and across countries. In recognition of this challenge, 21 European countries recently signed a declaration to transnationally share data on at least 1 million human genomes by 2022. In this Roadmap, we identify the challenges of data sharing across borders and demonstrate that European research infrastructures are well-positioned to support the rapid implementation of widespread genomic data access.
Asunto(s)
Investigación Biomédica , Genoma Humano , Proyecto Genoma Humano , Europa (Continente) , HumanosRESUMEN
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
RESUMEN
Polyploidy plays an important role in plant evolution, but knowledge of its eco-physiological consequences, such as of the putatively enlarged stomata of polyploid plants, remains limited. Enlarged stomata should disadvantage polyploids at low CO2 concentrations (namely during the Quaternary glacial periods) because larger stomata are viewed as less effective at CO2 uptake. We observed the growth, physiology, and epidermal cell features of 15 diploids and their polyploid relatives cultivated under glacial, present-day, and potential future atmospheric CO2 concentrations (200, 400, and 800 ppm respectively). We demonstrated some well-known polyploidy effects, such as faster growth and larger leaves, seeds, stomata, and other epidermal cells. The stomata of polyploids, however, tended to be more elongated than those of diploids, and contrary to common belief, they had no negative effect on the CO2 uptake capacity of polyploids. Moreover, polyploids grew comparatively better than diploids even at low, glacial CO2 concentrations. Higher polyploids with large genomes also showed increased operational stomatal conductance and consequently, a lower water-use efficiency. Our results point to a possible decrease in growth superiority of polyploids over diploids in a current and future high CO2 climatic scenarios, as well as the possible water and/or nutrient dependency of higher polyploids.
Asunto(s)
Fotosíntesis , Estomas de Plantas , Estomas de Plantas/fisiología , Fotosíntesis/fisiología , Dióxido de Carbono/farmacología , Hojas de la Planta/fisiología , AguaRESUMEN
MOTIVATION: Biobanks are indispensable for large-scale genetic/epidemiological studies, yet it remains difficult for researchers to determine which biobanks contain data matching their research questions. RESULTS: To overcome this, we developed a new matching algorithm that identifies pairs of related data elements between biobanks and research variables with high precision and recall. It integrates lexical comparison, Unified Medical Language System ontology tagging and semantic query expansion. The result is BiobankUniverse, a fast matchmaking service for biobanks and researchers. Biobankers upload their data elements and researchers their desired study variables, BiobankUniverse automatically shortlists matching attributes between them. Users can quickly explore matching potential and search for biobanks/data elements matching their research. They can also curate matches and define personalized data-universes. AVAILABILITY AND IMPLEMENTATION: BiobankUniverse is available at http://biobankuniverse.com or can be downloaded as part of the open source MOLGENIS suite at http://github.com/molgenis/molgenis. CONTACT: m.a.swertz@rug.nl. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Biología Computacional/métodos , Bases de Datos Factuales , Programas Informáticos , AlgoritmosRESUMEN
Climatic changes are altering Earth's hydrological cycle, resulting in altered precipitation amounts, increased interannual variability of precipitation, and more frequent extreme precipitation events. These trends will likely continue into the future, having substantial impacts on net primary productivity (NPP) and associated ecosystem services such as food production and carbon sequestration. Frequently, experimental manipulations of precipitation have linked altered precipitation regimes to changes in NPP. Yet, findings have been diverse and substantial uncertainty still surrounds generalities describing patterns of ecosystem sensitivity to altered precipitation. Additionally, we do not know whether previously observed correlations between NPP and precipitation remain accurate when precipitation changes become extreme. We synthesized results from 83 case studies of experimental precipitation manipulations in grasslands worldwide. We used meta-analytical techniques to search for generalities and asymmetries of aboveground NPP (ANPP) and belowground NPP (BNPP) responses to both the direction and magnitude of precipitation change. Sensitivity (i.e., productivity response standardized by the amount of precipitation change) of BNPP was similar under precipitation additions and reductions, but ANPP was more sensitive to precipitation additions than reductions; this was especially evident in drier ecosystems. Additionally, overall relationships between the magnitude of productivity responses and the magnitude of precipitation change were saturating in form. The saturating form of this relationship was likely driven by ANPP responses to very extreme precipitation increases, although there were limited studies imposing extreme precipitation change, and there was considerable variation among experiments. This highlights the importance of incorporating gradients of manipulations, ranging from extreme drought to extreme precipitation increases into future climate change experiments. Additionally, policy and land management decisions related to global change scenarios should consider how ANPP and BNPP responses may differ, and that ecosystem responses to extreme events might not be predicted from relationships found under moderate environmental changes.
Asunto(s)
Cambio Climático , Ecosistema , Pradera , Poaceae , LluviaRESUMEN
BBMRI-ERIC, the Biobanking and BioMolecular Resources Research Infrastructure-European Research Infrastructure Consortium, is a new form of umbrella organization for biobanking in Europe. For rare and common diseases alike, it aims at providing fair access to quality-controlled human biological samples and associated biomedical and biomolecular data. Such access enables basic mechanisms underlying diseases to be studied, which is indispensable for the development of new biomarkers and drugs. In the context of the European Research Area (ERA), biobanks, which were identified as a particular European strength, contribute to Europe's cohesion policy through capacity-building in the BBMRI-ERIC member countries.
Asunto(s)
Bancos de Muestras Biológicas/organización & administración , Investigación Biomédica/organización & administración , Sistemas de Administración de Bases de Datos/organización & administración , Bases de Datos Factuales , Relaciones Interinstitucionales , Sistema de Registros , Europa (Continente) , Predicción , Difusión de la Información/métodos , Almacenamiento y Recuperación de la Información/métodos , Modelos Organizacionales , Manejo de Especímenes/métodosRESUMEN
This study examined the effect of the interactions of key factors associated with predicted climate change (increased temperature, and drought) and elevated CO2 concentration on C3 and C4 crop representatives, barley and sorghum. The effect of two levels of atmospheric CO2 concentration (400 and 800 ppm), three levels of temperature regime (21/7, 26/12 and 33/19°C) and two regimes of water availability (simulation of drought by gradual reduction of irrigation and well-watered control) in all combinations was investigated in a pot experiment within growth chambers for barley variety Bojos and sorghum variety Ruby. Due to differences in photosynthetic metabolism in C3 barley and C4 sorghum, leading to different responses to elevated CO2 concentration, we hypothesized mitigation of the negative drought impact in barley under elevated CO2 concentration and, conversely, improved performance of sorghum at high temperatures. The results demonstrate the decoupling of photosynthetic CO2 assimilation and production parameters in sorghum. High temperatures and elevated CO2 concentration resulted in a significant increase in sorghum above- and below-ground biomass under sufficient water availability despite the enhanced sensitivity of photosynthesis to high temperatures. However, the negative effect of drought is amplified by the effect of high temperature, similarly for biomass and photosynthetic rates. Sorghum also showed a mitigating effect of elevated CO2 concentration on the negative drought impact, particularly in reducing the decrease of relative water content in leaves. In barley, no significant factor interactions were observed, indicating the absence of mitigating the negative drought effects by elevated CO2 concentration. These complex interactions imply that, unlike barley, sorghum can be predicted to have a much higher variability in response to climate change. However, under conditions combining elevated CO2 concentration, high temperature, and sufficient water availability, the outperforming of C4 crops can be expected. On the contrary, the C3 crops can be expected to perform even better under drought conditions when accompanied by lower temperatures.
RESUMEN
Background: There is much value to be gained by linking clinical studies and (biosample-) collections that have been generated in the context of a clinical study. However, the linking problem is hard because usually no direct references between a clinical study and an associated collection are available. Methods: The BBMRI-ERIC Directory and the ECRIN Metadata Repository (MDR), already include much of the information required to link clinical studies and related sample collections. In this study, we present the work performed to find and implement those links across existing corresponding records in the two systems. The linking process between MDR studies and related collections in the BBMRI-ERIC Directory started with exploring linkage in both directions - searching the BBMRI-ERIC Directory for candidate hits to try to link with MDR records, and searching the ECRIN MDR for candidate hits to try to link with Directory collections. Thereafter, a systematic search through the BBMRI-ERIC Directory was performed. Results: The investigation of linkages in both directions resulted in a limited but promising number of linkages. The results of the systematic search of the Directory identified linkage of 202 studies, spanning 284 collections. Conclusions: The analysis with existing data sources indicated that links between the BBMRI-ERIC and ECRIN collections exist, but also that they would be difficult to continuously identify and maintain without a great deal of manual work which neither organisation could support. The question arises whether, in the future, systems could be put into place to make the exchange of information and the linkage of identifiers almost automatic.
RESUMEN
Introduction: The Minimum Information About BIobank Data Sharing (MIABIS) is a biobank-specific terminology enabling the sharing of biobank-related data for different purposes across a wide range of database implementations. After 4 years in use and with the first version of the individual-level MIABIS component Sample, Sample donor, and Event, it was necessary to revise the terminology, especially to include biobanks that work more in the data domain than with samples. Materials & Methods: Nine use-cases representing different types of biobanks, studies, and networks participated in the development work. They represent types of data, specific sample types, or levels of organization that were not included earlier in MIABIS. To support our revision of the Biobank entity, we conducted a survey of European biobanks to chart the services they provide. An important stakeholder group for biobanks include researchers as the main users of biobanks. To be able to render MIABIS more researcher-friendly, we collected different sample/data requests to analyze the terminology adjustment needs in detail. During the update process, the Core terminology was iteratively reviewed by a large group of experts until a consensus was reached. Results: With this update, MIABIS was adjusted to encompass data-driven biobanks and to include data collections, while also describing the services and capabilities biobanks offer to their users, besides the retrospective samples. The terminology was also extended to accommodate sample and data collections of nonhuman origin. Additionally, a set of organizational attributes was compiled to describe networks. Discussion: The usability of MIABIS Core v3 was increased by extending it to cover more topics of the biobanking domain. Additionally, the focus was on a more general terminology and harmonization of attributes with the individual-level entities Sample, Sample donor, and Event to keep the overall terminology minimal. With this work, the internal semantics of the MIABIS terminology was improved.
Asunto(s)
Bancos de Muestras Biológicas , Difusión de la Información , Terminología como Asunto , Bancos de Muestras Biológicas/normas , Humanos , Bases de Datos FactualesRESUMEN
Improving patient care and advancing scientific discovery requires responsible sharing of research data, healthcare records, biosamples, and biomedical resources that must also respect applicable use conditions. Defining a standard to structure and manage these use conditions is a complex and challenging task. This is exemplified by a near unlimited range of asset types, a high variability of applicable conditions, and differing applications at the individual or collective level. Furthermore, the specifics and granularity required are likely to vary depending on the ultimate contexts of use. All these factors confound alignment of institutional missions, funding objectives, regulatory and technical requirements to facilitate effective sharing. The presented work highlights the complexity and diversity of the problem, reviews the current state of the art, and emphasises the need for a flexible and adaptable approach. We propose Digital Use Conditions (DUC) as a framework that addresses these needs by leveraging existing standards, striking a balance between expressiveness versus ambiguity, and considering the breadth of applicable information with their context of use.
Asunto(s)
Difusión de la Información , HumanosRESUMEN
Open and practical exchange, dissemination, and reuse of specimens and data have become a fundamental requirement for life sciences research. The quality of the data obtained and thus the findings and knowledge derived is thus significantly influenced by the quality of the samples, the experimental methods, and the data analysis. Therefore, a comprehensive and precise documentation of the pre-analytical conditions, the analytical procedures, and the data processing are essential to be able to assess the validity of the research results. With the increasing importance of the exchange, reuse, and sharing of data and samples, procedures are required that enable cross-organizational documentation, traceability, and non-repudiation. At present, this information on the provenance of samples and data is mostly either sparse, incomplete, or incoherent. Since there is no uniform framework, this information is usually only provided within the organization and not interoperably. At the same time, the collection and sharing of biological and environmental specimens increasingly require definition and documentation of benefit sharing and compliance to regulatory requirements rather than consideration of pure scientific needs. In this publication, we present an ongoing standardization effort to provide trustworthy machine-actionable documentation of the data lineage and specimens. We would like to invite experts from the biotechnology and biomedical fields to further contribute to the standard.
RESUMEN
Due to popular successes (e.g., ChatGPT) Artificial Intelligence (AI) is on everyone's lips today. When advances in biotechnology are combined with advances in AI unprecedented new potential solutions become available. This can help with many global problems and contribute to important Sustainability Development Goals. Current examples include Food Security, Health and Well-being, Clean Water, Clean Energy, Responsible Consumption and Production, Climate Action, Life below Water, or protect, restore and promote sustainable use of terrestrial ecosystems, sustainably manage forests, combat desertification, and halt and reverse land degradation and halt biodiversity loss. AI is ubiquitous in the life sciences today. Topics include a wide range from machine learning and Big Data analytics, knowledge discovery and data mining, biomedical ontologies, knowledge-based reasoning, natural language processing, decision support and reasoning under uncertainty, temporal and spatial representation and inference, and methodological aspects of explainable AI (XAI) with applications of biotechnology. In this pre-Editorial paper, we provide an overview of open research issues and challenges for each of the topics addressed in this special issue. Potential authors can directly use this as a guideline for developing their paper.
Asunto(s)
Inteligencia Artificial , Ecosistema , Biotecnología , Minería de Datos , Bases del ConocimientoRESUMEN
Diagnostic histopathology faces increasing demands due to aging populations and expanding healthcare programs. Semi-automated diagnostic systems employing deep learning methods are one approach to alleviate this pressure. The learning models for histopathology are inherently complex and opaque from the user's perspective. Hence different methods have been developed to interpret their behavior. However, relatively limited attention has been devoted to the connection between interpretation methods and the knowledge of experienced pathologists. The main contribution of this paper is a method for comparing morphological patterns used by expert pathologists to detect cancer with the patterns identified as important for inference of learning models. Given the patch-based nature of processing large-scale histopathological imaging, we have been able to show statistically that the VGG16 model could utilize all the structures that are observable by the pathologist, given the patch size and scan resolution. The results show that the neural network approach to recognizing prostatic cancer is similar to that of a pathologist at medium optical resolution. The saliency maps identified several prevailing histomorphological features characterizing carcinoma, e.g., single-layered epithelium, small lumina, and hyperchromatic nuclei with halo. A convincing finding was the recognition of their mimickers in non-neoplastic tissue. The method can also identify differences, i.e., standard patterns not used by the learning models and new patterns not yet used by pathologists. Saliency maps provide added value for automated digital pathology to analyze and fine-tune deep learning systems and improve trust in computer-based decisions.
Asunto(s)
Redes Neurales de la Computación , Neoplasias de la Próstata , Masculino , Humanos , Neoplasias de la Próstata/diagnóstico por imagen , Neoplasias de la Próstata/patología , PatólogosRESUMEN
AI development in biotechnology relies on high-quality data to train and validate algorithms. The FAIR principles (Findable, Accessible, Interoperable, and Reusable) and regulatory frameworks such as the In Vitro Diagnostic Regulation (IVDR) and the Medical Device Regulation (MDR) specify requirements on specimen and data provenance to ensure the quality and traceability of data used in AI development. In this paper, a framework is presented for recording and publishing provenance information to meet these requirements. The framework is based on the use of standardized models and protocols, such as the W3C PROV model and the ISO 23494 series, to capture and record provenance information at various stages of the data generation and analysis process. The framework and use case illustrate the role of provenance information in supporting the development of high-quality AI algorithms in biotechnology. Finally, the principles of the framework are illustrated in a simple computational pathology use case, showing how specimen and data provenance can be used in the development and documentation of an AI algorithm. The use case demonstrates the importance of managing and integrating distributed provenance information and highlights the complex task of considering factors such as semantic interoperability, confidentiality, and the verification of authenticity and integrity.
Asunto(s)
Algoritmos , Biotecnología , Inteligencia ArtificialRESUMEN
It is assumed that the stimulatory effects of elevated CO2 concentration ([CO2]) on photosynthesis and growth may be substantially reduced by co-occurring environmental factors and the length of CO2 treatment. Here, we present the study exploring the interactive effects of three manipulated factors ([CO2], nitrogen supply and water availability) on physiological (gas-exchange and chlorophyll fluorescence), morphological and stoichiometric traits of Norway spruce (Picea abies) saplings after 2 and 3 years of the treatment under natural field conditions. Such multifactorial studies, going beyond two-way interactions, have received only limited attention until now. Our findings imply a significant reduction of [CO2]-enhanced rate of CO2 assimilation under reduced water availability which deepens with the severity of water depletion. Similarly, insufficient nitrogen availability leads to a down-regulation of photosynthesis under elevated [CO2] being particularly associated with reduced carboxylation efficiency of the Rubisco enzyme. Such adjustments in the photosynthesis machinery result in the stimulation of water-use efficiency under elevated [CO2] only when it is combined with a high nitrogen supply and reduced water availability. These findings indicate limited effects of elevated [CO2] on carbon uptake in temperate coniferous forests when combined with naturally low nitrogen availability and intensifying droughts during the summer periods. Such interactions have to be incorporated into the mechanistic models predicting changes in terrestrial carbon sequestration and forest growth in the future.
Asunto(s)
Abies , Picea , Dióxido de Carbono/fisiología , Picea/fisiología , Nitrógeno , Agua , Temperatura , Fotosíntesis , Hojas de la Planta/fisiologíaRESUMEN
Access to large volumes of so-called whole-slide images-high-resolution scans of complete pathological slides-has become a cornerstone of the development of novel artificial intelligence methods in pathology for diagnostic use, education/training of pathologists, and research. Nevertheless, a methodology based on risk analysis for evaluating the privacy risks associated with sharing such imaging data and applying the principle "as open as possible and as closed as necessary" is still lacking. In this article, we develop a model for privacy risk analysis for whole-slide images which focuses primarily on identity disclosure attacks, as these are the most important from a regulatory perspective. We introduce a taxonomy of whole-slide images with respect to privacy risks and mathematical model for risk assessment and design . Based on this risk assessment model and the taxonomy, we conduct a series of experiments to demonstrate the risks using real-world imaging data. Finally, we develop guidelines for risk assessment and recommendations for low-risk sharing of whole-slide image data.
Asunto(s)
Inteligencia Artificial , Privacidad , Procesamiento de Imagen Asistido por Computador/métodos , Diagnóstico por Imagen/métodosRESUMEN
Data quality has recently become a critical topic for the research community. European guidelines recommend that scientific data should be made FAIR: findable, accessible, interoperable and reusable. However, as FAIR guidelines do not specify how the stated principles should be implemented, it might not be straightforward for researchers to know how actually to make their data FAIR. This can prevent life-science researchers from sharing their datasets and pipelines, ultimately hindering the progress of research. To address this difficulty, we developed the BIBBOX, which is a platform that supports researchers publishing their datasets and the associated software in a FAIR manner.