Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 595
Filtrar
Más filtros

Tipo del documento
Intervalo de año de publicación
1.
Physiol Rev ; 104(3): 1387-1408, 2024 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-38451234

RESUMEN

Effective data management is crucial for scientific integrity and reproducibility, a cornerstone of scientific progress. Well-organized and well-documented data enable validation and building on results. Data management encompasses activities including organization, documentation, storage, sharing, and preservation. Robust data management establishes credibility, fostering trust within the scientific community and benefiting researchers' careers. In experimental biomedicine, comprehensive data management is vital due to the typically intricate protocols, extensive metadata, and large datasets. Low-throughput experiments, in particular, require careful management to address variations and errors in protocols and raw data quality. Transparent and accountable research practices rely on accurate documentation of procedures, data collection, and analysis methods. Proper data management ensures long-term preservation and accessibility of valuable datasets. Well-managed data can be revisited, contributing to cumulative knowledge and potential new discoveries. Publicly funded research has an added responsibility for transparency, resource allocation, and avoiding redundancy. Meeting funding agency expectations increasingly requires rigorous methodologies, adherence to standards, comprehensive documentation, and widespread sharing of data, code, and other auxiliary resources. This review provides critical insights into raw and processed data, metadata, high-throughput versus low-throughput datasets, a common language for documentation, experimental and reporting guidelines, efficient data management systems, sharing practices, and relevant repositories. We systematically present available resources and optimal practices for wide use by experimental biomedical researchers.


Asunto(s)
Investigación Biomédica , Manejo de Datos , Difusión de la Información , Investigación Biomédica/normas , Investigación Biomédica/métodos , Difusión de la Información/métodos , Humanos , Animales , Manejo de Datos/métodos
2.
BMC Bioinformatics ; 25(1): 184, 2024 May 09.
Artículo en Inglés | MEDLINE | ID: mdl-38724907

RESUMEN

BACKGROUND: Major advances in sequencing technologies and the sharing of data and metadata in science have resulted in a wealth of publicly available datasets. However, working with and especially curating public omics datasets remains challenging despite these efforts. While a growing number of initiatives aim to re-use previous results, these present limitations that often lead to the need for further in-house curation and processing. RESULTS: Here, we present the Omics Dataset Curation Toolkit (OMD Curation Toolkit), a python3 package designed to accompany and guide the researcher during the curation process of metadata and fastq files of public omics datasets. This workflow provides a standardized framework with multiple capabilities (collection, control check, treatment and integration) to facilitate the arduous task of curating public sequencing data projects. While centered on the European Nucleotide Archive (ENA), the majority of the provided tools are generic and can be used to curate datasets from different sources. CONCLUSIONS: Thus, it offers valuable tools for the in-house curation previously needed to re-use public omics data. Due to its workflow structure and capabilities, it can be easily used and benefit investigators in developing novel omics meta-analyses based on sequencing data.


Asunto(s)
Curaduría de Datos , Programas Informáticos , Flujo de Trabajo , Curaduría de Datos/métodos , Metadatos , Bases de Datos Genéticas , Genómica/métodos , Biología Computacional/métodos
3.
J Synchrotron Radiat ; 31(Pt 2): 312-321, 2024 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-38300131

RESUMEN

In recent years, China's advanced light sources have entered a period of rapid construction and development. As modern X-ray detectors and data acquisition technologies advance, these facilities are expected to generate massive volumes of data annually, presenting significant challenges in data management and utilization. These challenges encompass data storage, metadata handling, data transfer and user data access. In response, the Data Organization Management Access Software (DOMAS) has been designed as a framework to address these issues. DOMAS encapsulates four fundamental modules of data management software, including metadata catalogue, metadata acquisition, data transfer and data service. For light source facilities, building a data management system only requires parameter configuration and minimal code development within DOMAS. This paper firstly discusses the development of advanced light sources in China and the associated demands and challenges in data management, prompting a reconsideration of data management software framework design. It then outlines the architecture of the framework, detailing its components and functions. Lastly, it highlights the application progress and effectiveness of DOMAS when deployed for the High Energy Photon Source (HEPS) and Beijing Synchrotron Radiation Facility (BSRF).

4.
Appl Environ Microbiol ; 90(2): e0171923, 2024 Feb 21.
Artículo en Inglés | MEDLINE | ID: mdl-38193672

RESUMEN

Application of organic fertilizers is an important strategy for sustainable agriculture. The biological source of organic fertilizers determines their specific functional characteristics, but few studies have systematically examined these functions or assessed their health risk to soil ecology. To fill this gap, we analyzed 16S rRNA gene amplicon sequencing data from 637 soil samples amended with plant- and animal-derived organic fertilizers (hereafter plant fertilizers and animal fertilizers). Results showed that animal fertilizers increased the diversity of soil microbiome, while plant fertilizers maintained the stability of soil microbial community. Microcosm experiments verified that plant fertilizers were beneficial to plant root development and increased carbon cycle pathways, while animal fertilizers enriched nitrogen cycle pathways. Compared with animal fertilizers, plant fertilizers harbored a lower abundance of risk factors such as antibiotic resistance genes and viruses. Consequently, plant fertilizers might be more suitable for long-term application in agriculture. This work provides a guide for organic fertilizer selection from the perspective of soil microecology and promotes sustainable development of organic agriculture.IMPORTANCEThis study provides valuable guidance for use of organic fertilizers in agricultural production from the perspective of the microbiome and ecological risk.


Asunto(s)
Microbiota , Rizosfera , Animales , Fertilizantes , ARN Ribosómico 16S/genética , Microbiota/genética , Suelo , Plantas/genética , Microbiología del Suelo , Raíces de Plantas
5.
J Microsc ; 295(2): 93-101, 2024 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-38532662

RESUMEN

As microscopy diversifies and becomes ever more complex, the problem of quantification of microscopy images has emerged as a major roadblock for many researchers. All researchers must face certain challenges in turning microscopy images into answers, independent of their scientific question and the images they have generated. Challenges may arise at many stages throughout the analysis process, including handling of the image files, image pre-processing, object finding, or measurement, and statistical analysis. While the exact solution required for each obstacle will be problem-specific, by keeping analysis in mind, optimizing data quality, understanding tools and tradeoffs, breaking workflows and data sets into chunks, talking to experts, and thoroughly documenting what has been done, analysts at any experience level can learn to overcome these challenges and create better and easier image analyses.

6.
J Biomed Inform ; 157: 104669, 2024 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-38880237

RESUMEN

BACKGROUND: Studies confirm that significant biases exist in online recommendation platforms, exacerbating pre-existing disparities and leading to less-than-optimal outcomes for underrepresented demographics. We study issues of bias in inclusion and representativeness in the context of healthcare information disseminated via videos on the YouTube social media platform, a widely used online channel for multi-media rich information. With one in three US adults using the Internet to learn about a health concern, it is critical to assess inclusivity and representativeness regarding how health information is disseminated by digital platforms such as YouTube. METHODS: Leveraging methods from fair machine learning (ML), natural language processing and voice and facial recognition methods, we examine inclusivity and representativeness of video content presenters using a large corpus of videos and their metadata on a chronic condition (diabetes) extracted from the YouTube platform. Regression models are used to determine whether presenter demographics impact video popularity, measured by the video's average daily view count. A video that generates a higher view count is considered to be more popular. RESULTS: The voice and facial recognition methods predicted the gender and race of the presenter with reasonable success. Gender is predicted through voice recognition (accuracy = 78%, AUC = 76%), while the gender and race predictions use facial recognition (accuracy = 93%, AUC = 92% and accuracy = 82%, AUC = 80%, respectively). The gender of the presenter is more significant for video views only when the face of the presenter is not visible while videos with male presenters with no face visibility have a positive relationship with view counts. Furthermore, videos with white and male presenters have a positive influence on view counts while videos with female and non - white group have high view counts. CONCLUSION: Presenters' demographics do have an influence on average daily view count of videos viewed on social media platforms as shown by advanced voice and facial recognition algorithms used for assessing inclusion and representativeness of the video content. Future research can explore short videos and those at the channel level because popularity of the channel name and the number of videos associated with that channel do have an influence on view counts.

7.
Pharmacoepidemiol Drug Saf ; 33(8): e5871, 2024 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-39145406

RESUMEN

PURPOSE: Metadata for data dIscoverability aNd study rEplicability in obseRVAtional studies (MINERVA), a European Medicines Agency-funded project (EUPAS39322), defined a set of metadata to describe real-world data sources (RWDSs) and piloted metadata collection in a prototype catalogue to assist investigators from data source discoverability through study conduct. METHODS: A list of metadata was created from a review of existing metadata catalogues and recommendations, structured interviews, a stakeholder survey, and a technical workshop. The prototype was designed to comply with the FAIR principles (findable, accessible, interoperable, reusable), using MOLGENIS software. Metadata collection was piloted by 15 data access partners (DAPs) from across Europe. RESULTS: A total of 442 metadata variables were defined in six domains: institutions (organizations connected to a data source); data banks (data collections sustained by an organization); data sources (collections of linkable data banks covering a common underlying population); studies; networks (of institutions); and common data models (CDMs). A total of 26 institutions were recorded in the prototype. Each DAP populated the metadata of one data source and its selected data banks. The number of data banks varied by data source; the most common data banks were hospital administrative records and pharmacy dispensation records (10 data sources each). Quantitative metadata were successfully extracted from three data sources conforming to different CDMs and entered into the prototype. CONCLUSIONS: A metadata list was finalized, a prototype was successfully populated, and a good practice guide was developed. Setting up and maintaining a metadata catalogue on RWDSs will require substantial effort to support discoverability of data sources and reproducibility of studies in Europe.


Asunto(s)
Metadatos , Estudios Observacionales como Asunto , Europa (Continente) , Humanos , Proyectos Piloto , Reproducibilidad de los Resultados , Estudios Observacionales como Asunto/métodos , Recolección de Datos/métodos , Recolección de Datos/normas , Bases de Datos Factuales/estadística & datos numéricos , Programas Informáticos , Farmacoepidemiología/métodos
8.
Proc Natl Acad Sci U S A ; 118(34)2021 08 24.
Artículo en Inglés | MEDLINE | ID: mdl-34404731

RESUMEN

Genomic data are being produced and archived at a prodigious rate, and current studies could become historical baselines for future global genetic diversity analyses and monitoring programs. However, when we evaluated the potential utility of genomic data from wild and domesticated eukaryote species in the world's largest genomic data repository, we found that most archived genomic datasets (86%) lacked the spatiotemporal metadata necessary for genetic biodiversity surveillance. Labor-intensive scouring of a subset of published papers yielded geospatial coordinates and collection years for only 33% (39% if place names were considered) of these genomic datasets. Streamlined data input processes, updated metadata deposition policies, and enhanced scientific community awareness are urgently needed to preserve these irreplaceable records of today's genetic biodiversity and to plug the growing metadata gap.


Asunto(s)
Biodiversidad , Exactitud de los Datos , Eucariontes/genética , Variación Genética , Genoma , Genómica/métodos , Dinámica Poblacional
9.
BMC Med Inform Decis Mak ; 24(1): 136, 2024 May 27.
Artículo en Inglés | MEDLINE | ID: mdl-38802886

RESUMEN

BACKGROUND: The selection of data elements is a decisive task within the development of a health registry. Having the right metadata is crucial for answering the particular research questions. Furthermore, the set of data elements determines the registries' readiness of interoperability and data reusability to a major extent. Six health registries shared and published their metadata within a German funding initiative. As one step in the direction of a common set of data elements, a selection of those metadata was evaluated with regard to their appropriateness for a broader usage. METHODS: Each registry was asked to contribute a 10%-selection of their data elements to an evaluation sample. The survey was set up with the online survey tool "LimeSurvey Cloud". The registries and an accompanying project participated in the survey with one vote for each project. The data elements were offered in content groups along with the question of whether the data element is appropriate for health registries on a broader scale. The question could be answered using a Likert scale with five options. Furthermore, "no answer" was allowed. The level of agreement was assessed using weighted Cohen's kappa and Kendall's coefficient of concordance. RESULTS: The evaluation sample consisted of 269 data elements. With a grade of "perhaps recommendable" or higher in the mean, 169 data elements were selected. These data elements belong preferably to groups' demography, education/occupation, medication, and nutrition. Half of the registries lost significance compared with their percentage of data elements in the evaluation sample, one remained stable. The level of concordance was adequate. CONCLUSIONS: The survey revealed a set of 169 data elements recommended for health registries. When developing a registry, this set could be valuable help in selecting the metadata appropriate to answer the registry's research questions. However, due to the high specificity of research questions, data elements beyond this set will be needed to cover the whole range of interests of a register. A broader discussion and subsequent surveys are needed to establish a common set of data elements on an international scale.


Asunto(s)
Sistema de Registros , Sistema de Registros/normas , Alemania , Humanos , Encuestas y Cuestionarios , Metadatos
10.
Sensors (Basel) ; 24(4)2024 Feb 14.
Artículo en Inglés | MEDLINE | ID: mdl-38400376

RESUMEN

In this paper, we address the challenge of detecting small moving targets in dynamic environments characterized by the concurrent movement of both platform and sensor. In such cases, simple image-based frame registration and optical flow analysis cannot be used to detect moving targets. To tackle this, it is necessary to use sensor and platform meta-data in addition to image analysis for temporal and spatial anomaly detection. To this end, we investigate techniques that utilize inertial data to enhance frame-to-frame registration, consistently yielding improved detection outcomes when compared against purely feature-based techniques. For cases where image registration is not possible even with metadata, we propose single-frame spatial anomaly detection and then estimate the range to the target using the platform velocity. The behavior of the estimated range over time helps us to discern targets from clutter. Finally, we show that a KNN classifier can be used to further reduce the false alarm rate without a significant reduction in detection performance. The proposed strategies offer a robust solution for the detection of moving targets in dynamically challenging settings.

11.
BMC Bioinformatics ; 24(1): 159, 2023 Apr 20.
Artículo en Inglés | MEDLINE | ID: mdl-37081398

RESUMEN

BACKGROUND: Biomedical researchers are strongly encouraged to make their research outputs more Findable, Accessible, Interoperable, and Reusable (FAIR). While many biomedical research outputs are more readily accessible through open data efforts, finding relevant outputs remains a significant challenge. Schema.org is a metadata vocabulary standardization project that enables web content creators to make their content more FAIR. Leveraging Schema.org could benefit biomedical research resource providers, but it can be challenging to apply Schema.org standards to biomedical research outputs. We created an online browser-based tool that empowers researchers and repository developers to utilize Schema.org or other biomedical schema projects. RESULTS: Our browser-based tool includes features which can help address many of the barriers towards Schema.org-compliance such as: The ability to easily browse for relevant Schema.org classes, the ability to extend and customize a class to be more suitable for biomedical research outputs, the ability to create data validation to ensure adherence of a research output to a customized class, and the ability to register a custom class to our schema registry enabling others to search and re-use it. We demonstrate the use of our tool with the creation of the Outbreak.info schema-a large multi-class schema for harmonizing various COVID-19 related resources. CONCLUSIONS: We have created a browser-based tool to empower biomedical research resource providers to leverage Schema.org classes to make their research outputs more FAIR.


Asunto(s)
Investigación Biomédica , COVID-19 , Humanos , Metadatos
12.
BMC Bioinformatics ; 24(1): 299, 2023 Jul 24.
Artículo en Inglés | MEDLINE | ID: mdl-37482620

RESUMEN

BACKGROUND: An updated version of the mwtab Python package for programmatic access to the Metabolomics Workbench (MetabolomicsWB) data repository was released at the beginning of 2021. Along with updating the package to match the changes to MetabolomicsWB's 'mwTab' file format specification and enhancing the package's functionality, the included validation facilities were used to detect and catalog file inconsistencies and errors across all publicly available datasets in MetabolomicsWB. RESULTS: The MetabolomicsWB File Status website was developed to provide continuous validation of MetabolomicsWB data files and a useful interface to all found inconsistencies and errors. This list of detectable issues/errors include format parsing errors, format compliance issues, access problems via MetabolomicsWB's REST interface, and other small inconsistencies that can hinder reusability. The website uses the mwtab Python package to pull down and validate each available analysis file and then generates an html report. The website is updated on a weekly basis. Moreover, the Python website design utilizes GitHub and GitHub.io, providing an easy to replicate template for implementing other metadata, virtual, and meta- repositories. CONCLUSIONS: The MetabolomicsWB File Status website provides a metadata repository of validation metadata to promote the FAIR use of existing metabolomics datasets from the MetabolomicsWB data repository.


Asunto(s)
Metadatos , Programas Informáticos , Metabolómica , Almacenamiento y Recuperación de la Información
13.
Biol Chem ; 404(5): 433-439, 2023 04 25.
Artículo en Inglés | MEDLINE | ID: mdl-36853922

RESUMEN

While the FAIR (Findable, Accessible, Interoperable, and Re-usable) principles are well accepted in the scientific community, there are still many challenges in implementing them in the day-to-day scientific process. Data management of microscopy images poses special challenges due to the volume, variety, and many proprietary formats. In particular, appropriate metadata collection, a basic requirement for FAIR data, is a real challenge for scientists due to the technical and content-related aspects. Researchers benefit here from interdisciplinary research network with centralized data management. The typically multimodal structure requires generalized data management and the corresponding acquisition of metadata. Here we report on the establishment of an appropriate infrastructure for the research network by a Core Facility and the development and integration of a software tool MDEmic that allows easy and convenient processing of metadata of microscopy images while providing high flexibility in terms of customization of metadata sets. Since it is also in the interest of the core facility to apply standards regarding the scope and serialization formats to realize successful and sustainable data management for bioimaging, we report on our efforts within the community to define standards in metadata, interfaces, and to reduce the barriers of daily data management.


Asunto(s)
Manejo de Datos , Programas Informáticos , Metadatos
14.
Brief Bioinform ; 22(1): 30-44, 2021 01 18.
Artículo en Inglés | MEDLINE | ID: mdl-32496509

RESUMEN

Thousands of new experimental datasets are becoming available every day; in many cases, they are produced within the scope of large cooperative efforts, involving a variety of laboratories spread all over the world, and typically open for public use. Although the potential collective amount of available information is huge, the effective combination of such public sources is hindered by data heterogeneity, as the datasets exhibit a wide variety of notations and formats, concerning both experimental values and metadata. Thus, data integration is becoming a fundamental activity, to be performed prior to data analysis and biological knowledge discovery, consisting of subsequent steps of data extraction, normalization, matching and enrichment; once applied to heterogeneous data sources, it builds multiple perspectives over the genome, leading to the identification of meaningful relationships that could not be perceived by using incompatible data formats. In this paper, we first describe a technological pipeline from data production to data integration; we then propose a taxonomy of genomic data players (based on the distinction between contributors, repository hosts, consortia, integrators and consumers) and apply the taxonomy to describe about 30 important players in genomic data management. We specifically focus on the integrator players and analyse the issues in solving the genomic data integration challenges, as well as evaluate the computational environments that they provide to follow up data integration by means of visualization and analysis tools.


Asunto(s)
Manejo de Datos/métodos , Genoma Humano , Genómica/métodos , Humanos , Metadatos
15.
Brief Bioinform ; 22(2): 664-675, 2021 03 22.
Artículo en Inglés | MEDLINE | ID: mdl-33348368

RESUMEN

With the outbreak of the COVID-19 disease, the research community is producing unprecedented efforts dedicated to better understand and mitigate the effects of the pandemic. In this context, we review the data integration efforts required for accessing and searching genome sequences and metadata of SARS-CoV2, the virus responsible for the COVID-19 disease, which have been deposited into the most important repositories of viral sequences. Organizations that were already present in the virus domain are now dedicating special interest to the emergence of COVID-19 pandemics, by emphasizing specific SARS-CoV2 data and services. At the same time, novel organizations and resources were born in this critical period to serve specifically the purposes of COVID-19 mitigation while setting the research ground for contrasting possible future pandemics. Accessibility and integration of viral sequence data, possibly in conjunction with the human host genotype and clinical data, are paramount to better understand the COVID-19 disease and mitigate its effects. Few examples of host-pathogen integrated datasets exist so far, but we expect them to grow together with the knowledge of COVID-19 disease; once such datasets will be available, useful integrative surveillance mechanisms can be put in place by observing how common variants distribute in time and space, relating them to the phenotypic impact evidenced in the literature.


Asunto(s)
COVID-19/terapia , COVID-19/epidemiología , COVID-19/virología , Genes Virales , Humanos , Almacenamiento y Recuperación de la Información , Pandemias , SARS-CoV-2/genética , SARS-CoV-2/aislamiento & purificación
16.
Histochem Cell Biol ; 160(3): 211-221, 2023 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-37537341

RESUMEN

Biological imaging is one of the primary tools by which we understand living systems across scales from atoms to organisms. Rapid advances in imaging technology have increased both the spatial and temporal resolutions at which we examine those systems, as well as enabling visualisation of larger tissue volumes. These advances have huge potential but also generate ever increasing amounts of imaging data that must be stored and analysed. Public image repositories provide a critical scientific service through open data provision, supporting reproducibility of scientific results, access to reference imaging datasets and reuse of data for new scientific discovery and acceleration of image analysis methods development. The scale and scope of imaging data provides both challenges and opportunities for open sharing of image data. In this article, we provide a perspective influenced by decades of provision of open data resources for biological information, suggesting areas to focus on and a path towards global interoperability.


Asunto(s)
Procesamiento de Imagen Asistido por Computador , Reproducibilidad de los Resultados
17.
Histochem Cell Biol ; 160(3): 199-209, 2023 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-37341795

RESUMEN

Bioimaging has now entered the era of big data with faster-than-ever development of complex microscopy technologies leading to increasingly complex datasets. This enormous increase in data size and informational complexity within those datasets has brought with it several difficulties in terms of common and harmonized data handling, analysis, and management practices, which are currently hampering the full potential of image data being realized. Here, we outline a wide range of efforts and solutions currently being developed by the microscopy community to address these challenges on the path towards FAIR bioimaging data. We also highlight how different actors in the microscopy ecosystem are working together, creating synergies that develop new approaches, and how research infrastructures, such as Euro-BioImaging, are fostering these interactions to shape the field.


Asunto(s)
Ecosistema , Microscopía
18.
Histochem Cell Biol ; 160(3): 169-192, 2023 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-37052655

RESUMEN

The second decade of the twenty-first century witnessed a new challenge in the handling of microscopy data. Big data, data deluge, large data, data compliance, data analytics, data integrity, data interoperability, data retention and data lifecycle are terms that have introduced themselves to the electron microscopy sciences. This is largely attributed to the booming development of new microscopy hardware tools. As a result, large digital image files with an average size of one terabyte within one single acquisition session is not uncommon nowadays, especially in the field of cryogenic electron microscopy. This brings along numerous challenges in data transfer, compute and management. In this review, we will discuss in detail the current state of international knowledge on big data in contemporary electron microscopy and how big data can be transferred, computed and managed efficiently and sustainably. Workflows, solutions, approaches and suggestions will be provided, with the example of the latest experiences in Australia. Finally, important principles such as data integrity, data lifetime and the FAIR and CARE principles will be considered.


Asunto(s)
Macrodatos , Microscopía Electrónica
19.
Biol Lett ; 19(11): 20230358, 2023 11.
Artículo en Inglés | MEDLINE | ID: mdl-37964576

RESUMEN

Africa experiences frequent emerging disease outbreaks among humans, with bats often proposed as zoonotic pathogen hosts. We comprehensively reviewed virus-bat findings from papers published between 1978 and 2020 to evaluate the evidence that African bats are reservoir and/or bridging hosts for viruses that cause human disease. We present data from 162 papers (of 1322) with original findings on (1) numbers and species of bats sampled across bat families and the continent, (2) how bats were selected for study inclusion, (3) if bats were terminally sampled, (4) what types of ecological data, if any, were recorded and (5) which viruses were detected and with what methodology. We propose a scheme for evaluating presumed virus-host relationships by evidence type and quality, using the contrasting available evidence for Orthoebolavirus versus Orthomarburgvirus as an example. We review the wording in abstracts and discussions of all 162 papers, identifying key framing terms, how these refer to findings, and how they might contribute to people's beliefs about bats. We discuss the impact of scientific research communication on public perception and emphasize the need for strategies that minimize human-bat conflict and support bat conservation. Finally, we make recommendations for best practices that will improve virological study metadata.


Asunto(s)
Quirópteros , Virus , Animales , Humanos , Reservorios de Enfermedades , África
20.
Conserv Biol ; 37(4): e14061, 2023 08.
Artículo en Inglés | MEDLINE | ID: mdl-36704891

RESUMEN

Genetic diversity within species represents a fundamental yet underappreciated level of biodiversity. Because genetic diversity can indicate species resilience to changing climate, its measurement is relevant to many national and global conservation policy targets. Many studies produce large amounts of genome-scale genetic diversity data for wild populations, but most (87%) do not include the associated spatial and temporal metadata necessary for them to be reused in monitoring programs or for acknowledging the sovereignty of nations or Indigenous peoples. We undertook a distributed datathon to quantify the availability of these missing metadata and to test the hypothesis that their availability decays with time. We also worked to remediate missing metadata by extracting them from associated published papers, online repositories, and direct communication with authors. Starting with 848 candidate genomic data sets (reduced representation and whole genome) from the International Nucleotide Sequence Database Collaboration, we determined that 561 contained mostly samples from wild populations. We successfully restored spatiotemporal metadata for 78% of these 561 data sets (n = 440 data sets with data on 45,105 individuals from 762 species in 17 phyla). Examining papers and online repositories was much more fruitful than contacting 351 authors, who replied to our email requests 45% of the time. Overall, 23% of our email queries to authors unearthed useful metadata. The probability of retrieving spatiotemporal metadata declined significantly as age of the data set increased. There was a 13.5% yearly decrease in metadata associated with published papers or online repositories and up to a 22% yearly decrease in metadata that were only available from authors. This rapid decay in metadata availability, mirrored in studies of other types of biological data, should motivate swift updates to data-sharing policies and researcher practices to ensure that the valuable context provided by metadata is not lost to conservation science forever.


Importancia de la curación oportuna de metadatos para la vigilancia mundial de la diversidad genética Resumen La diversidad genética intraespecífica representa un nivel fundamental, pero a la vez subvalorado de la biodiversidad. La diversidad genética puede indicar la resiliencia de una especie ante el clima cambiante, por lo que su medición es relevante para muchos objetivos de la política de conservación mundial y nacional. Muchos estudios producen una gran cantidad de datos sobre la diversidad a nivel genético de las poblaciones silvestres, aunque la mayoría (87%) no incluye los metadatos espaciales y temporales asociados para que sean reutilizados en los programas de monitoreo o para reconocer la soberanía de las naciones o los pueblos indígenas. Realizamos un "datatón" distribuido para cuantificar la disponibilidad de estos metadatos faltantes y para probar la hipótesis que supone que esta disponibilidad se deteriora con el tiempo. También trabajamos para reparar los metadatos faltantes al extraerlos de los artículos asociados publicados, los repositorios en línea y la comunicación directa con los autores. Iniciamos con 838 candidatos de conjuntos de datos genómicos (representación reducida y genoma completo) tomados de la colaboración internacional para la base de datos de secuencias de nucleótidos y determinamos que 561 incluían en su mayoría muestras tomadas de poblaciones silvestres. Restauramos con éxito los metadatos espaciotemporales en el 78% de estos 561 conjuntos de datos (n = 440 conjuntos de datos con información sobre 45,105 individuos de 762 especies en 17 filos). El análisis de los artículos y los repositorios virtuales fue mucho más productivo que contactar a los 351 autores, quienes tuvieron un 45% de respuesta a nuestros correos. En general, el 23% de nuestras consultas descubrieron metadatos útiles. La probabilidad de recuperar metadatos espaciotemporales declinó de manera significativa conforme incrementó la antigüedad del conjunto de datos. Hubo una disminución anual del 13.5% en los metadatos asociados con los artículos publicados y los repositorios virtuales y hasta una disminución anual del 22% en los metadatos que sólo estaban disponibles mediante la comunicación con los autores. Este rápido deterioro en la disponibilidad de los metadatos, duplicado en estudios de otros tipos de datos biológicos, debería motivar la pronta actualización de las políticas del intercambio de datos y las prácticas de los investigadores para asegurar que en las ciencias de la conservación no se pierda para siempre el contexto valioso proporcionado por los metadatos.


Asunto(s)
Conservación de los Recursos Naturales , Metadatos , Humanos , Biodiversidad , Probabilidad , Variación Genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA