Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 693
Filter
1.
BMC Med ; 22(1): 296, 2024 Jul 18.
Article in English | MEDLINE | ID: mdl-39020355

ABSTRACT

BACKGROUND: Sexually transmitted infections (STIs) pose a significant global public health challenge. Early diagnosis and treatment reduce STI transmission, but rely on recognising symptoms and care-seeking behaviour of the individual. Digital health software that distinguishes STI skin conditions could improve health-seeking behaviour. We developed and evaluated a deep learning model to differentiate STIs from non-STIs based on clinical images and symptoms. METHODS: We used 4913 clinical images of genital lesions and metadata from the Melbourne Sexual Health Centre collected during 2010-2023. We developed two binary classification models to distinguish STIs from non-STIs: (1) a convolutional neural network (CNN) using images only and (2) an integrated model combining both CNN and fully connected neural network (FCN) using images and metadata. We evaluated the model performance by the area under the ROC curve (AUC) and assessed metadata contributions to the Image-only model. RESULTS: Our study included 1583 STI and 3330 non-STI images. Common STI diagnoses were syphilis (34.6%), genital warts (24.5%) and herpes (19.4%), while most non-STIs (80.3%) were conditions such as dermatitis, lichen sclerosis and balanitis. In both STI and non-STI groups, the most frequently observed groups were 25-34 years (48.6% and 38.2%, respectively) and heterosexual males (60.3% and 45.9%, respectively). The Image-only model showed a reasonable performance with an AUC of 0.859 (SD 0.013). The Image + Metadata model achieved a significantly higher AUC of 0.893 (SD 0.018) compared to the Image-only model (p < 0.01). Out of 21 metadata, the integration of demographic and dermatological metadata led to the most significant improvement in model performance, increasing AUC by 6.7% compared to the baseline Image-only model. CONCLUSIONS: The Image + Metadata model outperformed the Image-only model in distinguishing STIs from other skin conditions. Using it as a screening tool in a clinical setting may require further development and evaluation with larger datasets.


Subject(s)
Metadata , Sexually Transmitted Diseases , Humans , Sexually Transmitted Diseases/diagnosis , Male , Female , Adult , Artificial Intelligence , Middle Aged , Neural Networks, Computer , Young Adult , Mass Screening/methods , Skin Diseases/diagnosis , Deep Learning
2.
Clin Transl Sci ; 17(7): e13886, 2024 Jul.
Article in English | MEDLINE | ID: mdl-39046315

ABSTRACT

Real-world evidence (RWE) trials have a key advantage over conventional randomized controlled trials (RCTs) due to their potentially better generalizability. High generalizability of study results facilitates new biological insights and enables targeted therapeutic strategies. Random sampling of RWE trial participants is regarded as the gold standard for generalizability. Additionally, the use of sample correction procedures can increase the generalizability of trial results, even when using nonrandomly sampled real-world data (RWD). This study presents descriptive evidence on the extent to which the design of currently planned or already conducted RWE trials takes sampling into account. It also examines whether random sampling or procedures for correcting nonrandom samples are considered. Based on text mining of publicly available metadata provided during registrations of RWE trials on clinicaltrials.gov, EU-PAS, and the OSF-RWE registry, it is shown that the share of RWE trial registrations with information on sampling increased from 65.27% in 2002 to 97.43% in 2022, with a corresponding increase from 14.79% to 28.30% for trials with random samples. For RWE trials with nonrandom samples, there is an increase from 0.00% to 0.95% of trials in which sample correction procedures are used. We conclude that the potential benefits of RWD in terms of generalizing trial results are not yet being fully realized.


Subject(s)
Data Mining , Research Design , Humans , Data Mining/methods , Randomized Controlled Trials as Topic/statistics & numerical data , Registries/statistics & numerical data , Clinical Trials as Topic/statistics & numerical data , Pragmatic Clinical Trials as Topic/methods , Metadata/statistics & numerical data
3.
Nat Commun ; 15(1): 6241, 2024 Jul 24.
Article in English | MEDLINE | ID: mdl-39048577

ABSTRACT

Studying the neural basis of human dynamic visual perception requires extensive experimental data to evaluate the large swathes of functionally diverse brain neural networks driven by perceiving visual events. Here, we introduce the BOLD Moments Dataset (BMD), a repository of whole-brain fMRI responses to over 1000 short (3 s) naturalistic video clips of visual events across ten human subjects. We use the videos' extensive metadata to show how the brain represents word- and sentence-level descriptions of visual events and identify correlates of video memorability scores extending into the parietal cortex. Furthermore, we reveal a match in hierarchical processing between cortical regions of interest and video-computable deep neural networks, and we showcase that BMD successfully captures temporal dynamics of visual events at second resolution. With its rich metadata, BMD offers new perspectives and accelerates research on the human brain basis of visual event perception.


Subject(s)
Brain Mapping , Brain , Magnetic Resonance Imaging , Metadata , Visual Perception , Humans , Magnetic Resonance Imaging/methods , Visual Perception/physiology , Male , Female , Brain Mapping/methods , Adult , Brain/physiology , Brain/diagnostic imaging , Parietal Lobe/physiology , Parietal Lobe/diagnostic imaging , Young Adult , Photic Stimulation , Video Recording
4.
Sci Data ; 11(1): 772, 2024 Jul 13.
Article in English | MEDLINE | ID: mdl-39003329

ABSTRACT

The German initiative "National Research Data Infrastructure for Personal Health Data" (NFDI4Health) focuses on research data management in health research. It aims to foster and develop harmonized informatics standards for public health, epidemiological studies, and clinical trials, facilitating access to relevant data and metadata standards. This publication lists syntactic and semantic data standards of potential use for NFDI4Health and beyond, based on interdisciplinary meetings and workshops, mappings of study questionnaires and the NFDI4Health metadata schema, and literature search. Included are 7 syntactic, 32 semantic and 9 combined syntactic and semantic standards. In addition, 101 ISO Standards from ISO/TC 215 Health Informatics and ISO/TC 276 Biotechnology could be identified as being potentially relevant. The work emphasizes the utilization of standards for epidemiological and health research data ensuring interoperability as well as the compatibility to NFDI4Health, its use cases, and to (inter-)national efforts within these sectors. The goal is to foster collaborative and inter-sectoral work in health research and initiate a debate around the potential of using common standards.


Subject(s)
Health Information Interoperability , Humans , Metadata , Germany , Health Records, Personal , Data Management
5.
Gigascience ; 132024 01 02.
Article in English | MEDLINE | ID: mdl-38991851

ABSTRACT

BACKGROUND: As biological data increase, we need additional infrastructure to share them and promote interoperability. While major effort has been put into sharing data, relatively less emphasis is placed on sharing metadata. Yet, sharing metadata is also important and in some ways has a wider scope than sharing data themselves. RESULTS: Here, we present PEPhub, an approach to improve sharing and interoperability of biological metadata. PEPhub provides an API, natural-language search, and user-friendly web-based sharing and editing of sample metadata tables. We used PEPhub to process more than 100,000 published biological research projects and index them with fast semantic natural-language search. PEPhub thus provides a fast and user-friendly way to finding existing biological research data or to share new data. AVAILABILITY: https://pephub.databio.org.


Subject(s)
Databases, Factual , Information Dissemination , Internet , Metadata , Software , User-Computer Interface , Information Dissemination/methods , Computational Biology/methods
6.
Sci Data ; 11(1): 754, 2024 Jul 10.
Article in English | MEDLINE | ID: mdl-38987254

ABSTRACT

Ancient DNA is producing a rich record of past genetic diversity in humans and other species. However, unless the primary data is appropriately archived, its long-term value will not be fully realised. I surveyed publicly archived data from 42 recent ancient genomics studies. Half of the studies archived incomplete datasets, preventing accurate replication and representing a loss of data of potential future use. No studies met all criteria that could be considered best practice. Based on these results, I make six recommendations for data producers: (1) archive all sequencing reads, not just those that aligned to a reference genome, (2) archive read alignments too, but as secondary analysis files, (3) provide correct experiment metadata on samples, libraries and sequencing runs, (4) provide informative sample metadata, (5) archive data from low-coverage and negative experiments, and (6) document archiving choices in papers, and peer review these. Given the reliance on destructive sampling of finite material, ancient genomics studies have a particularly strong responsibility to ensure the longevity and reusability of generated data.


Subject(s)
DNA, Ancient , Genomics , Humans , DNA, Ancient/analysis , Animals , Metadata
7.
Sci Data ; 11(1): 732, 2024 Jul 05.
Article in English | MEDLINE | ID: mdl-38969627

ABSTRACT

To explore complex biological questions, it is often necessary to access various data types from public data repositories. As the volume and complexity of biological sequence data grow, public repositories face significant challenges in ensuring that the data is easily discoverable and usable by the biological research community. To address these challenges, the National Center for Biotechnology Information (NCBI) has created NCBI Datasets. This resource provides straightforward, comprehensive, and scalable access to biological sequences, annotations, and metadata for a wide range of taxa. Following the FAIR (Findable, Accessible, Interoperable, and Reusable) data management principles, NCBI Datasets offers user-friendly web interfaces, command-line tools, and documented APIs, empowering researchers to access NCBI data seamlessly. The data is delivered as packages of sequences and metadata, thus facilitating improved data retrieval, sharing, and usability in research. Moreover, this data delivery method fosters effective data attribution and promotes its further reuse. This paper outlines the current scope of data accessible through NCBI Datasets and explains various options for exploring and downloading the data.


Subject(s)
Metadata , Databases, Genetic , United States , Information Storage and Retrieval
8.
Curr Protoc ; 4(7): e1066, 2024 Jul.
Article in English | MEDLINE | ID: mdl-39073034

ABSTRACT

Image data from a single animal in neuroscientific experiments can be comprised of terabytes of information. Full studies can thus be challenging to analyze, store, view, and manage. What follows is an updated guide for preparing and sharing big neuroanatomical image data. © 2024 Wiley Periodicals LLC. Basic Protocol 1: Naming and organizing images and metadata Basic Protocol 2: Preparing and annotating images for presentations and figures Basic Protocol 3: Assessing the internet environment and optimizing images.


Subject(s)
Image Processing, Computer-Assisted , Neuroanatomy , Neuroanatomy/methods , Image Processing, Computer-Assisted/methods , Animals , Internet , Humans , Metadata
9.
Health Informatics J ; 30(2): 14604582241262961, 2024.
Article in English | MEDLINE | ID: mdl-38881290

ABSTRACT

Objectives: This study aims to address the critical challenges of data integrity, accuracy, consistency, and precision in the application of electronic medical record (EMR) data within the healthcare sector, particularly within the context of Chinese medical information data management. The research seeks to propose a solution in the form of a medical metadata governance framework that is efficient and suitable for clinical research and transformation. Methods: The article begins by outlining the background of medical information data management and reviews the advancements in artificial intelligence (AI) technology relevant to the field. It then introduces the "Service, Patient, Regression, base/Away, Yeast" (SPRAY)-type AI application as a case study to illustrate the potential of AI in EMR data management. Results: The research identifies the scarcity of scientific research on the transformation of EMR data in Chinese hospitals and proposes a medical metadata governance framework as a solution. This framework is designed to achieve scientific governance of clinical data by integrating metadata management and master data management, grounded in clinical practices, medical disciplines, and scientific exploration. Furthermore, it incorporates an information privacy security architecture to ensure data protection. Conclusion: The proposed medical metadata governance framework, supported by AI technology, offers a structured approach to managing and transforming EMR data into valuable scientific research outcomes. This framework provides guidance for the identification, cleaning, mining, and deep application of EMR data, thereby addressing the bottlenecks currently faced in the healthcare scenario and paving the way for more effective clinical research and data-driven decision-making.


Subject(s)
Artificial Intelligence , Electronic Health Records , Artificial Intelligence/trends , China , Humans , Electronic Health Records/trends , Data Management/methods , Metadata
10.
Sci Data ; 11(1): 574, 2024 Jun 04.
Article in English | MEDLINE | ID: mdl-38834597

ABSTRACT

Experts from 18 consortia are collaborating on the Human Reference Atlas (HRA) which aims to map the 37 trillion cells in the healthy human body. Information relevant for HRA construction and usage is held by experts, published in scholarly papers, and captured in experimental data. However, these data sources use different metadata schemas and cannot be cross-searched efficiently. This paper documents the compilation of a dataset, named HRAlit, that links the 136 HRA v1.4 digital objects (31 organs with 4,279 anatomical structures, 1,210 cell types, 2,089 biomarkers) to 583,117 experts; 7,103,180 publications; 896,680 funded projects, and 1,816 experimental datasets. The resulting HRAlit has 22 tables with 20,939,937 records including 6 junction tables with 13,170,651 relationships. The HRAlit can be mined to identify leading experts, major papers, funding trends, or alignment with existing ontologies in support of systematic HRA construction and usage.


Subject(s)
Cells , Metadata , Humans
11.
Sci Data ; 11(1): 634, 2024 Jun 15.
Article in English | MEDLINE | ID: mdl-38879585

ABSTRACT

In low- and middle-income countries, the substantial costs associated with traditional data collection pose an obstacle to facilitating decision-making in the field of public health. Satellite imagery offers a potential solution, but the image extraction and analysis can be costly and requires specialized expertise. We introduce SatelliteBench, a scalable framework for satellite image extraction and vector embeddings generation. We also propose a novel multimodal fusion pipeline that utilizes a series of satellite imagery and metadata. The framework was evaluated generating a dataset with a collection of 12,636 images and embeddings accompanied by comprehensive metadata, from 81 municipalities in Colombia between 2016 and 2018. The dataset was then evaluated in 3 tasks: including dengue case prediction, poverty assessment, and access to education. The performance showcases the versatility and practicality of SatelliteBench, offering a reproducible, accessible and open tool to enhance decision-making in public health.


Subject(s)
Dengue , Public Health , Satellite Imagery , Colombia , Humans , Metadata
12.
PLoS One ; 19(6): e0306100, 2024.
Article in English | MEDLINE | ID: mdl-38917182

ABSTRACT

Making data FAIR-findable, accessible, interoperable, reproducible-has become the recurring theme behind many research data management efforts. dtool is a lightweight data management tool that packages metadata with immutable data to promote accessibility, interoperability, and reproducibility. Each dataset is self-contained and does not require metadata to be stored in a centralised system. This decentralised approach means that finding datasets can be difficult. dtool's lookup server, short dserver, as defined by a REST API, makes dtool datasets findable, hence rendering the dtool ecosystem fit for a FAIR data management world. Its simplicity, modularity, accessibility and standardisation via API distinguish dtool and dserver from other solutions and enable it to serve as a common denominator for cross-disciplinary research data management. The dtool ecosystem bridges the gap between standardisation-free data management by individuals and FAIR platform solutions with rigid metadata requirements.


Subject(s)
Software , Data Management/methods , Metadata , Ecosystem , Reproducibility of Results , Internet
13.
BMC Bioinformatics ; 25(1): 184, 2024 May 09.
Article in English | MEDLINE | ID: mdl-38724907

ABSTRACT

BACKGROUND: Major advances in sequencing technologies and the sharing of data and metadata in science have resulted in a wealth of publicly available datasets. However, working with and especially curating public omics datasets remains challenging despite these efforts. While a growing number of initiatives aim to re-use previous results, these present limitations that often lead to the need for further in-house curation and processing. RESULTS: Here, we present the Omics Dataset Curation Toolkit (OMD Curation Toolkit), a python3 package designed to accompany and guide the researcher during the curation process of metadata and fastq files of public omics datasets. This workflow provides a standardized framework with multiple capabilities (collection, control check, treatment and integration) to facilitate the arduous task of curating public sequencing data projects. While centered on the European Nucleotide Archive (ENA), the majority of the provided tools are generic and can be used to curate datasets from different sources. CONCLUSIONS: Thus, it offers valuable tools for the in-house curation previously needed to re-use public omics data. Due to its workflow structure and capabilities, it can be easily used and benefit investigators in developing novel omics meta-analyses based on sequencing data.


Subject(s)
Data Curation , Software , Workflow , Data Curation/methods , Metadata , Databases, Genetic , Genomics/methods , Computational Biology/methods
14.
Front Cell Infect Microbiol ; 14: 1384809, 2024.
Article in English | MEDLINE | ID: mdl-38774631

ABSTRACT

Introduction: Sharing microbiome data among researchers fosters new innovations and reduces cost for research. Practically, this means that the (meta)data will have to be standardized, transparent and readily available for researchers. The microbiome data and associated metadata will then be described with regards to composition and origin, in order to maximize the possibilities for application in various contexts of research. Here, we propose a set of tools and protocols to develop a real-time FAIR (Findable. Accessible, Interoperable and Reusable) compliant database for the handling and storage of human microbiome and host-associated data. Methods: The conflicts arising from privacy laws with respect to metadata, possible human genome sequences in the metagenome shotgun data and FAIR implementations are discussed. Alternate pathways for achieving compliance in such conflicts are analyzed. Sample traceable and sensitive microbiome data, such as DNA sequences or geolocalized metadata are identified, and the role of the GDPR (General Data Protection Regulation) data regulations are considered. For the construction of the database, procedures have been realized to make data FAIR compliant, while preserving privacy of the participants providing the data. Results and discussion: An open-source development platform, Supabase, was used to implement the microbiome database. Researchers can deploy this real-time database to access, upload, download and interact with human microbiome data in a FAIR complaint manner. In addition, a large language model (LLM) powered by ChatGPT is developed and deployed to enable knowledge dissemination and non-expert usage of the database.


Subject(s)
Microbiota , Humans , Microbiota/genetics , Databases, Factual , Metadata , Metagenome , Information Dissemination , Computational Biology/methods , Metagenomics/methods , Databases, Genetic
15.
Sci Data ; 11(1): 503, 2024 May 16.
Article in English | MEDLINE | ID: mdl-38755173

ABSTRACT

Nanomaterials hold great promise for improving our society, and it is crucial to understand their effects on biological systems in order to enhance their properties and ensure their safety. However, the lack of consistency in experimental reporting, the absence of universally accepted machine-readable metadata standards, and the challenge of combining such standards hamper the reusability of previously produced data for risk assessment. Fortunately, the research community has responded to these challenges by developing minimum reporting standards that address several of these issues. By converting twelve published minimum reporting standards into a machine-readable representation using FAIR maturity indicators, we have created a machine-friendly approach to annotate and assess datasets' reusability according to those standards. Furthermore, our NanoSafety Data Reusability Assessment (NSDRA) framework includes a metadata generator web application that can be integrated into experimental data management, and a new web application that can summarize the reusability of nanosafety datasets for one or more subsets of maturity indicators, tailored to specific computational risk assessment use cases. This approach enhances the transparency, communication, and reusability of experimental data and metadata. With this improved FAIR approach, we can facilitate the reuse of nanosafety research for exploration, toxicity prediction, and regulation, thereby advancing the field and benefiting society as a whole.


Subject(s)
Nanostructures , Metadata , Nanostructures/toxicity , Risk Assessment
16.
J Am Med Inform Assoc ; 31(7): 1578-1582, 2024 Jun 20.
Article in English | MEDLINE | ID: mdl-38700253

ABSTRACT

OBJECTIVE: Leverage electronic health record (EHR) audit logs to develop a machine learning (ML) model that predicts which notes a clinician wants to review when seeing oncology patients. MATERIALS AND METHODS: We trained logistic regression models using note metadata and a Term Frequency Inverse Document Frequency (TF-IDF) text representation. We evaluated performance with precision, recall, F1, AUC, and a clinical qualitative assessment. RESULTS: The metadata only model achieved an AUC 0.930 and the metadata and TF-IDF model an AUC 0.937. Qualitative assessment revealed a need for better text representation and to further customize predictions for the user. DISCUSSION: Our model effectively surfaces the top 10 notes a clinician wants to review when seeing an oncology patient. Further studies can characterize different types of clinician users and better tailor the task for different care settings. CONCLUSION: EHR audit logs can provide important relevance data for training ML models that assist with note-writing in the oncology setting.


Subject(s)
Electronic Health Records , Machine Learning , Medical Oncology , Humans , Logistic Models , Metadata , Medical Audit , Proof of Concept Study
17.
J Am Med Inform Assoc ; 31(7): 1463-1470, 2024 Jun 20.
Article in English | MEDLINE | ID: mdl-38722233

ABSTRACT

OBJECTIVE: ModelDB (https://modeldb.science) is a discovery platform for computational neuroscience, containing over 1850 published model codes with standardized metadata. These codes were mainly supplied from unsolicited model author submissions, but this approach is inherently limited. For example, we estimate we have captured only around one-third of NEURON models, the most common type of models in ModelDB. To more completely characterize the state of computational neuroscience modeling work, we aim to identify works containing results derived from computational neuroscience approaches and their standardized associated metadata (eg, cell types, research topics). MATERIALS AND METHODS: Known computational neuroscience work from ModelDB and identified neuroscience work queried from PubMed were included in our study. After pre-screening with SPECTER2 (a free document embedding method), GPT-3.5, and GPT-4 were used to identify likely computational neuroscience work and relevant metadata. RESULTS: SPECTER2, GPT-4, and GPT-3.5 demonstrated varied but high abilities in identification of computational neuroscience work. GPT-4 achieved 96.9% accuracy and GPT-3.5 improved from 54.2% to 85.5% through instruction-tuning and Chain of Thought. GPT-4 also showed high potential in identifying relevant metadata annotations. DISCUSSION: Accuracy in identification and extraction might further be improved by dealing with ambiguity of what are computational elements, including more information from papers (eg, Methods section), improving prompts, etc. CONCLUSION: Natural language processing and large language model techniques can be added to ModelDB to facilitate further model discovery, and will contribute to a more standardized and comprehensive framework for establishing domain-specific resources.


Subject(s)
Computational Biology , Neurosciences , Computational Biology/methods , Humans , Metadata , Data Curation/methods , Models, Neurological , Data Mining/methods , Databases, Factual
19.
Nat Ecol Evol ; 8(7): 1224-1232, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38789640

ABSTRACT

Genetic and genomic data are collected for a vast array of scientific and applied purposes. Despite mandates for public archiving, data are typically used only by the generating authors. The reuse of genetic and genomic datasets remains uncommon because it is difficult, if not impossible, due to non-standard archiving practices and lack of contextual metadata. But as the new field of macrogenetics is demonstrating, if genetic data and their metadata were more accessible and FAIR (findable, accessible, interoperable and reusable) compliant, they could be reused for many additional purposes. We discuss the main challenges with existing genetic and genomic data archives, and suggest best practices for archiving genetic and genomic data. Recognizing that this is a longstanding issue due to little formal data management training within the fields of ecology and evolution, we highlight steps that research institutions and publishers could take to improve data archiving.


Subject(s)
Genomics , Databases, Genetic , Data Management , Metadata
20.
Sci Data ; 11(1): 524, 2024 May 22.
Article in English | MEDLINE | ID: mdl-38778016

ABSTRACT

Datasets consist of measurement data and metadata. Metadata provides context, essential for understanding and (re-)using data. Various metadata standards exist for different methods, systems and contexts. However, relevant information resides at differing stages across the data-lifecycle. Often, this information is defined and standardized only at publication stage, which can lead to data loss and workload increase. In this study, we developed Metadatasheet, a metadata standard based on interviews with members of two biomedical consortia and systematic screening of data repositories. It aligns with the data-lifecycle allowing synchronous metadata recording within Microsoft Excel, a widespread data recording software. Additionally, we provide an implementation, the Metadata Workbook, that offers user-friendly features like automation, dynamic adaption, metadata integrity checks, and export options for various metadata standards. By design and due to its extensive documentation, the proposed metadata standard simplifies recording and structuring of metadata for biomedical scientists, promoting practicality and convenience in data management. This framework can accelerate scientific progress by enhancing collaboration and knowledge transfer throughout the intermediate steps of data creation.


Subject(s)
Data Management , Metadata , Biomedical Research , Data Management/standards , Metadata/standards , Software
SELECTION OF CITATIONS
SEARCH DETAIL