RESUMO
Core datasets are the composition of essential data items for a certain research scope. As they state commonalities between heterogeneous data collections, they serve as a basis for cross-site and cross-disease research. Therefore, researchers at the national and international levels have addressed the problem of missing core datasets. The German Center for Lung Research (DZL) comprises five sites and eight disease areas and aims to gain further scientific knowledge by continuously promoting collaborations. In this study, we elaborated a methodology for defining core datasets in the field of lung health science. Additionally, through support of domain experts, we have utilized our method and compiled core datasets for each DZL disease area and a general core dataset for lung research. All included data items were annotated with metadata and where possible they were assigned references to international classification systems. Our findings will support future scientific collaborations and meaningful data collections.
Assuntos
Pulmão , Metadados , Coleta de DadosRESUMO
OMOP common data model (CDM) is designed for analyzing large clinical data and building cohorts for medical research, which requires Extract-Transform-Load processes (ETL) of local heterogeneous medical data. We present a concept for developing and evaluating a modularized metadata-driven ETL process, which can transform data into OMOP CDM regardless of 1) the source data format, 2) its versions and 3) context of use.
Assuntos
Pesquisa Biomédica , Metadados , Registros Eletrônicos de Saúde , Bases de Dados FactuaisRESUMO
Semantic interoperability, i.e., the ability to automatically interpret the shared information in a meaningful way, is one of the most important requirements for data analysis of different sources. In the area of clinical and epidemiological studies, the target of the National Research Data Infrastructure for Personal Health Data (NFDI4Health), interoperability of data collection instruments such as case report forms (CRFs), data dictionaries and questionnaires is critical. Retrospective integration of semantic codes into study metadata at item-level is important, as ongoing or completed studies contain valuable information, which should be preserved. We present a first version of a Metadata Annotation Workbench to support annotators in dealing with a variety of complex terminologies and ontologies. User-driven development with users from the fields of nutritional epidemiology and chronic diseases ensured that the service fulfills the basic requirements for a semantic metadata annotation software for these NFDI4Health use cases. The web application can be accessed using a web browser and the source code of the software is available with an open-source MIT license.
Assuntos
Semântica , Software , Estudos Retrospectivos , Navegador , MetadadosRESUMO
Metadata standards are well-established for many types of electrophysiological methods but are still lacking for microneurographic recordings of peripheral sensory nerve fibers in humans. Finding a solution for daily work in the laboratory is a complex process. We have designed templates based on odML and odML-tables to structure and capture metadata and provided an extension to the existing GUI to enable database searching.
Assuntos
Metadados , Cuidados Paliativos , HumanosRESUMO
Extensive workflows have been designed to FAIRify data from various domains. These tend to be cumbersome and overwhelming. This work summarises our own experiences with FAIRification in health data management and provides simple steps that can be implemented to achieve a relatively low but improved level of FAIRness. The steps lead the data steward to register the data in a repository and then annotate it with the metadata recommended by that repository. It further leads the data steward to provide the data in a machine-readable format using an established and accessible language, establish a well-defined framework to describe and structure the (meta)data as well as publish the (meta)data. We hope that following the simple roadmap described in this work helps to demystify the FAIR data principles in the health domain.
Assuntos
Gerenciamento de Dados , MetadadosRESUMO
OBJECTIVE: To review the seroprevalence of toxoplasmosis in Pakistan. METHODS: The systematic review comprised search on Science Direct, Google Scholar, PubMed and Scopus databases for studies related to the seroprevalence of toxoplasmosis in Pakistan published between 2006 and 2020 which used serological diagnostic tests to detect Toxoplasma gondii. Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines were used throughout the review and statistical analysis was done using forest plot and random effect model. RESULTS: Of the 7093 human studies initially found, 20(0.28%) were reviewed. Of the 16,432 animal studies, 16(0.09%) were selected for detailed review. The pooled seroprevalence of toxoplasmosis in humans, calculated in this review was found as (76%) (95% confidence interval: 69-83%). Seroprevalence of human toxoplasmosis was higher in Khyber Pakhtunkhwa (31.7%) than Punjab (20.4%). Pooled seroprevalence in animals calculated in this review was found as (69%) (95% confidence interval: 64-74%). Seroprevalence in animals was higher in Khyber Pakhtunkhwa (44.7%) than Punjab (29.4%). CONCLUSIONS: The seroprevalence of toxoplasmosis in both humans and animals should be studied it other parts of Pakistan as well.
Assuntos
Metadados , Toxoplasmose , Animais , Humanos , Paquistão/epidemiologia , Estudos Soroepidemiológicos , Anticorpos Antiprotozoários , Toxoplasmose/epidemiologia , Fatores de RiscoRESUMO
The Swiss Pathogen Surveillance Platform (SPSP) is a shared secure surveillance platform between human and veterinary medicine, to also include environmental and foodborne isolates. It enables rapid and detailed transmission monitoring and outbreak surveillance of pathogens using whole genome sequencing data and associated metadata. It features controlled data access, complex dynamic queries, dedicated dashboards and automated data sharing with international repositories, providing actionable results for public health and the vision to improve societal well-being and health.
Assuntos
Genoma Bacteriano , Saúde Única , Humanos , Suíça/epidemiologia , Metadados , Genômica/métodosRESUMO
Trichomycete fungi are gut symbionts of arthropods living in aquatic habitats. The lack of a central platform with accessible collection records and associated ecological metadata has limited ecological investigations of trichomycetes. We present CIGAF (short for Collections of Insect Gut-Associated Fungi), a trichomycetes-focused digital database with interactive visualization functions enabled by the R Shiny web application. CIGAF curated 3120 collection records of trichomycetes across the globe, spanning from 1929 to 2022. CIGAF allows the exploration of nearly 100 years of field collection data through the web interface, including primary published data such as insect host information, collection site coordinates, descriptions and date of collection. When possible, specimen records are supplemented with climatic measures at collection sites. As a central platform of field collection records, multiple interactive tools allow users to analyze and plot data at various levels. CIGAF provides a comprehensive resource hub to the research community for further studies in mycology, entomology, symbiosis and biogeography.
Assuntos
Fungos , Insetos , Animais , Bases de Dados Factuais , Metadados , SoftwareRESUMO
Preclinical imaging is a critical component in translational research with significant complexities in workflow and site differences in deployment. Importantly, the National Cancer Institute's (NCI) precision medicine initiative emphasizes the use of translational co-clinical oncology models to address the biological and molecular bases of cancer prevention and treatment. The use of oncology models, such as patient-derived tumor xenografts (PDX) and genetically engineered mouse models (GEMMs), has ushered in an era of co-clinical trials by which preclinical studies can inform clinical trials and protocols, thus bridging the translational divide in cancer research. Similarly, preclinical imaging fills a translational gap as an enabling technology for translational imaging research. Unlike clinical imaging, where equipment manufacturers strive to meet standards in practice at clinical sites, standards are neither fully developed nor implemented in preclinical imaging. This fundamentally limits the collection and reporting of metadata to qualify preclinical imaging studies, thereby hindering open science and impacting the reproducibility of co-clinical imaging research. To begin to address these issues, the NCI co-clinical imaging research program (CIRP) conducted a survey to identify metadata requirements for reproducible quantitative co-clinical imaging. The enclosed consensus-based report summarizes co-clinical imaging metadata information (CIMI) to support quantitative co-clinical imaging research with broad implications for capturing co-clinical data, enabling interoperability and data sharing, as well as potentially leading to updates to the preclinical Digital Imaging and Communications in Medicine (DICOM) standard.
Assuntos
Metadados , Neoplasias , Animais , Camundongos , Humanos , Reprodutibilidade dos Testes , Diagnóstico por Imagem , Neoplasias/diagnóstico por imagem , Padrões de ReferênciaRESUMO
BACKGROUND: Biomedical researchers are strongly encouraged to make their research outputs more Findable, Accessible, Interoperable, and Reusable (FAIR). While many biomedical research outputs are more readily accessible through open data efforts, finding relevant outputs remains a significant challenge. Schema.org is a metadata vocabulary standardization project that enables web content creators to make their content more FAIR. Leveraging Schema.org could benefit biomedical research resource providers, but it can be challenging to apply Schema.org standards to biomedical research outputs. We created an online browser-based tool that empowers researchers and repository developers to utilize Schema.org or other biomedical schema projects. RESULTS: Our browser-based tool includes features which can help address many of the barriers towards Schema.org-compliance such as: The ability to easily browse for relevant Schema.org classes, the ability to extend and customize a class to be more suitable for biomedical research outputs, the ability to create data validation to ensure adherence of a research output to a customized class, and the ability to register a custom class to our schema registry enabling others to search and re-use it. We demonstrate the use of our tool with the creation of the Outbreak.info schema-a large multi-class schema for harmonizing various COVID-19 related resources. CONCLUSIONS: We have created a browser-based tool to empower biomedical research resource providers to leverage Schema.org classes to make their research outputs more FAIR.
Assuntos
Pesquisa Biomédica , COVID-19 , Humanos , MetadadosRESUMO
Studying the association of gene function, diseases, and regulatory gene network reconstruction demands data compatibility. Data from different databases follow distinct schemas and are accessible in heterogenic ways. Although the experiments differ, data may still be related to the same biological entities. Some entities may not be strictly biological, such as geolocations of habitats or paper references, but they provide a broader context for other entities. The same entities from different datasets can share similar properties, which may or may not be found within other datasets. Joint, simultaneous data fetching from multiple data sources is complicated for the end-user or, in many cases, unsupported and inefficient due to differences in data structures and ways of accessing the data. We propose BioGraph-a new model that enables connecting and retrieving information from the linked biological data that originated from diverse datasets. We have tested the model on metadata collected from five diverse public datasets and successfully constructed a knowledge graph containing more than 17 million model objects, of which 2.5 million are individual biological entity objects. The model enables the selection of complex patterns and retrieval of matched results that can be discovered only by joining the data from multiple sources.
Assuntos
Metadados , Bases de Dados FactuaisRESUMO
We present a database resulting from high throughput experimentation, primarily on metal oxide solid state materials. The central relational database, the Materials Provenance Store (MPS), manages the metadata and experimental provenance from acquisition of raw materials, through synthesis, to a broad range of materials characterization techniques. Given the primary research goal of materials discovery of solar fuels materials, many of the characterization experiments involve electrochemistry, along with optical, structural, and compositional characterizations. The MPS is populated with all information required for executing common data queries, which typically do not involve direct query of raw data. The result is a database file that can be distributed to users so that they can independently execute queries and subsequently download the data of interest. We propose this strategy as an approach to manage the highly heterogeneous and distributed data that arises from materials science experiments, as demonstrated by the management of over 30 million experiments run on over 12 million samples in the present MPS release.
Assuntos
Metadados , Semântica , Bases de Dados FactuaisRESUMO
The development of phenotypes using electronic health records is a resource-intensive process. Therefore, the cataloging of phenotype algorithm metadata for reuse is critical to accelerate clinical research. The Department of Veterans Affairs (VA) has developed a standard for phenotype metadata collection which is currently used in the VA phenomics knowledgebase library, CIPHER (Centralized Interactive Phenomics Resource), to capture over 5000 phenotypes. The CIPHER standard improves upon existing phenotype library metadata collection by capturing the context of algorithm development, phenotyping method used, and approach to validation. While the standard was iteratively developed with VA phenomics experts, it is applicable to the capture of phenotypes across healthcare systems. We describe the framework of the CIPHER standard for phenotype metadata collection, the rationale for its development, and its current application to the largest healthcare system in the United States.
Assuntos
Registros Eletrônicos de Saúde , Fenômica , Estados Unidos , Fenótipo , Algoritmos , MetadadosRESUMO
BACKGROUND: Data provenance refers to the origin, processing, and movement of data. Reliable and precise knowledge about data provenance has great potential to improve reproducibility as well as quality in biomedical research and, therefore, to foster good scientific practice. However, despite the increasing interest on data provenance technologies in the literature and their implementation in other disciplines, these technologies have not yet been widely adopted in biomedical research. OBJECTIVE: The aim of this scoping review was to provide a structured overview of the body of knowledge on provenance methods in biomedical research by systematizing articles covering data provenance technologies developed for or used in this application area; describing and comparing the functionalities as well as the design of the provenance technologies used; and identifying gaps in the literature, which could provide opportunities for future research on technologies that could receive more widespread adoption. METHODS: Following a methodological framework for scoping studies and the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) guidelines, articles were identified by searching the PubMed, IEEE Xplore, and Web of Science databases and subsequently screened for eligibility. We included original articles covering software-based provenance management for scientific research published between 2010 and 2021. A set of data items was defined along the following five axes: publication metadata, application scope, provenance aspects covered, data representation, and functionalities. The data items were extracted from the articles, stored in a charting spreadsheet, and summarized in tables and figures. RESULTS: We identified 44 original articles published between 2010 and 2021. We found that the solutions described were heterogeneous along all axes. We also identified relationships among motivations for the use of provenance information, feature sets (capture, storage, retrieval, visualization, and analysis), and implementation details such as the data models and technologies used. The important gap that we identified is that only a few publications address the analysis of provenance data or use established provenance standards, such as PROV. CONCLUSIONS: The heterogeneity of provenance methods, models, and implementations found in the literature points to the lack of a unified understanding of provenance concepts for biomedical data. Providing a common framework, a biomedical reference, and benchmarking data sets could foster the development of more comprehensive provenance solutions.
Assuntos
Pesquisa Biomédica , Humanos , Metadados , PubMed , Reprodutibilidade dos Testes , SoftwareRESUMO
Accelerating the development of synthetic biology applications requires reproducible experimental findings. Different standards and repositories exist to exchange experimental data and metadata. However, the associated software tools often do not support a uniform data capture, encoding, and exchange of information. A connection between digital repositories is required to prevent siloing and loss of information. To this end, we developed the Experimental Data Connector (XDC). It captures experimental data and related metadata by encoding it in standard formats and storing the converted data in digital repositories. Experimental data is then uploaded to Flapjack and the metadata to SynBioHub in a consistent manner linking these repositories. This produces complete connected experimental data sets that are exchangeable. The information is captured using a single template Excel Workbook, which can be integrated into existing experimental workflow automation processes and semiautomated capture of results.
Assuntos
Metadados , Software , Biologia Sintética/métodos , Fluxo de Trabalho , AutomaçãoRESUMO
MOTIVATION: The Gene Expression Omnibus has become an important source of biological data for secondary analysis. However, there is no simple, programmatic way to download data and metadata from Gene Expression Omnibus (GEO) in a standardized annotation format. RESULTS: To address this, we present GEOfetch-a command-line tool that downloads and organizes data and metadata from GEO and SRA. GEOfetch formats the downloaded metadata as a Portable Encapsulated Project, providing universal format for the reanalysis of public data. AVAILABILITY AND IMPLEMENTATION: GEOfetch is available on Bioconda and the Python Package Index (PyPI).
Assuntos
Expressão Gênica , Metadados , Biologia ComputacionalRESUMO
INTRODUCTION: Neurocognitive deficits after stroke are a common manifestation and pose a significant impact on the quality of life for patients and families; however, little attention is given to the burden and associated impact of cognitive impairment following stroke. The study aims to determine the prevalence and predictors of post-stroke cognitive impairment (PSCI) among adult stroke patients admitted to tertiary hospitals in Dodoma, Tanzania. METHODOLOGY: A prospective longitudinal study is conducted at tertiary hospitals in the Dodoma region, central Tanzania. Participants with the first stroke confirmed by CT/MRI brain aged ≥ 18 years who meet the inclusion criteria are enrolled and followed up. Baseline socio-demographic and clinical factors are identified during admission, while other clinical variables are determined during the three-month follow-up period. Descriptive statistics are used to summarize data; continuous data will be reported as Mean (SD) or Median (IQR), and categorical data will be summarized using proportions and frequencies. Univariate and multivariate logistic regression analysis will be used to determine predictors of PSCI.
Assuntos
Disfunção Cognitiva , Acidente Vascular Cerebral , Adulto , Humanos , Centros de Atenção Terciária , Estudos Longitudinais , Metadados , Estudos Prospectivos , Qualidade de Vida , Tanzânia/epidemiologia , Disfunção Cognitiva/epidemiologia , Disfunção Cognitiva/etiologia , Acidente Vascular Cerebral/complicações , Acidente Vascular Cerebral/epidemiologia , Estudos Observacionais como AssuntoRESUMO
BACKGROUND: The advancement of sequencing technologies today has made a plethora of whole-genome re-sequenced (WGRS) data publicly available. However, research utilizing the WGRS data without further configuration is nearly impossible. To solve this problem, our research group has developed an interactive Allele Catalog Tool to enable researchers to explore the coding region allelic variation present in over 1,000 re-sequenced accessions each for soybean, Arabidopsis, and maize. RESULTS: The Allele Catalog Tool was designed originally with soybean genomic data and resources. The Allele Catalog datasets were generated using our variant calling pipeline (SnakyVC) and the Allele Catalog pipeline (AlleleCatalog). The variant calling pipeline is developed to parallelly process raw sequencing reads to generate the Variant Call Format (VCF) files, and the Allele Catalog pipeline takes VCF files to perform imputations, functional effect predictions, and assemble alleles for each gene to generate curated Allele Catalog datasets. Both pipelines were utilized to generate the data panels (VCF files and Allele Catalog files) in which the accessions of the WGRS datasets were collected from various sources, currently representing over 1,000 diverse accessions for soybean, Arabidopsis, and maize individually. The main features of the Allele Catalog Tool include data query, visualization of results, categorical filtering, and download functions. Queries are performed from user input, and results are a tabular format of summary results by categorical description and genotype results of the alleles for each gene. The categorical information is specific to each species; additionally, available detailed meta-information is provided in modal popups. The genotypic information contains the variant positions, reference or alternate genotypes, the functional effect classes, and the amino-acid changes of each accession. Besides that, the results can also be downloaded for other research purposes. CONCLUSIONS: The Allele Catalog Tool is a web-based tool that currently supports three species: soybean, Arabidopsis, and maize. The Soybean Allele Catalog Tool is hosted on the SoyKB website ( https://soykb.org/SoybeanAlleleCatalogTool/ ), while the Allele Catalog Tool for Arabidopsis and maize is hosted on the KBCommons website ( https://kbcommons.org/system/tools/AlleleCatalogTool/Zmays and https://kbcommons.org/system/tools/AlleleCatalogTool/Athaliana ). Researchers can use this tool to connect variant alleles of genes with meta-information of species.