Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 393
Filtrar
1.
Stud Health Technol Inform ; 287: 73-77, 2021 Nov 18.
Artigo em Inglês | MEDLINE | ID: mdl-34795084

RESUMO

Adopting international standards within health research communities can elevate data FAIRness and widen analysis possibilities. The purpose of this study was to evaluate the mapping feasibility against HL7® Fast Healthcare Interoperability Resources® (FHIR)® of a generic metadata schema (MDS) created for a central search hub gathering COVID-19 health research (studies, questionnaires, documents = MDS resource types). Mapping results were rated by calculating the percentage of FHIR coverage. Among 86 items to map, total mapping coverage was 94%: 50 (58%) of the items were available as standard resources in FHIR and 31 (36%) could be mapped using extensions. Five items (6%) could not be mapped to FHIR. Analyzing each MDS resource type, there was a total mapping coverage of 93% for studies and 95% for questionnaires and documents, with 61% of the MDS items available as standard resources in FHIR for studies, 57% for questionnaires and 52% for documents. Extensions in studies, questionnaires and documents were used in 32%, 38% and 43% of items, respectively. This work shows that FHIR can be used as a standardized format in registries for clinical, epidemiological and public health research. However, further adjustments to the initial MDS are recommended - and two additional items even needed when implementing FHIR. Developing a MDS based on the FHIR standard could be a future approach to reduce data ambiguity and foster interoperability.


Assuntos
COVID-19 , Metadados , Atenção à Saúde , Registros Eletrônicos de Saúde , Nível Sete de Saúde , Humanos , Sistema de Registros , SARS-CoV-2
2.
Stud Health Technol Inform ; 287: 78-82, 2021 Nov 18.
Artigo em Inglês | MEDLINE | ID: mdl-34795085

RESUMO

The German Central Health Study Hub COVID-19 is an online service that offers bundled access to COVID-19 related studies conducted in Germany. It combines metadata and other information of epidemiologic, public health and clinical studies into a single data repository for FAIR data access. In addition to study characteristics the system also allows easy access to study documents, as well as instruments for data collection. Study metadata and survey instruments are decomposed into individual data items and semantically enriched to ease the findability. Data from existing clinical trial registries (DRKS, clinicaltrails.gov and WHO ICTRP) are merged with epidemiological and public health studies manually collected and entered. More than 850 studies are listed as of September 2021.


Assuntos
COVID-19 , Alemanha , Humanos , Metadados , SARS-CoV-2 , Inquéritos e Questionários
3.
Stud Health Technol Inform ; 287: 124-125, 2021 Nov 18.
Artigo em Inglês | MEDLINE | ID: mdl-34795095

RESUMO

The term 'metadata' is mentioned in every one of the FAIR principles. Metadata is without question important for findability, accessibility, and reusability, but essential for interoperability. Standardized schemas have been developed by various stakeholders for decades, but too rarely come to practical use. The reason for this is that the application domain is not clearly understood. In many bio-medical research projects, the need for metadata is recognized at some point, but there is not only a lack of overview of existing standards, but also a lack of correct assessment of what individual metadata schemas were actually made for. This paper differentiates different application scenarios for metadata in clinical research.


Assuntos
Informática Médica , Metadados
4.
Gesundheitswesen ; 83(S 01): S54-S59, 2021 Nov.
Artigo em Alemão | MEDLINE | ID: mdl-34731894

RESUMO

OBJECTIVE: The German Federal Ministry of Education and Research funded a project accompanying a funding initiative for registries in health services research. The aim was to provide cross-registry support initially for 16 and later 6 projects with regard to methodological, technical and structural standards. METHODS: The 16 projects were initially guided in concept development, e. g., providing a template for a registry protocol. Furthermore, an expert consultation was organized and carried out. To assist in the selection of an IT solution, a challenge workshop was hosted where different vendors presented their software for registries. The catalogs of data elements of the projects were migrated into a metadata catalog and transferred to the standard model of ISO/IEC 11179. A set of quality indicators was defined for a cross-registry quality management approach to be implemented during the operational phase. To improve data quality, the indicators were to be transmitted and evaluated on a regular basis. RESULTS: The template for a registry protocol was used by the majority of projects when applying for funding of their operational phase. At the workshop on IT solutions, 12 products for registry software were presented; however, the projects opted for other solutions for different reasons. Transferring the catalogs of data elements into a standard model enabled a comparison of attributes and value sets, which in turn enabled formulation of recommendations for important elements. A set of five quality indicators was defined for quality management, for which an initial evaluation was carried out for 2020. CONCLUSION: The template of a registry protocol serves a systematic development of a concept. The use of a uniformly structured catalog of data elements supports compliance with the FAIR principles. Monitoring of data quality can be achieved by regularly identifying quality indicators across registries.


Assuntos
Confiabilidade dos Dados , Metadados , Alemanha , Pesquisa sobre Serviços de Saúde , Sistema de Registros
5.
BMC Med Inform Decis Mak ; 21(Suppl 7): 275, 2021 11 09.
Artigo em Inglês | MEDLINE | ID: mdl-34753474

RESUMO

BACKGROUND: Fast food with its abundance and availability to consumers may have health consequences due to the high calorie intake which is a major contributor to life threatening diseases. Providing nutritional information has some impact on consumer decisions to self regulate and promote healthier diets, and thus, government regulations have mandated the publishing of nutritional content to assist consumers, including for fast food. However, fast food nutritional information is fragmented, and we realize a benefit to collate nutritional data to synthesize knowledge for individuals. METHODS: We developed the ontology of fast food facts as an opportunity to standardize knowledge of fast food and link nutritional data that could be analyzed and aggregated for the information needs of consumers and experts. The ontology is based on metadata from 21 fast food establishment nutritional resources and authored in OWL2 using Protégé. RESULTS: Three evaluators reviewed the logical structure of the ontology through natural language translation of the axioms. While there is majority agreement (76.1% pairwise agreement) of the veracity of the ontology, we identified 103 out of the 430 statements that were erroneous. We revised the ontology and publicably published the initial release of the ontology. The ontology has 413 classes, 21 object properties, 13 data properties, and 494 logical axioms. CONCLUSION: With the initial release of the ontology of fast food facts we discuss some future visions with the continued evolution of this knowledge base, and the challenges we plan to address, like the management and publication of voluminous amount of semantically linked fast food nutritional data.


Assuntos
Formação de Conceito , Web Semântica , Fast Foods , Humanos , Idioma , Metadados
6.
Sensors (Basel) ; 21(19)2021 Sep 28.
Artigo em Inglês | MEDLINE | ID: mdl-34640782

RESUMO

The annotation of sensor data with semantic metadata is essential to the goals of automation and interoperability in the context of Industry 4.0. In this contribution, we outline a semantic description of quality of data in sensor networks in terms of indicators, metrics and interpretations. The concepts thus defined are consolidated into an ontology that describes quality of data metainformation in heterogeneous sensor networks and methods for the determination of corresponding quality of data dimensions are outlined. By incorporating support for sensor calibration models and measurement uncertainty via a previously derived ontology, a conformity with metrological requirements for sensor data is ensured. A quality description for a calibrated sensor generated using the resulting ontology is presented in the JSON-LD format using the battery level and calibration data as quality indicators. Finally, the general applicability of the model is demonstrated using a series of competency questions.


Assuntos
Metadados , Semântica
7.
J Integr Bioinform ; 18(3)2021 Oct 20.
Artigo em Inglês | MEDLINE | ID: mdl-34668356

RESUMO

A standardized approach to annotating computational biomedical models and their associated files can facilitate model reuse and reproducibility among research groups, enhance search and retrieval of models and data, and enable semantic comparisons between models. Motivated by these potential benefits and guided by consensus across the COmputational Modeling in BIology NEtwork (COMBINE) community, we have developed a specification for encoding annotations in Open Modeling and EXchange (OMEX)-formatted archives. This document details version 1.2 of the specification, which builds on version 1.0 published last year in this journal. In particular, this version includes a set of initial model-level annotations (whereas v 1.0 described exclusively annotations at a smaller scale). Additionally, this version uses best practices for namespaces, and introduces omex-library.org as a common root for all annotations. Distributing modeling projects within an OMEX archive is a best practice established by COMBINE, and the OMEX metadata specification presented here provides a harmonized, community-driven approach for annotating a variety of standardized model representations. This specification acts as a technical guideline for developing software tools that can support this standard, and thereby encourages broad advances in model reuse, discovery, and semantic analyses.


Assuntos
Metadados , Software , Biologia Computacional , Reprodutibilidade dos Testes , Semântica
8.
J Integr Bioinform ; 18(3)2021 Oct 22.
Artigo em Inglês | MEDLINE | ID: mdl-34674411

RESUMO

This special issue of the Journal of Integrative Bioinformatics contains updated specifications of COMBINE standards in systems and synthetic biology. The 2021 special issue presents four updates of standards: Synthetic Biology Open Language Visual Version 2.3, Synthetic Biology Open Language Visual Version 3.0, Simulation Experiment Description Markup Language Level 1 Version 4, and OMEX Metadata specification Version 1.2. This document can also be consulted to identify the latest specifications of all COMBINE standards.


Assuntos
Biologia Computacional , Biologia Sintética , Simulação por Computador , Metadados , Linguagens de Programação , Software
9.
Nat Commun ; 12(1): 5854, 2021 10 06.
Artigo em Inglês | MEDLINE | ID: mdl-34615866

RESUMO

The amount of public proteomics data is rapidly increasing but there is no standardized format to describe the sample metadata and their relationship with the dataset files in a way that fully supports their understanding or reanalysis. Here we propose to develop the transcriptomics data format MAGE-TAB into a standard representation for proteomics sample metadata. We implement MAGE-TAB-Proteomics in a crowdsourcing project to manually curate over 200 public datasets. We also describe tools and libraries to validate and submit sample metadata-related information to the PRIDE repository. We expect that these developments will improve the reproducibility and facilitate the reanalysis and integration of public proteomics datasets.


Assuntos
Análise de Dados , Bases de Dados de Proteínas , Metadados , Proteômica , Big Data , Humanos , Reprodutibilidade dos Testes , Software , Transcriptoma
10.
Stud Health Technol Inform ; 283: 59-68, 2021 Sep 21.
Artigo em Inglês | MEDLINE | ID: mdl-34545820

RESUMO

INTRODUCTION: Ensuring scientific reproducibility and compliance with documentation guidelines of funding bodies and journals is a topic of greatly increasing importance in biomedical research. Failure to comply, or unawareness of documentation standards can have adverse effects on the translation of research into patient treatments, as well as economic implications. In the context of the German Research Foundation-funded collaborative research center (CRC) 1002, an IT-infrastructure sub-project was designed. Its goal has been to establish standardized metadata documentation and information exchange benefitting the participating research groups with minimal additional documentation efforts. METHODS: Implementation of the self-developed menoci-based research data platform (RDP) was driven by close communication and collaboration with researchers as early adopters and experts. Requirements analysis and concept development involved in person observation of experimental procedures, interviews and collaboration with researchers and experts, as well as the investigation of available and applicable metadata standards and tools. The Drupal-based RDP features distinct modules for the different documented data and workflow types, and both the development and the types of collected metadata were continuously reviewed and evaluated with the early adopters. RESULTS: The menoci-based RDP allows for standardized documentation, sharing and cross-referencing of different data types, workflows, and scientific publications. Different modules have been implemented for specific data types and workflows, allowing for the enrichment of entries with specific metadata and linking to further relevant entries in different modules. DISCUSSION: Taking the workflows and datasets of the frequently involved experimental service projects as a starting point for (meta-)data types to overcome irreproducibility of research data, results in increased benefits for researchers with minimized efforts. While the menoci-based RDP with its data models and metadata schema was originally developed in a cardiological context, it has been implemented and extended to other consortia at GÃuttingen Campus and beyond in different life science research areas.


Assuntos
Pesquisa Biomédica , Metadados , Documentação , Humanos , Reprodutibilidade dos Testes , Fluxo de Trabalho
11.
Eur Rev Med Pharmacol Sci ; 25(17): 5556-5560, 2021 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-34533806

RESUMO

OBJECTIVE: This paper aims to develop four prediction models for recovered and unrecovered cases using descriptive data of patients and symptoms of CoVID-19 patients. The developed prediction models aim to extract the important variables in predicting recovered cases by using the binary values for recovered cases. MATERIALS AND METHODS: The data were collected from different countries all over the world. The input of the prediction model contains 28 symptoms and four variables of the patient's information. Symptoms of COVID-19 include a high fever, low fever, sore throat, cough, and so on, where patient metadata includes Province, county, sex, and age. The dataset contains 1254 patients with 664 recovered cases. To develop prediction models, four models are used including neural network, support vector machine, CHAID, and QUEST models. To develop prediction models, the dataset is divided into train and test datasets with splitting ratios equal to 70%, and 30%, respectively. RESULTS: The results showed that the neural network model is the most effective model in developing COVID-19 prediction with the highest performance metrics using train and test datasets. The results found that recovered cases are associated with the place of the patients mainly, province of the patient. Besides the results showed that high fever is not strongly associated with recovered cases, where cough and low fever are strongly associated with recovered cases. In addition, the country, sex, and age of the patients have higher importance than other patient's symptoms in COVID-19 development. CONCLUSIONS: The results revealed that the prediction models of the recovered COVID-19 cases can be effectively predicted using patient characteristics and symptoms, besides the neural network model is the most effective model to create a COVID -19 prediction model. Finally, the research provides empirical evidence that recovered cases of COVID-19 are closely related to patients' provinces.


Assuntos
COVID-19 , Modelos Teóricos , Redes Neurais de Computação , SARS-CoV-2 , Máquina de Vetores de Suporte , Avaliação de Sintomas , Humanos , Metadados
12.
Sensors (Basel) ; 21(18)2021 Sep 16.
Artigo em Inglês | MEDLINE | ID: mdl-34577422

RESUMO

This work considers the design and practical implementation of JSCC-Cast, a comprehensive analog video encoding and transmission system requiring a reduced amount of digital metadata. Suitable applications for JSCC-Cast are multicast transmissions over time-varying channels and Internet of Things wireless connectivity of end devices having severe constraints on their computational capabilities. The proposed system exhibits a similar image quality compared to existing analog and hybrid encoding alternatives such as Softcast. Its design is based on the use of linear transforms that exploit the spatial and temporal redundancy and the analog encoding of the transformed coefficients with different protection levels depending on their relevance. JSCC-Cast is compared to Softcast, which is considered the benchmark for analog and hybrid video coding, and with an all-digital H.265-based encoder. The results show that, depending on the scenario and considering image quality metrics such as the structural similarity index measure, the peak signal-to-noise ratio, and the perceived quality of the video, JSCC-Cast exhibits a performance close to that of Softcast but with less metadata and not requiring a feedback channel in order to track channel variations. Moreover, in some circumstances, the JSCC-Cast obtains a perceived quality for the frames comparable to those displayed by the digital one.


Assuntos
Algoritmos , Metadados , Razão Sinal-Ruído , Software , Gravação em Vídeo
13.
Nutrients ; 13(9)2021 Sep 21.
Artigo em Inglês | MEDLINE | ID: mdl-34579168

RESUMO

In any research field, data access and data integration are major challenges that even large, well-established consortia face. Although data sharing initiatives are increasing, joint data analyses on nutrition and microbiomics in health and disease are still scarce. We aimed to identify observational studies with data on nutrition and gut microbiome composition from the Intestinal Microbiomics (INTIMIC) Knowledge Platform following the findable, accessible, interoperable, and reusable (FAIR) principles. An adapted template from the European Nutritional Phenotype Assessment and Data Sharing Initiative (ENPADASI) consortium was used to collect microbiome-specific information and other related factors. In total, 23 studies (17 longitudinal and 6 cross-sectional) were identified from Italy (7), Germany (6), Netherlands (3), Spain (2), Belgium (1), and France (1) or multiple countries (3). Of these, 21 studies collected information on both dietary intake (24 h dietary recall, food frequency questionnaire (FFQ), or Food Records) and gut microbiome. All studies collected stool samples. The most often used sequencing platform was Illumina MiSeq, and the preferred hypervariable regions of the 16S rRNA gene were V3-V4 or V4. The combination of datasets will allow for sufficiently powered investigations to increase the knowledge and understanding of the relationship between food and gut microbiome in health and disease.


Assuntos
Microbioma Gastrointestinal , Inquéritos Nutricionais , Ciências da Nutrição , Estudos Observacionais como Assunto , Inquéritos sobre Dietas/métodos , Ingestão de Alimentos , Europa (Continente) , Humanos , Disseminação de Informação , Metadados , Inquéritos Nutricionais/métodos , Ciências da Nutrição/métodos
14.
Database (Oxford) ; 20212021 09 29.
Artigo em Inglês | MEDLINE | ID: mdl-34585726

RESUMO

EpiSurf is a Web application for selecting viral populations of interest and then analyzing how their amino acid changes are distributed along epitopes. Viral sequences are searched within ViruSurf, which stores curated metadata and amino acid changes imported from the most widely used deposition sources for viral databases (GenBank, COVID-19 Genomics UK (COG-UK) and Global initiative on sharing all influenza data (GISAID)). Epitopes are searched within the open source Immune Epitope Database or directly proposed by users by indicating their start and stop positions in the context of a given viral protein. Amino acid changes of selected populations are joined with epitopes of interest; a result table summarizes, for each epitope, statistics about the overlapping amino acid changes and about the sequences carrying such alterations. The results may also be inspected by the VirusViz Web application; epitope regions are highlighted within the given viral protein, and changes can be comparatively inspected. For sequences mutated within the epitope, we also offer a complete view of the distribution of amino acid changes, optionally grouped by the location, collection date or lineage. Thanks to these functionalities, EpiSurf supports the user-friendly testing of epitope conservancy within selected populations of interest, which can be of utmost relevance for designing vaccines, drugs or serological assays. EpiSurf is available at two endpoints. Database URL: http://gmql.eu/episurf/ (for searching GenBank and COG-UK sequences) and http://gmql.eu/episurf_gisaid/ (for GISAID sequences).


Assuntos
Substituição de Aminoácidos , Antígenos Virais/química , Epitopos/química , Internet , Metadados , SARS-CoV-2/química , Ferramenta de Busca , Software , Aminoácidos/química , Aminoácidos/imunologia , Antígenos Virais/imunologia , COVID-19/virologia , Epitopos/imunologia , Humanos , SARS-CoV-2/imunologia
15.
RECIIS (Online) ; 15(3): 722-735, jul.-set. 2021. ilus, tab
Artigo em Inglês | LILACS | ID: biblio-1342698

RESUMO

The FAIR principles have become a data management instrument for the academic and scientific community, since they provide a set of guiding principles to bring findability, accessibility, interoperability and reusability to data and metadata stewardship. Since their official publication in 2016 by Scientific Data ­ Nature, these principles have received worldwide recognition and have been quickly endorsed and adopted as a cornerstone of data stewardship and research policy. However, when put into practice, they occasionally result in organisational, legal and technological challenges that can lead to doubts and uncertainty as to whether the effort of implementing them is worthwhile. Soon after their publication, the European Commission and other funding agencies started to require that project proposals include a Data Management Plan (DMP) based on the FAIR principles. This paper reports on the adherence of DMPs to the FAIR principles, critically evaluating ten European DMP templates. We observed that the current FAIRness of most of these DMPs is only partly satisfactory, in that they address data best practices, findability, accessibility and sometimes preservation, but pay much less attention to metadata and interoperability.


Os princípios FAIR tornaram-se um instrumento de gestão de dados para a comunidade acadêmica e científica, uma vez que fornecem um conjunto de princípios orientadores que facilitam a localização, acessibilidade, interoperabilidade e reutilização de dados e metadados. Desde sua publicação oficial em 2016 pela Scientific Data - Nature, esses princípios receberam reconhecimento mundial e foram rapidamente endossados e adotados como pilares da gestão de dados e das políticas de pesquisa. No entanto, quando postos em prática, apresentam ocasionalmente desafios organizacionais, jurídicos e tecnológicos que podem levar a dúvidas e incertezas quanto ao esforço em implementá-los. Logo após sua publicação, a Comissão Europeia e outras agências de financiamento começaram a exigir nas suas propostas de projetos um Plano de Gestão de Dados (PGD) com base nos princípios da FAIR. Este artigo relata a aderência dos PGDs aos princípios FAIR, avaliando criticamente dez modelos europeus de PGD. Observamos que o nível de FAIRness da maioria dos PGDs analisados ainda é parcialmente satisfatório, uma vez que abordam as melhores práticas de dados, localização, acessibilidade e, às vezes, preservação, mas dão pouca atenção aos metadados e a interoperabilidade.


Los principios FAIR se han convertido en una herramienta de gestión de datos para la comunidad académica y científica, ya que proporcionan un conjunto de principios rectores que facilitan la localización, accesibilidad, interoperabilidad y reutilización de la gestión de datos y metadatos. Desde su publicación oficial en 2016 por Scientific Data - Nature, estos principios han recibido reconocimiento mundial y fueron rápidamente respaldados y adoptados como pilares de la política de investigación y gestión de datos. Sin embargo, cuando se ponen en práctica, ocasionalmente presentan desafíos organizativos, legales y tecnológicos que pueden generar dudas e incertidumbres sobre el esfuerzo para implementarlos. Poco después de su publicación, la Comisión Europea y otras agencias de financiación comenzaron a exigir en sus propuestas de proyectos un Plan de Gestión de Datos (PGD) basado en los principios de FAIR. Este artículo informa sobre la adherencia de los PGD a los principios FAIR, evaluando críticamente diez modelos europeos de PGD. Observamos que el nivel de FAIRness de la mayoría de los PGD analizados sigue siendo parcialmente insatisfactorio, ya que abordan las mejores prácticas de datos, ubicación, accesibilidad y, a veces, preservación, pero prestan poca atención a los metadatos y la interoperabilidad.


Assuntos
Humanos , Metadados , Comunicação Acadêmica , Interoperabilidade da Informação em Saúde , Gerenciamento de Dados , Comentário , Política de Pesquisa em Saúde , Domínios Científicos , Análise de Dados
16.
Sci Data ; 8(1): 214, 2021 08 11.
Artigo em Inglês | MEDLINE | ID: mdl-34381057

RESUMO

Transcriptional profiling of pre- and post-malignant colorectal cancer (CRC) lesions enable temporal monitoring of molecular events underlying neoplastic progression. However, the most widely used transcriptomic dataset for CRC, TCGA-COAD, is devoid of adenoma samples, which increases reliance on an assortment of disparate microarray studies and hinders consensus building. To address this, we developed a microarray meta-dataset comprising 231 healthy, 132 adenoma, and 342 CRC tissue samples from twelve independent studies. Utilizing a stringent analytic framework, select datasets were downloaded from the Gene Expression Omnibus, normalized by frozen robust multiarray averaging and subsequently merged. Batch effects were then identified and removed by empirical Bayes estimation (ComBat). Finally, the meta-dataset was filtered for low variant probes, enabling downstream differential expression as well as quantitative and functional validation through cross-platform correlation and enrichment analyses, respectively. Overall, our meta-dataset provides a robust tool for investigating colorectal adenoma formation and malignant transformation at the transcriptional level with a pipeline that is modular and readily adaptable for similar analyses in other cancer types.


Assuntos
Adenoma/genética , Adenoma/patologia , Transformação Celular Neoplásica/genética , Neoplasias Colorretais/genética , Neoplasias Colorretais/patologia , Perfilação da Expressão Gênica , Idoso , Feminino , Humanos , Masculino , Metadados , Pessoa de Meia-Idade , Análise de Sequência com Séries de Oligonucleotídeos , Transcriptoma
17.
PLoS Biol ; 19(8): e3001319, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-34437530

RESUMO

Cryo-electron tomography (cryo-ET) and subtomogram averaging (STA) are increasingly used for macromolecular structure determination in situ. Here, we introduce a set of computational tools and resources designed to enable flexible approaches to STA through increased automation and simplified metadata handling. We create a bidirectional interface between the Dynamo software package and the Warp-Relion-M pipeline, providing a framework for ab initio and geometrical approaches to multiparticle refinement in M. We illustrate the power of working within this framework by applying it to EMPIAR-10164, a publicly available dataset containing immature HIV-1 virus-like particles (VLPs), and a challenging in situ dataset containing chemosensory arrays in bacterial minicells. Additionally, we provide a comprehensive, step-by-step guide to obtaining a 3.4-Å reconstruction from EMPIAR-10164. The guide is hosted on https://teamtomo.org/, a collaborative online platform we establish for sharing knowledge about cryo-ET.


Assuntos
Microscopia Crioeletrônica , Tomografia com Microscopia Eletrônica , Processamento de Imagem Assistida por Computador/métodos , Software , Escherichia coli , Metadados
18.
Database (Oxford) ; 20212021 08 14.
Artigo em Inglês | MEDLINE | ID: mdl-34389844

RESUMO

Researchers are seeking cost-effective solutions for management and analysis of large-scale genotypic and phenotypic data. Open-source software is uniquely positioned to fill this need through user-focused, crowd-sourced development. Tripal, an open-source toolkit for developing biological data web portals, uses the GMOD Chado database schema to achieve flexible, ontology-driven storage in PostgreSQL. Tripal also aids research-focused web portals in providing data according to findable, accessible, interoperable, reusable (FAIR) principles. We describe here a fully relational PostgreSQL solution to handle large-scale genotypic and phenotypic data that is implemented as a collection of freely available, open-source modules. These Tripal extension modules provide a holistic approach for importing, storage, display and analysis within a relational database schema. Furthermore, they embody the Tripal approach to FAIR data by providing multiple search tools and ensuring metadata is fully described and interoperable. Our solution focuses on data integrity, as well as optimizing performance to provide a fully functional system that is currently being used in the production of Tripal portals for crop species. We fully describe the implementation of our solution and discuss why a PostgreSQL-powered web portal provides an efficient environment for researcher-driven genotypic and phenotypic data analysis.


Assuntos
Bases de Dados Genéticas , Software , Genótipo , Metadados
19.
J Biomed Inform ; 122: 103897, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-34454078

RESUMO

INTRODUCTION: Existing methods to make data Findable, Accessible, Interoperable, and Reusable (FAIR) are usually carried out in a post hoc manner: after the research project is conducted and data are collected. De-novo FAIRification, on the other hand, incorporates the FAIRification steps in the process of a research project. In medical research, data is often collected and stored via electronic Case Report Forms (eCRFs) in Electronic Data Capture (EDC) systems. By implementing a de novo FAIRification process in such a system, the reusability and, thus, scalability of FAIRification across research projects can be greatly improved. In this study, we developed and implemented a novel method for de novo FAIRification via an EDC system. We evaluated our method by applying it to the Registry of Vascular Anomalies (VASCA). METHODS: Our EDC and research project independent method ensures that eCRF data entered into an EDC system can be transformed into machine-readable, FAIR data using a semantic data model (a canonical representation of the data, based on ontology concepts and semantic web standards) and mappings from the model to questions on the eCRF. The FAIRified data are stored in a triple store and can, together with associated metadata, be accessed and queried through a FAIR Data Point. The method was implemented in Castor EDC, an EDC system, through a data transformation application. The FAIRness of the output of the method, the FAIRified data and metadata, was evaluated using the FAIR Evaluation Services. RESULTS: We successfully applied our FAIRification method to the VASCA registry. Data entered on eCRFs is automatically transformed into machine-readable data and can be accessed and queried using SPARQL queries in the FAIR Data Point. Twenty-one FAIR Evaluator tests pass and one test regarding the metadata persistence policy fails, since this policy is not in place yet. CONCLUSION: In this study, we developed a novel method for de novo FAIRification via an EDC system. Its application in the VASCA registry and the automated FAIR evaluation show that the method can be used to make clinical research data FAIR when they are entered in an eCRF without any intervention from data management and data entry personnel. Due to the generic approach and developed tooling, we believe that our method can be used in other registries and clinical trials as well.


Assuntos
Pesquisa Biomédica , Metadados , Gerenciamento de Dados , Eletrônica , Sistema de Registros
20.
J Proteome Res ; 20(9): 4621-4624, 2021 09 03.
Artigo em Inglês | MEDLINE | ID: mdl-34342226

RESUMO

The volume of proteomics and mass spectrometry data available in public repositories continues to grow at a rapid pace as more researchers embrace open science practices. Open access to the data behind scientific discoveries has become critical to validate published findings and develop new computational tools. Here, we present ppx, a Python package that provides easy, programmatic access to the data stored in ProteomeXchange repositories, such as PRIDE and MassIVE. The ppx package can be used as either a command line tool or a Python package to retrieve the files and metadata associated with a project when provided its identifier. To demonstrate how ppx enhances reproducible research, we used ppx within a Snakemake workflow to reanalyze a published data set with the open modification search tool ANN-SoLo and compared our reanalysis to the original results. We show that ppx readily integrates into workflows, and our reanalysis produced results consistent with the original analysis. We envision that ppx will be a valuable tool for creating reproducible analyses, providing tool developers easy access to data for development, testing, and benchmarking, and enabling the use of mass spectrometry data in data-intensive analyses. The ppx package is freely available and open source under the MIT license at https://github.com/wfondrie/ppx.


Assuntos
Proteômica , Software , Espectrometria de Massas , Metadados , Ferramenta de Busca
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...