Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 469
Filtrar
1.
Sci Rep ; 12(1): 5767, 2022 Apr 06.
Artigo em Inglês | MEDLINE | ID: mdl-35388080

RESUMO

Accumulation of beta-amyloid in the brain and cognitive decline are considered hallmarks of Alzheimer's disease. Knowing from previous studies that these two factors can manifest in the retina, the aim was to investigate whether a deep learning method was able to predict the cognition of an individual from a RGB image of his retina and metadata. A deep learning model, EfficientNet, was used to predict cognitive scores from the Canadian Longitudinal Study on Aging (CLSA) database. The proposed model explained 22.4% of the variance in cognitive scores on the test dataset using fundus images and metadata. Metadata alone proved to be more effective in explaining the variance in the sample (20.4%) versus fundus images (9.3%) alone. Attention maps highlighted the optic nerve head as the most influential feature in predicting cognitive scores. The results demonstrate that RGB fundus images are limited in predicting cognition.


Assuntos
Aprendizado Profundo , Canadá , Cognição , Fundo de Olho , Estudos Longitudinais , Metadados
2.
BMC Bioinformatics ; 23(1): 123, 2022 Apr 07.
Artigo em Inglês | MEDLINE | ID: mdl-35392801

RESUMO

BACKGROUND: Heterogeneous omics data, increasingly collected through high-throughput technologies, can contain hidden answers to very important and still unsolved biomedical questions. Their integration and processing are crucial mostly for tertiary analysis of Next Generation Sequencing data, although suitable big data strategies still address mainly primary and secondary analysis. Hence, there is a pressing need for algorithms specifically designed to explore big omics datasets, capable of ensuring scalability and interoperability, possibly relying on high-performance computing infrastructures. RESULTS: We propose RGMQL, a R/Bioconductor package conceived to provide a set of specialized functions to extract, combine, process and compare omics datasets and their metadata from different and differently localized sources. RGMQL is built over the GenoMetric Query Language (GMQL) data management and computational engine, and can leverage its open curated repository as well as its cloud-based resources, with the possibility of outsourcing computational tasks to GMQL remote services. Furthermore, it overcomes the limits of the GMQL declarative syntax, by guaranteeing a procedural approach in dealing with omics data within the R/Bioconductor environment. But mostly, it provides full interoperability with other packages of the R/Bioconductor framework and extensibility over the most used genomic data structures and processing functions. CONCLUSIONS: RGMQL is able to combine the query expressiveness and computational efficiency of GMQL with a complete processing flow in the R environment, being a fully integrated extension of the R/Bioconductor framework. Here we provide three fully reproducible example use cases of biological relevance that are particularly explanatory of its flexibility of use and interoperability with other R/Bioconductor packages. They show how RGMQL can easily scale up from local to parallel and cloud computing while it combines and analyzes heterogeneous omics data from local or remote datasets, both public and private, in a completely transparent way to the user.


Assuntos
Metadados , Software , Big Data , Computação em Nuvem , Genômica
3.
Sci Data ; 9(1): 169, 2022 04 13.
Artigo em Inglês | MEDLINE | ID: mdl-35418585

RESUMO

The genomes of thousands of individuals are profiled within Dutch healthcare and research each year. However, this valuable genomic data, associated clinical data and consent are captured in different ways and stored across many systems and organizations. This makes it difficult to discover rare disease patients, reuse data for personalized medicine and establish research cohorts based on specific parameters. FAIR Genomes aims to enable NGS data reuse by developing metadata standards for the data descriptions needed to FAIRify genomic data while also addressing ELSI issues. We developed a semantic schema of essential data elements harmonized with international FAIR initiatives. The FAIR Genomes schema v1.1 contains 110 elements in 9 modules. It reuses common ontologies such as NCIT, DUO and EDAM, only introducing new terms when necessary. The schema is represented by a YAML file that can be transformed into templates for data entry software (EDC) and programmatic interfaces (JSON, RDF) to ease genomic data sharing in research and healthcare. The schema, documentation and MOLGENIS reference implementation are available at https://fairgenomes.org .


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Metadados , Atenção à Saúde , Genômica , Humanos , Software
4.
J Med Internet Res ; 24(4): e28114, 2022 04 22.
Artigo em Inglês | MEDLINE | ID: mdl-35451980

RESUMO

BACKGROUND: Advances in biomedical research using deep learning techniques have generated a large volume of related literature. However, there is a lack of scientometric studies that provide a bird's-eye view of them. This absence has led to a partial and fragmented understanding of the field and its progress. OBJECTIVE: This study aimed to gain a quantitative and qualitative understanding of the scientific domain by analyzing diverse bibliographic entities that represent the research landscape from multiple perspectives and levels of granularity. METHODS: We searched and retrieved 978 deep learning studies in biomedicine from the PubMed database. A scientometric analysis was performed by analyzing the metadata, content of influential works, and cited references. RESULTS: In the process, we identified the current leading fields, major research topics and techniques, knowledge diffusion, and research collaboration. There was a predominant focus on applying deep learning, especially convolutional neural networks, to radiology and medical imaging, whereas a few studies focused on protein or genome analysis. Radiology and medical imaging also appeared to be the most significant knowledge sources and an important field in knowledge diffusion, followed by computer science and electrical engineering. A coauthorship analysis revealed various collaborations among engineering-oriented and biomedicine-oriented clusters of disciplines. CONCLUSIONS: This study investigated the landscape of deep learning research in biomedicine and confirmed its interdisciplinary nature. Although it has been successful, we believe that there is a need for diverse applications in certain areas to further boost the contributions of deep learning in addressing biomedical research problems. We expect the results of this study to help researchers and communities better align their present and future work.


Assuntos
Pesquisa Biomédica , Aprendizado Profundo , Bibliometria , Humanos , Metadados , Redes Neurais de Computação , Publicações
5.
J Biomed Semantics ; 13(1): 12, 2022 Apr 25.
Artigo em Inglês | MEDLINE | ID: mdl-35468846

RESUMO

BACKGROUND: The COVID-19 pandemic has challenged healthcare systems and research worldwide. Data is collected all over the world and needs to be integrated and made available to other researchers quickly. However, the various heterogeneous information systems that are used in hospitals can result in fragmentation of health data over multiple data 'silos' that are not interoperable for analysis. Consequently, clinical observations in hospitalised patients are not prepared to be reused efficiently and timely. There is a need to adapt the research data management in hospitals to make COVID-19 observational patient data machine actionable, i.e. more Findable, Accessible, Interoperable and Reusable (FAIR) for humans and machines. We therefore applied the FAIR principles in the hospital to make patient data more FAIR. RESULTS: In this paper, we present our FAIR approach to transform COVID-19 observational patient data collected in the hospital into machine actionable digital objects to answer medical doctors' research questions. With this objective, we conducted a coordinated FAIRification among stakeholders based on ontological models for data and metadata, and a FAIR based architecture that complements the existing data management. We applied FAIR Data Points for metadata exposure, turning investigational parameters into a FAIR dataset. We demonstrated that this dataset is machine actionable by means of three different computational activities: federated query of patient data along open existing knowledge sources across the world through the Semantic Web, implementing Web APIs for data query interoperability, and building applications on top of these FAIR patient data for FAIR data analytics in the hospital. CONCLUSIONS: Our work demonstrates that a FAIR research data management plan based on ontological models for data and metadata, open Science, Semantic Web technologies, and FAIR Data Points is providing data infrastructure in the hospital for machine actionable FAIR Digital Objects. This FAIR data is prepared to be reused for federated analysis, linkable to other FAIR data such as Linked Open Data, and reusable to develop software applications on top of them for hypothesis generation and knowledge discovery.


Assuntos
COVID-19 , Pandemias , COVID-19/epidemiologia , Hospitais , Humanos , Metadados , Web Semântica
6.
Sensors (Basel) ; 22(8)2022 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-35459018

RESUMO

The work aims to propose a novel approach for automatically identifying all instruments present in an audio excerpt using sets of individual convolutional neural networks (CNNs) per tested instrument. The paper starts with a review of tasks related to musical instrument identification. It focuses on tasks performed, input type, algorithms employed, and metrics used. The paper starts with the background presentation, i.e., metadata description and a review of related works. This is followed by showing the dataset prepared for the experiment and its division into subsets: training, validation, and evaluation. Then, the analyzed architecture of the neural network model is presented. Based on the described model, training is performed, and several quality metrics are determined for the training and validation sets. The results of the evaluation of the trained network on a separate set are shown. Detailed values for precision, recall, and the number of true and false positive and negative detections are presented. The model efficiency is high, with the metric values ranging from 0.86 for the guitar to 0.99 for drums. Finally, a discussion and a summary of the results obtained follows.


Assuntos
Aprendizado Profundo , Algoritmos , Benchmarking , Metadados , Redes Neurais de Computação
7.
BMC Res Notes ; 15(1): 106, 2022 Mar 18.
Artigo em Inglês | MEDLINE | ID: mdl-35303952

RESUMO

Open science and open data within scholarly research programs are growing both in popularity and by requirement from grant funding agencies and journal publishers. A central component of open data management, especially on collaborative, multidisciplinary, and multi-institutional science projects, is documentation of complete and accurate metadata, workflow, and source code in addition to access to raw data and data products to uphold FAIR (Findable, Accessible, Interoperable, Reusable) principles. Although best practice in data/metadata management is to use established internationally accepted metadata schemata, many of these standards are discipline-specific making it difficult to catalog multidisciplinary data and data products in a way that is easily findable and accessible. Consequently, scattered and incompatible metadata records create a barrier to scientific innovation, as researchers are burdened to find and link multidisciplinary datasets. One possible solution to increase data findability, accessibility, interoperability, reproducibility, and integrity within multi-institutional and interdisciplinary projects is a centralized and integrated data management platform. Overall, this type of interoperable framework supports reproducible open science and its dissemination to various stakeholders and the public in a FAIR manner by providing direct access to raw data and linking protocols, metadata and supporting workflow materials.


Assuntos
Gerenciamento de Dados , Metadados , Pesquisa Interdisciplinar , Reprodutibilidade dos Testes , Software
8.
Sensors (Basel) ; 22(6)2022 Mar 09.
Artigo em Inglês | MEDLINE | ID: mdl-35336307

RESUMO

Sensor data from digital health technologies (DHTs) used in clinical trials provides a valuable source of information, because of the possibility to combine datasets from different studies, to combine it with other data types, and to reuse it multiple times for various purposes. To date, there exist no standards for capturing or storing DHT biosensor data applicable across modalities and disease areas, and which can also capture the clinical trial and environment-specific aspects, so-called metadata. In this perspectives paper, we propose a metadata framework that divides the DHT metadata into metadata that is independent of the therapeutic area or clinical trial design (concept of interest and context of use), and metadata that is dependent on these factors. We demonstrate how this framework can be applied to data collected with different types of DHTs deployed in the WATCH-PD clinical study of Parkinson's disease. This framework provides a means to pre-specify and therefore standardize aspects of the use of DHTs, promoting comparability of DHTs across future studies.


Assuntos
Metadados , Doença de Parkinson , Humanos
9.
J Chem Inf Model ; 62(7): 1633-1643, 2022 04 11.
Artigo em Inglês | MEDLINE | ID: mdl-35349259

RESUMO

The layout of portable document format (PDF) files is constant to any screen, and the metadata therein are latent, compared to mark-up languages such as HTML and XML. No semantic tags are usually provided, and a PDF file is not designed to be edited or its data interpreted by software. However, data held in PDF files need to be extracted in order to comply with open-source data requirements that are now government-regulated. In the chemical domain, related chemical and property data also need to be found, and their correlations need to be exploited to enable data science in areas such as data-driven materials discovery. Such relationships may be realized using text-mining software such as the "chemistry-aware" natural-language-processing tool, ChemDataExtractor; however, this tool has limited data-extraction capabilities from PDF files. This study presents the PDFDataExtractor tool, which can act as a plug-in to ChemDataExtractor. It outperforms other PDF-extraction tools for the chemical literature by coupling its functionalities to the chemical-named entity-recognition capabilities of ChemDataExtractor. The intrinsic PDF-reading abilities of ChemDataExtractor are much improved. The system features a template-based architecture. This enables semantic information to be extracted from the PDF files of scientific articles in order to reconstruct the logical structure of articles. While other existing PDF-extracting tools focus on quantity mining, this template-based system is more focused on quality mining on different layouts. PDFDataExtractor outputs information in JSON and plain text, including the metadata of a PDF file, such as paper title, authors, affiliation, email, abstract, keywords, journal, year, document object identifier (DOI), reference, and issue number. With a self-created evaluation article set, PDFDataExtractor achieved promising precision for all key assessed metadata areas of the document text.


Assuntos
Metadados , Leitura , Mineração de Dados , Processamento de Linguagem Natural , Software
10.
J Biomed Semantics ; 13(1): 10, 2022 03 18.
Artigo em Inglês | MEDLINE | ID: mdl-35303946

RESUMO

BACKGROUND: Health data from different specialties or domains generallly have diverse formats and meanings, which can cause semantic communication barriers when these data are exchanged among heterogeneous systems. As such, this study is intended to develop a national health concept data model (HCDM) and develop a corresponding system to facilitate healthcare data standardization and centralized metadata management. METHODS: Based on 55 data sets (4640 data items) from 7 health business domains in China, a bottom-up approach was employed to build the structure and metadata for HCDM by referencing HL7 RIM. According to ISO/IEC 11179, a top-down approach was used to develop and standardize the data elements. RESULTS: HCDM adopted three-level architecture of class, attribute and data type, and consisted of 6 classes and 15 sub-classes. Each class had a set of descriptive attributes and every attribute was assigned a data type. 100 initial data elements (DEs) were extracted from HCDM and 144 general DEs were derived from corresponding initial DEs. Domain DEs were transformed by specializing general DEs using 12 controlled vocabularies which developed from HL7 vocabularies and actual health demands. A model-based system was successfully established to evaluate and manage the NHDD. CONCLUSIONS: HCDM provided a unified metadata reference for multi-source data standardization and management. This approach of defining health data elements was a feasible solution in healthcare information standardization to enable healthcare interoperability in China.


Assuntos
Metadados , Vocabulário Controlado , Atenção à Saúde , Semântica
11.
Sci Rep ; 12(1): 3933, 2022 03 10.
Artigo em Inglês | MEDLINE | ID: mdl-35273188

RESUMO

Forest tree improvement helps provide adapted planting stock to ensure growth productivity, fibre quality and carbon sequestration through reforestation and afforestation activities. However, there is increasing doubt that conventional pedigree provides the most accurate estimates for selection and prediction of performance of improved planting stock. When the additive genetic relationships among relatives is estimated using pedigree information, it is not possible to take account of Mendelian sampling due to the random segregation of parental alleles. The use of DNA markers distributed genome-wide (multi-locus genotypes) makes it possible to estimate the realized additive genomic relationships, which takes account of the Mendelian sampling and possible pedigree errors. We reviewed a series of papers on conifer and broadleaf tree species in which both pedigree-based and marker-based estimates of genetic parameters have been reported. Using metadata analyses, we show that for heritability and genetic gains, the estimates obtained using only the pedigree information are generally biased upward compared to those obtained using DNA markers distributed genome-wide, and that genotype-by-environment (GxE) interaction can be underestimated for low to moderate heritability traits. As high-throughput genotyping becomes economically affordable, we recommend expanding the use of genomic selection to obtain more accurate estimates of genetic parameters and gains.


Assuntos
Metadados , Árvores , Alelos , Marcadores Genéticos , Genótipo , Modelos Genéticos , Fenótipo , Melhoramento Vegetal
12.
J Acoust Soc Am ; 151(2): 1125, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-35232080

RESUMO

Knowledge of hearing ability, as represented in audiograms, is essential for understanding how animals acoustically perceive their environment, predicting and counteracting the effects of anthropogenic noise, and managing wildlife. Audiogram data and relevant background information are currently only available embedded in the text of individual scientific publications in various unstandardized formats. This heterogeneity makes it hard to access, compare, and integrate audiograms. The Animal Audiogram Database (https://animalaudiograms.org) assembles published audiogram data, metadata about the corresponding experiments, and links to the original publications in a consistent format. The database content is the result of an extensive survey of the scientific literature and manual curation of the audiometric data found therein. As of November 1, 2021, the database contains 306 audiogram datasets from 34 animal species. The scope and format of the provided metadata and design of the database interface were established by active research community involvement. Options to compare audiograms and download datasets in structured formats are provided. With the focus currently on vertebrates and hearing in underwater environments, the database is drafted as a free and open resource for facilitating the review and correction of the contained data and collaborative extension with audiogram data from any taxonomic group and habitat.


Assuntos
Audiometria , Metadados , Animais , Audição , Testes Auditivos , Ruído
13.
Database (Oxford) ; 20222022 03 09.
Artigo em Inglês | MEDLINE | ID: mdl-35262674

RESUMO

To meet the increasing demand for data sharing, data reuse and meta-analysis in the immunology research community, we have developed the data discovery system ImmuneData. The system provides integrated access to five immunology data repositories funded by the National Institute of Allergy and Infectious Diseases, Division of Allergy, Immunology and Transplantation, including ImmPort, ImmuneSpace, ITN TrialShare, ImmGen and IEDB. ImmuneData restructures the data repositories' metadata into a uniform schema using domain experts' knowledge and state-of-the-art Natural Language Processing (NLP) technologies. It comes with a user-friendly web interface, accessible at http://www.immunedata.org/, and a Google-like search engine for biological researchers to find and access data easily. The vast quantity of synonyms used in biomedical research increase the likelihood of incomplete search results. Thus, our search engine converts queries submitted by users into ontology terms, which are then expended by NLP technologies to ensure that the search results will include all synonyms for a particular concept. The system also includes an advanced search function to build customized queries to meet higher-level users' needs. ImmuneData ensures the FAIR principle (Findability, Accessibility, Interoperability and Reusability) of the five data repositories to benefit data reuse in the immunology research community. The data pipeline constructing our system can be extended to other data repositories to build a more comprehensive biological data discovery system. DATABASE URL: http://www.immunedata.org/.


Assuntos
Metadados , Processamento de Linguagem Natural , Bases de Dados Factuais , Disseminação de Informação , Ferramenta de Busca
14.
J Biomed Inform ; 127: 104007, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-35124236

RESUMO

Biomedical research data reuse and sharing is essential for fostering research progress. To this aim, data producers need to master data management and reporting through standard and rich metadata, as encouraged by open data initiatives such as the FAIR (Findable, Accessible, Interoperable, Reusable) guidelines. This helps data re-users to understand and reuse the shared data with confidence. Therefore, dedicated frameworks are required. The provenance reporting throughout a biomedical study lifecycle has been proposed as a way to increase confidence in data while reusing it. The Biomedical Study - Lifecycle Management (BMS-LM) data model has implemented provenance and lifecycle traceability for several multimodal-imaging techniques but this is not enough for data understanding while reusing it. Actually, in the large scope of biomedical research, a multitude of metadata sources, also called Knowledge Organization Systems (KOSs), are available for data annotation. In addition, data producers uses local terminologies or KOSs, containing vernacular terms for data reporting. The result is a set of heterogeneous KOSs (local and published) with different formats and levels of granularity. To manage the inherent heterogeneity, semantic interoperability is encouraged by the Research Data Management (RDM) community. Ontologies, and more specifically top ontologies such as BFO and DOLCE, make explicit the metadata semantics and enhance semantic interoperability. Based on the BMS-LM data model and the BFO top ontology, the BioMedical Study - Lifecycle Management (BMS-LM) core ontology is proposed together with an associated framework for semantic interoperability between heterogeneous KOSs. It is made of four ontological levels: top/core/domain/local and aims to build bridges between local and published KOSs. In this paper, the conversion of the BMS-LM data model to a core ontology is detailed. The implementation of its semantic interoperability in a specific domain context is explained and illustrated with examples from small animal preclinical research.


Assuntos
Ontologias Biológicas , Pesquisa Biomédica , Animais , Curadoria de Dados , Metadados , Projetos de Pesquisa , Semântica
15.
BMJ Health Care Inform ; 29(1)2022 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-35193857

RESUMO

OBJECTIVE: How health researchers find secondary data to analyse is unclear. We sought to describe the approaches that UK organisations take to help researchers find data and to assess the findability of health data that are available for research. METHODS: We surveyed established organisations about how they make data findable. We derived measures of findability based on the first element of the FAIR principles (Findable, Accessible, Interoperable, Reproducible). We applied these to 13 UK health datasets and measured their findability via two major internet search engines in 2018 and repeated in 2021. RESULTS: Among 12 survey respondents, 11 indicated that they made metadata publicly available. Respondents said internet presence was important for findability, but that this needed improvement. In 2018, 8 out of 13 datasets were listed in the top 100 search results of 10 searches repeated on both search engines, while the remaining 5 were found one click away from those search results. In 2021, this had reduced to seven datasets directly listed and one dataset one click away. In 2021, Google Dataset Search had become available, which listed 3 of the 13 datasets within the top 100 search results. DISCUSSION: Measuring findability via online search engines is one method for evaluating efforts to improve findability. Findability could perhaps be improved with catalogues that have greater inclusion of datasets, field-level metadata and persistent identifiers. CONCLUSION: UK organisations recognised the importance of the internet for finding data for research. However, health datasets available for research were no more findable in 2021 than in 2018.


Assuntos
Metadados , Humanos , Reino Unido
16.
PLoS One ; 17(2): e0263891, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35148341

RESUMO

Crowdfunding platforms allow entrepreneurs to publish projects and raise funds for realizing them. Hence, the question of what influences projects' fundraising success is very important. Previous studies examined various factors such as project goals and project duration that may influence the outcomes of fundraising campaigns. We present a novel model for predicting the success of crowdfunding projects in meeting their funding goals. Our model focuses on semantic features only, whose performance is comparable to that of previous models. In an additional model we developed, we examine both project metadata and project semantics, delivering a comprehensive study of factors influencing crowdfunding success. Further, we analyze a large dataset of crowdfunding project data, larger than reported in the art. Finally, we show that when combining semantics and metadata, we arrive at F1 score accuracy of 96.2%. We compare our model's accuracy to the accuracy of previous research models by applying their methods on our dataset, and demonstrate higher accuracy of our model. In addition to our scientific contribution, we provide practical recommendations that may increase project funding success chances.


Assuntos
Crowdsourcing/métodos , Obtenção de Fundos/métodos , Algoritmos , Humanos , Metadados , Modelos Teóricos , Semântica
17.
BMC Bioinformatics ; 23(1): 61, 2022 Feb 07.
Artigo em Inglês | MEDLINE | ID: mdl-35130839

RESUMO

BACKGROUND: As technical developments in omics and biomedical imaging increase the throughput of data generation in life sciences, the need for information systems capable of managing heterogeneous digital assets is increasing. In particular, systems supporting the findability, accessibility, interoperability, and reusability (FAIR) principles of scientific data management. RESULTS: We propose a Service Oriented Architecture approach for integrated management and analysis of multi-omics and biomedical imaging data. Our architecture introduces an image management system into a FAIR-supporting, web-based platform for omics data management. Interoperable metadata models and middleware components implement the required data management operations. The resulting architecture allows for FAIR management of omics and imaging data, facilitating metadata queries from software applications. The applicability of the proposed architecture is demonstrated using two technical proofs of concept and a use case, aimed at molecular plant biology and clinical liver cancer research, which integrate various imaging and omics modalities. CONCLUSIONS: We describe a data management architecture for integrated, FAIR-supporting management of omics and biomedical imaging data, and exemplify its applicability for basic biology research and clinical studies. We anticipate that FAIR data management systems for multi-modal data repositories will play a pivotal role in data-driven research, including studies which leverage advanced machine learning methods, as the joint analysis of omics and imaging data, in conjunction with phenotypic metadata, becomes not only desirable but necessary to derive novel insights into biological processes.


Assuntos
Disciplinas das Ciências Biológicas , Gerenciamento de Dados , Gestão da Informação , Metadados , Software
18.
Gigascience ; 112022 02 16.
Artigo em Inglês | MEDLINE | ID: mdl-35169842

RESUMO

BACKGROUND: The Public Health Alliance for Genomic Epidemiology (PHA4GE) (https://pha4ge.org) is a global coalition that is actively working to establish consensus standards, document and share best practices, improve the availability of critical bioinformatics tools and resources, and advocate for greater openness, interoperability, accessibility, and reproducibility in public health microbial bioinformatics. In the face of the current pandemic, PHA4GE has identified a need for a fit-for-purpose, open-source SARS-CoV-2 contextual data standard. RESULTS: As such, we have developed a SARS-CoV-2 contextual data specification package based on harmonizable, publicly available community standards. The specification can be implemented via a collection template, as well as an array of protocols and tools to support both the harmonization and submission of sequence data and contextual information to public biorepositories. CONCLUSIONS: Well-structured, rich contextual data add value, promote reuse, and enable aggregation and integration of disparate datasets. Adoption of the proposed standard and practices will better enable interoperability between datasets and systems, improve the consistency and utility of generated data, and ultimately facilitate novel insights and discoveries in SARS-CoV-2 and COVID-19. The package is now supported by the NCBI's BioSample database.


Assuntos
COVID-19 , SARS-CoV-2 , Genômica , Humanos , Metadados , Saúde Pública , Reprodutibilidade dos Testes
19.
ACS Synth Biol ; 11(3): 1373-1376, 2022 03 18.
Artigo em Inglês | MEDLINE | ID: mdl-35226470

RESUMO

As synthetic biology becomes increasingly automated and data-driven, tools that help researchers implement FAIR (findable-accessible-interoperable-reusable) data management practices are needed. Crucially, in order to support machine processing and reusability of data, it is important that data artifacts are appropriately annotated with metadata drawn from controlled vocabularies. Unfortunately, adopting standardized annotation practices is difficult for many research groups to adopt, given the set of specialized database science skills usually required to interface with ontologies. In response to this need, Take Your Terms from Ontologies (Tyto) is a lightweight Python tool that supports the use of controlled vocabularies in everyday scripting practice. While Tyto has been developed for synthetic biology applications, its utility may extend to users working in other areas of bioinformatics research as well. Tyto is available as a Python package distribution or available as source at https://github.com/SynBioDex/tyto.


Assuntos
Software , Biologia Sintética , Biologia Computacional , Bases de Dados Factuais , Metadados
20.
J Med Internet Res ; 24(1): e25440, 2022 01 11.
Artigo em Inglês | MEDLINE | ID: mdl-35014967

RESUMO

BACKGROUND: Metadata are created to describe the corresponding data in a detailed and unambiguous way and is used for various applications in different research areas, for example, data identification and classification. However, a clear definition of metadata is crucial for further use. Unfortunately, extensive experience with the processing and management of metadata has shown that the term "metadata" and its use is not always unambiguous. OBJECTIVE: This study aimed to understand the definition of metadata and the challenges resulting from metadata reuse. METHODS: A systematic literature search was performed in this study following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines for reporting on systematic reviews. Five research questions were identified to streamline the review process, addressing metadata characteristics, metadata standards, use cases, and problems encountered. This review was preceded by a harmonization process to achieve a general understanding of the terms used. RESULTS: The harmonization process resulted in a clear set of definitions for metadata processing focusing on data integration. The following literature review was conducted by 10 reviewers with different backgrounds and using the harmonized definitions. This study included 81 peer-reviewed papers from the last decade after applying various filtering steps to identify the most relevant papers. The 5 research questions could be answered, resulting in a broad overview of the standards, use cases, problems, and corresponding solutions for the application of metadata in different research areas. CONCLUSIONS: Metadata can be a powerful tool for identifying, describing, and processing information, but its meaningful creation is costly and challenging. This review process uncovered many standards, use cases, problems, and solutions for dealing with metadata. The presented harmonized definitions and the new schema have the potential to improve the classification and generation of metadata by creating a shared understanding of metadata and its context.


Assuntos
Metadados , Publicações , Humanos , Padrões de Referência
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...