Búsqueda | Portal de Búsqueda de la BVS

CEDAR OnDemand: a browser extension to generate ontology-based scientific metadata.

Bukhari, Syed Ahmad Chan; Martínez-Romero, Marcos; O' Connor, Martin J; Egyedi, Attila L; Willrett, Debra; Graybeal, John; Musen, Mark A; Cheung, Kei-Hoi; Kleinstein, Steven H.

BMC Bioinformatics ; 19(1): 268, 2018 07 16.

Artículo en Inglés | MEDLINE | ID: mdl-30012108

RESUMEN

BACKGROUND: Public biomedical data repositories often provide web-based interfaces to collect experimental metadata. However, these interfaces typically reflect the ad hoc metadata specification practices of the associated repositories, leading to a lack of standardization in the collected metadata. This lack of standardization limits the ability of the source datasets to be broadly discovered, reused, and integrated with other datasets. To increase reuse, discoverability, and reproducibility of the described experiments, datasets should be appropriately annotated by using agreed-upon terms, ideally from ontologies or other controlled term sources. RESULTS: This work presents "CEDAR OnDemand", a browser extension powered by the NCBO (National Center for Biomedical Ontology) BioPortal that enables users to seamlessly enter ontology-based metadata through existing web forms native to individual repositories. CEDAR OnDemand analyzes the web page contents to identify the text input fields and associate them with relevant ontologies which are recommended automatically based upon input fields' labels (using the NCBO ontology recommender) and a pre-defined list of ontologies. These field-specific ontologies are used for controlling metadata entry. CEDAR OnDemand works for any web form designed in the HTML format. We demonstrate how CEDAR OnDemand works through the NCBI (National Center for Biotechnology Information) BioSample web-based metadata entry. CONCLUSION: CEDAR OnDemand helps lower the barrier of incorporating ontologies into standardized metadata entry for public data repositories. CEDAR OnDemand is available freely on the Google Chrome store https://chrome.google.com/webstore/search/CEDAROnDemand.

Asunto(s)

Ontologías Biológicas , Internet , Metadatos , Programas Informáticos , Algoritmos , Humanos

Using association rule mining and ontologies to generate metadata recommendations from multiple biomedical databases.

Martínez-Romero, Marcos; O'Connor, Martin J; Egyedi, Attila L; Willrett, Debra; Hardi, Josef; Graybeal, John; Musen, Mark A.

Database (Oxford) ; 20192019 01 01.

Artículo en Inglés | MEDLINE | ID: mdl-31210270

RESUMEN

Metadata-the machine-readable descriptions of the data-are increasingly seen as crucial for describing the vast array of biomedical datasets that are currently being deposited in public repositories. While most public repositories have firm requirements that metadata must accompany submitted datasets, the quality of those metadata is generally very poor. A key problem is that the typical metadata acquisition process is onerous and time consuming, with little interactive guidance or assistance provided to users. Secondary problems include the lack of validation and sparse use of standardized terms or ontologies when authoring metadata. There is a pressing need for improvements to the metadata acquisition process that will help users to enter metadata quickly and accurately. In this paper, we outline a recommendation system for metadata that aims to address this challenge. Our approach uses association rule mining to uncover hidden associations among metadata values and to represent them in the form of association rules. These rules are then used to present users with real-time recommendations when authoring metadata. The novelties of our method are that it is able to combine analyses of metadata from multiple repositories when generating recommendations and can enhance those recommendations by aligning them with ontology terms. We implemented our approach as a service integrated into the CEDAR Workbench metadata authoring platform, and evaluated it using metadata from two public biomedical repositories: US-based National Center for Biotechnology Information BioSample and European Bioinformatics Institute BioSamples. The results show that our approach is able to use analyses of previously entered metadata coupled with ontology-based mappings to present users with accurate recommendations when authoring metadata.

Asunto(s)

Minería de Datos/métodos , Minería de Datos/normas , Bases de Datos Factuales/normas , Metadatos , Biología Computacional/normas

Unleashing the value of Common Data Elements through the CEDAR Workbench.

O'Connor, Martin J; Warzel, Denise B; Martínez-Romero, Marcos; Hardi, Josef; Willrett, Debra; Egyedi, Attila L; Eftekhari, Aras; Graybeal, John; Musen, Mark A.

AMIA Annu Symp Proc ; 2019: 681-690, 2019.

Artículo en Inglés | MEDLINE | ID: mdl-32308863

RESUMEN

Developing promising treatments in biomedicine often requires aggregation and analysis of data from disparate sources across the healthcare and research spectrum. To facilitate these approaches, there is a growing focus on supporting interoperation of datasets by standardizing data-capture and reporting requirements. Common Data Elements (CDEs)-precise specifications of questions and the set of allowable answers to each question-are increasingly being adopted to help meet these standardization goals. While CDEs can provide a strong conceptual foundation for interoperation, there are no widely recognized serialization or interchange formats to describe and exchange their definitions. As a result, CDEs defined in one system cannot be easily be reused by other systems. An additional problem is that current CDE-based systems tend to be rather heavyweight and cannot be easily adopted and used by third-parties. To address these problems, we developed extensions to a metadata management system called the CEDAR Workbench to provide a platform to simplify the creation, exchange, and use of CDEs. We show how the resulting system allows users to quickly define and share CDEs and to immediately use these CDEs to build and deploy Web-based forms to acquire conforming metadata. We also show how we incorporated a large CDE library from the National Cancer Institute's caDSR system and made these CDEs publicly available for general use.

Asunto(s)

Investigación Biomédica , Elementos de Datos Comunes , Recolección de Datos/normas , Manejo de Datos/métodos , Elementos de Datos Comunes/normas , Manejo de Datos/normas , Humanos , Internet , Metadatos , National Institutes of Health (U.S.) , Sistema de Registros , Estados Unidos , Interfaz Usuario-Computador

The CAIRR Pipeline for Submitting Standards-Compliant B and T Cell Receptor Repertoire Sequencing Studies to the National Center for Biotechnology Information Repositories.

Bukhari, Syed Ahmad Chan; O'Connor, Martin J; Martínez-Romero, Marcos; Egyedi, Attila L; Willrett, Debra; Graybeal, John; Musen, Mark A; Rubelt, Florian; Cheung, Kei-Hoi; Kleinstein, Steven H.

Front Immunol ; 9: 1877, 2018.

Artículo en Inglés | MEDLINE | ID: mdl-30166985

RESUMEN

The adaptation of high-throughput sequencing to the B cell receptor and T cell receptor has made it possible to characterize the adaptive immune receptor repertoire (AIRR) at unprecedented depth. These AIRR sequencing (AIRR-seq) studies offer tremendous potential to increase the understanding of adaptive immune responses in vaccinology, infectious disease, autoimmunity, and cancer. The increasingly wide application of AIRR-seq is leading to a critical mass of studies being deposited in the public domain, offering the possibility of novel scientific insights through secondary analyses and meta-analyses. However, effective sharing of these large-scale data remains a challenge. The AIRR community has proposed minimal information about adaptive immune receptor repertoire (MiAIRR), a standard for reporting AIRR-seq studies. The MiAIRR standard has been operationalized using the National Center for Biotechnology Information (NCBI) repositories. Submissions of AIRR-seq data to the NCBI repositories typically use a combination of web-based and flat-file templates and include only a minimal amount of terminology validation. As a result, AIRR-seq studies at the NCBI are often described using inconsistent terminologies, limiting scientists' ability to access, find, interoperate, and reuse the data sets. In order to improve metadata quality and ease submission of AIRR-seq studies to the NCBI, we have leveraged the software framework developed by the Center for Expanded Data Annotation and Retrieval (CEDAR), which develops technologies involving the use of data standards and ontologies to improve metadata quality. The resulting CEDAR-AIRR (CAIRR) pipeline enables data submitters to: (i) create web-based templates whose entries are controlled by ontology terms, (ii) generate and validate metadata, and (iii) submit the ontology-linked metadata and sequence files (FASTQ) to the NCBI BioProject, BioSample, and Sequence Read Archive databases. Overall, CAIRR provides a web-based metadata submission interface that supports compliance with the MiAIRR standard. This pipeline is available at http://cairr.miairr.org, and will facilitate the NCBI submission process and improve the metadata quality of AIRR-seq studies.

Asunto(s)

Biología Computacional/métodos , Bases de Datos de Ácidos Nucleicos , Receptores de Antígenos de Linfocitos B/genética , Receptores de Antígenos de Linfocitos T/genética , Programas Informáticos , Biología Computacional/organización & administración , Minería de Datos , Ontología de Genes , Humanos , Metadatos , Reproducibilidad de los Resultados , Interfaz Usuario-Computador , Flujo de Trabajo

Fast and Accurate Metadata Authoring Using Ontology-Based Recommendations.

Martínez-Romero, Marcos; O'Connor, Martin J; Shankar, Ravi D; Panahiazar, Maryam; Willrett, Debra; Egyedi, Attila L; Gevaert, Olivier; Graybeal, John; Musen, Mark A.

AMIA Annu Symp Proc ; 2017: 1272-1281, 2017.

Artículo en Inglés | MEDLINE | ID: mdl-29854196

RESUMEN

In biomedicine, high-quality metadata are crucial for finding experimental datasets, for understanding how experiments were performed, and for reproducing those experiments. Despite the recent focus on metadata, the quality of metadata available in public repositories continues to be extremely poor. A key difficulty is that the typical metadata acquisition process is time-consuming and error prone, with weak or nonexistent support for linking metadata to ontologies. There is a pressing need for methods and tools to speed up the metadata acquisition process and to increase the quality of metadata that are entered. In this paper, we describe a methodology and set of associated tools that we developed to address this challenge. A core component of this approach is a value recommendation framework that uses analysis of previously entered metadata and ontology-based metadata specifications to help users rapidly and accurately enter their metadata. We performed an initial evaluation of this approach using metadata from a public metadata repository.

Asunto(s)

Metadatos , Ontologías Biológicas , Investigación Biomédica , Exactitud de los Datos , Análisis de Datos , Metadatos/normas , Métodos

The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata that Describe Scientific Experiments.

Gonçalves, Rafael S; O'Connor, Martin J; Martínez-Romero, Marcos; Egyedi, Attila L; Willrett, Debra; Graybeal, John; Musen, Mark A.

Semant Web ISWC ; 10588: 103-110, 2017 Oct.

Artículo en Inglés | MEDLINE | ID: mdl-32219223

RESUMEN

The Center for Expanded Data Annotation and Retrieval (CEDAR) aims to revolutionize the way that metadata describing scientific experiments are authored. The software we have developed-the CEDAR Workbench-is a suite of Web-based tools and REST APIs that allows users to construct metadata templates, to fill in templates to generate high-quality metadata, and to share and manage these resources. The CEDAR Workbench provides a versatile, REST-based environment for authoring metadata that are enriched with terms from ontologies. The metadata are available as JSON, JSON-LD, or RDF for easy integration in scientific applications and reusability on the Web. Users can leverage our APIs for validating and submitting metadata to external repositories. The CEDAR Workbench is freely available and open-source.

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA