Búsqueda | Portal de Búsqueda de la BVS

DataMed - an open source discovery index for finding biomedical datasets.

Chen, Xiaoling; Gururaj, Anupama E; Ozyurt, Burak; Liu, Ruiling; Soysal, Ergin; Cohen, Trevor; Tiryaki, Firat; Li, Yueling; Zong, Nansu; Jiang, Min; Rogith, Deevakar; Salimi, Mandana; Kim, Hyeon-Eui; Rocca-Serra, Philippe; Gonzalez-Beltran, Alejandra; Farcas, Claudiu; Johnson, Todd; Margolis, Ron; Alter, George; Sansone, Susanna-Assunta; Fore, Ian M; Ohno-Machado, Lucila; Grethe, Jeffrey S; Xu, Hua.

J Am Med Inform Assoc ; 25(3): 300-308, 2018 Mar 01.

Artículo en Inglés | MEDLINE | ID: mdl-29346583

RESUMEN

OBJECTIVE: Finding relevant datasets is important for promoting data reuse in the biomedical domain, but it is challenging given the volume and complexity of biomedical data. Here we describe the development of an open source biomedical data discovery system called DataMed, with the goal of promoting the building of additional data indexes in the biomedical domain. MATERIALS AND METHODS: DataMed, which can efficiently index and search diverse types of biomedical datasets across repositories, is developed through the National Institutes of Health-funded biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE) consortium. It consists of 2 main components: (1) a data ingestion pipeline that collects and transforms original metadata information to a unified metadata model, called DatA Tag Suite (DATS), and (2) a search engine that finds relevant datasets based on user-entered queries. In addition to describing its architecture and techniques, we evaluated individual components within DataMed, including the accuracy of the ingestion pipeline, the prevalence of the DATS model across repositories, and the overall performance of the dataset retrieval engine. RESULTS AND CONCLUSION: Our manual review shows that the ingestion pipeline could achieve an accuracy of 90% and core elements of DATS had varied frequency across repositories. On a manually curated benchmark dataset, the DataMed search engine achieved an inferred average precision of 0.2033 and a precision at 10 (P@10, the number of relevant results in the top 10 search results) of 0.6022, by implementing advanced natural language processing and terminology services. Currently, we have made the DataMed system publically available as an open source package for the biomedical community.

DATS, the data tag suite to enable discoverability of datasets.

Sansone, Susanna-Assunta; Gonzalez-Beltran, Alejandra; Rocca-Serra, Philippe; Alter, George; Grethe, Jeffrey S; Xu, Hua; Fore, Ian M; Lyle, Jared; Gururaj, Anupama E; Chen, Xiaoling; Kim, Hyeon-Eui; Zong, Nansu; Li, Yueling; Liu, Ruiling; Ozyurt, I Burak; Ohno-Machado, Lucila.

Sci Data ; 4: 170059, 2017 06 06.

Artículo en Inglés | MEDLINE | ID: mdl-28585923

RESUMEN

Today's science increasingly requires effective ways to find and access existing datasets that are distributed across a range of repositories. For researchers in the life sciences, discoverability of datasets may soon become as essential as identifying the latest publications via PubMed. Through an international collaborative effort funded by the National Institutes of Health (NIH)'s Big Data to Knowledge (BD2K) initiative, we have designed and implemented the DAta Tag Suite (DATS) model to support the DataMed data discovery index. DataMed's goal is to be for data what PubMed has been for the scientific literature. Akin to the Journal Article Tag Suite (JATS) used in PubMed, the DATS model enables submission of metadata on datasets to DataMed. DATS has a core set of elements, which are generic and applicable to any type of dataset, and an extended set that can accommodate more specialized data types. DATS is a platform-independent model also available as an annotated serialization in schema.org, which in turn is widely used by major search engines like Google, Microsoft, Yahoo and Yandex.

Finding useful data across multiple biomedical data repositories using DataMed.

Ohno-Machado, Lucila; Sansone, Susanna-Assunta; Alter, George; Fore, Ian; Grethe, Jeffrey; Xu, Hua; Gonzalez-Beltran, Alejandra; Rocca-Serra, Philippe; Gururaj, Anupama E; Bell, Elizabeth; Soysal, Ergin; Zong, Nansu; Kim, Hyeon-Eui.

Nat Genet ; 49(6): 816-819, 2017 05 26.

Artículo en Inglés | MEDLINE | ID: mdl-28546571

Asunto(s)

Ontologías Biológicas , Investigación Biomédica , Biología Computacional/métodos , Bases de Datos Factuales , Metadatos , Humanos , Programas Informáticos , Integración de Sistemas

Life sciences domain analysis model.

Freimuth, Robert R; Freund, Elaine T; Schick, Lisa; Sharma, Mukesh K; Stafford, Grace A; Suzek, Baris E; Hernandez, Joyce; Hipp, Jason; Kelley, Jenny M; Rokicki, Konrad; Pan, Sue; Buckler, Andrew; Stokes, Todd H; Fernandez, Anna; Fore, Ian; Buetow, Kenneth H; Klemm, Juli D.

J Am Med Inform Assoc ; 19(6): 1095-102, 2012.

Artículo en Inglés | MEDLINE | ID: mdl-22744959

RESUMEN

OBJECTIVE: Meaningful exchange of information is a fundamental challenge in collaborative biomedical research. To help address this, the authors developed the Life Sciences Domain Analysis Model (LS DAM), an information model that provides a framework for communication among domain experts and technical teams developing information systems to support biomedical research. The LS DAM is harmonized with the Biomedical Research Integrated Domain Group (BRIDG) model of protocol-driven clinical research. Together, these models can facilitate data exchange for translational research. MATERIALS AND METHODS: The content of the LS DAM was driven by analysis of life sciences and translational research scenarios and the concepts in the model are derived from existing information models, reference models and data exchange formats. The model is represented in the Unified Modeling Language and uses ISO 21090 data types. RESULTS: The LS DAM v2.2.1 is comprised of 130 classes and covers several core areas including Experiment, Molecular Biology, Molecular Databases and Specimen. Nearly half of these classes originate from the BRIDG model, emphasizing the semantic harmonization between these models. Validation of the LS DAM against independently derived information models, research scenarios and reference databases supports its general applicability to represent life sciences research. DISCUSSION: The LS DAM provides unambiguous definitions for concepts required to describe life sciences research. The processes established to achieve consensus among domain experts will be applied in future iterations and may be broadly applicable to other standardization efforts. CONCLUSIONS: The LS DAM provides common semantics for life sciences research. Through harmonization with BRIDG, it promotes interoperability in translational science.

Asunto(s)

Disciplinas de las Ciencias Biológicas , Difusión de la Información , Sistemas de Información , Integración de Sistemas , Investigación Biomédica Traslacional , Humanos , Almacenamiento y Recuperación de la Información , Estándares de Referencia , Semántica , Unified Medical Language System

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA