Búsqueda | BVS CLAP/SMR-OPS/OMS

Flexible data integration and curation using a graph-based approach.

Croset, Samuel; Rupp, Joachim; Romacker, Martin.

Bioinformatics ; 32(6): 918-25, 2016 03 15.

Artículo en Inglés | MEDLINE | ID: mdl-26556384

RESUMEN

MOTIVATION: The increasing diversity of data available to the biomedical scientist holds promise for better understanding of diseases and discovery of new treatments for patients. In order to provide a complete picture of a biomedical question, data from many different origins needs to be combined into a unified representation. During this data integration process, inevitable errors and ambiguities present in the initial sources compromise the quality of the resulting data warehouse, and greatly diminish the scientific value of the content. Expensive and time-consuming manual curation is then required to improve the quality of the information. However, it becomes increasingly difficult to dedicate and optimize the resources for data integration projects as available repositories are growing both in size and in number everyday. RESULTS: We present a new generic methodology to identify problematic records, causing what we describe as 'data hairball' structures. The approach is graph-based and relies on two metrics traditionally used in social sciences: the graph density and the betweenness centrality. We evaluate and discuss these measures and show their relevance for flexible, optimized and automated data curation and linkage. The methodology focuses on information coherence and correctness to improve the scientific meaningfulness of data integration endeavors, such as knowledge bases and large data warehouses. CONTACT: samuel.croset@roche.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Curaduría de Datos , Humanos , Bases del Conocimiento

Ver mas detalles

ENVIAR RESULTADO:

Exportar

Imprimir

RSS

XML

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA