Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 1 de 1
Filtrar
Más filtros

Banco de datos
Asunto principal
Tipo del documento
Asunto de la revista
País de afiliación
Intervalo de año de publicación
1.
Bioinformatics ; 32(6): 918-25, 2016 03 15.
Artículo en Inglés | MEDLINE | ID: mdl-26556384

RESUMEN

MOTIVATION: The increasing diversity of data available to the biomedical scientist holds promise for better understanding of diseases and discovery of new treatments for patients. In order to provide a complete picture of a biomedical question, data from many different origins needs to be combined into a unified representation. During this data integration process, inevitable errors and ambiguities present in the initial sources compromise the quality of the resulting data warehouse, and greatly diminish the scientific value of the content. Expensive and time-consuming manual curation is then required to improve the quality of the information. However, it becomes increasingly difficult to dedicate and optimize the resources for data integration projects as available repositories are growing both in size and in number everyday. RESULTS: We present a new generic methodology to identify problematic records, causing what we describe as 'data hairball' structures. The approach is graph-based and relies on two metrics traditionally used in social sciences: the graph density and the betweenness centrality. We evaluate and discuss these measures and show their relevance for flexible, optimized and automated data curation and linkage. The methodology focuses on information coherence and correctness to improve the scientific meaningfulness of data integration endeavors, such as knowledge bases and large data warehouses. CONTACT: samuel.croset@roche.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Curaduría de Datos , Humanos , Bases del Conocimiento
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA