Búsqueda | Portal Regional de la BVS

Knowledge Graph-Enabled Cancer Data Analytics.

Hasan, S M Shamimul; Rivera, Donna; Wu, Xiao-Cheng; Durbin, Eric B; Christian, J Blair; Tourassi, Georgia.

IEEE J Biomed Health Inform ; 24(7): 1952-1967, 2020 07.

Artículo en Inglés | MEDLINE | ID: mdl-32386166

RESUMEN

Cancer registries collect unstructured and structured cancer data for surveillance purposes which provide important insights regarding cancer characteristics, treatments, and outcomes. Cancer registry data typically (1) categorize each reportable cancer case or tumor at the time of diagnosis, (2) contain demographic information about the patient such as age, gender, and location at time of diagnosis, (3) include planned and completed primary treatment information, and (4) may contain survival outcomes. As structured data is being extracted from various unstructured sources, such as pathology reports, radiology reports, medical records, and stored for reporting and other needs, the associated information representing a reportable cancer is constantly expanding and evolving. While some popular analytic approaches including SEER*Stat and SAS exist, we provide a knowledge graph approach to organizing cancer registry data. Our approach offers unique advantages for timely data analysis and presentation and visualization of valuable information. This knowledge graph approach semantically enriches the data, and easily enables linking with third-party data which can help explain variation in cancer incidence patterns, disparities, and outcomes. We developed a prototype knowledge graph based on the Louisiana Tumor Registry dataset. We present the advantages of the knowledge graph approach by examining: i) scenario-specific queries, ii) links with openly available external datasets, iii) schema evolution for iterative analysis, and iv) data visualization. Our results demonstrate that this graph based solution can perform complex queries, improve query run-time performance by up to 76%, and more easily conduct iterative analyses to enhance researchers' understanding of cancer registry data.

Asunto(s)

Bases del Conocimiento , Neoplasias , Sistema de Registros , Adulto , Anciano , Anciano de 80 o más Años , Algoritmos , Bases de Datos Factuales , Femenino , Humanos , Incidencia , Masculino , Persona de Mediana Edad , Neoplasias/diagnóstico , Neoplasias/epidemiología , Neoplasias/fisiopatología

EpiK: A Knowledge Base for Epidemiological Modeling and Analytics of Infectious Diseases.

Hasan, S M Shamimul; Fox, Edward A; Bisset, Keith; Marathe, Madhav V.

J Healthc Inform Res ; 1(2): 260-303, 2017 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-35415398

RESUMEN

Computational epidemiology seeks to develop computational methods to study the distribution and determinants of health-related states or events (including disease), and the application of this study to the control of diseases and other health problems. Recent advances in computing and data sciences have led to the development of innovative modeling environments to support this important goal. The datasets used to drive the dynamic models as well as the data produced by these models presents unique challenges owing to their size, heterogeneity and diversity. These datasets form the basis of effective and easy to use decision support and analytical environments. As a result, it is important to develop scalable data management systems to store, manage and integrate these datasets. In this paper, we develop EpiK-a knowledge base that facilitates the development of decision support and analytical environments to support epidemic science. An important goal is to develop a framework that links the input as well as output datasets to facilitate effective spatio-temporal and social reasoning that is critical in planning and intervention analysis before and during an epidemic. The data management framework links modeling workflow data and its metadata using a controlled vocabulary. The metadata captures information about storage, the mapping between the linked model and the physical layout, and relationships to support services. EpiK is designed to support agent-based modeling and analytics frameworks-aggregate models can be seen as special cases and are thus supported. We use semantic web technologies to create a representation of the datasets that encapsulates both the location and the schema heterogeneity. The choice of RDF as a representation language is motivated by the diversity and growth of the datasets that need to be integrated. A query bank is developed-the queries capture a broad range of questions that can be posed and answered during a typical case study pertaining to disease outbreaks. The queries are constructed using SPARQL Protocol and RDF Query Language (SPARQL) over the EpiK. EpiK can hide schema and location heterogeneity while efficiently supporting queries that span the computational epidemiology modeling pipeline: from model construction to simulation output. We show that the performance of benchmark queries varies significantly with respect to the choice of hardware underlying the database and resource description framework (RDF) engine.

RESUMEN

Asunto(s)

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA