Biomedical heterogeneous data categorization and schema mapping toward data integration.

Deshpande, Priya; Rasin, Alexander; Tchoua, Roselyne; Furst, Jacob; Raicu, Daniela; Schinkel, Michiel; Trivedi, Hari; Antani, Sameer

Deshpande, Priya; Rasin, Alexander; Tchoua, Roselyne; Furst, Jacob; Raicu, Daniela; Schinkel, Michiel; Trivedi, Hari; Antani, Sameer.

Afiliação

Deshpande P; Marquette University, Milwaukee, WI, United States.
Rasin A; DePaul University, Chicago, IL, United States.
Tchoua R; DePaul University, Chicago, IL, United States.
Furst J; DePaul University, Chicago, IL, United States.
Raicu D; DePaul University, Chicago, IL, United States.
Schinkel M; Center for Experimental and Molecular Medicine (CEMM), University of Amsterdam, Amsterdam, Netherlands.
Trivedi H; Emory University, Atlanta, GA, United States.
Antani S; National Library of Medicine, National Institutes of Health, Bethesda, MD, United States.

Front Big Data ; 6: 1173038, 2023.

Article em En | MEDLINE | ID: mdl-37139170

RESUMO

Data integration is a well-motivated problem in the clinical data science domain. Availability of patient data, reference clinical cases, and datasets for research have the potential to advance the healthcare industry. However, the unstructured (text, audio, or video data) and heterogeneous nature of the data, the variety of data standards and formats, and patient privacy constraint make data interoperability and integration a challenge. The clinical text is further categorized into different semantic groups and may be stored in different files and formats. Even the same organization may store cases in different data structures, making data integration more challenging. With such inherent complexity, domain experts and domain knowledge are often necessary to perform data integration. However, expert human labor is time and cost prohibitive. To overcome the variability in the structure, format, and content of the different data sources, we map the text into common categories and compute similarity within those. In this paper, we present a method to categorize and merge clinical data by considering the underlying semantics behind the cases and use reference information about the cases to perform data integration. Evaluation shows that we were able to merge 88% of clinical data from five different sources.

Palavras-chave

data categorization; data integration; datasets; heterogeneous data; schema mapping; semantic similarity; unstructured data

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Tipo de estudo: Guideline Idioma: En Revista: Front Big Data Ano de publicação: 2023 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google