Pesquisa | Portal Regional da BVS

Use of Metadata-Driven Approaches for Data Harmonization in the Medical Domain: Scoping Review.

Peng, Yuan; Bathelt, Franziska; Gebler, Richard; Gött, Robert; Heidenreich, Andreas; Henke, Elisa; Kadioglu, Dennis; Lorenz, Stephan; Vengadeswaran, Abishaa; Sedlmayr, Martin.

JMIR Med Inform ; 12: e52967, 2024 Feb 14.

Artigo em Inglês | MEDLINE | ID: mdl-38354027

RESUMO

BACKGROUND: Multisite clinical studies are increasingly using real-world data to gain real-world evidence. However, due to the heterogeneity of source data, it is difficult to analyze such data in a unified way across clinics. Therefore, the implementation of Extract-Transform-Load (ETL) or Extract-Load-Transform (ELT) processes for harmonizing local health data is necessary, in order to guarantee the data quality for research. However, the development of such processes is time-consuming and unsustainable. A promising way to ease this is the generalization of ETL/ELT processes. OBJECTIVE: In this work, we investigate existing possibilities for the development of generic ETL/ELT processes. Particularly, we focus on approaches with low development complexity by using descriptive metadata and structural metadata. METHODS: We conducted a literature review following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. We used 4 publication databases (ie, PubMed, IEEE Explore, Web of Science, and Biomed Center) to search for relevant publications from 2012 to 2022. The PRISMA flow was then visualized using an R-based tool (Evidence Synthesis Hackathon). All relevant contents of the publications were extracted into a spreadsheet for further analysis and visualization. RESULTS: Regarding the PRISMA guidelines, we included 33 publications in this literature review. All included publications were categorized into 7 different focus groups (ie, medicine, data warehouse, big data, industry, geoinformatics, archaeology, and military). Based on the extracted data, ontology-based and rule-based approaches were the 2 most used approaches in different thematic categories. Different approaches and tools were chosen to achieve different purposes within the use cases. CONCLUSIONS: Our literature review shows that using metadata-driven (MDD) approaches to develop an ETL/ELT process can serve different purposes in different thematic categories. The results show that it is promising to implement an ETL/ELT process by applying MDD approach to automate the data transformation from Fast Healthcare Interoperability Resources to Observational Medical Outcomes Partnership Common Data Model. However, the determining of an appropriate MDD approach and tool to implement such an ETL/ELT process remains a challenge. This is due to the lack of comprehensive insight into the characterizations of the MDD approaches presented in this study. Therefore, our next step is to evaluate the MDD approaches presented in this study and to determine the most appropriate MDD approaches and the way to integrate them into the ETL/ELT process. This could verify the ability of using MDD approaches to generalize the ETL process for harmonizing medical data.

DistSNE: Distributed computing and online visualization of DNA methylation-based central nervous system tumor classification.

Schmid, Kai; Sehring, Jannik; Németh, Attila; Harter, Patrick N; Weber, Katharina J; Vengadeswaran, Abishaa; Storf, Holger; Seidemann, Christian; Karki, Kapil; Fischer, Patrick; Dohmen, Hildegard; Selignow, Carmen; von Deimling, Andreas; Grau, Stefan; Schröder, Uwe; Plate, Karl H; Stein, Marco; Uhl, Eberhard; Acker, Till; Amsel, Daniel.

Brain Pathol ; 34(3): e13228, 2024 May.

Artigo em Inglês | MEDLINE | ID: mdl-38012085

RESUMO

The current state-of-the-art analysis of central nervous system (CNS) tumors through DNA methylation profiling relies on the tumor classifier developed by Capper and colleagues, which centrally harnesses DNA methylation data provided by users. Here, we present a distributed-computing-based approach for CNS tumor classification that achieves a comparable performance to centralized systems while safeguarding privacy. We utilize the t-distributed neighborhood embedding (t-SNE) model for dimensionality reduction and visualization of tumor classification results in two-dimensional graphs in a distributed approach across multiple sites (DistSNE). DistSNE provides an intuitive web interface (https://gin-tsne.med.uni-giessen.de) for user-friendly local data management and federated methylome-based tumor classification calculations for multiple collaborators in a DataSHIELD environment. The freely accessible web interface supports convenient data upload, result review, and summary report generation. Importantly, increasing sample size as achieved through distributed access to additional datasets allows DistSNE to improve cluster analysis and enhance predictive power. Collectively, DistSNE enables a simple and fast classification of CNS tumors using large-scale methylation data from distributed sources, while maintaining the privacy and allowing easy and flexible network expansion to other institutes. This approach holds great potential for advancing human brain tumor classification and fostering collaborative precision medicine in neuro-oncology.

Assuntos

Neoplasias Encefálicas , Neoplasias do Sistema Nervoso Central , Humanos , Metilação de DNA , Neoplasias do Sistema Nervoso Central/genética , Neoplasias Encefálicas/genética

Linking a Consortium-Wide Data Quality Assessment Tool with the MIRACUM Metadata Repository.

Kapsner, Lorenz A; Mang, Jonathan M; Mate, Sebastian; Seuchter, Susanne A; Vengadeswaran, Abishaa; Bathelt, Franziska; Deppenwiese, Noemi; Kadioglu, Dennis; Kraska, Detlef; Prokosch, Hans-Ulrich.

Appl Clin Inform ; 12(4): 826-835, 2021 08.

Artigo em Inglês | MEDLINE | ID: mdl-34433217

RESUMO

BACKGROUND: Many research initiatives aim at using data from electronic health records (EHRs) in observational studies. Participating sites of the German Medical Informatics Initiative (MII) established data integration centers to integrate EHR data within research data repositories to support local and federated analyses. To address concerns regarding possible data quality (DQ) issues of hospital routine data compared with data specifically collected for scientific purposes, we have previously presented a data quality assessment (DQA) tool providing a standardized approach to assess DQ of the research data repositories at the MIRACUM consortium's partner sites. OBJECTIVES: Major limitations of the former approach included manual interpretation of the results and hard coding of analyses, making their expansion to new data elements and databases time-consuming and error prone. We here present an enhanced version of the DQA tool by linking it to common data element definitions stored in a metadata repository (MDR), adopting the harmonized DQA framework from Kahn et al and its application within the MIRACUM consortium. METHODS: Data quality checks were consequently aligned to a harmonized DQA terminology. Database-specific information were systematically identified and represented in an MDR. Furthermore, a structured representation of logical relations between data elements was developed to model plausibility-statements in the MDR. RESULTS: The MIRACUM DQA tool was linked to data element definitions stored in a consortium-wide MDR. Additional databases used within MIRACUM were linked to the DQ checks by extending the respective data elements in the MDR with the required information. The evaluation of DQ checks was automated. An adaptable software implementation is provided with the R package DQAstats. CONCLUSION: The enhancements of the DQA tool facilitate the future integration of new data elements and make the tool scalable to other databases and data models. It has been provided to all ten MIRACUM partners and was successfully deployed and integrated into their respective data integration center infrastructure.

Assuntos

Confiabilidade dos Dados , Informática Médica , Bases de Dados Factuais , Registros Eletrônicos de Saúde , Metadados

Semantically Annotated Metadata: Interconnecting Samply.MDR and MDM-Portal.

Vengadeswaran, Abishaa; Neuhaus, Philipp; Hegselmann, Stefan; Storf, Holger; Kadioglu, Dennis.

Stud Health Technol Inform ; 267: 86-92, 2019 Sep 03.

Artigo em Inglês | MEDLINE | ID: mdl-31483259

RESUMO

Interoperability is a growing demand in healthcare, caused by heterogeneous sources, which aggravate information transfer. The interoperability issues can be addressed by metadata repositories. These support to ensure syntactical interoperability, like compatible data formats or value ranges, however especially semantic interoperability is still challenging. Semantic annotation through standardized terminologies and classifications enables to foster semantic interoperability. This work aims to interconnect Samply.MDR and Portal of Medical Data Model (MDM-Portal) to allow facilitated semantic annotation with UMLS. Therefore, Samply.MDR was extended to store semantic information. While creating a data element, a request to MDM is send, which results in possible UMLS codes. The user can now adopt the most suitable code and select a link type between the code and the element itself. A successful enrichment of data elements with UMLS codes was shown by interconnecting Samply.MDR and MDM-Portal.

Assuntos

Metadados , Semântica

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA