Pesquisa | BVS - MINISTÉRIO DA SAÚDE

Application of Bayesian networks to generate synthetic health data.

Kaur, Dhamanpreet; Sobiesk, Matthew; Patil, Shubham; Liu, Jin; Bhagat, Puran; Gupta, Amar; Markuzon, Natasha.

J Am Med Inform Assoc ; 28(4): 801-811, 2021 03 18.

Artigo em Inglês | MEDLINE | ID: mdl-33367620

RESUMO

OBJECTIVE: This study seeks to develop a fully automated method of generating synthetic data from a real dataset that could be employed by medical organizations to distribute health data to researchers, reducing the need for access to real data. We hypothesize the application of Bayesian networks will improve upon the predominant existing method, medBGAN, in handling the complexity and dimensionality of healthcare data. MATERIALS AND METHODS: We employed Bayesian networks to learn probabilistic graphical structures and simulated synthetic patient records from the learned structure. We used the University of California Irvine (UCI) heart disease and diabetes datasets as well as the MIMIC-III diagnoses database. We evaluated our method through statistical tests, machine learning tasks, preservation of rare events, disclosure risk, and the ability of a machine learning classifier to discriminate between the real and synthetic data. RESULTS: Our Bayesian network model outperformed or equaled medBGAN in all key metrics. Notable improvement was achieved in capturing rare variables and preserving association rules. DISCUSSION: Bayesian networks generated data sufficiently similar to the original data with minimal risk of disclosure, while offering additional transparency, computational efficiency, and capacity to handle more data types in comparison to existing methods. We hope this method will allow healthcare organizations to efficiently disseminate synthetic health data to researchers, enabling them to generate hypotheses and develop analytical tools. CONCLUSION: We conclude the application of Bayesian networks is a promising option for generating realistic synthetic health data that preserves the features of the original data without compromising data privacy.

Assuntos

Teorema de Bayes , Anonimização de Dados , Gerenciamento de Dados , Aprendizado de Máquina , Redes Neurais de Computação , Confidencialidade , Conjuntos de Dados como Assunto , Revelação , Humanos , Disseminação de Informação

Automated mutual exclusion rules discovery for structured observational codes in echocardiography reporting.

Forsberg, Thomas A; Sevenster, Merlijn; Bieganski, Szymon; Bhagat, Puran; Kanasseril, Melvin; Jia, Yugang; Spencer, Kirk T.

AMIA Annu Symp Proc ; 2015: 570-9, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26958191

RESUMO

Structured reporting in medicine has been argued to support and enhance machine-assisted processing and communication of pertinent information. Retrospective studies showed that structured echocardiography reports, constructed through point-and-click selection of finding codes (FCs), contain pair-wise contradictory FCs (e.g., "No tricuspid regurgitation" and "Severe regurgitation") downgrading report quality and reliability thereof. In a prospective study, contradictions were detected automatically using an extensive rule set that encodes mutual exclusion patterns between FCs. Rules creation is a labor and knowledge-intensive task that could benefit from automation. We propose a machine-learning approach to discover mutual exclusion rules in a corpus of 101,211 structured echocardiography reports through semantic and statistical analysis. Ground truth is derived from the extensive prospectively evaluated rule set. On the unseen test set, F-measure (0.439) and above-chance level AUC (0.885) show that our approach can potentially support the manual rules creation process. Our methods discovered previously unknown rules per expert review.

Assuntos

Mineração de Dados/métodos , Ecocardiografia , Aprendizado de Máquina , Área Sob a Curva , Erros de Diagnóstico , Humanos , Estudos Prospectivos , Reprodutibilidade dos Testes

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA