Pesquisa | BVS Integralidade em Saúde

Privacy preserving data anonymization of spontaneous ADE reporting system dataset.

Lin, Wen-Yang; Yang, Duen-Chuan; Wang, Jie-Teng.

BMC Med Inform Decis Mak ; 16 Suppl 1: 58, 2016 07 18.

Artigo em Inglês | MEDLINE | ID: mdl-27454754

RESUMO

BACKGROUND: To facilitate long-term safety surveillance of marketing drugs, many spontaneously reporting systems (SRSs) of ADR events have been established world-wide. Since the data collected by SRSs contain sensitive personal health information that should be protected to prevent the identification of individuals, it procures the issue of privacy preserving data publishing (PPDP), that is, how to sanitize (anonymize) raw data before publishing. Although much work has been done on PPDP, very few studies have focused on protecting privacy of SRS data and none of the anonymization methods is favorable for SRS datasets, due to which contain some characteristics such as rare events, multiple individual records, and multi-valued sensitive attributes. METHODS: We propose a new privacy model called MS(k, Î¸ (*) )-bounding for protecting published spontaneous ADE reporting data from privacy attacks. Our model has the flexibility of varying privacy thresholds, i.e., Î¸ (*) , for different sensitive values and takes the characteristics of SRS data into consideration. We also propose an anonymization algorithm for sanitizing the raw data to meet the requirements specified through the proposed model. Our algorithm adopts a greedy-based clustering strategy to group the records into clusters, conforming to an innovative anonymization metric aiming to minimize the privacy risk as well as maintain the data utility for ADR detection. Empirical study was conducted using FAERS dataset from 2004Q1 to 2011Q4. We compared our model with four prevailing methods, including k-anonymity, (X, Y)-anonymity, Multi-sensitive l-diversity, and (α, k)-anonymity, evaluated via two measures, Danger Ratio (DR) and Information Loss (IL), and considered three different scenarios of threshold setting for Î¸ (*) , including uniform setting, level-wise setting and frequency-based setting. We also conducted experiments to inspect the impact of anonymized data on the strengths of discovered ADR signals. RESULTS: With all three different threshold settings for sensitive value, our method can successively prevent the disclosure of sensitive values (nearly all observed DRs are zeros) without sacrificing too much of data utility. With non-uniform threshold setting, level-wise or frequency-based, our MS(k, Î¸ (*))-bounding exhibits the best data utility and the least privacy risk among all the models. The experiments conducted on selected ADR signals from MedWatch show that only very small difference on signal strength (PRR or ROR) were observed. The results show that our method can effectively prevent the disclosure of patient sensitive information without sacrificing data utility for ADR signal detection. CONCLUSIONS: We propose a new privacy model for protecting SRS data that possess some characteristics overlooked by contemporary models and an anonymization algorithm to sanitize SRS data in accordance with the proposed model. Empirical evaluation on the real SRS dataset, i.e., FAERS, shows that our method can effectively solve the privacy problem in SRS data without influencing the ADR signal strength.

Assuntos

Sistemas de Notificação de Reações Adversas a Medicamentos/normas , Anonimização de Dados , Modelos Teóricos , Privacidade , Humanos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa