Anonymization Through Data Synthesis Using Generative Adversarial Networks (ADS-GAN).
IEEE J Biomed Health Inform
; 24(8): 2378-2388, 2020 08.
Article
en En
| MEDLINE
| ID: mdl-32167919
The medical and machine learning communities are relying on the promise of artificial intelligence (AI) to transform medicine through enabling more accurate decisions and personalized treatment. However, progress is slow. Legal and ethical issues around unconsented patient data and privacy is one of the limiting factors in data sharing, resulting in a significant barrier in accessing routinely collected electronic health records (EHR) by the machine learning community. We propose a novel framework for generating synthetic data that closely approximates the joint distribution of variables in an original EHR dataset, providing a readily accessible, legally and ethically appropriate solution to support more open data sharing, enabling the development of AI solutions. In order to address issues around lack of clarity in defining sufficient anonymization, we created a quantifiable, mathematical definition for "identifiability". We used a conditional generative adversarial networks (GAN) framework to generate synthetic data while minimize patient identifiability that is defined based on the probability of re-identification given the combination of all data on any individual patient. We compared models fitted to our synthetically generated data to those fitted to the real data across four independent datasets to evaluate similarity in model performance, while assessing the extent to which original observations can be identified from the synthetic data. Our model, ADS-GAN, consistently outperformed state-of-the-art methods, and demonstrated reliability in the joint distributions. We propose that this method could be used to develop datasets that can be made publicly available while considerably lowering the risk of breaching patient confidentiality.
Texto completo:
1
Colección:
01-internacional
Base de datos:
MEDLINE
Contexto en salud:
1_ASSA2030
Problema de salud:
1_sistemas_informacao_saude
Asunto principal:
Redes Neurales de la Computación
/
Difusión de la Información
/
Registros Electrónicos de Salud
/
Anonimización de la Información
Tipo de estudio:
Prognostic_studies
Aspecto:
Ethics
Límite:
Female
/
Humans
/
Male
Idioma:
En
Revista:
IEEE J Biomed Health Inform
Año:
2020
Tipo del documento:
Article