De-Identifying Swedish EHR Text Using Public Resources in the General Domain.
Stud Health Technol Inform
; 270: 148-152, 2020 Jun 16.
Article
en En
| MEDLINE
| ID: mdl-32570364
ABSTRACT
Sensitive data is normally required to develop rule-based or train machine learning-based models for de-identifying electronic health record (EHR) clinical notes; and this presents important problems for patient privacy. In this study, we add non-sensitive public datasets to EHR training data; (i) scientific medical text and (ii) Wikipedia word vectors. The data, all in Swedish, is used to train a deep learning model using recurrent neural networks. Tests on pseudonymized Swedish EHR clinical notes showed improved precision and recall from 55.62% and 80.02% with the base EHR embedding layer, to 85.01% and 87.15% when Wikipedia word vectors are added. These results suggest that non-sensitive text from the general domain can be used to train robust models for de-identifying Swedish clinical text; and this could be useful in cases where the data is both sensitive and in low-resource languages.
Palabras clave
Texto completo:
1
Colección:
01-internacional
Base de datos:
MEDLINE
Asunto principal:
Registros Electrónicos de Salud
Tipo de estudio:
Prognostic_studies
País/Región como asunto:
Europa
Idioma:
En
Revista:
Stud Health Technol Inform
Asunto de la revista:
INFORMATICA MEDICA
/
PESQUISA EM SERVICOS DE SAUDE
Año:
2020
Tipo del documento:
Article
País de afiliación:
Noruega