De-Identifying Swedish EHR Text Using Public Resources in the General Domain.

Chomutare, Taridzo; Yigzaw, Kassaye Yitbarek; Budrionis, Andrius; Makhlysheva, Alexandra; Godtliebsen, Fred; Dalianis, Hercules

Chomutare, Taridzo; Yigzaw, Kassaye Yitbarek; Budrionis, Andrius; Makhlysheva, Alexandra; Godtliebsen, Fred; Dalianis, Hercules.

Afiliación

Chomutare T; Norwegian Centre for E-health Research, Tromsø, Norway.
Yigzaw KY; Norwegian Centre for E-health Research, Tromsø, Norway.
Budrionis A; Norwegian Centre for E-health Research, Tromsø, Norway.
Makhlysheva A; Norwegian Centre for E-health Research, Tromsø, Norway.
Godtliebsen F; Norwegian Centre for E-health Research, Tromsø, Norway.
Dalianis H; Faculty of Science & Technology, UiT - The Arctic University of Norway.

Stud Health Technol Inform ; 270: 148-152, 2020 Jun 16.

Article en En | MEDLINE | ID: mdl-32570364

ABSTRACT

ABSTRACT

Sensitive data is normally required to develop rule-based or train machine learning-based models for de-identifying electronic health record (EHR) clinical notes; and this presents important problems for patient privacy. In this study, we add non-sensitive public datasets to EHR training data; (i) scientific medical text and (ii) Wikipedia word vectors. The data, all in Swedish, is used to train a deep learning model using recurrent neural networks. Tests on pseudonymized Swedish EHR clinical notes showed improved precision and recall from 55.62% and 80.02% with the base EHR embedding layer, to 85.01% and 87.15% when Wikipedia word vectors are added. These results suggest that non-sensitive text from the general domain can be used to train robust models for de-identifying Swedish clinical text; and this could be useful in cases where the data is both sensitive and in low-resource languages.

Asunto(s)

Registros Electrónicos de Salud; Lenguaje; Aprendizaje Automático; Procesamiento de Lenguaje Natural; Suecia

Palabras clave

EHR; clinical text; de-identification; deep learning; wiki word vectors

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Registros Electrónicos de Salud Tipo de estudio: Prognostic_studies País/Región como asunto: Europa Idioma: En Revista: Stud Health Technol Inform Asunto de la revista: INFORMATICA MEDICA / PESQUISA EM SERVICOS DE SAUDE Año: 2020 Tipo del documento: Article País de afiliación: Noruega

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google