Annotating German Clinical Documents for De-Identification.
Stud Health Technol Inform
; 264: 203-207, 2019 Aug 21.
Article
em En
| MEDLINE
| ID: mdl-31437914
ABSTRACT
We devised annotation guidelines for the de-identification of German clinical documents and assembled a corpus of 1,106 discharge summaries and transfer letters with 44K annotated protected health information (PHI) items. After three iteration rounds, our annotation team finally reached an inter-annotator agreement of 0.96 on the instance level and 0.97 on the token level of annotation (averaged pair-wise F1 score). To establish a baseline for automatic de-identification on our corpus, we trained a recurrent neural network (RNN) and achieved F1 scores greater than 0.9 on most major PHI categories.
Palavras-chave
Texto completo:
1
Base de dados:
MEDLINE
Assunto principal:
Registros Eletrônicos de Saúde
/
Anonimização de Dados
Tipo de estudo:
Diagnostic_studies
/
Guideline
/
Prognostic_studies
Idioma:
En
Revista:
Stud Health Technol Inform
Assunto da revista:
INFORMATICA MEDICA
/
PESQUISA EM SERVICOS DE SAUDE
Ano de publicação:
2019
Tipo de documento:
Article
País de afiliação:
Alemanha