Clustering Similar Diagnosis Terms.
Stud Health Technol Inform
; 302: 837-838, 2023 May 18.
Article
em En
| MEDLINE
| ID: mdl-37203513
A large clinical diagnosis list is explored with the goal to cluster syntactic variants. A string similarity heuristic is compared with a deep learning-based approach. Levenshtein distance (LD) applied to common words only (not tolerating deviations in acronyms and tokens with numerals), together with pair-wise substring expansions raised F1 to 13% above baseline (plain LD), with a maximum F1 of 0.71. In contrast, the model-based approach trained on a German medical language model did not perform better than the baseline, not exceeding an F1 value of 0.42.
Palavras-chave
Texto completo:
1
Coleções:
01-internacional
Contexto em Saúde:
1_ASSA2030
Base de dados:
MEDLINE
Assunto principal:
Processamento de Linguagem Natural
/
Idioma
Tipo de estudo:
Diagnostic_studies
/
Prognostic_studies
Idioma:
En
Revista:
Stud Health Technol Inform
Ano de publicação:
2023
Tipo de documento:
Article