Term Candidate Generation to Enrich Clinical Terminologies with Large Language Models.
Stud Health Technol Inform
; 316: 695-699, 2024 Aug 22.
Article
en En
| MEDLINE
| ID: mdl-39176890
ABSTRACT
Annotated language resources derived from clinical routine documentation form an intriguing asset for secondary use case scenarios. In this investigation, we report on how such a resource can be leveraged to identify additional term candidates for a chosen set of ICD-10 codes. We conducted a log-likelihood analysis, considering the co-occurrence of approximately 1.9 million de-identified ICD-10 codes alongside corresponding brief textual entries from problem lists in German. This analysis aimed to identify potential candidates with statistical significance set at p < 0.01, which were used as seed terms to harvest additional candidates by interfacing to a large language model in a second step. The proposed approach can identify additional term candidates at suitable performance values hypernyms MAP@5=0.801, synonyms MAP@5 = 0.723 and hyponyms MAP@5 = 0.507. The re-use of existing annotated clinical datasets, in combination with large language models, presents an interesting strategy to bridge the lexical gap in standardized clinical terminologies and real-world jargon.
Palabras clave
Texto completo:
1
Colección:
01-internacional
Base de datos:
MEDLINE
Asunto principal:
Procesamiento de Lenguaje Natural
/
Clasificación Internacional de Enfermedades
/
Vocabulario Controlado
Límite:
Humans
País/Región como asunto:
Europa
Idioma:
En
Revista:
Stud Health Technol Inform
Asunto de la revista:
INFORMATICA MEDICA
/
PESQUISA EM SERVICOS DE SAUDE
Año:
2024
Tipo del documento:
Article
País de afiliación:
Austria
Pais de publicación:
Países Bajos