Term Candidate Generation to Enrich Clinical Terminologies with Large Language Models.

Kugic, Amila; Schulz, Stefan; Kreuzthaler, Markus

Kugic, Amila; Schulz, Stefan; Kreuzthaler, Markus.

Afiliación

Kugic A; Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Austria.
Schulz S; Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Austria.
Kreuzthaler M; Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Austria.

Stud Health Technol Inform ; 316: 695-699, 2024 Aug 22.

Article en En | MEDLINE | ID: mdl-39176890

ABSTRACT

ABSTRACT

Annotated language resources derived from clinical routine documentation form an intriguing asset for secondary use case scenarios. In this investigation, we report on how such a resource can be leveraged to identify additional term candidates for a chosen set of ICD-10 codes. We conducted a log-likelihood analysis, considering the co-occurrence of approximately 1.9 million de-identified ICD-10 codes alongside corresponding brief textual entries from problem lists in German. This analysis aimed to identify potential candidates with statistical significance set at p < 0.01, which were used as seed terms to harvest additional candidates by interfacing to a large language model in a second step. The proposed approach can identify additional term candidates at suitable performance values hypernyms MAP@5=0.801, synonyms MAP@5 = 0.723 and hyponyms MAP@5 = 0.507. The re-use of existing annotated clinical datasets, in combination with large language models, presents an interesting strategy to bridge the lexical gap in standardized clinical terminologies and real-world jargon.

Asunto(s)

Clasificación Internacional de Enfermedades; Procesamiento de Lenguaje Natural; Vocabulario Controlado; Humanos; Terminología como Asunto; Registros Electrónicos de Salud/clasificación; Alemania

Palabras clave

Electronic Health Records; Large Language Models; Machine Learning; Natural Language Processing; Terminologies

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Procesamiento de Lenguaje Natural / Clasificación Internacional de Enfermedades / Vocabulario Controlado Límite: Humans País/Región como asunto: Europa Idioma: En Revista: Stud Health Technol Inform Asunto de la revista: INFORMATICA MEDICA / PESQUISA EM SERVICOS DE SAUDE Año: 2024 Tipo del documento: Article País de afiliación: Austria Pais de publicación: Países Bajos

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google