Búsqueda | BVS CLAP/SMR-OPS/OMS

Inferring cancer disease response from radiology reports using large language models with data augmentation and prompting.

Tan, Ryan Shea Ying Cong; Lin, Qian; Low, Guat Hwa; Lin, Ruixi; Goh, Tzer Chew; Chang, Christopher Chu En; Lee, Fung Fung; Chan, Wei Yin; Tan, Wei Chong; Tey, Han Jieh; Leong, Fun Loon; Tan, Hong Qi; Nei, Wen Long; Chay, Wen Yee; Tai, David Wai Meng; Lai, Gillianne Geet Yi; Cheng, Lionel Tim-Ee; Wong, Fuh Yong; Chua, Matthew Chin Heng; Chua, Melvin Lee Kiang; Tan, Daniel Shao Weng; Thng, Choon Hua; Tan, Iain Bee Huat; Ng, Hwee Tou.

J Am Med Inform Assoc ; 30(10): 1657-1664, 2023 09 25.

Artículo en Inglés | MEDLINE | ID: mdl-37451682

RESUMEN

OBJECTIVE: To assess large language models on their ability to accurately infer cancer disease response from free-text radiology reports. MATERIALS AND METHODS: We assembled 10 602 computed tomography reports from cancer patients seen at a single institution. All reports were classified into: no evidence of disease, partial response, stable disease, or progressive disease. We applied transformer models, a bidirectional long short-term memory model, a convolutional neural network model, and conventional machine learning methods to this task. Data augmentation using sentence permutation with consistency loss as well as prompt-based fine-tuning were used on the best-performing models. Models were validated on a hold-out test set and an external validation set based on Response Evaluation Criteria in Solid Tumors (RECIST) classifications. RESULTS: The best-performing model was the GatorTron transformer which achieved an accuracy of 0.8916 on the test set and 0.8919 on the RECIST validation set. Data augmentation further improved the accuracy to 0.8976. Prompt-based fine-tuning did not further improve accuracy but was able to reduce the number of training reports to 500 while still achieving good performance. DISCUSSION: These models could be used by researchers to derive progression-free survival in large datasets. It may also serve as a decision support tool by providing clinicians an automated second opinion of disease response. CONCLUSIONS: Large clinical language models demonstrate potential to infer cancer disease response from radiology reports at scale. Data augmentation techniques are useful to further improve performance. Prompt-based fine-tuning can significantly reduce the size of the training dataset.

Asunto(s)

Neoplasias , Radiología , Humanos , Aprendizaje Automático , Redes Neurales de la Computación , Neoplasias/diagnóstico por imagen , Informe de Investigación , Procesamiento de Lenguaje Natural

Domain adaptation for semantic role labeling in the biomedical domain.

Dahlmeier, Daniel; Ng, Hwee Tou.

Bioinformatics ; 26(8): 1098-104, 2010 Apr 15.

Artículo en Inglés | MEDLINE | ID: mdl-20179074

RESUMEN

MOTIVATION: Semantic role labeling (SRL) is a natural language processing (NLP) task that extracts a shallow meaning representation from free text sentences. Several efforts to create SRL systems for the biomedical domain have been made during the last few years. However, state-of-the-art SRL relies on manually annotated training instances, which are rare and expensive to prepare. In this article, we address SRL for the biomedical domain as a domain adaptation problem to leverage existing SRL resources from the newswire domain. RESULTS: We evaluate the performance of three recently proposed domain adaptation algorithms for SRL. Our results show that by using domain adaptation, the cost of developing an SRL system for the biomedical domain can be reduced significantly. Using domain adaptation, our system can achieve 97% of the performance with as little as 60 annotated target domain abstracts. AVAILABILITY: Our BioKIT system that performs SRL in the biomedical domain as described in this article is implemented in Python and C and operates under the Linux operating system. BioKIT can be downloaded at http://nlp.comp.nus.edu.sg/software. The domain adaptation software is available for download at http://www.mysmu.edu/faculty/jingjiang/software/DALR.html. The BioProp corpus is available from the Linguistic Data Consortium http://www.ldc.upenn.edu.

Asunto(s)

Indización y Redacción de Resúmenes/métodos , Procesamiento de Lenguaje Natural , Semántica , Inteligencia Artificial

Ver mas detalles

ENVIAR RESULTADO:

Exportar

Imprimir

RSS

XML

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA