Search | VHL Regional Portal

Local Large Language Models for Complex Structured Tasks.

Bumgardner, V K Cody; Mullen, Aaron; Armstrong, Samuel E; Hickey, Caylin; Marek, Victor; Talbert, Jeff.

AMIA Jt Summits Transl Sci Proc ; 2024: 105-114, 2024.

Article in English | MEDLINE | ID: mdl-38827047

ABSTRACT

This paper introduces an approach that combines the language reasoning capabilities of large language models (LLMs) with the benefits of local training to tackle complex language tasks. The authors demonstrate their approach by extracting structured condition codes from pathology reports. The proposed approach utilizes local, fine-tuned LLMs to respond to specific generative instructions and provide structured outputs. Over 150k uncurated surgical pathology reports containing gross descriptions, final diagnoses, and condition codes were used. Different model architectures were trained and evaluated, including LLaMA, BERT, and LongFormer. The results show that the LLaMA-based models significantly outperform BERT-style models across all evaluated metrics. LLaMA models performed especially well with large datasets, demonstrating their ability to handle complex, multi-label tasks. Overall, this work presents an effective approach for utilizing LLMs to perform structured generative tasks on domain-specific language in the medical domain.

CLASSify: A Web-Based Tool for Machine Learning.

Mullen, Aaron D; Armstrong, Samuel E; Talbert, Jeff; Bumgardner, V K Cody.

AMIA Jt Summits Transl Sci Proc ; 2024: 364-373, 2024.

Article in English | MEDLINE | ID: mdl-38827105

ABSTRACT

Machine learning classification problems are widespread in bioinformatics, but the technical knowledge required to perform model training, optimization, and inference can prevent researchers from utilizing this technology. This article presents an automated tool for machine learning classification problems to simplify the process of training models and producing results while providing informative visualizations and insights into the data. This tool supports both binary and multiclass classification problems, and it provides access to a variety of models and methods. Synthetic data can be generated within the interface to fill missing values, balance class labels, or generate entirely new datasets. It also provides support for feature evaluation and generates explainability scores to indicate which features influence the output the most. We present CLASSify, an open-source tool for simplifying the user experience of solving classification problems without the need for knowledge of machine learning.

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL