Búsqueda | Portal de Búsqueda de la BVS Enfermería

An Entity Extraction Pipeline for Medical Text Records Using Large Language Models: Analytical Study.

Wang, Lei; Ma, Yinyao; Bi, Wenshuai; Lv, Hanlin; Li, Yuxiang.

J Med Internet Res ; 26: e54580, 2024 Mar 29.

Artículo en Inglés | MEDLINE | ID: mdl-38551633

RESUMEN

BACKGROUND: The study of disease progression relies on clinical data, including text data, and extracting valuable features from text data has been a research hot spot. With the rise of large language models (LLMs), semantic-based extraction pipelines are gaining acceptance in clinical research. However, the security and feature hallucination issues of LLMs require further attention. OBJECTIVE: This study aimed to introduce a novel modular LLM pipeline, which could semantically extract features from textual patient admission records. METHODS: The pipeline was designed to process a systematic succession of concept extraction, aggregation, question generation, corpus extraction, and question-and-answer scale extraction, which was tested via 2 low-parameter LLMs: Qwen-14B-Chat (QWEN) and Baichuan2-13B-Chat (BAICHUAN). A data set of 25,709 pregnancy cases from the People's Hospital of Guangxi Zhuang Autonomous Region, China, was used for evaluation with the help of a local expert's annotation. The pipeline was evaluated with the metrics of accuracy and precision, null ratio, and time consumption. Additionally, we evaluated its performance via a quantified version of Qwen-14B-Chat on a consumer-grade GPU. RESULTS: The pipeline demonstrates a high level of precision in feature extraction, as evidenced by the accuracy and precision results of Qwen-14B-Chat (95.52% and 92.93%, respectively) and Baichuan2-13B-Chat (95.86% and 90.08%, respectively). Furthermore, the pipeline exhibited low null ratios and variable time consumption. The INT4-quantified version of QWEN delivered an enhanced performance with 97.28% accuracy and a 0% null ratio. CONCLUSIONS: The pipeline exhibited consistent performance across different LLMs and efficiently extracted clinical features from textual data. It also showed reliable performance on consumer-grade hardware. This approach offers a viable and effective solution for mining clinical research data from textual records.

Asunto(s)

Minería de Datos , Registros Electrónicos de Salud , Humanos , Minería de Datos/métodos , Procesamiento de Lenguaje Natural , China , Lenguaje

Investigating the Impact of Prompt Engineering on the Performance of Large Language Models for Standardizing Obstetric Diagnosis Text: Comparative Study.

Wang, Lei; Bi, Wenshuai; Zhao, Suling; Ma, Yinyao; Lv, Longting; Meng, Chenwei; Fu, Jingru; Lv, Hanlin.

JMIR Form Res ; 8: e53216, 2024 Feb 08.

Artículo en Inglés | MEDLINE | ID: mdl-38329787

RESUMEN

BACKGROUND: The accumulation of vast electronic medical records (EMRs) through medical informatization creates significant research value, particularly in obstetrics. Diagnostic standardization across different health care institutions and regions is vital for medical data analysis. Large language models (LLMs) have been extensively used for various medical tasks. Prompt engineering is key to use LLMs effectively. OBJECTIVE: This study aims to evaluate and compare the performance of LLMs with various prompt engineering techniques on the task of standardizing obstetric diagnostic terminology using real-world obstetric data. METHODS: The paper describes a 4-step approach used for mapping diagnoses in electronic medical records to the International Classification of Diseases, 10th revision, observation domain. First, similarity measures were used for mapping the diagnoses. Second, candidate mapping terms were collected based on similarity scores above a threshold, to be used as the training data set. For generating optimal mapping terms, we used two LLMs (ChatGLM2 and Qwen-14B-Chat [QWEN]) for zero-shot learning in step 3. Finally, a performance comparison was conducted by using 3 pretrained bidirectional encoder representations from transformers (BERTs), including BERT, whole word masking BERT, and momentum contrastive learning with BERT (MC-BERT), for unsupervised optimal mapping term generation in the fourth step. RESULTS: LLMs and BERT demonstrated comparable performance at their respective optimal levels. LLMs showed clear advantages in terms of performance and efficiency in unsupervised settings. Interestingly, the performance of the LLMs varied significantly across different prompt engineering setups. For instance, when applying the self-consistency approach in QWEN, the F1-score improved by 5%, with precision increasing by 7.9%, outperforming the zero-shot method. Likewise, ChatGLM2 delivered similar rates of accurately generated responses. During the analysis, the BERT series served as a comparative model with comparable results. Among the 3 models, MC-BERT demonstrated the highest level of performance. However, the differences among the versions of BERT in this study were relatively insignificant. CONCLUSIONS: After applying LLMs to standardize diagnoses and designing 4 different prompts, we compared the results to those generated by the BERT model. Our findings indicate that QWEN prompts largely outperformed the other prompts, with precision comparable to that of the BERT model. These results demonstrate the potential of unsupervised approaches in improving the efficiency of aligning diagnostic terms in daily research and uncovering hidden information values in patient data.

An early screening model for preeclampsia: utilizing zero-cost maternal predictors exclusively.

Wang, Lei; Ma, Yinyao; Bi, Wenshuai; Meng, Chenwei; Liang, Xuxia; Wu, Hua; Zhang, Chun; Wang, Xiaogang; Lv, Hanlin; Li, Yuxiang.

Hypertens Res ; 47(4): 1051-1062, 2024 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-38326453

RESUMEN

To provide a reliable, low-cost screening model for preeclampsia, this study developed an early screening model in a retrospective cohort (25,709 pregnancies) and validated in a validation cohort (1760 pregnancies). A data augmentation method (α-inverse weighted-GMM + RUS) was applied to a retrospective cohort before 10 machine learning models were simultaneously trained on augmented data, and the optimal model was chosen via sensitivity (at a false positive rate of 10%). The AdaBoost model, utilizing 16 predictors, was chosen as the final model, achieving a performance beyond acceptable with Area Under the Receiver Operating Characteristic Curve of 0.8008 and sensitivity of 0.5190. All predictors were derived from clinical characteristics, some of which were previously unreported (such as nausea and vomiting in pregnancy and menstrual cycle irregularity). Compared to previous studies, our model demonstrated superior performance, exhibiting at least a 50% improvement in sensitivity over checklist-based approaches, and a minimum of 28% increase over multivariable models that solely utilized maternal predictors. We validated an effective approach for preeclampsia early screening incorporating zero-cost predictors, which demonstrates superior performance in comparison to similar studies. We believe the application of the approach in combination with high performance approaches could substantially increase screening participation rate among pregnancies. Machine learning model for early preeclampsia screening, using 16 zero-cost predictors derived from clinical characteristics, was built on a 10-year Chinese cohort. The model outperforms similar research by at least 28%; validated on an independent cohort.

Asunto(s)

Preeclampsia , Embarazo , Femenino , Humanos , Preeclampsia/diagnóstico , Primer Trimestre del Embarazo , Estudios Retrospectivos , Medición de Riesgo/métodos , Estudios Prospectivos , Biomarcadores

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA