Pesquisa | BVS Aleitamento Materno

Ascle-A Python Natural Language Processing Toolkit for Medical Text Generation: Development and Evaluation Study.

Yang, Rui; Zeng, Qingcheng; You, Keen; Qiao, Yujie; Huang, Lucas; Hsieh, Chia-Chun; Rosand, Benjamin; Goldwasser, Jeremy; Dave, Amisha; Keenan, Tiarnan; Ke, Yuhe; Hong, Chuan; Liu, Nan; Chew, Emily; Radev, Dragomir; Lu, Zhiyong; Xu, Hua; Chen, Qingyu; Li, Irene.

J Med Internet Res ; 26: e60601, 2024 Oct 03.

Artigo em Inglês | MEDLINE | ID: mdl-39361955

RESUMO

BACKGROUND: Medical texts present significant domain-specific challenges, and manually curating these texts is a time-consuming and labor-intensive process. To address this, natural language processing (NLP) algorithms have been developed to automate text processing. In the biomedical field, various toolkits for text processing exist, which have greatly improved the efficiency of handling unstructured text. However, these existing toolkits tend to emphasize different perspectives, and none of them offer generation capabilities, leaving a significant gap in the current offerings. OBJECTIVE: This study aims to describe the development and preliminary evaluation of Ascle. Ascle is tailored for biomedical researchers and clinical staff with an easy-to-use, all-in-one solution that requires minimal programming expertise. For the first time, Ascle provides 4 advanced and challenging generative functions: question-answering, text summarization, text simplification, and machine translation. In addition, Ascle integrates 12 essential NLP functions, along with query and search capabilities for clinical databases. METHODS: We fine-tuned 32 domain-specific language models and evaluated them thoroughly on 27 established benchmarks. In addition, for the question-answering task, we developed a retrieval-augmented generation (RAG) framework for large language models that incorporated a medical knowledge graph with ranking techniques to enhance the reliability of generated answers. Additionally, we conducted a physician validation to assess the quality of generated content beyond automated metrics. RESULTS: The fine-tuned models and RAG framework consistently enhanced text generation tasks. For example, the fine-tuned models improved the machine translation task by 20.27 in terms of BLEU score. In the question-answering task, the RAG framework raised the ROUGE-L score by 18% over the vanilla models. Physician validation of generated answers showed high scores for readability (4.95/5) and relevancy (4.43/5), with a lower score for accuracy (3.90/5) and completeness (3.31/5). CONCLUSIONS: This study introduces the development and evaluation of Ascle, a user-friendly NLP toolkit designed for medical text generation. All code is publicly available through the Ascle GitHub repository. All fine-tuned language models can be accessed through Hugging Face.

Assuntos

Processamento de Linguagem Natural , Humanos , Algoritmos , Software

On the stability of canonical correlation analysis and partial least squares with application to brain-behavior associations.

Helmer, Markus; Warrington, Shaun; Mohammadi-Nejad, Ali-Reza; Ji, Jie Lisa; Howell, Amber; Rosand, Benjamin; Anticevic, Alan; Sotiropoulos, Stamatios N; Murray, John D.

Commun Biol ; 7(1): 217, 2024 Feb 21.

Artigo em Inglês | MEDLINE | ID: mdl-38383808

RESUMO

Associations between datasets can be discovered through multivariate methods like Canonical Correlation Analysis (CCA) or Partial Least Squares (PLS). A requisite property for interpretability and generalizability of CCA/PLS associations is stability of their feature patterns. However, stability of CCA/PLS in high-dimensional datasets is questionable, as found in empirical characterizations. To study these issues systematically, we developed a generative modeling framework to simulate synthetic datasets. We found that when sample size is relatively small, but comparable to typical studies, CCA/PLS associations are highly unstable and inaccurate; both in their magnitude and importantly in the feature pattern underlying the association. We confirmed these trends across two neuroimaging modalities and in independent datasets with n ≈ 1000 and n = 20,000, and found that only the latter comprised sufficient observations for stable mappings between imaging-derived and behavioral features. We further developed a power calculator to provide sample sizes required for stability and reliability of multivariate analyses. Collectively, we characterize how to limit detrimental effects of overfitting on CCA/PLS stability, and provide recommendations for future studies.

Assuntos

Algoritmos , Análise de Correlação Canônica , Análise dos Mínimos Quadrados , Reprodutibilidade dos Testes , Encéfalo/diagnóstico por imagem

Automated Identification of Heart Failure with Reduced Ejection Fraction using Deep Learning-based Natural Language Processing.

Nargesi, Arash A; Adejumo, Philip; Dhingra, Lovedeep; Rosand, Benjamin; Hengartner, Astrid; Coppi, Andreas; Benigeri, Simon; Sen, Sounok; Ahmad, Tariq; Nadkarni, Girish N; Lin, Zhenqiu; Ahmad, Faraz S; Krumholz, Harlan M; Khera, Rohan.

medRxiv ; 2023 Sep 11.

Artigo em Inglês | MEDLINE | ID: mdl-37745445

RESUMO

Background: The lack of automated tools for measuring care quality has limited the implementation of a national program to assess and improve guideline-directed care in heart failure with reduced ejection fraction (HFrEF). A key challenge for constructing such a tool has been an accurate, accessible approach for identifying patients with HFrEF at hospital discharge, an opportunity to evaluate and improve the quality of care. Methods: We developed a novel deep learning-based language model for identifying patients with HFrEF from discharge summaries using a semi-supervised learning framework. For this purpose, hospitalizations with heart failure at Yale New Haven Hospital (YNHH) between 2015 to 2019 were labeled as HFrEF if the left ventricular ejection fraction was under 40% on antecedent echocardiography. The model was internally validated with model-based net reclassification improvement (NRI) assessed against chart-based diagnosis codes. We externally validated the model on discharge summaries from hospitalizations with heart failure at Northwestern Medicine, community hospitals of Yale New Haven Health in Connecticut and Rhode Island, and the publicly accessible MIMIC-III database, confirmed with chart abstraction. Results: A total of 13,251 notes from 5,392 unique individuals (mean age 73 ± 14 years, 48% female), including 2,487 patients with HFrEF (46.1%), were used for model development (train/held-out test: 70/30%). The deep learning model achieved an area under receiving operating characteristic (AUROC) of 0.97 and an area under precision-recall curve (AUPRC) of 0.97 in detecting HFrEF on the held-out set. In external validation, the model had high performance in identifying HFrEF from discharge summaries with AUROC 0.94 and AUPRC 0.91 on 19,242 notes from Northwestern Medicine, AUROC 0.95 and AUPRC 0.96 on 139 manually abstracted notes from Yale community hospitals, and AUROC 0.91 and AUPRC 0.92 on 146 manually reviewed notes at MIMIC-III. Model-based prediction of HFrEF corresponded to an overall NRI of 60.2 ± 1.9% compared with the chart diagnosis codes (p-value < 0.001) and an increase in AUROC from 0.61 [95% CI: 060-0.63] to 0.91 [95% CI 0.90-0.92]. Conclusions: We developed and externally validated a deep learning language model that automatically identifies HFrEF from clinical notes with high precision and accuracy, representing a key element in automating quality assessment and improvement for individuals with HFrEF.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA