Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
medRxiv ; 2024 May 06.
Artigo em Inglês | MEDLINE | ID: mdl-38633810

RESUMO

Background: Large language models (LLMs) have shown promising performance in various healthcare domains, but their effectiveness in identifying specific clinical conditions in real medical records is less explored. This study evaluates LLMs for detecting signs of cognitive decline in real electronic health record (EHR) clinical notes, comparing their error profiles with traditional models. The insights gained will inform strategies for performance enhancement. Methods: This study, conducted at Mass General Brigham in Boston, MA, analyzed clinical notes from the four years prior to a 2019 diagnosis of mild cognitive impairment in patients aged 50 and older. We used a randomly annotated sample of 4,949 note sections, filtered with keywords related to cognitive functions, for model development. For testing, a random annotated sample of 1,996 note sections without keyword filtering was utilized. We developed prompts for two LLMs, Llama 2 and GPT-4, on HIPAA-compliant cloud-computing platforms using multiple approaches (e.g., both hard and soft prompting and error analysis-based instructions) to select the optimal LLM-based method. Baseline models included a hierarchical attention-based neural network and XGBoost. Subsequently, we constructed an ensemble of the three models using a majority vote approach. Results: GPT-4 demonstrated superior accuracy and efficiency compared to Llama 2, but did not outperform traditional models. The ensemble model outperformed the individual models, achieving a precision of 90.3%, a recall of 94.2%, and an F1-score of 92.2%. Notably, the ensemble model showed a significant improvement in precision, increasing from a range of 70%-79% to above 90%, compared to the best-performing single model. Error analysis revealed that 63 samples were incorrectly predicted by at least one model; however, only 2 cases (3.2%) were mutual errors across all models, indicating diverse error profiles among them. Conclusions: LLMs and traditional machine learning models trained using local EHR data exhibited diverse error profiles. The ensemble of these models was found to be complementary, enhancing diagnostic performance. Future research should investigate integrating LLMs with smaller, localized models and incorporating medical data and domain knowledge to enhance performance on specific tasks.

2.
Appl Clin Inform ; 2024 Apr 18.
Artigo em Inglês | MEDLINE | ID: mdl-38636542

RESUMO

OBJECTIVE: To assess primary care physicians' (PCPs) perception of the need for serious illness conversations (SIC) or other palliative care interventions in patients flagged by a machine learning tool for high one-year mortality risk. MATERIALS AND METHODS: We surveyed PCPs from four Brigham and Women's Hospital primary care practice sites. Multiple mortality prediction algorithms were ensembled to assess adult patients of these PCPs who were either enrolled in the hospital's integrated care management program or had one of several chronic conditions. The patients were classified as high or low-risk of one-year mortality. A blinded survey had PCPs evaluate these patients for palliative care needs. We measured PCP and machine learning tool agreement regarding patients' need for an SIC/elevated risk of mortality. RESULTS: Of 66 PCPs, 20 (30.3%) participated in the survey. Out of 312 patients evaluated, 60.6% were female, with a mean (SD) age of 69.3 (17.5) years, and a mean (SD) Charlson comorbidity index of 2.80 (2.89). The machine learning tool identified 162 (51.9%) patients as high-risk. Excluding deceased or unfamiliar patients, PCPs felt that an SIC was appropriate for 179 patients; the machine learning tool flagged 123 of these patients as high-risk (68.7% concordance). For 105 patients whom PCPs deemed SIC-unnecessary, the tool classified 83 as low-risk (79.1% concordance). There was substantial agreement between PCPs and the tool (Gwet's agreement coefficient of 0.640). CONCLUSIONS AND RELEVANCE: A machine learning mortality prediction tool offers promise as a clinical decision aid, helping clinicians pinpoint patients needing palliative care interventions.

3.
J Med Internet Res ; 26: e47739, 2024 Feb 13.
Artigo em Inglês | MEDLINE | ID: mdl-38349732

RESUMO

BACKGROUND: Assessment of activities of daily living (ADLs) and instrumental ADLs (iADLs) is key to determining the severity of dementia and care needs among older adults. However, such information is often only documented in free-text clinical notes within the electronic health record and can be challenging to find. OBJECTIVE: This study aims to develop and validate machine learning models to determine the status of ADL and iADL impairments based on clinical notes. METHODS: This cross-sectional study leveraged electronic health record clinical notes from Mass General Brigham's Research Patient Data Repository linked with Medicare fee-for-service claims data from 2007 to 2017 to identify individuals aged 65 years or older with at least 1 diagnosis of dementia. Notes for encounters both 180 days before and after the first date of dementia diagnosis were randomly sampled. Models were trained and validated using note sentences filtered by expert-curated keywords (filtered cohort) and further evaluated using unfiltered sentences (unfiltered cohort). The model's performance was compared using area under the receiver operating characteristic curve and area under the precision-recall curve (AUPRC). RESULTS: The study included 10,000 key-term-filtered sentences representing 441 people (n=283, 64.2% women; mean age 82.7, SD 7.9 years) and 1000 unfiltered sentences representing 80 people (n=56, 70% women; mean age 82.8, SD 7.5 years). Area under the receiver operating characteristic curve was high for the best-performing ADL and iADL models on both cohorts (>0.97). For ADL impairment identification, the random forest model achieved the best AUPRC (0.89, 95% CI 0.86-0.91) on the filtered cohort; the support vector machine model achieved the highest AUPRC (0.82, 95% CI 0.75-0.89) for the unfiltered cohort. For iADL impairment, the Bio+Clinical bidirectional encoder representations from transformers (BERT) model had the highest AUPRC (filtered: 0.76, 95% CI 0.68-0.82; unfiltered: 0.58, 95% CI 0.001-1.0). Compared with a keyword-search approach on the unfiltered cohort, machine learning reduced false-positive rates from 4.5% to 0.2% for ADL and 1.8% to 0.1% for iADL. CONCLUSIONS: In this study, we demonstrated the ability of machine learning models to accurately identify ADL and iADL impairment based on free-text clinical notes, which could be useful in determining the severity of dementia.


Assuntos
Demência , Processamento de Linguagem Natural , Estados Unidos , Humanos , Idoso , Feminino , Idoso de 80 Anos ou mais , Masculino , Estudos Transversais , Atividades Cotidianas , Estado Funcional , Medicare
4.
AMIA Annu Symp Proc ; 2023: 339-348, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38222335

RESUMO

Venous Thromboembolism (VTE) is a serious, preventable public health problem that requires timely treatment. Because signs and symptoms are non-specific, patients often present to primary care providers with VTE symptoms prior to diagnosis. Today there are no federal measurement tools in place to track delayed diagnosis of VTE. We developed and tested an electronic clinical quality measure (eCQM) to quantify Diagnostic Delay of Venous Thromboembolism (DOVE); the rate of avoidable delayed VTE events occurring in patients with a VTE who had reported VTE symptoms in primary care within 30 days of diagnosis. DOVE uses routinely collected EHR data without contributing to documentation burden. DOVE was tested in two geographically distant healthcare systems. Overall DOVE rates were 72.60% (site 1) and 77.14% (site 2). This novel, data-driven eCQM could inform healthcare providers and facilities about opportunities to improve care, strengthen incentives for quality improvement, and ultimately improve patient safety.


Assuntos
Tromboembolia Venosa , Humanos , Tromboembolia Venosa/diagnóstico , Tromboembolia Venosa/tratamento farmacológico , Diagnóstico Tardio , Indicadores de Qualidade em Assistência à Saúde , Melhoria de Qualidade , Atenção Primária à Saúde , Anticoagulantes/uso terapêutico
5.
Stud Health Technol Inform ; 290: 433-437, 2022 Jun 06.
Artigo em Inglês | MEDLINE | ID: mdl-35673051

RESUMO

Cancer screening and timely follow-up of abnormal results can reduce mortality. One barrier to follow-up is the failure to identify abnormal results. While EHRs have coded results for certain tests, cancer screening results are often stored in free-text reports, which limit capabilities for automated decision support. As part of the multilevel Follow-up of Cancer Screening (mFOCUS) trial, we developed and implemented a natural language processing (NLP) tool to assist with real-time detection of abnormal cancer screening test results (including mammograms, low-dose chest CT scans, and Pap smears) and identification of gynecological follow-up for higher risk abnormalities (i.e. colposcopy) from free-text reports. We demonstrate the integration and implementation of NLP, within the mFOCUS system, to improve the follow-up of abnormal cancer screening results in a large integrated healthcare system. The NLP pipelines have detected scenarios when guideline-recommended care was not delivered, in part because the provider mis-identified the text-based result reports.


Assuntos
Processamento de Linguagem Natural , Neoplasias do Colo do Útero , Detecção Precoce de Câncer/métodos , Feminino , Seguimentos , Humanos , Pulmão , Neoplasias do Colo do Útero/diagnóstico
6.
JAMA Netw Open ; 4(11): e2135174, 2021 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-34792589

RESUMO

Importance: Detecting cognitive decline earlier among older adults can facilitate enrollment in clinical trials and early interventions. Clinical notes in longitudinal electronic health records (EHRs) provide opportunities to detect cognitive decline earlier than it is noted in structured EHR fields as formal diagnoses. Objective: To develop and validate a deep learning model to detect evidence of cognitive decline from clinical notes in the EHR. Design, Setting, and Participants: Notes documented 4 years preceding the initial mild cognitive impairment (MCI) diagnosis were extracted from Mass General Brigham's Enterprise Data Warehouse for patients aged 50 years or older and with initial MCI diagnosis during 2019. The study was conducted from March 1, 2020, to June 30, 2021. Sections of notes for cognitive decline were labeled manually and 2 reference data sets were created. Data set I contained a random sample of 4950 note sections filtered by a list of keywords related to cognitive functions and was used for model training and testing. Data set II contained 2000 randomly selected sections without keyword filtering for assessing whether the model performance was dependent on specific keywords. Main Outcomes and Measures: A deep learning model and 4 baseline models were developed and their performance was compared using the area under the receiver operating characteristic curve (AUROC) and area under the precision recall curve (AUPRC). Results: Data set I represented 1969 patients (1046 [53.1%] women; mean [SD] age, 76.0 [13.3] years). Data set II comprised 1161 patients (619 [53.3%] women; mean [SD] age, 76.5 [10.2] years). With some overlap of patients deleted, the unique population was 2166. Cognitive decline was noted in 1453 sections (29.4%) in data set I and 69 sections (3.45%) in data set II. Compared with the 4 baseline models, the deep learning model achieved the best performance in both data sets, with AUROC of 0.971 (95% CI, 0.967-0.976) and AUPRC of 0.933 (95% CI, 0.921-0.944) for data set I and AUROC of 0.997 (95% CI, 0.994-0.999) and AUPRC of 0.929 (95% CI, 0.870-0.969) for data set II. Conclusions and Relevance: In this diagnostic study, a deep learning model accurately detected cognitive decline from clinical notes preceding MCI diagnosis and had better performance than keyword-based search and other machine learning models. These results suggest that a deep learning model could be used for earlier detection of cognitive decline in the EHRs.


Assuntos
Disfunção Cognitiva/diagnóstico , Aprendizado Profundo , Diagnóstico Precoce , Medição de Risco/métodos , Idoso , Idoso de 80 Anos ou mais , Registros Eletrônicos de Saúde/estatística & dados numéricos , Feminino , Humanos , Estudos Longitudinais , Masculino , Massachusetts , Pessoa de Meia-Idade
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA