Pesquisa | Portal de Pesquisa da BVS

Using Large Language Models to Annotate Complex Cases of Social Determinants of Health in Longitudinal Clinical Records.

Ralevski, Alexandra; Taiyab, Nadaa; Nossal, Michael; Mico, Lindsay; Piekos, Samantha N; Hadlock, Jennifer.

medRxiv ; 2024 Apr 27.

Artigo em Inglês | MEDLINE | ID: mdl-38712224

RESUMO

Social Determinants of Health (SDoH) are an important part of the exposome and are known to have a large impact on variation in health outcomes. In particular, housing stability is known to be intricately linked to a patient's health status, and pregnant women experiencing housing instability (HI) are known to have worse health outcomes. Most SDoH information is stored in electronic health records (EHRs) as free text (unstructured) clinical notes, which traditionally required natural language processing (NLP) for automatic identification of relevant text or keywords. A patient's housing status can be ambiguous or subjective, and can change from note to note or within the same note, making it difficult to use existing NLP solutions. New developments in NLP allow researchers to prompt LLMs to perform complex, subjective annotation tasks that require reasoning that previously could only be attempted by human annotators. For example, large language models (LLMs) such as GPT (Generative Pre-trained Transformer) enable researchers to analyze complex, unstructured data using simple prompts. We used a secure platform within a large healthcare system to compare the ability of GPT-3.5 and GPT-4 to identify instances of both current and past housing instability, as well as general housing status, from 25,217 notes from 795 pregnant women. Results from these LLMs were compared with results from manual annotation, a named entity recognition (NER) model, and regular expressions (RegEx). We developed a chain-of-thought prompt requiring evidence and justification for each note from the LLMs, to help maximize the chances of finding relevant text related to HI while minimizing hallucinations and false positives. Compared with GPT-3.5 and the NER model, GPT-4 had the highest performance and had a much higher recall (0.924) than human annotators (0.702) in identifying patients experiencing current or past housing instability, although precision was lower (0.850) compared with human annotators (0.971). In most cases, the evidence output by GPT-4 was similar or identical to that of human annotators, and there was no evidence of hallucinations in any of the outputs from GPT-4. Most cases where the annotators and GPT-4 differed were ambiguous or subjective, such as "living in an apartment with too many people". We also looked at GPT-4 performance on de-identified versions of the same notes and found that precision improved slightly (0.936 original, 0.939 de-identified), while recall dropped (0.781 original, 0.704 de-identified). This work demonstrates that, while manual annotation is likely to yield slightly more accurate results overall, LLMs, when compared with manual annotation, provide a scalable, cost-effective solution with the advantage of greater recall. At the same time, further evaluation is needed to address the risk of missed cases and bias in the initial selection of housing-related notes. Additionally, while it was possible to reduce confabulation, signs of unusual justifications remained. Given these factors, together with changes in both LLMs and charting over time, this approach is not yet appropriate for use as a fully-automated process. However, these results demonstrate the potential for using LLMs for computer-assisted annotation with human review, reducing cost and increasing recall. More efficient methods for obtaining structured SDoH data can help accelerate inclusion of exposome variables in biomedical research, and support healthcare systems in identifying patients who could benefit from proactive outreach.

Physician documentation deficiencies in abdominal ultrasound reports: frequency, characteristics, and financial impact.

Duszak, Richard; Nossal, Michael; Schofield, Lyle; Picus, Daniel.

J Am Coll Radiol ; 9(6): 403-8, 2012 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-22632666

RESUMO

PURPOSE: The aim of this study was to assess the frequency, characteristics, and financial impact of physician documentation deficiencies in abdominal ultrasound reports. METHODS: Using a multi-institutional coding and billing database and natural language processing software, 12,699,502 radiology reports from 37 practices were used to identify and analyze abdominal ultrasound reports. Using standard Current Procedural Terminology(®) (CPT(®)) criteria, examinations were categorized as complete (all 8 required elements documented) or limited (<8 elements). Assuming incomplete documentation, examinations were categorized as very likely, likely, or possibly complete depending on whether a minimum of 7, 6, or 5 elements were reported. Frequency and financial impact were assessed using all 3 models, and presumed documentation deficiencies were characterized. RESULTS: Of 336,062 abdominal ultrasound reports by 1,136 radiologists, 252,478 (75.1%) documented all 8 elements for CPT coding as complete examinations, 25,925 (7.7%) documented 7 elements, 20,559 (5.6%) documented 6 elements, 17,521 (4.8%) documented 5 elements, and 49,579 (13.5%) documented ≤4 elements. For very likely, likely, and possibly complete examination models, deficiencies were present in 9.3%, 15.5%, and 20.2% of cases, resulting in 2.5%, 4.2%, and 5.5% decreases in legitimate professional payments. The spleen (41.2%) was the most frequent element neglected. Of 106,168 examinations titled complete, only 92,824 (87.4%) fulfilled complete CPT criteria. In 221,887 (60.6%), examination titles were clearly erroneous or too ambiguous for code assignment. Documentation deficiencies were less frequent for high-volume radiologists (P < .0001). CONCLUSIONS: Incomplete physician documentation in abdominal ultrasound reports is common (9.3%-20.2% of cases) and results in 2.5% to 5.5% in lost professional income. Structured reporting may improve documentation and mitigate lost revenue.

Assuntos

Abdome/diagnóstico por imagem , Erros de Diagnóstico/economia , Erros de Diagnóstico/estatística & dados numéricos , Documentação/economia , Documentação/estatística & dados numéricos , Ultrassonografia/economia , Ultrassonografia/estatística & dados numéricos , Humanos , Médicos/economia , Médicos/estatística & dados numéricos , Sensibilidade e Especificidade , Estados Unidos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA