Búsqueda | Portal Regional de la BVS

1.

Less Likely Brainstorming: Using Language Models to Generate Alternative Hypotheses.

Tang, Liyan; Peng, Yifan; Wang, Yanshan; Ding, Ying; Durrett, Greg; Rousseau, Justin F.

Proc Conf Assoc Comput Linguist Meet ; 2023: 12532-12555, 2023 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-37701928

RESUMEN

A human decision-maker benefits the most from an AI assistant that corrects for their biases. For problems such as generating interpretation of a radiology report given findings, a system predicting only highly likely outcomes may be less useful, where such outcomes are already obvious to the user. To alleviate biases in human decision-making, it is worth considering a broad differential diagnosis, going beyond the most likely options. We introduce a new task, "less likely brainstorming," that asks a model to generate outputs that humans think are relevant but less likely to happen. We explore the task in two settings: a brain MRI interpretation generation setting and an everyday commonsense reasoning setting. We found that a baseline approach of training with less likely hypotheses as targets generates outputs that humans evaluate as either likely or irrelevant nearly half of the time; standard MLE training is not effective. To tackle this problem, we propose a controlled text generation method that uses a novel contrastive learning strategy to encourage models to differentiate between generating likely and less likely outputs according to humans. We compare our method with several state-of-the-art controlled text generation models via automatic and human evaluations and show that our models' capability of generating less likely outputs is improved.

2.

Evaluating large language models on medical evidence summarization.

Tang, Liyan; Sun, Zhaoyi; Idnay, Betina; Nestor, Jordan G; Soroush, Ali; Elias, Pierre A; Xu, Ziyang; Ding, Ying; Durrett, Greg; Rousseau, Justin F; Weng, Chunhua; Peng, Yifan.

NPJ Digit Med ; 6(1): 158, 2023 Aug 24.

Artículo en Inglés | MEDLINE | ID: mdl-37620423

RESUMEN

Recent advances in large language models (LLMs) have demonstrated remarkable successes in zero- and few-shot performance on various downstream tasks, paving the way for applications in high-stakes domains. In this study, we systematically examine the capabilities and limitations of LLMs, specifically GPT-3.5 and ChatGPT, in performing zero-shot medical evidence summarization across six clinical domains. We conduct both automatic and human evaluations, covering several dimensions of summary quality. Our study demonstrates that automatic metrics often do not strongly correlate with the quality of summaries. Furthermore, informed by our human evaluations, we define a terminology of error types for medical evidence summarization. Our findings reveal that LLMs could be susceptible to generating factually inconsistent summaries and making overly convincing or uncertain statements, leading to potential harm due to misinformation. Moreover, we find that models struggle to identify the salient information and are more error-prone when summarizing over longer textual contexts.

3.

Evaluating Large Language Models on Medical Evidence Summarization.

Tang, Liyan; Sun, Zhaoyi; Idnay, Betina; Nestor, Jordan G; Soroush, Ali; Elias, Pierre A; Xu, Ziyang; Ding, Ying; Durrett, Greg; Rousseau, Justin; Weng, Chunhua; Peng, Yifan.

medRxiv ; 2023 Apr 24.

Artículo en Inglés | MEDLINE | ID: mdl-37162998

RESUMEN

Recent advances in large language models (LLMs) have demonstrated remarkable successes in zero- and few-shot performance on various downstream tasks, paving the way for applications in high-stakes domains. In this study, we systematically examine the capabilities and limitations of LLMs, specifically GPT-3.5 and ChatGPT, in performing zero-shot medical evidence summarization across six clinical domains. We conduct both automatic and human evaluations, covering several dimensions of summary quality. Our study has demonstrated that automatic metrics often do not strongly correlate with the quality of summaries. Furthermore, informed by our human evaluations, we define a terminology of error types for medical evidence summarization. Our findings reveal that LLMs could be susceptible to generating factually inconsistent summaries and making overly convincing or uncertain statements, leading to potential harm due to misinformation. Moreover, we find that models struggle to identify the salient information and are more error-prone when summarizing over longer textual contexts.

4.

EchoGen: A New Benchmark Study on Generating Conclusions from Echocardiogram Notes.

Tang, Liyan; Kooragayalu, Shravan; Wang, Yanshan; Ding, Ying; Durrett, Greg; Rousseau, Justin F; Peng, Yifan.

Proc Conf Assoc Comput Linguist Meet ; 2022: 359-368, 2022 May.

Artículo en Inglés | MEDLINE | ID: mdl-36339656

RESUMEN

Generating a summary from findings has been recently explored (Zhang et al., 2018, 2020) in note types such as radiology reports that typically have short length. In this work, we focus on echocardiogram notes that is longer and more complex compared to previous note types. We formally define the task of echocardiography conclusion generation (EchoGen) as generating a conclusion given the findings section, with emphasis on key cardiac findings. To promote the development of EchoGen methods, we present a new benchmark, which consists of two datasets collected from two hospitals. We further compare both standard and state-of-the-art methods on this new benchmark, with an emphasis on factual consistency. To accomplish this, we develop a tool to automatically extract concept-attribute tuples from the text. We then propose an evaluation metric, FactComp, to compare concept-attribute tuples between the human reference and generated conclusions. Both automatic and human evaluations show that there is still a significant gap between human-written and machine-generated conclusions on echo reports in terms of factuality and overall quality.

5.

Nonoperative management of pectus carinatum.

Frey, Ala Stanford; Garcia, Victor F; Brown, Rebeccah L; Inge, Thomas H; Ryckman, Frederick C; Cohen, Aliza P; Durrett, Greg; Azizkhan, Richard G.

J Pediatr Surg ; 41(1): 40-5; discussion 40-5, 2006 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-16410105

RESUMEN

BACKGROUND: Although surgery has been the mainstay of treatment of chondrogladiolar pectus carinatum (PC), several authors have advocated the benefits of nonoperative approaches to induce chest wall remodeling. Based on our initial success with compression bracing, we have integrated this modality into our treatment algorithm. METHOD: We reviewed the charts of all patients treated for PC at our pediatric hospital between 1997 and 2004. Patients were managed with observation, operative repair, and orthotic bracing that provides continuous anteroposterior sternal compression. The brace was worn for 14 to 16 hours per day until linear growth was complete or for a minimum of 2 years. RESULTS: One hundred patients were diagnosed with PC. Fifty-seven patients had no treatment and were monitored. Twenty-nine patients were fitted with a brace. Of these 29 patients, 3 were noncompliant, resulting in a compliance rate of 90%. Of the remaining brace patients, all have had positive outcomes with no observed complications. Seventeen patients underwent surgical repair. Their outcomes were also positive with no major complications. CONCLUSION: Our findings clearly demonstrate that compression bracing is a safe and effective treatment for children with chondrogladiolar PC. We currently offer this approach as a first-line treatment, reserving surgery for patients who are noncompliant and those who fail the nonoperative modality.

Asunto(s)

Tirantes , Esternón/anomalías , Pared Torácica/anomalías , Adolescente , Niño , Anomalías Congénitas/terapia , Femenino , Humanos , Masculino , Presión , Estudios Retrospectivos , Resultado del Tratamiento

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA