Búsqueda | Biblioteca Virtual en Salud

Evaluating large language models on medical, lay language, and self-reported descriptions of genetic conditions.

Flaharty, Kendall A; Hu, Ping; Hanchard, Suzanna Ledgister; Ripper, Molly E; Duong, Dat; Waikel, Rebekah L; Solomon, Benjamin D.

Am J Hum Genet ; 2024 Jul 31.

Artículo en Inglés | MEDLINE | ID: mdl-39146935

RESUMEN

Large language models (LLMs) are generating interest in medical settings. For example, LLMs can respond coherently to medical queries by providing plausible differential diagnoses based on clinical notes. However, there are many questions to explore, such as evaluating differences between open- and closed-source LLMs as well as LLM performance on queries from both medical and non-medical users. In this study, we assessed multiple LLMs, including Llama-2-chat, Vicuna, Medllama2, Bard/Gemini, Claude, ChatGPT3.5, and ChatGPT-4, as well as non-LLM approaches (Google search and Phenomizer) regarding their ability to identify genetic conditions from textbook-like clinician questions and their corresponding layperson translations related to 63 genetic conditions. For open-source LLMs, larger models were more accurate than smaller LLMs: 7b, 13b, and larger than 33b parameter models obtained accuracy ranges from 21%-49%, 41%-51%, and 54%-68%, respectively. Closed-source LLMs outperformed open-source LLMs, with ChatGPT-4 performing best (89%-90%). Three of 11 LLMs and Google search had significant performance gaps between clinician and layperson prompts. We also evaluated how in-context prompting and keyword removal affected open-source LLM performance. Models were provided with 2 types of in-context prompts: list-type prompts, which improved LLM performance, and definition-type prompts, which did not. We further analyzed removal of rare terms from descriptions, which decreased accuracy for 5 of 7 evaluated LLMs. Finally, we observed much lower performance with real individuals' descriptions; LLMs answered these questions with a maximum 21% accuracy.

Comparison of clinical geneticist and computer visual attention in assessing genetic conditions.

Duong, Dat; Johny, Anna Rose; Ledgister Hanchard, Suzanna; Fortney, Christopher; Flaharty, Kendall; Hellmann, Fabio; Hu, Ping; Javanmardi, Behnam; Moosa, Shahida; Patel, Tanviben; Persky, Susan; Sümer, Ömer; Tekendo-Ngongang, Cedrik; Lesmann, Hellen; Hsieh, Tzung-Chien; Waikel, Rebekah L; André, Elisabeth; Krawitz, Peter; Solomon, Benjamin D.

PLoS Genet ; 20(2): e1011168, 2024 Feb.

Artículo en Inglés | MEDLINE | ID: mdl-38412177

RESUMEN

Artificial intelligence (AI) for facial diagnostics is increasingly used in the genetics clinic to evaluate patients with potential genetic conditions. Current approaches focus on one type of AI called Deep Learning (DL). While DL- based facial diagnostic platforms have a high accuracy rate for many conditions, less is understood about how this technology assesses and classifies (categorizes) images, and how this compares to humans. To compare human and computer attention, we performed eye-tracking analyses of geneticist clinicians (n = 22) and non-clinicians (n = 22) who viewed images of people with 10 different genetic conditions, as well as images of unaffected individuals. We calculated the Intersection-over-Union (IoU) and Kullback-Leibler divergence (KL) to compare the visual attentions of the two participant groups, and then the clinician group against the saliency maps of our deep learning classifier. We found that human visual attention differs greatly from DL model's saliency results. Averaging over all the test images, IoU and KL metric for the successful (accurate) clinician visual attentions versus the saliency maps were 0.15 and 11.15, respectively. Individuals also tend to have a specific pattern of image inspection, and clinicians demonstrate different visual attention patterns than non-clinicians (IoU and KL of clinicians versus non-clinicians were 0.47 and 2.73, respectively). This study shows that humans (at different levels of expertise) and a computer vision model examine images differently. Understanding these differences can improve the design and use of AI tools, and lead to more meaningful interactions between clinicians and AI technologies.

Asunto(s)

Inteligencia Artificial , Computadores , Humanos , Simulación por Computador

Approximating facial expression effects on diagnostic accuracy via generative AI in medical genetics.

Patel, Tanviben; Othman, Amna A; Sümer, Ömer; Hellman, Fabio; Krawitz, Peter; André, Elisabeth; Ripper, Molly E; Fortney, Chris; Persky, Susan; Hu, Ping; Tekendo-Ngongang, Cedrik; Hanchard, Suzanna Ledgister; Flaharty, Kendall A; Waikel, Rebekah L; Duong, Dat; Solomon, Benjamin D.

Bioinformatics ; 40(Supplement_1): i110-i118, 2024 Jun 28.

Artículo en Inglés | MEDLINE | ID: mdl-38940144

RESUMEN

Artificial intelligence (AI) is increasingly used in genomics research and practice, and generative AI has garnered significant recent attention. In clinical applications of generative AI, aspects of the underlying datasets can impact results, and confounders should be studied and mitigated. One example involves the facial expressions of people with genetic conditions. Stereotypically, Williams (WS) and Angelman (AS) syndromes are associated with a "happy" demeanor, including a smiling expression. Clinical geneticists may be more likely to identify these conditions in images of smiling individuals. To study the impact of facial expression, we analyzed publicly available facial images of approximately 3500 individuals with genetic conditions. Using a deep learning (DL) image classifier, we found that WS and AS images with non-smiling expressions had significantly lower prediction probabilities for the correct syndrome labels than those with smiling expressions. This was not seen for 22q11.2 deletion and Noonan syndromes, which are not associated with a smiling expression. To further explore the effect of facial expressions, we computationally altered the facial expressions for these images. We trained HyperStyle, a GAN-inversion technique compatible with StyleGAN2, to determine the vector representations of our images. Then, following the concept of InterfaceGAN, we edited these vectors to recreate the original images in a phenotypically accurate way but with a different facial expression. Through online surveys and an eye-tracking experiment, we examined how altered facial expressions affect the performance of human experts. We overall found that facial expression is associated with diagnostic accuracy variably in different genetic conditions.

Asunto(s)

Expresión Facial , Humanos , Aprendizaje Profundo , Inteligencia Artificial , Genética Médica/métodos , Síndrome de Williams/genética

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

Detalles de la búsqueda