Pesquisa | BVS IEC

LLM Comparator: Interactive Analysis of Side-by-Side Evaluation of Large Language Models.

Kahng, Minsuk; Tenney, Ian; Pushkarna, Mahima; Liu, Michael Xieyang; Wexler, James; Reif, Emily; Kallarackal, Krystal; Chang, Minsuk; Terry, Michael; Dixon, Lucas.

IEEE Trans Vis Comput Graph ; PP2024 Sep 10.

Artigo em Inglês | MEDLINE | ID: mdl-39255096

RESUMO

Evaluating large language models (LLMs) presents unique challenges. While automatic side-by-side evaluation, also known as LLM-as-a-judge, has become a promising solution, model developers and researchers face difficulties with scalability and interpretability when analyzing these evaluation outcomes. To address these challenges, we introduce LLM Comparator, a new visual analytics tool designed for side-by-side evaluations of LLMs. This tool provides analytical workflows that help users understand when and why one LLM outperforms or underperforms another, and how their responses differ. Through close collaboration with practitioners developing LLMs at Google, we have iteratively designed, developed, and refined the tool. Qualitative feedback from these users highlights that the tool facilitates in-depth analysis of individual examples while enabling users to visually overview and flexibly slice data. This empowers users to identify undesirable patterns, formulate hypotheses about model behavior, and gain insights for model improvement. LLM Comparator has been integrated into Google's LLM evaluation platforms and open-sourced.

The What-If Tool: Interactive Probing of Machine Learning Models.

Wexler, James; Pushkarna, Mahima; Bolukbasi, Tolga; Wattenberg, Martin; Viegas, Fernanda; Wilson, Jimbo.

IEEE Trans Vis Comput Graph ; 26(1): 56-65, 2020 01.

Artigo em Inglês | MEDLINE | ID: mdl-31442996

RESUMO

A key challenge in developing and deploying Machine Learning (ML) systems is understanding their performance across a wide range of inputs. To address this challenge, we created the What-If Tool, an open-source application that allows practitioners to probe, visualize, and analyze ML systems, with minimal coding. The What-If Tool lets practitioners test performance in hypothetical situations, analyze the importance of different data features, and visualize model behavior across multiple models and subsets of input data. It also lets practitioners measure systems according to multiple ML fairness metrics. We describe the design of the tool, and report on real-life usage at different organizations.

Assuntos

Gráficos por Computador , Simulação por Computador , Aprendizado de Máquina , Software , Interface Usuário-Computador , Bases de Dados Factuais , Humanos

Visualizing Dataflow Graphs of Deep Learning Models in TensorFlow.

Wongsuphasawat, Kanit; Smilkov, Daniel; Wexler, James; Wilson, Jimbo; Mane, Dandelion; Fritz, Doug; Krishnan, Dilip; Viegas, Fernanda B; Wattenberg, Martin.

IEEE Trans Vis Comput Graph ; 24(1): 1-12, 2018 01.

Artigo em Inglês | MEDLINE | ID: mdl-28866562

RESUMO

We present a design study of the TensorFlow Graph Visualizer, part of the TensorFlow machine intelligence platform. This tool helps users understand complex machine learning architectures by visualizing their underlying dataflow graphs. The tool works by applying a series of graph transformations that enable standard layout techniques to produce a legible interactive diagram. To declutter the graph, we decouple non-critical nodes from the layout. To provide an overview, we build a clustered graph using the hierarchical structure annotated in the source code. To support exploration of nested structure on demand, we perform edge bundling to enable stable and responsive cluster expansion. Finally, we detect and highlight repeated structures to emphasize a model's modular composition. To demonstrate the utility of the visualizer, we describe example usage scenarios and report user feedback. Overall, users find the visualizer useful for understanding, debugging, and sharing the structures of their models.

Scalable and accurate deep learning with electronic health records.

Rajkomar, Alvin; Oren, Eyal; Chen, Kai; Dai, Andrew M; Hajaj, Nissan; Hardt, Michaela; Liu, Peter J; Liu, Xiaobing; Marcus, Jake; Sun, Mimi; Sundberg, Patrik; Yee, Hector; Zhang, Kun; Zhang, Yi; Flores, Gerardo; Duggan, Gavin E; Irvine, Jamie; Le, Quoc; Litsch, Kurt; Mossin, Alexander; Tansuwan, Justin; Wang, De; Wexler, James; Wilson, Jimbo; Ludwig, Dana; Volchenboum, Samuel L; Chou, Katherine; Pearson, Michael; Madabushi, Srinivasan; Shah, Nigam H; Butte, Atul J; Howell, Michael D; Cui, Claire; Corrado, Greg S; Dean, Jeffrey.

NPJ Digit Med ; 1: 18, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-31304302

RESUMO

Predictive modeling with electronic health record (EHR) data is anticipated to drive personalized medicine and improve healthcare quality. Constructing predictive statistical models typically requires extraction of curated predictor variables from normalized EHR data, a labor-intensive process that discards the vast majority of information in each patient's record. We propose a representation of patients' entire raw EHR records based on the Fast Healthcare Interoperability Resources (FHIR) format. We demonstrate that deep learning methods using this representation are capable of accurately predicting multiple medical events from multiple centers without site-specific data harmonization. We validated our approach using de-identified EHR data from two US academic medical centers with 216,221 adult patients hospitalized for at least 24 h. In the sequential format we propose, this volume of EHR data unrolled into a total of 46,864,534,945 data points, including clinical notes. Deep learning models achieved high accuracy for tasks such as predicting: in-hospital mortality (area under the receiver operator curve [AUROC] across sites 0.93-0.94), 30-day unplanned readmission (AUROC 0.75-0.76), prolonged length of stay (AUROC 0.85-0.86), and all of a patient's final discharge diagnoses (frequency-weighted AUROC 0.90). These models outperformed traditional, clinically-used predictive models in all cases. We believe that this approach can be used to create accurate and scalable predictions for a variety of clinical scenarios. In a case study of a particular prediction, we demonstrate that neural networks can be used to identify relevant information from the patient's chart.

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA