Búsqueda | Portal Regional de la BVS

Evaluating language models for mathematics through interactions.

Collins, Katherine M; Jiang, Albert Q; Frieder, Simon; Wong, Lionel; Zilka, Miri; Bhatt, Umang; Lukasiewicz, Thomas; Wu, Yuhuai; Tenenbaum, Joshua B; Hart, William; Gowers, Timothy; Li, Wenda; Weller, Adrian; Jamnik, Mateja.

Proc Natl Acad Sci U S A ; 121(24): e2318124121, 2024 Jun 11.

Artículo en Inglés | MEDLINE | ID: mdl-38830100

RESUMEN

There is much excitement about the opportunity to harness the power of large language models (LLMs) when building problem-solving assistants. However, the standard methodology of evaluating LLMs relies on static pairs of inputs and outputs; this is insufficient for making an informed decision about which LLMs are best to use in an interactive setting, and how that varies by setting. Static assessment therefore limits how we understand language model capabilities. We introduce CheckMate, an adaptable prototype platform for humans to interact with and evaluate LLMs. We conduct a study with CheckMate to evaluate three language models (InstructGPT, ChatGPT, and GPT-4) as assistants in proving undergraduate-level mathematics, with a mixed cohort of participants from undergraduate students to professors of mathematics. We release the resulting interaction and rating dataset, MathConverse. By analyzing MathConverse, we derive a taxonomy of human query behaviors and uncover that despite a generally positive correlation, there are notable instances of divergence between correctness and perceived helpfulness in LLM generations, among other findings. Further, we garner a more granular understanding of GPT-4 mathematical problem-solving through a series of case studies, contributed by experienced mathematicians. We conclude with actionable takeaways for ML practitioners and mathematicians: models that communicate uncertainty, respond well to user corrections, and can provide a concise rationale for their recommendations, may constitute better assistants. Humans should inspect LLM output carefully given their current shortcomings and potential for surprising fallibility.

Asunto(s)

Lenguaje , Matemática , Solución de Problemas , Humanos , Solución de Problemas/fisiología , Estudiantes/psicología

Unsupervised construction of computational graphs for gene expression data with explicit structural inductive biases.

Scherer, Paul; Trebacz, Maja; Simidjievski, Nikola; Viñas, Ramon; Shams, Zohreh; Terre, Helena Andres; Jamnik, Mateja; Liò, Pietro.

Bioinformatics ; 38(5): 1320-1327, 2022 02 07.

Artículo en Inglés | MEDLINE | ID: mdl-34888618

RESUMEN

MOTIVATION: Gene expression data are commonly used at the intersection of cancer research and machine learning for better understanding of the molecular status of tumour tissue. Deep learning predictive models have been employed for gene expression data due to their ability to scale and remove the need for manual feature engineering. However, gene expression data are often very high dimensional, noisy and presented with a low number of samples. This poses significant problems for learning algorithms: models often overfit, learn noise and struggle to capture biologically relevant information. In this article, we utilize external biological knowledge embedded within structures of gene interaction graphs such as protein-protein interaction (PPI) networks to guide the construction of predictive models. RESULTS: We present Gene Interaction Network Constrained Construction (GINCCo), an unsupervised method for automated construction of computational graph models for gene expression data that are structurally constrained by prior knowledge of gene interaction networks. We employ this methodology in a case study on incorporating a PPI network in cancer phenotype prediction tasks. Our computational graphs are structurally constructed using topological clustering algorithms on the PPI networks which incorporate inductive biases stemming from network biology research on protein complex discovery. Each of the entities in the GINCCo computational graph represents biological entities such as genes, candidate protein complexes and phenotypes instead of arbitrary hidden nodes of a neural network. This provides a biologically relevant mechanism for model regularization yielding strong predictive performance while drastically reducing the number of model parameters and enabling guided post-hoc enrichment analyses of influential gene sets with respect to target phenotypes. Our experiments analysing a variety of cancer phenotypes show that GINCCo often outperforms support vector machine, Fully Connected Multi-layer Perceptrons (MLP) and Randomly Connected MLPs despite greatly reduced model complexity. AVAILABILITY AND IMPLEMENTATION: https://github.com/paulmorio/gincco contains the source code for our approach. We also release a library with algorithms for protein complex discovery within PPI networks at https://github.com/paulmorio/protclus. This repository contains implementations of the clustering algorithms used in this article. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Algoritmos , Neoplasias , Humanos , Redes Neurales de la Computación , Programas Informáticos , Neoplasias/genética , Sesgo , Expresión Génica , Biología Computacional/métodos

Human inference beyond syllogisms: an approach using external graphical representations.

Sato, Yuri; Stapleton, Gem; Jamnik, Mateja; Shams, Zohreh.

Cogn Process ; 20(1): 103-115, 2019 Feb.

Artículo en Inglés | MEDLINE | ID: mdl-30076513

RESUMEN

Research in psychology about reasoning has often been restricted to relatively inexpressive statements involving quantifiers (e.g. syllogisms). This is limited to situations that typically do not arise in practical settings, like ontology engineering. In order to provide an analysis of inference, we focus on reasoning tasks presented in external graphic representations where statements correspond to those involving multiple quantifiers and unary and binary relations. Our experiment measured participants' performance when reasoning with two notations. The first notation used topological constraints to convey information via node-link diagrams (i.e. graphs). The second used topological and spatial constraints to convey information (Euler diagrams with additional graph-like syntax). We found that topo-spatial representations were more effective for inferences than topological representations alone. Reasoning with statements involving multiple quantifiers was harder than reasoning with single quantifiers in topological representations, but not in topo-spatial representations. These findings are compared to those in sentential reasoning tasks.

Asunto(s)

Presentación de Datos , Solución de Problemas , Humanos

Variational Autoencoders for Cancer Data Integration: Design Principles and Computational Practice.

Simidjievski, Nikola; Bodnar, Cristian; Tariq, Ifrah; Scherer, Paul; Andres Terre, Helena; Shams, Zohreh; Jamnik, Mateja; Liò, Pietro.

Front Genet ; 10: 1205, 2019.

Artículo en Inglés | MEDLINE | ID: mdl-31921281

RESUMEN

International initiatives such as the Molecular Taxonomy of Breast Cancer International Consortium are collecting multiple data sets at different genome-scales with the aim to identify novel cancer bio-markers and predict patient survival. To analyze such data, several machine learning, bioinformatics, and statistical methods have been applied, among them neural networks such as autoencoders. Although these models provide a good statistical learning framework to analyze multi-omic and/or clinical data, there is a distinct lack of work on how to integrate diverse patient data and identify the optimal design best suited to the available data.In this paper, we investigate several autoencoder architectures that integrate a variety of cancer patient data types (e.g., multi-omics and clinical data). We perform extensive analyses of these approaches and provide a clear methodological and computational framework for designing systems that enable clinicians to investigate cancer traits and translate the results into clinical applications. We demonstrate how these networks can be designed, built, and, in particular, applied to tasks of integrative analyses of heterogeneous breast cancer data. The results show that these approaches yield relevant data representations that, in turn, lead to accurate and stable diagnosis.

What is a proof?

Bundy, Alan; Jamnik, Mateja; Fugard, Andrew.

Philos Trans A Math Phys Eng Sci ; 363(1835): 2377-88; discussion 2388-91, 2005 Oct 15.

Artículo en Inglés | MEDLINE | ID: mdl-16188611

RESUMEN

To those brought up in a logic-based tradition there seems to be a simple and clear definition of proof. But this is largely a twentieth century invention; many earlier proofs had a different nature. We will look particularly at the faulty proof of Euler's Theorem and Lakatos' rational reconstruction of the history of this proof. We will ask: how is it possible for the errors in a faulty proof to remain undetected for several years-even when counter-examples to it are known? How is it possible to have a proof about concepts that are only partially defined? And can we give a logic-based account of such phenomena? We introduce the concept of schematic proofs and argue that they offer a possible cognitive model for the human construction of proofs in mathematics. In particular, we show how they can account for persistent errors in proofs.

Asunto(s)

Algoritmos , Cultura , Cómputos Matemáticos , Modelos Teóricos , Análisis Numérico Asistido por Computador , Validación de Programas de Computación , Programas Informáticos

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA