Arithmetic with language models: From memorization to computation.

Maltoni, Davide; Ferrara, Matteo

Maltoni, Davide; Ferrara, Matteo.

Afiliação

Maltoni D; Department of Computer Science and Engineering, University of Bologna, Italy. Electronic address: davide.maltoni@unibo.it.
Ferrara M; Department of Computer Science and Engineering, University of Bologna, Italy. Electronic address: matteo.ferrara@unibo.it.

Neural Netw ; 179: 106550, 2024 Nov.

Article em En | MEDLINE | ID: mdl-39068682

ABSTRACT

ABSTRACT

A better understanding of the emergent computation and problem-solving capabilities of recent large language models is of paramount importance to further improve them and broaden their applicability. This work investigates how a language model, trained to predict the next token, can perform arithmetic computations generalizing beyond training data. Binary addition and multiplication constitute a good testbed for this purpose, since they require a very small vocabulary and exhibit relevant input/output discontinuities making smooth input interpolation ineffective for novel data. We successfully trained a light language model to learn these tasks and ran a number of experiments to investigate the extrapolation capabilities and internal information processing. Our findings support the hypothesis that the language model works as an Encoding-Regression-Decoding machine where the computation takes place in the value space once the input token representation is mapped to an appropriate internal representation.

Assuntos

Idioma; Humanos; Matemática; Resolução de Problemas/fisiologia; Redes Neurais de Computação; Memória/fisiologia

Palavras-chave

AI explainability; Arithmetic; Interpretability; Language models; Probing

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Idioma Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Idioma Idioma: En Ano de publicação: 2024 Tipo de documento: Article