Distilling mathematical reasoning capabilities into Small Language Models.

Zhu, Xunyu; Li, Jian; Liu, Yong; Ma, Can; Wang, Weiping

Zhu, Xunyu; Li, Jian; Liu, Yong; Ma, Can; Wang, Weiping.

Afiliación

Zhu X; Institute of Information Engineering, Chinese Academy of Sciences, China; School of Cyber Security, University of Chinese Academy of Sciences, China. Electronic address: zhuxunyu@iie.ac.cn.
Li J; Institute of Information Engineering, Chinese Academy of Sciences, China; School of Cyber Security, University of Chinese Academy of Sciences, China. Electronic address: lijian9026@iie.ac.cn.
Liu Y; Gaoling School of Artificial Intelligence, Renmin University of China, China. Electronic address: liuyonggsai@ruc.edu.cn.
Ma C; Institute of Information Engineering, Chinese Academy of Sciences, China; School of Cyber Security, University of Chinese Academy of Sciences, China. Electronic address: macan@iie.ac.cn.
Wang W; Institute of Information Engineering, Chinese Academy of Sciences, China; School of Cyber Security, University of Chinese Academy of Sciences, China. Electronic address: wangweiping@iie.ac.cn.

Neural Netw ; 179: 106594, 2024 Nov.

Article en En | MEDLINE | ID: mdl-39121788

ABSTRACT

ABSTRACT

This work addresses the challenge of democratizing advanced Large Language Models (LLMs) by compressing their mathematical reasoning capabilities into sub-billion parameter Small Language Models (SLMs) without compromising performance. We introduce Equation-of-Thought Distillation (EoTD), a novel technique that encapsulates the reasoning process into equation-based representations to construct an EoTD dataset for fine-tuning SLMs. Additionally, we propose the Ensemble Thoughts Distillation (ETD) framework to enhance the reasoning performance of SLMs. This involves creating a reasoning dataset with multiple thought processes, including Chain-of-Thought (CoT), Program-of-Thought (PoT), and Equation-of-Thought (EoT), and using it for fine-tuning. Our experimental performance demonstrates that EoTD significantly boosts the reasoning abilities of SLMs, while ETD enables these models to achieve state-of-the-art reasoning performance.

Asunto(s)

Lenguaje; Humanos; Pensamiento/fisiología; Redes Neurales de la Computación; Modelos Teóricos; Matemática

Palabras clave

Chain-of-Thought; Knowledge Distillation; Large language models; Mathematical reasoning; Program-of-Thought

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Lenguaje Límite: Humans Idioma: En Revista: Neural Netw Asunto de la revista: NEUROLOGIA Año: 2024 Tipo del documento: Article

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google