https://arxiv.org/html/2504.11536v1
Introducing ReTool, a reinforcement learning framework for strategic tool use in LLMs.
Enhances long-form reasoning by integrating real-time code execution.
Develops an automated RL paradigm for multi-turn code execution and tool invocation
Introducing ReTool, a reinforcement learning framework for strategic tool use in LLMs.
Enhances long-form reasoning by integrating real-time code execution.
Develops an automated RL paradigm for multi-turn code execution and tool invocation
https://arxiv.org/abs/2504.15777
Tina demonstrates that substantial reasoning performance can be developed using only minimal resources, by applying parameter-efficient updates during reinforcement learning (RL), using low-rank adaptation (LoRA), to an already tiny 1.5B parameter base model.
Tina demonstrates that substantial reasoning performance can be developed using only minimal resources, by applying parameter-efficient updates during reinforcement learning (RL), using low-rank adaptation (LoRA), to an already tiny 1.5B parameter base model.
arXiv.org
Tina: Tiny Reasoning Models via LoRA
How cost-effectively can strong reasoning abilities be achieved in language models? Driven by this fundamental question, we present Tina, a family of tiny reasoning models achieved with high...