CM2: Reinforcement Learning with Checklist Rewards for Multi-Turn and Multi-Step Agentic Tool Use

Image generated by Gemini AI
Researchers have introduced CM2, a reinforcement learning framework designed for multi-turn interactions with AI agents. CM2 replaces traditional verifiable rewards with checklist-based criteria, enabling more stable performance assessments. Trained in a simulated environment, CM2 demonstrated significant improvements over existing models, achieving higher scores on benchmarks like tau^-Bench and ToolSandbox. This approach offers a scalable method for enhancing AI tool use without the need for extensive engineering on reward systems. The code is available for public use at GitHub.
CM2: A New Framework for Reinforcement Learning in Multi-Turn Tool Use
Researchers have introduced CM2, a reinforcement learning (RL) framework designed to enhance AI agents' performance in multi-turn interactions and tool use. This approach addresses critical challenges in RL, particularly the complexities of building and maintaining executable tool environments.
Checklist Rewards and Evaluation Criteria
CM2 replaces conventional outcome rewards with checklist rewards, allowing for a systematic assessment of agent performance. It decomposes intended behavior into detailed binary criteria, transforming performance evaluations into stable, classification-style decisions. The framework employs sparse reward assignment while maintaining dense evaluation criteria.
Performance Outcomes
In testing, CM2 demonstrated substantial improvements over supervised fine-tuning techniques. Using an 8 billion parameter base model and an 8,000-example RL dataset, CM2 achieved:
- An 8-point increase on the tau^-Bench evaluation.
- A 10-point improvement on the BFCL-V4 benchmark.
- A 12-point gain on ToolSandbox.
These results indicate superior performance compared to traditional methods and position CM2 on par with, or exceeding, the capabilities of similarly sized open-source models.
CM2's framework is accessible through the open-source community: CM2-RLCR-Tool-Agent on GitHub.
Related Topics:
📰 Original Source: https://arxiv.org/abs/2602.12268v1
All rights and credit belong to the original publisher.