On-Policy Context Distillation for Language Models

Image generated by Gemini AI
A new framework called On-Policy Context Distillation (OPCD) enhances language models by allowing them to internalize knowledge from their own generated outputs. This method effectively consolidates experiential knowledge and optimizes system prompts, leading to improved accuracy in tasks like mathematical reasoning and text-based games. OPCD also facilitates knowledge transfer from larger to smaller models, outperforming existing baseline techniques.
On-Policy Context Distillation Framework Introduced for Language Models
A new framework, On-Policy Context Distillation (OPCD), has been proposed to enhance language models by enabling them to internalize in-context knowledge more effectively. The OPCD framework trains a student model using its own generated trajectories while minimizing the reverse Kullback-Leibler divergence against a context-conditioned teacher model. This method has shown promise in experiential knowledge distillation and system prompt distillation.
Performance Outcomes
The effectiveness of OPCD has been validated across multiple domains, including:
- Mathematical reasoning
- Text-based games
- Domain-specific tasks
In these applications, OPCD consistently outperformed baseline methods, achieving higher task accuracy and demonstrating better preservation of out-of-distribution capabilities.
Related Topics:
📰 Original Source: https://arxiv.org/abs/2602.12275v1
All rights and credit belong to the original publisher.