On-Policy Context Distillation for Language Models

•

Original Author:Tianzhu Ye et al.

•

February 12, 2026

On-Policy Context Distillation for Language Models

Image generated by Gemini AI

A new framework called On-Policy Context Distillation (OPCD) enhances language models by allowing them to internalize knowledge from their own generated outputs. This method effectively consolidates experiential knowledge and optimizes system prompts, leading to improved accuracy in tasks like mathematical reasoning and text-based games. OPCD also facilitates knowledge transfer from larger to smaller models, outperforming existing baseline techniques.

On-Policy Context Distillation Framework Introduced for Language Models

A new framework, On-Policy Context Distillation (OPCD), has been proposed to enhance language models by enabling them to internalize in-context knowledge more effectively. The OPCD framework trains a student model using its own generated trajectories while minimizing the reverse Kullback-Leibler divergence against a context-conditioned teacher model. This method has shown promise in experiential knowledge distillation and system prompt distillation.

Performance Outcomes

The effectiveness of OPCD has been validated across multiple domains, including:

Mathematical reasoning
Text-based games
Domain-specific tasks

In these applications, OPCD consistently outperformed baseline methods, achieving higher task accuracy and demonstrating better preservation of out-of-distribution capabilities.

Share this article

Twitter Facebook LinkedIn WhatsApp Reddit

On-Policy Context Distillation for Language Models

On-Policy Context Distillation Framework Introduced for Language Models

Performance Outcomes

Related Topics:

Share this article