Beyond VLM-Based Rewards: Diffusion-Native Latent Reward Modeling

•

Original Author:Gongye Liu et al.

•

February 11, 2026

Beyond VLM-Based Rewards: Diffusion-Native Latent Reward Modeling

Image generated by Gemini AI

Researchers have introduced DiNa-LRM, a diffusion-native latent reward model that optimizes preference learning directly on noisy diffusion states. This approach utilizes a noise-calibrated Thurstone likelihood to enhance alignment efficiency. DiNa-LRM outperforms existing diffusion-based reward systems and competes with leading Vision-Language Models, achieving significant improvements in speed and resource use during model alignment.

New Diffusion-Native Reward Model Outperforms Vision-Language Models

A novel approach to preference optimization in diffusion models, known as DiNa-LRM, has shown significant advancements over traditional Vision-Language Models (VLMs) in computational efficiency and alignment performance. This model formulates preference learning directly on noisy diffusion states.

DiNa-LRM addresses the limitations of current reward functions that rely on VLMs, which suffer from high computational and memory costs. The method introduces a noise-calibrated Thurstone likelihood that streamlines the optimization process.

Performance Metrics and Comparisons

In image alignment benchmarks, DiNa-LRM demonstrated substantial improvements over current diffusion-based reward models, achieving performance levels competitive with state-of-the-art VLMs at a significantly reduced computational cost. This positions DiNa-LRM as a compelling alternative for optimizing preference in machine learning applications.

Share this article

Twitter Facebook LinkedIn WhatsApp Reddit

Beyond VLM-Based Rewards: Diffusion-Native Latent Reward Modeling

New Diffusion-Native Reward Model Outperforms Vision-Language Models

Performance Metrics and Comparisons

Related Topics:

Share this article