Beyond VLM-Based Rewards: Diffusion-Native Latent Reward Modeling

Image generated by Gemini AI
Researchers have introduced DiNa-LRM, a diffusion-native latent reward model that optimizes preference learning directly on noisy diffusion states. This approach utilizes a noise-calibrated Thurstone likelihood to enhance alignment efficiency. DiNa-LRM outperforms existing diffusion-based reward systems and competes with leading Vision-Language Models, achieving significant improvements in speed and resource use during model alignment.
New Diffusion-Native Reward Model Outperforms Vision-Language Models
A novel approach to preference optimization in diffusion models, known as DiNa-LRM, has shown significant advancements over traditional Vision-Language Models (VLMs) in computational efficiency and alignment performance. This model formulates preference learning directly on noisy diffusion states.
DiNa-LRM addresses the limitations of current reward functions that rely on VLMs, which suffer from high computational and memory costs. The method introduces a noise-calibrated Thurstone likelihood that streamlines the optimization process.
Performance Metrics and Comparisons
In image alignment benchmarks, DiNa-LRM demonstrated substantial improvements over current diffusion-based reward models, achieving performance levels competitive with state-of-the-art VLMs at a significantly reduced computational cost. This positions DiNa-LRM as a compelling alternative for optimizing preference in machine learning applications.
Related Topics:
📰 Original Source: https://arxiv.org/abs/2602.11146v1
All rights and credit belong to the original publisher.