JUST-DUB-IT: Video Dubbing via Joint Audio-Visual Diffusion

•

Original Author:Anthony Chen et al.

•

January 29, 2026

JUST-DUB-IT: Video Dubbing via Joint Audio-Visual Diffusion

Image generated by Gemini AI

Researchers have developed a novel approach to video dubbing using a single-model audio-video diffusion model enhanced by a lightweight LoRA. This method allows for real-time translation of audio and synchronized facial motion by generating multilingual videos with language switches. The model maintains speaker identity and lip sync while enhancing visual quality, outperforming traditional dubbing pipelines in real-world scenarios.

JUST-DUB-IT: Advancements in Video Dubbing Technology

A new approach to video dubbing, termed JUST-DUB-IT, leverages a foundational audio-video diffusion model to enhance the quality and efficiency of dubbing processes. This innovative method addresses the limitations of current task-specific pipelines that often falter in real-world applications.

JUST-DUB-IT utilizes a low-rank adaptation (LoRA) for video-to-video dubbing, allowing for the simultaneous generation of translated audio and synchronized facial movements, significantly improving the dubbing experience.

Key benefits include:

High-quality dubbed videos with enhanced visual fidelity.
Improved lip synchronization, crucial for viewer engagement.
Robustness against complex motions and real-world dynamics.

Comparative evaluations demonstrate that this model surpasses existing dubbing pipelines, offering a more coherent and realistic dubbing experience.

Share this article

Twitter Facebook LinkedIn WhatsApp Reddit

JUST-DUB-IT: Video Dubbing via Joint Audio-Visual Diffusion

JUST-DUB-IT: Advancements in Video Dubbing Technology

Related Topics:

Share this article