
Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation
Researchers have developed TransPhy3D, a synthetic video dataset of 11,000 sequences showcasing transparent and reflective scenes using Blender/Cycles. This dataset aids in training DKT, a video-to-video translator that improves depth and normal estimation for transparent objects. DKT achieves state-of-the-art performance on benchmarks like ClearPose and enhances grasping success rates on complex surfaces, demonstrating the potential of repurposing diffusion models for advanced perception tasks in robotics.










