ExposeAnyone: Personalized Audio-to-Expression Diffusion Models Are Robust Zero-Shot Face Forgery Detectors

Image generated by Gemini AI
Researchers have introduced ExposeAnyone, a self-supervised method for detecting deepfake manipulations using a diffusion model that generates expression sequences from audio. By personalizing models to specific subjects, it computes identity distances to identify forgeries. This approach outperforms existing methods by 4.22 AUC points and effectively detects challenging Sora2-generated videos while remaining robust against distortions like blur and compression, enhancing real-world applicability in face forgery detection.
ExposeAnyone: A Breakthrough in Face Forgery Detection
ExposeAnyone, a self-supervised model, significantly enhances the detection of deepfake manipulations, particularly unknown variations. This approach utilizes a diffusion model to generate expression sequences from audio, allowing for robust detection of face forgery.
Current methods rely on supervised training, which can lead to overfitting. ExposeAnyone employs a fully self-supervised framework that personalizes the model to specific subjects through reference datasets, enabling it to compute identity distances between suspected videos and reference subjects by analyzing diffusion reconstruction errors.
Performance Metrics
Experiments have showcased ExposeAnyone's effectiveness across multiple datasets, including DF-TIMIT and DFDCP. Key findings include:
- A 4.22 percentage point improvement in average Area Under the Curve (AUC) over previous methods.
- Enhanced capability to detect videos generated by the Sora2 model.
- Strong robustness against common distortions such as blur and compression.
Related Topics:
📰 Original Source: https://arxiv.org/abs/2601.02359v1
All rights and credit belong to the original publisher.