[WACV 2026] AuViRe: Audio-visual Speech Representation Reconstruction for Deepfake Temporal Localization (with Model Checkpoints)
Authors/Creators
Description
Model checkpoints of the WACV 2026 paper "AuViRe: Audio-visual Speech Representation Reconstruction for Deepfake Temporal Localization".
Abstract. With the rapid advancement of sophisticated synthetic audio-visual content, e.g., for subtle malicious manipulations, ensuring the integrity of digital media has become paramount. This work presents a novel approach to temporal localization of deepfakes by leveraging Audio-Visual Speech Representation Reconstruction (AuViRe). Specifically, our approach reconstructs speech representations from one modality (e.g., lip movements) based on the other (e.g., audio waveform). Cross-modal reconstruction is significantly more challenging in manipulated video segments, leading to amplified discrepancies, thereby providing robust discriminative cues for precise temporal forgery localization. AuViRe outperforms the state of the art by +8.9 AP@0.95 on LAV-DF, +9.6 AP@0.5 on AV-Deepfake1M, and +5.1 AUC on an in-the-wild experiment. Code available at https://github.com/mever-team/auvire.
Files
WACV_2026_AuViRe.pdf
Files
(257.1 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:098cb9d0676276e3705a5ab5f57507ad
|
107.4 MB | Download |
|
md5:a694ab3fce5a1706f03d51d7f04e0261
|
145.5 MB | Download |
|
md5:d42c25bcd40a3a24db83815d392478d1
|
4.2 MB | Preview Download |
Additional details
Funding
Software
- Repository URL
- https://github.com/mever-team/auvire
- Programming language
- Python