Detecting tampered media across image, video, and audio with explainable reasoning.
Omni-modal VLMs trained to verify media across image, video, and audio by detecting semantic inconsistencies, temporal artifacts, and audio-visual mismatches. The model outputs chain-of-thought reasoning to make decisions explainable, and generalizes to attacks unseen at training time.
The thread connecting this to the rest of my work: trust models decay because attackers move. The right architectural answer is models that reason about why a piece of content is or isn’t trustworthy, so updates shift the reasoning rather than retrain a black-box classifier.
The closest published work is AuthGuard (Shen et al., 2026), which uses language-guided commonsense reasoning for deepfake detection (AUC gains of 6.15% on DFDC and 16.68% on DF40). Earlier deepfake work used self-consistency learning (Zhao et al., 2021) as the basis for source-feature inconsistency detection. The model-diagnosis-and-correction framework (Chen et al., 2025) closes the loop on this thread: when the model errs, an automated system localizes the cause via attribute editing and synthesizes counterfactual training data to fix it.
@inproceedings{shen2026authguard,title={AuthGuard: Generalizable Deepfake Detection via Language Guidance},author={Shen, Guangyu and Li, Zhihua and Xu, Xiang and Zhao, Tianchen and Zhang, Zheng and An, Dongsheng and Tu, Zhuowen and Xing, Yifan and Zhang, Qin},booktitle={IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},year={2026},}
@inproceedings{chen2025model,title={Model Diagnosis and Correction via Linguistic and Implicit Attribute Editing},author={Chen, Xuanbai and Xu, Xiang and Li, Zhihua and Zhao, Tianchen and Perona, Pietro and Zhang, Qin and Xing, Yifan},booktitle={IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},year={2025},}
@inproceedings{zhao2021selfconsistency,title={Learning Self-Consistency for Deepfake Detection},author={Zhao, Tianchen and Xu, Xiang and Xu, Mingze and Ding, Hui and Xiong, Yuanjun and Xia, Wei},booktitle={IEEE/CVF International Conference on Computer Vision (ICCV)},year={2021},note={Oral}}