Omni-Modal Trust & Verification

Omni-modal VLMs trained to verify media across image, video, and audio by detecting semantic inconsistencies, temporal artifacts, and audio-visual mismatches. The model outputs chain-of-thought reasoning to make decisions explainable, and generalizes to attacks unseen at training time.

The thread connecting this to the rest of my work: trust models decay because attackers move. The right architectural answer is models that reason about why a piece of content is or isn’t trustworthy, so updates shift the reasoning rather than retrain a black-box classifier.

The closest published work is AuthGuard (Shen et al., 2026), which uses language-guided commonsense reasoning for deepfake detection (AUC gains of 6.15% on DFDC and 16.68% on DF40). Earlier deepfake work used self-consistency learning (Zhao et al., 2021) as the basis for source-feature inconsistency detection. The model-diagnosis-and-correction framework (Chen et al., 2025) closes the loop on this thread: when the model errs, an automated system localizes the cause via attribute editing and synthesizes counterfactual training data to fix it.

References

2026

WACV

AuthGuard: Generalizable Deepfake Detection via Language Guidance

Guangyu Shen, Zhihua Li, Xiang Xu, and 6 more authors

In IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2026

arXiv Bib PDF

@inproceedings{shen2026authguard,
  title = {AuthGuard: Generalizable Deepfake Detection via Language Guidance},
  author = {Shen, Guangyu and Li, Zhihua and Xu, Xiang and Zhao, Tianchen and Zhang, Zheng and An, Dongsheng and Tu, Zhuowen and Xing, Yifan and Zhang, Qin},
  booktitle = {IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
  year = {2026},
}

2025

CVPR

Model Diagnosis and Correction via Linguistic and Implicit Attribute Editing

Xuanbai Chen, Xiang Xu, Zhihua Li, and 4 more authors

In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025

Bib PDF

@inproceedings{chen2025model,
  title = {Model Diagnosis and Correction via Linguistic and Implicit Attribute Editing},
  author = {Chen, Xuanbai and Xu, Xiang and Li, Zhihua and Zhao, Tianchen and Perona, Pietro and Zhang, Qin and Xing, Yifan},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2025},
}

2021

ICCV

Learning Self-Consistency for Deepfake Detection

Tianchen Zhao, Xiang Xu, Mingze Xu, and 3 more authors

In IEEE/CVF International Conference on Computer Vision (ICCV), 2021

Oral

arXiv Bib PDF

@inproceedings{zhao2021selfconsistency,
  title = {Learning Self-Consistency for Deepfake Detection},
  author = {Zhao, Tianchen and Xu, Xiang and Xu, Mingze and Ding, Hui and Xiong, Yuanjun and Xia, Wei},
  booktitle = {IEEE/CVF International Conference on Computer Vision (ICCV)},
  year = {2021},
  note = {Oral}
}