Super Co-Alignment | Xiang (Sean) Xu

End-to-end alignment pipeline for multi-modal foundation models. Multi-modal red-teaming, distributed evaluation with LLM-as-judge scoring, and adversarial post-training using preference optimization. Reduces attack success rate by an order of magnitude across text and vision while preserving task quality.

The thing the literature underweights: alignment in production is a moving target. Static reward models decay against adapting attackers, which is why I treat alignment as a continual learning problem and pair this pipeline with the self-evolving agent for continuous updates.

The principle: a one-shot RLHF pass is the start of alignment, not the end. Production alignment lives in the loop. The adversarial component connects to the sharpness-aware optimization work for transferable attacks (Ye et al., 2024) and the principles-of-design work for remote anti-spoofing systems (Xu et al., 2024).

References

2024

CVPRW

Sharpness-Aware Optimization for Real-World Adversarial Attacks for Diverse Compute Platforms with Enhanced Transferability

Muchao Ye, Xiang Xu, Qin Zhang, and 1 more author

In IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2024

Bib PDF

@inproceedings{ye2024sharpness,
  title = {Sharpness-Aware Optimization for Real-World Adversarial Attacks for Diverse Compute Platforms with Enhanced Transferability},
  author = {Ye, Muchao and Xu, Xiang and Zhang, Qin and Wu, Jon},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)},
  year = {2024},
}

arXiv

Principles of Designing Robust Remote Face Anti-Spoofing Systems

Xiang Xu, Tianchen Zhao, Zheng Zhang, and 4 more authors

arXiv preprint arXiv:2406.03684, 2024

arXiv Bib

@article{xu2024principles,
  title = {Principles of Designing Robust Remote Face Anti-Spoofing Systems},
  author = {Xu, Xiang and Zhao, Tianchen and Zhang, Zheng and Li, Zhihua and Wu, Jon and Achille, Alessandro and Srivastava, Mani},
  journal = {arXiv preprint arXiv:2406.03684},
  year = {2024},
}