Xiangxin Zhou

I am a fifth-year Ph.D. student at the Institute of Automation, Chinese Academy of Sciences (CASIA) and University of Chinese Academy of Sciences (UCAS), advised by Liang Wang.

I work on LLM RL and agents — the engine (principled algorithms that scale), the loop (intelligence that improves itself), and the interface (agents that act on the world).

I received my B.Eng. from Tsinghua University in 2021. I have interned at Tencent Hunyuan, Sea AI Lab, Xiaohongshu Hi Lab, ByteDance Seed, ByteDance AI Lab, and ByteDance AML.


Publications

* Equal Contribution   † Project Lead   # Corresponding Author
Rethinking the Divergence Regularization in LLM RL
Jiarui Yao*, Xiangxin Zhou*, Penghui Qi*, Wee Sun Lee, Liefeng Bo, Tianyu Pang#
arXiv Preprint, 2026
Rethinking the Trust Region in LLM Reinforcement Learning
Penghui Qi*, Xiangxin Zhou*, Zichen Liu, Tianyu Pang, Chao Du, Min Lin, Wee Sun Lee
International Conference on Machine Learning (ICML), 2026
Variational Reasoning for Language Models
Xiangxin Zhou*, Zichen Liu, Haonan Wang, Chao Du, Min Lin, Chongxuan Li, Liang Wang, Tianyu Pang*#
International Conference on Learning Representations (ICLR), 2026
Reinforcing General Reasoning Without Verifiers
Xiangxin Zhou*, Zichen Liu*, Anya Sims*, Haonan Wang, Tianyu Pang, Chongxuan Li, Liang Wang, Min Lin, Chao Du#
International Conference on Learning Representations (ICLR), 2026
Defeating the Training-Inference Mismatch via FP16
Penghui Qi*, Zichen Liu*, Xiangxin Zhou*, Tianyu Pang, Chao Du, Wee Sun Lee, Min Lin
Preprint., 2025
Stabilizing Policy Gradients for Stochastic Differential Equations via Consistency with Perturbation Process
Xiangxin Zhou, Liang Wang, Yichi Zhou#
International Conference on Machine Learning (ICML), 2024