Xiangxin Zhou

Email Scholar GitHub Twitter

I am a fifth-year Ph.D. student at the Institute of Automation, Chinese Academy of Sciences (CASIA) and University of Chinese Academy of Sciences (UCAS), advised by Liang Wang.

I work on LLM RL and agents — the engine (principled algorithms that scale), the loop (intelligence that improves itself), and the interface (agents that act on the world).

I received my B.Eng. from Tsinghua University in 2021. I have interned at Tencent Hunyuan, Sea AI Lab, Xiaohongshu Hi Lab, ByteDance Seed, ByteDance AI Lab, and ByteDance AML.

Publications

* Equal Contribution † Project Lead # Corresponding Author

Rethinking the Divergence Regularization in LLM RL

Jiarui Yao^*, Xiangxin Zhou^*^†, Penghui Qi^*^†, Wee Sun Lee, Liefeng Bo, Tianyu Pang^#

arXiv Preprint, 2026

Rethinking the Trust Region in LLM Reinforcement Learning

Penghui Qi^*, Xiangxin Zhou^*, Zichen Liu, Tianyu Pang, Chao Du, Min Lin, Wee Sun Lee

International Conference on Machine Learning (ICML), 2026

Variational Reasoning for Language Models

Xiangxin Zhou^*, Zichen Liu, Haonan Wang, Chao Du, Min Lin, Chongxuan Li, Liang Wang, Tianyu Pang^*^#

International Conference on Learning Representations (ICLR), 2026

Reinforcing General Reasoning Without Verifiers

Xiangxin Zhou^*, Zichen Liu^*, Anya Sims^*, Haonan Wang, Tianyu Pang, Chongxuan Li, Liang Wang, Min Lin, Chao Du^#

International Conference on Learning Representations (ICLR), 2026

Defeating the Training-Inference Mismatch via FP16

Penghui Qi^*, Zichen Liu^*, Xiangxin Zhou^*, Tianyu Pang, Chao Du, Wee Sun Lee, Min Lin

arXiv Preprint, 2025

Stabilizing Policy Gradients for Stochastic Differential Equations via Consistency with Perturbation Process

Xiangxin Zhou, Liang Wang, Yichi Zhou^#

International Conference on Machine Learning (ICML), 2024