Towards Robust Zero-Shot Reinforcement Learning

Abstract

The recent development of zero-shot reinforcement learning (RL) has opened a new avenue for learning pre-trained generalist policies that can adapt to arbitrary new tasks in a zero-shot manner. While the popular Forward-Backward (FB) representations and related methods have shown promise in zero-shot RL, we found empirically that their non-expressive modeling architecture and extrapolation errors caused by out-of-the-distribution (OOD) actions during offline learning often lead to potentially biased and non-robust representations, eventually causing suboptimal performance. To address these issues, we propose Behavior-Rgularized Zero-shot RL with Expressivity enhancement (BREEZE), an upgraded FB-based framework that simultaneously enhances offline learning stability, FB representation learning quality, and policy extraction capability. Specifically, BREEZE introduces behavioral regularization in zero-shot RL policy learning, transforming policy optimization into a stable in-sample learning paradigm. Additionally, BREEZE employs the expressive self-attention based architectures for forward and backward representations, providing more accurate successor measure estimates that capture the complicated environment-task relationships. Moreover, BREEZE extracts the policy with a guided task-conditioned diffusion model, enabling optimal action synthesis while capturing the highly multi-modal action distributions in zero-shot RL settings. Extensive experiments on ExORL and FrankaKitchen demonstrate that BREEZE achieves state-of-the-art performance while exhibiting superior robustness compared to prior offline zero-shot RL methods.

Publication
In the Thirty-Ninth Conference on Neural Information Processing Systems (NeurIPS 2025)
Kexin ZHENG
Kexin ZHENG
Research Intern

Undergrad student at The Chinese University of Hong Kong, Hong Kong

Yinan Zheng
Yinan Zheng
PhD Candidate
Yu Luo
Yu Luo
Research Scientist at Huawei
Xianyuan Zhan
Xianyuan Zhan
Faculty Member