AIR-DREAM Lab
AIR-DREAM Lab
Home
News
Researches
Publications
People
Light
Dark
Automatic
Paper-Conference
Query-Policy Misalignment in Preference-Based Reinforcement Learning
Preference-based reinforcement learning (PbRL) provides a natural way to align RL agents’ behavior with human desired outcomes, but is …
Xiao Hu
,
Jianxiong Li
,
Xianyuan Zhan
,
Qing-Shan Jia
,
Ya-Qin Zhang
PDF
Cite
Project
Revealing the Mystery of Distribution Correction Estimation via Orthogonal-gradient Update
In this study, we investigate the DIstribution Correction Estimation (DICE) methods, an important line of work in offline reinforcement …
Liyuan Mao
,
Haoran Xu
,
Weinan Zhang
,
Xianyuan Zhan
PDF
Cite
Project
Safe Offline Reinforcement Learning with Feasibility-Guided Diffusion Model
Safe offline reinforcement learning is a promising way to bypass risky online interactions towards safe policy learning. Most existing …
Yinan Zheng
,
Jianxiong Li
,
Dongjie Yu
,
Yujie Yang
,
Shengbo Eben Li
,
Xianyuan Zhan
,
Jingjing Liu
PDF
Cite
Code
Project
Website
Look Beneath the Surface: Exploiting Fundamental Symmetry for Sample-Efficient Offline RL
Offline reinforcement learning (RL) offers an appealing approach to real-world tasks by learning policies from pre-collected datasets …
Peng Cheng
,
Xianyuan Zhan
,
Zhihao Wu
,
Wenjia Zhang
,
Shoucheng Song
,
Han Wang
,
Youfang Lin
,
Li Jiang
PDF
Cite
Project
Project
Offline Multi-Agent Reinforcement Learning with Implicit Global-to-Local Value Regularization
Offline reinforcement learning (RL) has received considerable attention in recent years due to its attractive capability of learning …
Xiangsen Wang
,
Haoran Xu
,
Yinan Zheng
,
Xianyuan Zhan
PDF
Cite
Project
Offline RL with No OOD Actions: In-Sample Learning via Implicit Value Regularization
Based on the IVR framework, we further propose two practical algorithms, Sparse Q-learning (SQL) and Exponential Q-learning (EQL), which adopt the same value regularization used in existing works, but in a complete in-sample manner.
Haoran Xu
,
Li Jiang
,
Jianxiong Li
,
Zhuoran Yang
,
Zhaoran Wang
,
Victor Wai Kin Chan
,
Xianyuan Zhan
PDF
Cite
Code
Project
When Data Geometry Meets Deep Function: Generalizing Offline Reinforcement Learning
DOGE marries dataset geometry with deep function approximators in offline RL, and enables exploitation in generalizable OOD areas rather than strictly constraining policy within data distribution.
Jianxiong Li
,
Xianyuan Zhan
,
Haoran Xu
,
Xiangyu Zhu
,
Jingjing Liu
,
Ya-Qin Zhang
PDF
Cite
Code
Project
An Efficient Multi-Agent Optimization Approach for Coordinated Massive MIMO Beamforming
Beamforming plays an important role in 5G Massive Multiple-Input Multiple-Output (MMIMO) communications. Optimizing beamforming …
Li Jiang
,
Xiangsen Wang
,
Aidong Yang
,
Xidong Wang
,
Xiaojia Jin
,
Wei Wang
,
Xiaozhou Ye
,
Ye Ouyang
,
Xianyuan Zhan
PDF
Cite
Project
Project
Mind the Gap: Offline Policy Optimization for Imperfect Rewards
This paper proposes an offline policy optimization approach for imperfect rewards. Abstract: Reward function is essential in …
Jianxiong Li
,
Xiao Hu
,
Haoran Xu
,
Jingjing Liu
,
Xianyuan Zhan
,
Qing-Shan Jia
,
Ya-Qin Zhang
PDF
Cite
Project
Offline Multi-Agent Reinforcement Learning with Coupled Value Factorization
Offline reinforcement learning (RL) that learns policies from offline datasets without environment interaction has received …
Xiangsen Wang
,
Xianyuan Zhan
PDF
Cite
Project
«
»
Cite
×