AIR-DREAM Lab
AIR-DREAM Lab
Home
News
Researches
Publications
People
Light
Dark
Automatic
Algorithms
Are Expressive Models Truly Necessary for Offline RL?
Among various branches of offline reinforcement learning (RL) methods, goal-conditioned supervised learning (GCSL) has gained …
Guan Wang
,
Haoyi Niu
,
Jianxiong Li
,
Li Jiang
,
Jianming HU
,
Xianyuan Zhan
PDF
Cite
Code
Project
Project
Diffusion-DICE: In-Sample Diffusion Guidance for Offline Reinforcement Learning
One important property of DIstribution Correction Estimation (DICE) methods is that the solution is the optimal stationary distribution …
Liyuan Mao
,
Haoran Xu
,
Weinan Zhang
,
Xianyuan Zhan
,
Amy Zhang
PDF
Cite
Code
Project
Website
Instruction-Guided Visual Masking
Instruction following is crucial in contemporary LLM. However, when extended to multimodal setting, it often suffers from misalignment …
Jinliang Zheng
,
Jianxiong Li
,
Sijie Cheng
,
Yinan Zheng
,
Jiaming Li
,
Jihao Liu
,
Yu Liu
,
Jingjing Liu
,
Xianyuan Zhan
PDF
Cite
Code
Project
Project
Website
Bidirectional-Reachable Hierarchical Reinforcement Learning with Mutually Responsive Policies
Hierarchical reinforcement learning (HRL) addresses complex long-horizon tasks by skillfully decomposing them into subgoals. Therefore, …
Yu Luo
,
Fuchun Sun
,
Tianying Ji
,
Xianyuan Zhan
PDF
Cite
Project
OMPO: A Unified Framework for RL under Policy and Dynamics Shifts
Training reinforcement learning policies using environment interaction data collected from varying policies or dynamics presents a …
Yu Luo
,
Tianying Ji
,
Fuchun Sun
,
Jianwei Zhang
,
Huazhe Xu
,
Xianyuan Zhan
PDF
Cite
Project
Offline-Boosted Actor-Critic: Adaptively Blending Optimal Historical Behaviors in Deep Off-Policy RL
Off-policy reinforcement learning (RL) has achieved notable success in tackling many complex real-world tasks, by leveraging previously …
Yu Luo
,
Tianying Ji
,
Fuchun Sun
,
Jianwei Zhang
,
Huazhe Xu
,
Xianyuan Zhan
PDF
Cite
Project
DecisionNCE: Embodied Multimodal Representations via Implicit Preference Learning
Multimodal pretraining has emerged as an effective strategy for the trinity of goals of representation learning in autonomous robots: …
Jianxiong Li
,
Jinliang Zheng
,
Yinan Zheng
,
Liyuan Mao
,
Xiao Hu
,
Sijie Cheng
,
Haoyi Niu
,
Jihao Liu
,
Yu Liu
,
Jingjing Liu
,
Others
PDF
Cite
Code
Project
Project
Website
Seizing Serendipity: Exploiting the Value of Past Success in Off-Policy Actor-Critic
Learning high-quality Q-value functions plays a key role in the success of many modern off-policy deep reinforcement learning (RL) …
Tianying Ji
,
Yu Luo
,
Fuchun Sun
,
Xianyuan Zhan
,
Jianwei Zhang
,
Huazhe Xu
PDF
Cite
Code
Project
A Comprehensive Survey of Cross-Domain Policy Transfer for Embodied Agents
The burgeoning fields of robot learning and embodied AI have triggered an increasing demand for large quantities of data. However, …
Haoyi Niu
,
Jianming HU
,
Guyue Zhou
,
Xianyuan Zhan
PDF
Cite
Code
Project
Query-Policy Misalignment in Preference-Based Reinforcement Learning
Preference-based reinforcement learning (PbRL) provides a natural way to align RL agents’ behavior with human desired outcomes, but is …
Xiao Hu
,
Jianxiong Li
,
Xianyuan Zhan
,
Qing-Shan Jia
,
Ya-Qin Zhang
PDF
Cite
Project
»
Cite
×