AIR-DREAM Lab
AIR-DREAM Lab
Home
News
Researches
Publications
People
Light
Dark
Automatic
Algorithms
Bidirectional-Reachable Hierarchical Reinforcement Learning with Mutually Responsive Policies
Hierarchical reinforcement learning (HRL) addresses complex long-horizon tasks by skillfully decomposing them into subgoals. Therefore, …
Yu Luo
,
Fuchun Sun
,
Tianying Ji
,
Xianyuan Zhan
PDF
Cite
Project
Instruction-Guided Visual Masking
Instruction following is crucial in contemporary LLM. However, when extended to multimodal setting, it often suffers from misalignment …
Jinliang Zheng
,
Jianxiong Li
,
Sijie Cheng
,
Yinan Zheng
,
Jiaming Li
,
Jihao Liu
,
Yu Liu
,
Jingjing Liu
,
Xianyuan Zhan
PDF
Cite
Code
Project
Project
Website
OMPO: A Unified Framework for RL under Policy and Dynamics Shifts
Training reinforcement learning policies using environment interaction data collected from varying policies or dynamics presents a …
Yu Luo
,
Tianying Ji
,
Fuchun Sun
,
Jianwei Zhang
,
Huazhe Xu
,
Xianyuan Zhan
PDF
Cite
Project
Offline-Boosted Actor-Critic: Adaptively Blending Optimal Historical Behaviors in Deep Off-Policy RL
Off-policy reinforcement learning (RL) has achieved notable success in tackling many complex real-world tasks, by leveraging previously …
Yu Luo
,
Tianying Ji
,
Fuchun Sun
,
Jianwei Zhang
,
Huazhe Xu
,
Xianyuan Zhan
PDF
Cite
Project
DecisionNCE: Embodied Multimodal Representations via Implicit Preference Learning
Multimodal pretraining has emerged as an effective strategy for the trinity of goals of representation learning in autonomous robots: …
Jianxiong Li
,
Jinliang Zheng
,
Yinan Zheng
,
Liyuan Mao
,
Xiao Hu
,
Sijie Cheng
,
Haoyi Niu
,
Jihao Liu
,
Yu Liu
,
Jingjing Liu
,
Others
PDF
Cite
Code
Project
Project
Website
Seizing Serendipity: Exploiting the Value of Past Success in Off-Policy Actor-Critic
Learning high-quality Q-value functions plays a key role in the success of many modern off-policy deep reinforcement learning (RL) …
Tianying Ji
,
Yu Luo
,
Fuchun Sun
,
Xianyuan Zhan
,
Jianwei Zhang
,
Huazhe Xu
PDF
Cite
Code
Project
A Comprehensive Survey of Cross-Domain Policy Transfer for Embodied Agents
The burgeoning fields of robot learning and embodied AI have triggered an increasing demand for large quantities of data. However, …
Haoyi Niu
,
Jianming Hu
,
Guyue Zhou
,
Xianyuan Zhan
PDF
Cite
Code
Project
Query-Policy Misalignment in Preference-Based Reinforcement Learning
Preference-based reinforcement learning (PbRL) provides a natural way to align RL agents’ behavior with human desired outcomes, but is …
Xiao Hu
,
Jianxiong Li
,
Xianyuan Zhan
,
Qing-Shan Jia
,
Ya-Qin Zhang
PDF
Cite
Project
Revealing the Mystery of Distribution Correction Estimation via Orthogonal-gradient Update
In this study, we investigate the DIstribution Correction Estimation (DICE) methods, an important line of work in offline reinforcement …
Liyuan Mao
,
Haoran Xu
,
Weinan Zhang
,
Xianyuan Zhan
PDF
Cite
Project
Safe Offline Reinforcement Learning with Feasibility-Guided Diffusion Model
Safe offline reinforcement learning is a promising way to bypass risky online interactions towards safe policy learning. Most existing …
Yinan Zheng
,
Jianxiong Li
,
Dongjie Yu
,
Yujie Yang
,
Shengbo Eben Li
,
Xianyuan Zhan
,
Jingjing Liu
PDF
Cite
Code
Project
Website
»
Cite
×