AIR-DREAM Lab
AIR-DREAM Lab
Home
News
Researches
Publications
People
Light
Dark
Automatic
Qing-Shan Jia
Latest
Query-Policy Misalignment in Preference-Based Reinforcement Learning
Mind the Gap: Offline Policy Optimization for Imperfect Rewards
Cite
×