Online Prototype Alignment for Few-shot Policy Transfer
Abstract
Domain adaptation in reinforcement learning (RL) mainly deals with the changes of observation when transferring the policy to a new environment. Many traditional approaches of domain adaptation in RL manage to learn a mapping function between the source and <PRE_TAG>target domain</POST_TAG> in explicit or implicit ways. However, they typically require access to abundant data from the <PRE_TAG>target domain</POST_TAG>. Besides, they often rely on visual clues to learn the mapping function and may fail when the source domain looks quite different from the <PRE_TAG>target domain</POST_TAG>. To address these problems, we propose a novel framework Online Prototype Alignment (OPA) to learn the mapping function based on the functional similarity of elements and is able to achieve the few-shot policy transfer within only several episodes. The key insight of OPA is to introduce an exploration mechanism that can interact with the unseen elements of the <PRE_TAG>target domain</POST_TAG> in an efficient and purposeful manner, and then connect them with the seen elements in the source domain according to their functionalities (instead of visual clues). Experimental results show that when the <PRE_TAG>target domain</POST_TAG> looks visually different from the source domain, OPA can achieve better transfer performance even with much fewer samples from the <PRE_TAG>target domain</POST_TAG>, outperforming prior methods.
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper