In reinforcement learning (RL), an agent aims to learn the optimal policy by interacting with the environment and collecting the reward for each action taken. With the aid of strong function approximators such as the neural networks, RL achieves tremendous empirical successes in various scenarios, including game playing \citep{silver2016mastering, silver2017mastering},...