In this dissertation, we aim to develop algorithms that achieve optimality with provable complexity guarantees under various settings in reinforcement learning (RL). Specifically, in Markov decision processes (MDPs), we study single-agent and multi-agent online RL, respectively, and offline RL under the presence of unobserved confounders. Single-agent online RL. We design...