In this dissertation, we aim to develop algorithms that achieve optimality with provable complexity guarantees under various settings in reinforcement learning (RL). Specifically, in Markov decision processes (MDPs), we study single-agent and multi-agent online RL, respectively, and offline RL under the presence of unobserved confounders. Single-agent online RL. We design...
In this thesis, we aim to develop efficient algorithms with theoretical guarantees for noisy nonlinear optimization problems, with and without constraints, under various different assumptions. Apart from Chapter 1 which provides relevant backgrounds, the remaining of thesis is divided into four chapters. In Chapter 2, we establish the theoretical convergence...