Fast policy learning for linear quadratic control with entropy regularization
Xin Guo, Unviersity of California, Berkeley
Reinforcement Learning (RL) is a powerful framework for solving sequential decision-making problems, Fast convergence and sample efficiency are critical for many applied RL problems, such as financial trading and healthcare treatment recommendations, where acquiring new samples is costly or the chance of exploring new actions in the system is limited. In such cases, the cost of making incorrect decisions can be prohibitively high. In this talk, we present two new policy learning methods for a class of discounted linear-quadratic control (LQC) problems over an infinite time horizon with entropy regularization, and show their linear and super-linear convergence in finding optimal policies. We also discuss how to apply transfer learning techniques in such an RL setting. Based on joint work with Xinyu Li of UC Berkeley and Renyuan Xu of USC.