科学研究
Actor-Critic Reinforcement Learning in Continuous Time and Space
邀请人:董玉超
发布时间:2022-09-03浏览次数:

题目:Actor-Critic Reinforcement Learning in Continuous Time and Space

报告人:贾颜玮 博士(哥伦比亚大学)

地点:zoom线上会议 会议ID:86710704222, 密码:413890

时间:2022年9月7日 21:30-22:30

摘要:We study the actor-critic (AC) algorithms for reinforcement learning (RL) in continuous time and space under the regularized exploratory formulation developed by Wang et al. (2020). The first part of the talk introduces a unified martingale perspective on policy evaluation (PE) and temporal-difference error, which is the fundamental step for AC algorithms and does not depend on time discretization. We show that PE is equivalent to maintaining the martingale condition of a process and there are different methods to use the martingale characterization for designing PE algorithms. The second part develops the representation of the policy gradient (PG), which is transformed into an auxiliary PE problem and can be computed using observed samples together with the value function. PG-based AC algorithms update policy along the gradient ascent direction and hence improves the policy. The last part studies the continuous-time counterpart to Q-function, which we refer to as `` (little) q-function’’. This function is related to the instantaneous advantage rate function as well as the Hamiltonian. We jointly characterize the q-function and value function by martingale conditions and hence PE algorithms can be applied. We show the policy improvement theorem regarding q-function. Through our martingale perspective, we can recover and re-interpret many well-known RL algorithms, and more importantly pin down the key element that bridges the continuous-time RL to existing theory on stochastic control and discrete-time RL. We demonstrate our RL algorithms in two toy examples (mean-variance portfolio selection and linear-quadratic control) with simulated data. This talk is based on several joint works with Prof. Xun Yu Zhou.

欢迎各位参加!