Optimal PhiBE — A Model-Free PDE-Based Framework for Continuous-Time Reinforcement Learning
Dr. Yuhua Zhu
January 7th, 2026 | Add to calendar
Optimal PhiBE — A Model-Free PDE-Based Framework for Continuous-Time Reinforcement Learning
Abstract: This talk addresses continuous-time reinforcement learning (RL) in settings where the system dynamics are governed by a stochastic differential equation but remains unknown, with only discrete-time observations available. We introduce Optimal-PhiBE, an equation that integrates discrete-time information into a PDE, combining the strengths of both RL and PDE formulations. In linear-quadratic control, Optimal-PhiBE can even achieve accurate continuous-time optimal policy with only discrete-time information. In general dynamics, Optimal-PhiBE is less sensitive to reward oscillations, leading to smaller discretization errors. Furthermore, we extend Optimal-PhiBE to higher orders, providing increasingly accurate approximations. At the end of the talk, I will discuss how this technique can be leveraged to generate time-dependent samples and tackle goal-oriented inverse problems.
Ref 1.
Dr. Yuhua Zhu
Assistant Professor at the Department of Statistics and Data Science, UCLA
yuhua.zhu@stat.ucla.edu | Personal page
Host: Yunbei Pan & Jiahang Sha