Paper: | PS-1A.35 |
Session: | Poster Session 1A |
Location: | Symphony/Overture |
Session Time: | Thursday, September 6, 16:30 - 18:30 |
Presentation Time: | Thursday, September 6, 16:30 - 18:30 |
Presentation: |
Poster
|
Publication: |
2018 Conference on Cognitive Computational Neuroscience, 5-8 September 2018, Philadelphia, Pennsylvania |
Paper Title: |
Shaping Model-Free Reinforcement-Learning with Model-Based Pseudorewards |
Manuscript: |
Click here to view manuscript |
DOI: |
https://doi.org/10.32470/CCN.2018.1191-0 |
Authors: |
Paul Krueger, Thomas Griffiths, Princeton University, United States |
Abstract: |
Model-free and model-based reinforcement-learning provide a successful framework for understanding human behavior and neural data. These two systems are usually thought to compete for control of behavior. However, it has also been proposed that they can be integrated cooperatively. The Dyna algorithm uses MB replay of past experience to train the MF system, and has inspired research examining whether human learners do something similar. Here we introduce an approach that links MF and MB learning in a new way: via the reward function. Given a model of the learning environment, dynamic programming is used to iteratively approximate state values that monotonically converge to state values under the optimal decision policy. Pseudorewards are calculated from these values and used to shape the reward function of a MF learner in a way that is guaranteed not to change the optimal policy. We show that this method offers computational advantages over Dyna. It also offers a new way to think about integrating MF and MB RL: our knowledge of the world doesn't just provide a source of simulated experience for training our instincts; it shapes the rewards that those instincts latch onto. We discuss psychological phenomena that this theory could apply to. |