Paper: | PS-1A.12 |
Session: | Poster Session 1A |
Location: | Symphony/Overture |
Session Time: | Thursday, September 6, 16:30 - 18:30 |
Presentation Time: | Thursday, September 6, 16:30 - 18:30 |
Presentation: |
Poster
|
Publication: |
2018 Conference on Cognitive Computational Neuroscience, 5-8 September 2018, Philadelphia, Pennsylvania |
Paper Title: |
Humans can outperform Q-learning in terms of learning efficiency |
Manuscript: |
Click here to view manuscript |
DOI: |
https://doi.org/10.32470/CCN.2018.1045-0 |
Authors: |
Holger Mohr, Katharina Zwosta, Dimitrije Markovic, Sebastian Bitzer, Uta Wolfensteller, Hannes Ruge, Technische Universität Dresden, Germany |
Abstract: |
Reinforcement learning algorithms like Q-learning maximize cumulative reward by strengthening or weakening stimulus-response associations based on feedback signals. It has been recently shown that Q-learning can be employed to achieve super-human performance in complex tasks (Mnih et al., 2015). However, this requires comprehensive training involving thousands of training episodes, indicating that humans might outperform Q-learning in terms of learning efficiency. Here we explicitly show this by analyzing human learning strategies on a simple stimulus-response learning task involving only four stimuli and four response options (Mohr et al., 2018). By comparing response data from N = 85 subjects with response data generated by the Q-learning algorithm, we show that humans explore the space of stimulus-response pairings more efficiently than Q-learning on the presented learning task. Moreover, using additional computational models, we also show that the subjects accomplished this by integrating implicit task structure into their learning strategies, with some subjects implementing specific response heuristics in order to maximize learning efficiency while keeping memory and computational resources bounded. We conclude that by engaging high-level cognitive processes, humans can minimize the number of errors during learning and can thus outperform Q-learning in terms of learning efficiency. |