Rangachari Anand, Kishan Mehrotra, et al.
IEEE Transactions on Neural Networks
This paper describes a general approach for automatically programming a behavior-based robot. New behaviors are learned by trial and error using a performance feedback function as reinforcement. Two algorithms for behavior learning are described that combine Q learning, a well-known scheme for propagating reinforcement values temporally across actions, with statistical clustering and Hamming distance, two ways of propagating reinforcement values spatially across states. A real behavior-based robot called OBELIX is described that learns several component behaviors in an example task involving pushing boxes. A simulator for the box pushing task is also used to gather data on the learning techniques. A detailed experimental study using the real robot and the simulator suggests two conclusions. 1. (1) The learning techniques are able to learn the individual behaviors, sometimes outperforming a handcoded program. 2. (2) Using a behavior-based architecture speeds up reinforcement learning by converting the problem of learning a complex task into that of learning a simpler set of special-purpose reactive subtasks. © 1992.
Rangachari Anand, Kishan Mehrotra, et al.
IEEE Transactions on Neural Networks
Aditya Malik, Nalini Ratha, et al.
CAI 2024
Annina Riedhauser, Viacheslav Snigirev, et al.
CLEO 2023
Erik Altman, Jovan Blanusa, et al.
NeurIPS 2023