Stout, Konidaris, Barto, "Intrinsically Motivated Reinforcement…: aifwt

aifwt

(no subject)

Feb 15, 2009 21:29

Stout, Konidaris, Barto, "Intrinsically Motivated Reinforcement Learning: A Promising Framework for Developmental Robot Learning." and,
Konidaris, Hayes, "An Architecture for Behavior-Based Reinforcement Learning."

I mentioned that one of the elements missing in Brooks' experiments (at least before Cog and Kismet, which have other issues) was learning; the agents would display an ability to act reminiscent of animal-level intelligence, but no part of their control structure adapted to their experience and environment. These papers all try to address the issue of reinforcement learning (the most apparently biologically-plausible method) in an embodied AI ("reinforcement learning" refers to methods where a system is "rewarded" when certain goals are met, as opposed to methods which explicitly specify the correct behavior to the system (supervised learning) or provide no input whatsoever (unsupervised learning.)) By putting a learning-capable "layer" on top of more primitive layers such as were implemented in Brooks' experiments, the learning system can make use of the sensory abstraction and motor capabilities of these lower layers; for example, simpler non-learning layers may be able to recognize the sensory patterns corresponding to a giant monster, and similarly possess within themselves methods for running away from a specified direction, making the task of learning "run away from monsters" simply a matter of recognizing a correspondence in fundamental inputs. It is also interesting to note that the researchers differentiate between "intrinsic" and "extrinsic" reward signals; a robot may be rewarded by certain external events which it is supposed to try to achieve (finding a "food puck," for example), but can also contain within itself conditions which trigger a reward signal (for example, if it encounters conditions from which it expects to find a "food puck" soon.)
The experiments described in the paper, however, have a considerable shortcoming of their own. The first doesn't use a real robot at all, and instead uses a simulated robot acting within a grid-world in which the location and identities of all the things it can see is specified to it, and in which the robot's fundamental motor actions are as high level as "move to [thing]." The experimenters argue that such an abstraction could be achieved through lower-layer perceptual and actor layers, but many of the things they assume to work perfectly (object identification, navigation, external variable (lighting, sound) recognition) would likely be much less reliable in a real robot, which could have unknown effects on their model. The second experiment does use a robot, to their credit, but do so in a drastically simplified maze environment of orthogonal walls and standard distances in which the robot's ability to locate itself is drastically easier than it would be in a "real-world" environment; given that their control model relies utterly on the robot's knowledge of where it is and recognition of "landmarks," this seems like a very important limitation. (I suppose that the rhetoric of situated AI researchers have made me sensitive to reliance on experimental cleanliness.) Essentially, both the learning method in both experiments relies on the robot recognizing given "states" (such as, being in a given place at a given direction) and learning how various states can be navigated between (from here one can go there, and etc) as well as which certain states entail rewards. But if state-recognition breaks down, or even if the transition between states becomes unpredictable, the method would likely turn out to be fragile. Regardless, both experiments work, so there is certainly something to the concept, if not the method itself.

As the recent direction of my research may indicate, I'm considering building a robot for my senior project. But I would consider it a major point of the project for it to be capable of acting in at least somewhat complicated environments (such as the hallways of Dickenson, or perhaps even Commons Lawn. VAPA would probably be asking too much.)