[Back to my homepage]

Reinforcement Learning with Physical Robots

During my stay in Denmark in 2002 and 2003 I was involved in a project with reinforcement learning (besides doing my graduation project on robot localization and Kalman filters). I did this project with Stine Søndergaard, Katrine Hommelhoff Jensen, and Anders Christian Kølle. The goal of the project was to explore and apply reinforcement learning techniques to practical robots. This page gives a non-technical impression of what the project was about.

Project Goal

The goal of the project was to look at reinforcement learning methods and apply them in practice to physical robots. We choose the maze problem as problem to be solved by reinforcement learning. In the maze problem a robot is situated in a to the robot unknown environment. The environment may be filled with obstacles. The goal of the robot is to find its way through the environment to some goal location; however, it does not in advance know this location. By trial and error it has to find out how to get to the goal location, building up experience about the environment. Over time the robot has to get more and more familiar with driving around in the environment, leading to it finding the goal location quicker and quicker.


For the project we used so-called Eyebots. At the time of the project three Eyebots lived in the robolab. Each of the robots is equipped with a camera, three (infra red) distance sensors, two wheels, a kicker to push something away, and a radio communication module. For the purpose of our project we created several hats that one of the robots (the middle one in the picture) could wear in order to determine its location in the environment more easily. See also a little movie, and another one.

Besides the robots the robolab contains a camera overlooking a specific region in the robolab of about 1m x 1.5m. This camera is connected to a computer, which also has a radio communication module to communicate with the robots.

We used white flamingos to create a closed environment with obstacles in it in the region under the top camera. The camera monitored the environment constantly, while the computer analyzed the resulting images.

In order to perform reinforcement learning a robot driving through the environment has to know its location. Due to inaccuracies in the driving of robots it is not a trivial task to obtain an accurate location estimate. One way to do deal with this is by using Kalman filters as I describe in my Master's thesis. In this project however we assume that a robot has access to its location through some sort of GPS system. We used the hat with lights that we specifically designed for tracking the robot through the images made by the top camera, analyzed by the computer. Once the computer had detected a robot it used the radio communication to communicate this location with the robot.


For easy monitoring of the activities of the robot while driving through the maze and for dealing with the GPS system we developed software running on the PC. The figure shows the area being monitored by the top camera. The software shows (in real-time) the image given by the camera and plots the trajectory driven by the robot on top of that. Furthermore, the software discretizes the space of locations into squares, to reduce computational complexity of the learning. The location of a robot now simply is the square in which it is located. This location is used as GPS signal, sent over the radio communication link.

Besides monitoring the activities of the robot and providing it with location information, the software can also download the experience built up by the robot. This experience is stored in the form of a Q-table. The software can detect the location squares in which there is an obstacle in the environment.

With this information and the experience from the robot, the software may perform several learning iterations. This speeds up learning significantly compared to learning in the real environment alone. After a number of learning iterations the Q-table may be uploaded back to the robot, after which it continues driving with the newly gained experience.

We also developed software in C to run on the robot. Through the LCD screen we could obtain some information about the activities of the robot, in particular the choice of actions, its location, and gained experiences.


To experiment with the use of reinforcement learning in the maze problem we constructed several test environments. One of them is shown in the right image. Whenever the robot tries to drive into a wall or into an object it receives a highly negative reward. Whenever the robot reaches the goal location it obtains a highly positive reward.

Consider the trial in the image on the right. The robot started at the right-bottom corner and the goal was to find the cilinder placed in the top-left corner. The robot started with no information at all about the environment. This clearly resulted in the robot driving around without knowing where to go, without having a real direction insentive.

After the first run we placed the robot at several other locations to let it experience more of the environment. This finally resulted in the robot being able to directly drive from the starting position of the first trial to the goal location, as the figure on the left shows.

This result shows that we were able to make a robot find its way to a goal location in an unknown environment using reinforcement learning. It should be noted that applying a technique like reinforcement learning in a practical setting brings a huge amount of practical problems. These problems have to be faced before the actual theory under consideration can be applied. In this case these problems included amongst other obstacle avoidance, object recognition, object tracking, radio and serial communication, and uncertainty in driving and sensing. However, once these problems have been faced the robot is indeed capable of finding its way:)

For further questions or comments, contact me. Nedstat Basic - Free web site statistics