The PinBall Domain

The PinBall domain is a 4-dimensional continuous test domain for Reinforcement Learning (RL) algorithms. A small blue ball is placed in an arena where it must be manouvered into a red hole. The ball is dynamic (drag coefficient 0.995), so its state is described by four variables: x, y, x velocity and y velocity. Collisions with obstacles are fully elastic and cause the ball to bounce, so rather than merely avoiding obstacles the agent may choose to use them to efficiently reach the hole.

There are five primitive actions: adding or subtracting a small force to x velocity or y velocity (which incurs a reward of -5 per action), or leaving them unchanged (which incurs a reward of -1 per timestep); reaching the goal obtains a reward of 10,000.

The PinBall domain is an interesting test domain for RL algorithms because its dynamic aspects, sharp discontinuities, and extended dynamic control characteristics make it difficult for control and for function approximation -- much more difficult than a simple navigation task, or other common benchmarks tasks (e.g., Mountain Car and Acrobot).

The domain is written in Java (full source code and JavaDoc is available under the GPL) and reads in a configuration file that specifies the size and locations of the ball and target, and descriptions of the obstacles. Users can therefore create their own customized versions of the domain to suit their own needs, for which a GUI configuration editor is provided. Below are two example configurations which are available from the downloads section.


A simple configuration	A slightly harder configuration

Features:

Full source code and documentation (under the GPL).
RL-Glue interface.
GUI program for testing domains.
GUI program for saving domain snapshots to disk.
GUI program for viewing saved trajectories, and saving frame images to disk for conversion to an animation.
GUI domain configuration editor.

Note: if you use RL-Glue and my Fourier Basis Sarsa(λ) agent, note that that agent will issue an error message and shut down if function approximation diverges. It is not always obvious that this has happened if you are running all of the RL-Glue services in one terminal, so I suggest that you run each service (RL-Glue, agent, environment and experiment) in separate terminals. I find that Sarsa(λ) will often diverge in Pinball when λ is non-zero. For reference, using λ=0, α=0.0075, γ=1.0, and a 5th order Fourier Basis seems to be a good place to start for the simple configuration.

Documentation:

You can browse the JavaDoc documentation here. It can also be downloaded as part of the source code archive.

Downloads:

Java source code and JavaDoc documentation (165KB).
Example domain configuration files: simple configuration (multiple start states), hard configuration (multiple start states).
RL-Glue interface code (using the Java codec).

Python Version:

Pierre-Luc Bacon has ported Pinball to Python. This version also has a GPL license and an interface to RL-Glue, and has been incorporated into Will Dabney's PyRL code. It requires the RLGlue codec for Python and the pygame package.

Papers:

The Pinball Domain was introduced in:

G.D. Konidaris and A.G. Barto. Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining. Advances in Neural Information Processing Systems 22, pages 1015-1023, December 2009. [PDF, BibTeX]

Please cite this paper if you use Pinball in your own papers, since it includes a complete description and a link to this website. I would greatly appreciate an email (gdk at csail dot mit dot edu) with the reference information (and, if you like, a link to the PDF) so that I can list it here.

The Pinball Domain has also appeared in:

G.D. Konidaris, S.R. Kuindersma, A.G. Barto and R.A. Grupen. Constructing Skill Trees for Reinforcement Learning Agents from Demonstration Trajectories. Advances in Neural Information Processing Systems 23, pages 1162-1170, December 2010.
G.D. Konidaris. Autonomous Robot Skill Acquisition. PhD Thesis, Department of Computer Science, University of Massachusetts Amherst, May 2011.

G.D. Konidaris, S.R. Kuindersma, R.A. Grupen and A.G. Barto. CST: Constructing Skill Trees by Demonstration. In Proceedings of the ICML Workshop on New Developments in Imitation Learning, July 2011.

G.D. Konidaris, S.R. Kuindersma, R.A. Grupen and A.G. Barto. Robot Learning from Demonstration by Constructing Skill Trees. The International Journal of Robotics Research 31(3), pages 360-375, March 2012. (Freely accessible draft.)

S. Kelly, P. Lichodzijewski and M.I. Heywood. On run time libraries and hierarchical symbiosis. In Proceedings of the 2012 IEEE Congress on Evolutionary Computation, pages 10-8, June 2012.

S. Kelly. On Developmental Variation in Hierarchical Symbiotic Policy Search. Master's Thesis, Dalhousie University, August 2012.

A. Tamar, D. Di Castro and S. Mannor. Temporal Difference Methods for the Variance of the Reward To Go, In Proceedings of the 30th International Conference on Machine Learning, June 2013.

[ Overview | Features | Documentation | Downloads | Papers ]

George Konidaris.