Introduction

In the Subsymbolic Methods in AI course at NTNU, one of the assignments is to evolve a fully connected feed-forward artificial neural network (ANN) which represents an agent strategy for a Flatland environment. This Flatland environment is a \(10 \times 10\) toroidal grid with cells containing either food, poison, or nothing. By default a third of the cells are filled with food and a third of the remaining cells are filled with poison. For fun; a fairly pixel-perfect version of the original Pac-Man was built through recording screenshots in a MAME-emulator (enemies represent poison -- delicious cherries, melons, apples and oranges represent food). The extra dots in the visualization are purely cosmetic.

An agent network is fed six binary inputs with indications of whether there is food or poison for any of the cells in its left, forward, and right direction. It can respond with one of the actions move left, which turns it \(90^\circ\) to the left and moves it forward one step, move forward, which moves the agent one step forward, or move right, which turns it \(90^\circ\) to the right and moves it forward one step. An agent can also choose to do nothing, but as it has no internal state, choosing to do nothing for a set of inputs will cause it to keep doing nothing until time runs out. By default there are 60 time steps in the game, with each action taking 1 time step.

Functionally, there are three possible outcomes for each of the three cells which the agent receives information about; they can either be empty, contain food, or contain poison. Discounting non-action, there are three possible actions an agent can respond with. Even in this simple environment where there are \(3^3=27\) possible input cases, this means that the number of functionally distinct agent strategies is \(3^{3^3} \approx 7.6 \cdot 10^{12}\), since one out of three actions must be chosen for each of the input cases.

Note the special case where the agent is surrounded by poison on the left, in front, and on the right. In this case, the agent must either forfeit any future possible reward by remaining immobile, or eat poison to progress. Given that the agent does not known the remaining time and that eating poison is non-fatal, the rational choice is to eat poison.

Fitness

When formulating a fitness function, it is important that eating poison is not punished too harshly, as exploration will be deterred, giving less room for constructive evolution. If it is not punished enough, evolution will only value food eaten, no matter how much poison is also eaten. Given a \(10 \times 10\) grid, \(\frac{1}{3}\) food coverage, 60 time steps, and awarding one point for each food eaten; the normalized upper fitness boundary is 0.55. In practise, a normalized fitness score of around 0.45 or greater results in good performance.

Agent fitness is either evaluated statically or dynamically. When evaluated statically, a given number of random environments are generated, and the same environments are used to evaluate all individuals in all generations. This gives the evolution process little opportunity to generalize well as individuals may find limited strategies which work well in this small number of environments, but which may lack adaptation to important edge cases. When evaluated dynamically, each individual's fitness is evaluated using a unique set of randomly generated environments. In addition, the fitness of all individuals is retested using new random environments for every generation. This provides great opportunity to find general behavior, as individuals continually needs to perform well in new environments to maintain their fitness within the population.

Implementation

The implementation uses a bit-vector genotype, single uniform random crossover, and full generational replacement with elitism. Switch bits can be used which adds one bit per weight which can enable or disable the weight, which is equivalent to a weight value of zero. This allows for neutral evolution of the weight value when disabled, and easier changes to topology. Note that this implementation is not efficient as it is appropriately abstracted for educational purposes. It is also my first time using Javascript.

To start testing, click Evolve, then when the evolution is complete the buttons directly below the Flatland widget can be used to run the agent in the available environments.

Evolving Neural Networks for a Flatland Agent

Introduction

Fitness

Implementation

References