Based on the deepmind paper on deep Q reinforcement learning. Learns an optimum action policy from pixel level data.
Python interface instructions are available here https://github.com/bbitmaster/ale_python_interface/wiki/Code-Tutorial
We tested the agent on a really simple game, dodge the brick. The aim of the game is for the green brick to dodge the onslaught of the falling red bricks The score policy is +1/-30 to weed out the random agents. +1 points for surviving in each frame and -30 for colliding with the red brick and dying.
This agent just sits around doing nothing. It doesn't take any action, left or right. Looks like someone needs a little motivation in life. An agent which oscillates a lot, staying at the same place rougly. It also runs into the red brick a lot and isn't really being affected by the red brick's presence. An agent which tries to move away from the red brick and find a safe spot. It gets a score of 2529/10000 = 25% accuracy. The agent still oscillates a lot but lesser than first case. It does run into the red brick occasionally though. An agent which has learnt to dodge the red brick effectively. It get a score of 8698/10000 = 86% accuracy. The agent has learnt that oscillating to and fro heavily is not a really helpful policy and hence is a lot calmer.It also never runs into the red brick anymore, dodging it like a master samurai.
Coming soon