Hack 52. Use Random Selection as Artificial Intelligence | Statistics Hacks: Tips & Tools for Measuring the World and Beating the Odds

Statisticians have been able to build intelligent, learning computers long before the advent of the microprocessor. You can use coconut shells and the laws of probability to build a machine that will learn to never lose at Tic-Tac-Toe.

A common joke about the 1960s TV show Gilligan's Island is that the Professor was always building computers or washing machines or rocket ships out of coconuts and vines. I don't know about washing machines and rocket shipsthose do sound pretty far-fetchedbut the castaways certainly could have built computers out of coconuts. You can, too. If you are ever stranded on a desert island and want a companion, build one.

You won't need a volleyball like Tom Hanks's buddy in Castaway, and it won't have much personality, but your computer will be able to play games with you, and it will even learn and get smarter. The driving forces behind the learning algorithm are chance and the power of random selection.

Trial-and-Error Learning

According to behavioral psychologists, all animals (including humans, otters, and single-celled creatures) learn essentially the same way. Experience presents situations in which choices lead to outcomes. As the animal receives feedback about the outcome, it adapts. If the outcome was positive, the creature is more likely to make the same choice in the future. If the outcome was negative, the creature is less likely to make that choice again.

Notice that there is no guarantee that a "good" behavior is always repeated or that a bad behavior becomes extinct; it is only a matter of probability. The right decision is more likely to be made and the wrong decision is less likely to be made. To make a machine that mimics the way that animals learn, we must build on this probability angle.

Game playing reflects much of the trial-and-error learning process because outcomes are easily interpreted as positive (a win) or negative (a loss). In games, the feedback is often immediate, and studies show that the closeness in time between the choice and the feedback is a key factor in whether learning has occurred. And learning, remember, is defined here as an increase in the likelihood of correct choices or a decrease in the likelihood of incorrect choices.

Building a Tic-Tac-Toe Machine

Stuck on this island with no friends, you might want to fight boredom by playing games with a smart opponent. Here are instructions for building a contraption that does not use any electricity or silicon, but will play a game and provide decent competition.

This machine learns: the more times you play against it, the better it will be. The game this machine plays is Tic-Tac-Toe, but theoretically, you could build a device for any two-person strategy game using the same principles. Tic-Tac-Toe is simple enough that it demonstrates well the methods of design, construction, and operation.

If the Professor on Gilligan's Island ever did build a computer out of coconuts, he was likely influenced by the pioneering work of biologist Donald Michie and his matchboxes. Michie published an article in the very first issue of the Computer Journal in 1963, a few years before Gilligan and his pals were stranded on their island. Michie describes how he designed and actually built a nonelectric computer with the following complete list of parts:

287 matchboxes: Each matchbox has a little drawer that can be opened. Michie labeled each matchbox with one of 287 different possible Tic-Tac-Toe configurations throughout a game. There are actually many more possible positions, but because the standard Tic-Tac-Toe layout of three rows and three columns is symmetrical, four different unique positions can be summarized with just one position. At any point in the game, the current layout of the "board" directs the human operator to the corresponding matchbox.
A large supply of beads of nine different colors: The nine colors represent each of the nine different spaces on the Tic-Tac-Toe board. Each matchbox begins with an equal supply of beads for each of the possible next moves. Only beads representing legal moves are put in each box. Different positions and matchboxes, of course, correspond to only a small set of legal next moves, so each box has a slightly different mixture of beads.

The Professor would have used coconut shells instead of matchboxes and sand pebbles or seeds (or perhaps Mr. Howell's coin collection, which he never goes anywhere without) instead of beads. Gather these supplies from your tropical surroundings, organize the pebble-filled coconuts in an efficient grouping, and you have your desert island game-playing computer. Yes, you'll need to find 287 coconuts, but do you have anything better to do?

Operating the Computer

To play a game of Tic-Tac-Toe against your pebble-powered PC, follow these instructions:

The computer goes first. Find the coconut that is labeled with the current position. (For the first move, it is a blank layout.) Close your eyes and randomly draw out a pebble.
Mark an X on your board (drawn in the sand, I'm assuming) in the space indicated by the color of the pebble. Set the pebble aside in a safe place.
Make your move, marking an O in your chosen space.
There is a new position on the board now. Go to the corresponding coconut and randomly draw out a pebble from it. Return to step 2.
Repeat steps 2 through 4 until there is a winner or a draw.

What happens next is the most important part because it results in the computer learning to play better. Behavioral psychologists call this final stage reinforcement.

If the computer loses, "punish" it by taking the pebbles that you drew randomly from the coconuts and throwing them into the ocean.

If the machine wins or draws the game, return the pebbles to the coconuts from which they came and "reward" it by adding an additional pebble of the same color.

Why It Works

The process of rewarding or punishing the computer essentially duplicates the process by which animals learn. Positive results lead to an increase in the likelihood of the rewarded behavior, while negative results lead to a decrease in the likelihood of the punished behavior. By adding or removing pebbles, you are literally increasing or decreasing the true probability of the machine making certain moves in the game.

Consider this stage of a game, where the computer, playing X, must make its move:

X	O	X
	O

You probably recognize that the best movereally, the only move to consideris for the computer to block your impending win by putting its X in the bottom center space. The computer, though, recognizes several possibilities. It considers any legal move. Two moves that it would consider (which means, literally, that it would allow to be drawn randomly out of the coconut shell) are the best move and a bad move:

X	O	X	X	O	X
	O			O
	X		X

When the computer first starts playing the game, both these moves (or behaviors) are equally likely. Other moves are also possible in this situation, and they are also equally likely. The move on the left probably won't result in a loss, at least not immediately, so as pebbles representing that move are added to the coconut, the relative probability of that move increases compared to other moves. The move on the right probably ends in a loss (except against Gilligan, maybe), so the chance of that move being selected next time mathematically decreases, as there are fewer pebbles of that color to be randomly selected.

The probability of any given move being selected can be represented by this simple expression:

The machine begins with an equal number of pebbles or, in other words, an equal likelihood of any of a variety of moves being chosen. Of course, some moves look foolish to our experienced game-playing eye and would never be made in a real game except by the most naive of players. The point that behavioral psychologists argue, though, is that all creatures are novices until they have built up a large pool of experiences that have shaped the basic probabilities that they will engage in a behavior.

Hacking the Hack

There are several ways to modify your machine to make it smarter. For example, you can choose to reward moves that lead to wins more than moves that lead to ties. This should produce a good player more quickly. Michie suggested three beads for a win and one bead for a tie.

If you want to simulate the way animal learning occurs, you can adjust the system so that moves near the end of the game are more crucial than those made at the beginning. This is meant to mirror the observation that reinforcement that comes closest in time to when the behavior occurs is most effective. In the case of Tic-Tac-Toe, mistakes that lead to immediate losses should be dealt with and punished more effectively. By having fewer total beads in use for moves late in the game, the learning will occur more quickly.

An obvious upgrade is to make your computer smarter by not even allowing bad moves. Don't even place pebbles representing moves that will result in immediate defeat into your containers. This will solve the problem of your computer's initial low intelligence, but it doesn't really reflect the way animals learn. So, while this might make for a stronger competitor, the Professor would be disappointed in your lack of scientific rigor.