Hack 80. Play Six Degrees of Kevin Bacon

In the Kevin Bacon game, you pick an actor and try to connect that actor to Kevin Bacon through movies that they both have been in.

The Kevin Bacon game is a fun way to explore network connection concepts. For example, if you were to choose fellow brat-packer Demi Moore the game would be easy. Both Kevin and Demi were in A Few Good Men (1992), so Demi's Kevin Bacon number is 1.

Had you chosen Steve McQueen, you would have to notice that Kevin Bacon worked with Dustin Hoffman in Sleepers (1996) and Dustin Hoffman costarred with McQueen in Papillon (1973). That gives Steve McQueen a Kevin Bacon number of 2.

Mathematicians play this game, but relative to Paul Erdös rather than Kevin Bacon. They calculate their Erdös numbers by following a chain of coauthored papers. The average number of coauthors is generally much smaller than the cast list of a movie, so the game is much harder for mathematicians.

To get started you need a database of movies, actors, and casting, such as the one at http://sqlzoo.net/h.htm#data. This example is based on a simplification of that data set.

The performsIn table contains (actor, movie) pairs. An entry in this table indicates that the actor was in the movie:

mysql> SELECT * FROM performsIn WHERE actor = 'Kevin Bacon';
+-------------+---------------------+
| actor | movie |
+-------------+---------------------+
| Kevin Bacon | Friday the 13th |
| Kevin Bacon | Murder in the First |
| Kevin Bacon | Footloose |
| Kevin Bacon | Diner |
... 

You can find the actors with a Kevin Bacon number of 1 by joining this table to itself:

mysql> SELECT x.actor,x.movie,y.actor
 -> FROM performsIn x JOIN performsIn y ON (x.movie=y.movie)
 -> WHERE x.actor='Kevin Bacon'
 -> ORDER BY y.actor;
+-------------+------------------------------+-----------------------+
| actor | movie | actor |
+-------------+------------------------------+-----------------------+
| Kevin Bacon | Friday the 13th | Adrienne King |
| Kevin Bacon | Flatliners | Aeryk Egan |
| Kevin Bacon | She's Having a Baby | Alec Baldwin |
...

You can make a VIEW based on this pattern that makes the next stage a little simpler. This VIEW connects all actors who have been in the same movie. The two actor columns are called actorIn and actorOut because you can think of this kind of view as a kind of function. You can also exclude the connection between each actor and him or herself:

CREATE VIEW actorToActor AS
 SELECT x.actor actorIn ,x.movie, y.actor actorOut
 FROM performsIn x JOIN performsIn y
 ON (x.movie=y.movie AND x.actor!=y.actor);

Running the Kevin Bacon 1 query using this view now gives the following results:

mysql> SELECT movie, actorOut
 -> FROM actorToActor
 -> WHERE actorIn='Kevin Bacon'
 -> ORDER BY actorOut;
+------------------------------+-----------------------+
| movie | actorOut |
+------------------------------+-----------------------+
| Friday the 13th | Adrienne King |
| Flatliners | Aeryk Egan |
| She's Having a Baby | Alec Baldwin |
...

To get to Kevin Bacon 2 actors you can JOIN this view to itself. You tie up the output of one instance into the input of the other. The result is pretty big by this stage, so it is best to specify your target. In this example, the target is Alan Alda:

mysql> SELECT LEFT(x.movie,20) movie1
 -> ,LEFT(x.actorOut,20) actor1
 -> ,LEFT(y.movie,20) movie2
 -> ,LEFT(y.actorOut,10) actor2
 -> FROM actorToActor x JOIN actorToActor y ON (x.actorOut=y.actorIn)
 -> WHERE x.actorIn = 'Kevin Bacon'
 -> AND y.actorOut = 'Alan Alda';
+--------------------+------------------+----------------------+-----------+
| movie1 | actor1 | movie2 | actor2 |
+--------------------+------------------+----------------------+-----------+
| Diner | Timothy Daly | Object of My Affecti | Alan Alda |
| Picture Perfect | Jennifer Aniston | Object of My Affecti | Alan Alda |
| My Dog Skip | Diane Lane | Murder at 1600 | Alan Alda |
| Few Good Men, A | Kevin Pollak | Canadian Bacon | Alan Alda |
| Hollow Man | Josh Brolin | Flirting with Disast | Alan Alda |
|Planes, Trains & Aut| John Candy | Canadian Bacon | Alan Alda |
+--------------------+------------------+----------------------+-----------+
6 rows in set (0.13 sec)

The six rows indicate that there are six different ways to connect Kevin Bacon to Alan Alda. The LEFT function has been used to truncate the actor and movie names.

All it takes is another join to connect Kevin Bacon to "Weird Al" Yankovic, who has a Kevin Bacon number of 3:

mysql> SELECT LEFT(x.movie,10) movie1
 -> ,LEFT(x.actorOut,10) actor1
 -> ,LEFT(y.movie,11) movie2
 -> ,LEFT(y.actorOut,10) actor2
 -> ,LEFT(z.movie,11) movie3
 -> ,LEFT(z.actorOut,10) actor3
 -> FROM actorToActor x JOIN actorToActor y ON x.actorOut=y.actorIn
 -> JOIN actorToActor z ON y.actorOut=z.actorIn
 -> WHERE x.actorIn = 'Kevin Bacon'
 -> AND z.actorOut = '''Weird Al'' Yankovic';
+-----------+-----------+------------+-----------+------------+------------+
| movie1 | actor1 | movie2 | actor2 | movie3 | actor3 |
+-----------+-----------+------------+-----------+------------+------------+
| Apollo 13 | Ed Harris | Creepshow | Leslie Nie| Naked Gun: | 'Weird Al' |
| Tremors | Fred Ward | Naked Gun 3| Leslie Nie| Naked Gun: | 'Weird Al' |
| Tremors | Fred Ward | Naked Gun 3| George Ken| Naked Gun: | 'Weird Al' |

You can continue the number of links, but it gets more expensive at every level. The number of different paths grows exponentially. You very quickly get to the stage where the number of rows generated is greater than not just the number of actors in your database, but also the number of people on the planet.

The following query connects Kevin Bacon back to Kevin Bacon in four steps. It takes nearly five minutes and includes nearly 400,000 rows. There are fewer than 2,000 movies in the database.

mysql> SELECT COUNT(*) FROM(
 -> SELECT LEFT(w.movie,10) movie1
 -> ,LEFT(w.actorOut,10) actor1
 -> ,LEFT(z.movie,11) movie3
 -> ,LEFT(z.actorOut,10) actor3
 -> FROM actorToActor w JOIN actorToActor x ON w.actorOut=x.actorIn
 -> JOIN actorToActor y ON x.actorOut=y.actorIn
 -> JOIN actorToActor z ON y.actorOut=z.actorIn
 -> WHERE w.actorIn = 'Kevin Bacon'
 -> AND z.actorOut = 'Kevin Bacon'
 -> ) t;
+----------+
| COUNT(*) |
+----------+
| 400637 |
+----------+
1 row in set (4 min 43.02 sec)

There are limits on how far you can take this technique without using some shortcuts. Some large problems, such as calculating tag clouds or determining page rank calculations, require transitive links such as these to be calculated. As the numbers grow, you have to store intermediate results in tables so that you can effectively use indexes and remove duplicates.

SQL Fundamentals

Joins, Unions, and Views

Text Handling

Date Handling

Number Crunching

Online Applications

Organizing Data

Storing Small Amounts of Data

Locking and Performance

Reporting

Users and Administration

Wider Access

Index



SQL Hacks
SQL Hacks
ISBN: 0596527993
EAN: 2147483647
Year: 2004
Pages: 147

Flylib.com © 2008-2020.
If you may any questions please contact us: flylib@qtcs.net