Randomizing a Set of Rows

Table of contents:

13.8.1 Problem

You want to randomize a set of rows or values.

13.8.2 Solution

Use ORDER BY RAND( ).

13.8.3 Discussion

MySQL's RAND( ) function can be used to randomize the order in which a query returns its rows. Somewhat paradoxically, this randomization is achieved by adding an ORDER BY clause to the query. The technique is roughly equivalent to a spreadsheet randomization method. Suppose you have a set of values in a spreadsheet that looks like this:

Patrick
Penelope
Pertinax
Polly

To place these in random order, first add another column that contains randomly chosen numbers:

Patrick .73
Penelope .37
Pertinax .16
Polly .48

Then sort the rows according to the values of the random numbers:

Pertinax .16
Penelope .37
Polly .48
Patrick .73

At this point, the original values have been placed in random order, because the effect of sorting the random numbers is to randomize the values associated with them. To re-randomize the values, choose another set of random numbers and sort the rows again.

In MySQL, a similar effect is achieved by associating a set of random numbers with a query result and sorting the result by those numbers. For MySQL 3.23.2 and up, this is done with an ORDER BY RAND( ) clause:

mysql> SELECT name FROM t ORDER BY RAND( );
+----------+
| name |
+----------+
| Pertinax |
| Penelope |
| Patrick |
| Polly |
+----------+
mysql> SELECT name FROM t ORDER BY RAND( );
+----------+
| name |
+----------+
| Patrick |
| Pertinax |
| Penelope |
| Polly |
+----------+ 

How Random Is RAND()?

Does the RAND( ) function generate evenly distributed numbers? Check it out for yourself with the following Python script, rand_test.py, from the stats directory of the recipes distribution. It uses RAND( ) to generate random numbers and constructs a frequency distribution from them, using .1-sized categories. This provides a means of assessing how evenly distributed the values are.

#! /usr/bin/python
# rand_test.pl - create a frequency distribution of RAND( ) values.
# This provides a test of the randomness of RAND( ).
# Method is to draw random numbers in the range from 0 to 1.0,
# and count how many of them occur in .1-sized intervals (0 up
# to .1, .1 up to .2, ..., .9 up *through* 1.0).
import sys
sys.path.insert (0, "/usr/local/apache/lib/python")
import MySQLdb
import Cookbook
npicks = 1000 # number of times to pick a number
bucket = [0] * 10
conn = Cookbook.connect ( )
cursor = conn.cursor ( )
for i in range (0, npicks):
 cursor.execute ("SELECT RAND( )")
 (val,) = cursor.fetchone ( )
 slot = int (val * 10)
 if slot > 9:
 slot = 9 # put 1.0 in last slot
 bucket[slot] = bucket[slot] + 1
cursor.close ( )
conn.close ( )
# Print the resulting frequency distribution
for slot in range (0, 9):
 print "%2d %d" % (slot+1, bucket[slot])
sys.exit (0)

The stats directory also contains equivalent scripts in other languages.

For versions of MySQL older than 3.23.2, ORDER BY clauses cannot refer to expressions, so you cannot use RAND( ) there (see Recipe 6.4). As a workaround, add a column of random numbers to the column output list, alias it, and refer to the alias for sorting:

mysql> SELECT name, name*0+RAND( ) AS rand_num FROM t ORDER BY rand_num;
+----------+-------------------+
| name | rand_num |
+----------+-------------------+
| Penelope | 0.372227413926485 |
| Patrick | 0.431537678867148 |
| Pertinax | 0.566524063764628 |
| Polly | 0.715938107777329 |
+----------+-------------------+

Note that the expression for the random number column is name*0+RAND( ), not just RAND( ). If you try using the latter, the pre-3.23 MySQL optimizer notices that the column contains only a function, assumes that the function returns a constant value for each row, and optimizes the corresponding ORDER BY clause out of existence. As a result, no sorting is done. The workaround is to fool the optimizer by adding extra factors to the expression that don't change its value, but make the column look like a non-constant. The query just shown illustrates one easy way to do this: Take any column name, multiply it by zero, and add the result to RAND( ). Granted, it may seem a little strange to use name in a mathematical expression, because that column's values aren't numeric. That doesn't matter; MySQL sees the * multiplication operator and performs a string-to-number conversion of the name values before the multiply operation. The important thing is that the result of the multiplication is zero, which means that name*0+RAND( ) has the same value as RAND( ).

Applications for randomizing a set of rows include any scenario that uses selection without replacement (choosing each item from a set of items, until there are no more items left). Some examples of this are:

  • Determining the starting order for participants in an event. List the participants in a table and select them in random order.
  • Assigning starting lanes or gates to participants in a race. List the lanes in a table and select a random lane order.
  • Choosing the order in which to present a set of quiz questions.
  • Shuffling a deck of cards. Represent each card by a row in a table and shuffle the deck by selecting the rows in random order. Deal them one by one until the deck is exhausted.

To use the last example as an illustration, let's implement a card deck shuffling algorithm. Shuffling and dealing cards is randomization plus selection without replacement: each card is dealt once before any is dealt twice; when the deck is used up, it is reshuffled to re-randomize it for a new dealing order. Within a program, this task can be performed with MySQL using a table deck that has 52 rows, assuming a set of cards with each combination of 13 face values and 4 suits:

  • Select the entire table and store it into an array.
  • Each time a card is needed, take the next element from the array.
  • When the array is exhausted, all the cards have been dealt. "Reshuffle" the table to generate a new card order.

Setting up the deck table is a tedious task if you insert the 52 card records by writing out all the INSERT statements manually. The deck contents can be generated more easily in combinatorial fashion within a program by generating each pairing of face value with suit. Here's some PHP code that creates a deck table with face and suit columns, then populates the table using nested loops to generate the pairings for the INSERT statements:

mysql_query ("
 CREATE TABLE deck
 (
 face ENUM('A', 'K', 'Q', 'J', '10', '9', '8',
 '7', '6', '5', '4', '3', '2') NOT NULL,
 suit ENUM('hearts', 'diamonds', 'clubs', 'spades') NOT NULL
 )", $conn_id)
 or die ("Cannot issue CREATE TABLE statement
");

$face_array = array ("A", "K", "Q", "J", "10", "9", "8",
 "7", "6", "5", "4", "3", "2");
$suit_array = array ("hearts", "diamonds", "clubs", "spades");

# insert a "card" into the deck for each combination of suit and face

reset ($face_array);
while (list ($index, $face) = each ($face_array))
{
 reset ($suit_array);
 while (list ($index2, $suit) = each ($suit_array))
 {
 mysql_query ("INSERT INTO deck (face,suit) VALUES('$face','$suit')",
 $conn_id)
 or die ("Cannot insert card into deck
");
 }
}

Shuffling the cards is a matter of issuing this statement:

SELECT face, suit FROM deck ORDER BY RAND( );

To do that and store the results in an array within a script, write a shuffle_deck( ) function that issues the query and returns the resulting values in an array (again shown in PHP):

function shuffle_deck ($conn_id)
{
 $query = "SELECT face, suit FROM deck ORDER BY RAND( )";
 $result_id = mysql_query ($query, $conn_id)
 or die ("Cannot retrieve cards from deck
");
 $card = array ( );
 while ($obj = mysql_fetch_object ($result_id))
 $card[ ] = $obj; # add card record to end of $card array
 mysql_free_result ($result_id);
 return ($card);
}

Deal the cards by keeping a counter that ranges from 0 to 51 to indicate which card to select. When the counter reaches 52, the deck is exhausted and should be shuffled again.

Using the mysql Client Program

Writing MySQL-Based Programs

Record Selection Techniques

Working with Strings

Working with Dates and Times

Sorting Query Results

Generating Summaries

Modifying Tables with ALTER TABLE

Obtaining and Using Metadata

Importing and Exporting Data

Generating and Using Sequences

Using Multiple Tables

Statistical Techniques

Handling Duplicates

Performing Transactions

Introduction to MySQL on the Web

Incorporating Query Resultsinto Web Pages

Processing Web Input with MySQL

Using MySQL-Based Web Session Management

Appendix A. Obtaining MySQL Software

Appendix B. JSP and Tomcat Primer

Appendix C. References



MySQL Cookbook
MySQL Cookbook
ISBN: 059652708X
EAN: 2147483647
Year: 2005
Pages: 412
Authors: Paul DuBois

Flylib.com © 2008-2020.
If you may any questions please contact us: flylib@qtcs.net