Running a Giveaway Contest | MySQL and Perl for the Web

only for RuBoard - do not distribute or recompile

Some Web sites are fairly stable, with content that doesn t change very much. Suppose, however, that you operate a site that provides frequently changing information, such as a news site for a radio or television station. You probably want people to visit often to see new material, and you may be looking for some sort of attraction to lure them to the site on a recurring basis. Then it hits you: Offer a periodic giveaway contest, where people can enter their name once for each giveaway. This serves multiple purposes:

It encourages people to visit your site regularly.
You get people to divulge information that you can use later to send them annoying junk mail telling them how wonderful your site is and urging them to visit even more frequently!

Okay, I m just kidding about that second purpose. I don t condone gathering information and then misusing it, although that is often the reason giveaways are held. (Ever notice how you end up on a new mailing list after you put your business card into one of those Win a Free Dinner! fishbowls at a restaurant? Now you know why; those things are really a trap for the unwary.)

There are several aspects to running a giveaway contest such as this:

Obviously, you have to have prizes to give away. I leave this part up to you.
You need to collect entries.This can be done using a form-based script.
You have to perform periodic drawings to determine who the winners are. Initially, we ll write our application to handle a daily giveaway, because the queries are simpler to write. Later we ll consider the issues involved in modifying the application for longer periods of time, such as weekly or monthly.
You must notify the winners of their good fortune.
You may want to run summary queries to provide you with information about the popularity of your contest, or perhaps delete old entries.
When you re ready to retire the application, remember to remove it from the Web site. Otherwise, you ll have one of those hideous Most recent winner: Jane Doe, February 13, XXXX situations, where XXXX is about 10 years ago. A good way to disgust your visitors is to awaken their interest in a contest and then have them realize that it s an ancient promotion that you re no longer maintaining.

We ll discuss how to implement each of these phases of the contest shortly (except for acquiring the prizes and deleting the application), but first we should consider something else: fraud.

Detecting Fraud

One issue to be concerned about when you develop this kind of application is the potential for ballot-box stuffing. You don t want people to submit multiple entries to gain an unfair advantage over other contestants. Detection of fraudulent entries is a very difficult problem to solve in general, because in a Web environment it s difficult to uniquely identify visitors to your site. Fortunately, one thing you have working for you in a giveaway situation is that contestants have an incentive to submit correct information. (Only a silly or stupid contestant would submit an entry containing fake information, because you d have no way to contact the contestant if you drew that entry as a winner.) You can use this fact to help you determine whether one entry duplicates another.

One general bad guy strategy for submitting multiple entries is to enter information each time that is correct yet slightly different from earlier entries. Conversely, a general strategy for defeating these attempts is to remove variation by converting values to a single standard format before storing them. This gives you a better chance of detecting duplicates. The following list discusses some of these techniques:

Trim leading and trailing whitespace to prevent people from attempting to submit multiple entries using values that differ only in the amount of whitespace:^[6]

^[6] We used this whitespace-removal technique earlier, but for another reason: to prevent our applications from thinking that a field containing only tabs or spaces has a real value in it.
```
 $val =~ s/^\s+//;  $val =~ s/\s+$//; 
```
Convert sequences of multiple whitespace characters to single spaces. This defeats attempts to make values unique that are based on putting differing numbers of spaces between words:
```
 $val =~ s/\s+/ /g; 
```
Don t be fooled by letter-case variations. Paul , PAUL , and paul are all the same name, and you d want to recognize them as such. To that end, don t store text values in case-sensitive table columns, or else convert values to a known case before storing them. For example, you can convert strings to uppercase or lowercase like this:
```
 $val = uc ($val);  $val = lc ($val); 
```
CHAR and VARCHAR columns are not case sensitive. CHAR BINARY and VARCHAR BINARY are case sensitive, as are TEXT and BLOB columns. If you do use a column type that is case sensitive, you should force values to a given case before storing them.

Remove punctuation if you can. The following pattern removes all characters but letters and spaces, which converts differing values, such as John A. Smith and John A Smith , to the same value:
```
 $val =~ s/[^ a-zA-Z]//g; 
```
Convert values that are partially numeric to strictly numeric form when possible. A user might attempt to submit three entries by using variant forms of the same telephone number, such as 123-4567 , 123 4567 , and 1234567 . A string comparison would consider these all different, even though you d easily recognize them as the same. By stripping out the non-digit characters, all three forms become the same value 1234567 . You can do this by performing the following substitution:
```
 $val =~ s/\D//g; 
```
The same kind of conversion can be useful with other types of values, such as credit card or social security numbers.

We ll use several of these techniques in our giveaway application. They aren t a complete solution to the duplicate-entry problem, but they do help make entries more uniform. These value-transformation operations are likely to be useful in other scripts as well, so we ll add them as utility functions to our WebDB.pm module file (see Chapter 2, Getting Connected Putting Your Database on the Web ). For example, a function to trim whitespace from the ends of a value can be written like this:

 sub trim  { my $str = shift;      return "" if !defined $str;      $str =~ s/^\s+//;      $str =~ s/\s+$//;      return ($str);  }

trim() is likely to be the first transformation we apply to field values, so we also can have it convert undef to the empty string. That way it takes care of two value-checking operations automatically. To use trim(), invoke it as follows:

 $val = WebDB::trim ($val);

The other functions are written in similar fashion; we ll call them collapse_whites-pace(), strip_punctuation(), and strip_non_digits(). You ll find all these functions in the version of WebDB.pm provided in this book s webdb source distribution. ^[7]

^[7] If you decide to add other validation functions to WebDB.pm and you re using mod_perl, remember that you ll need to restart Apache. mod_perl notices only changes to scripts that are invoked directly, not changes to library modules.

Making entries more uniform helps us in two ways. First, we can attempt to eliminate as many duplicate entries as possible at the time they are submitted. To do this, we ll require contestants to provide some piece of information that makes entries unique. If you consider telephone numbers unique and want to allow only one entry per person per day, for example, you can construct a unique index on the contestant record table that consists of the entry date and the telephone number. Then if someone attempts to enter twice on the same day, we can toss the second entry by noticing that an entry with the same phone number has already been received. (There are at least three problems with this strategy: A person might have multiple phone numbers and could submit one entry per number; a person might enter once with the area code specified and once without; and people who legitimately share the same phone number, such as household members, are prevented from each submitting separate entries. However, I m going to ignore these problems and just observe that they illustrate the difficulty of arriving at bulletproof uniqueness criteria. You will of course want to determine specific fraud-detection guidelines for your own situation.)

Second, entries that are more uniform help us each time we draw a winning entry. When we select a specific contestant record, we can use it to locate other entries that are near matches and examine them to see whether they look suspiciously similar to the winning entry.

Designing the Contestant Table

The table for storing contestant entries is not complicated. As usual, we ll have an AUTO_INCREMENT column named id to which MySQL will assign unique identifier numbers, and a column to record the record-creation time. We ll also have a few columns that identify the contestant:

 CREATE TABLE contestant  (     id          INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,      entry_date  DATE NOT NULL,      name        VARCHAR(50) NOT NULL,      email       VARCHAR(60) NOT NULL,      telephone   VARCHAR(20) NOT NULL,      UNIQUE (entry_date, telephone)  )

The contestant table includes a unique index composed of values in the entry_date and telephone columns. This enforces the constraint that only one entry per day can be submitted for any given phone number. The date can be assigned by our application, giveaway.pl, when it receives an entry. The telephone number must be provided by the contestant, but might be entered using varying formats, so we ll perform some preprocessing on it to increase our chances of duplicate detection.

Collecting Contest Entries

The main logic of the giveaway.pl script is pretty much identical to that of prod_reg.pl, so it need not be shown here. Basically, it checks the choice parameter to see whether it s empty or submit, and then displays the entry form or processes a submitted form accordingly.

The contestant entry form contains only three text-input fields (contestant s name, email address, and telephone number) and a Submit button. Here again, as with the product-registration application, we can use a hash to describe each field, and then pass the hash array to the form-generating and record-entry functions. The field array looks like this:

 my @field_list =  (     { name => "name", label => "Name:" },      { name => "email", label => "Email address:" },      { name => "telephone", label => "Telephone:" }  );

The array contains field hashes that differ slightly from those we used for the product-registration form. First, there is no req hash attribute to indicate which fields must be non-empty. We re going to require all fields to be filled in, so the form-validation code can enforce that requirement uniformly for all fields. Second, we ll make all fields the same length, so there s no need for a size attribute, either. The code to generate the entry form ends up much like that for product registrations, with some small modifications:

 sub display_entry_form  { my $field_ref = shift;      # reference to field list  my @row;      print start_form (-action => url ()),              p ("Please complete the following form to submit your\n"                  . "entry for our giveaway contest. Please, only\n"                  . "one entry per person per day.");      foreach my $f (@{$field_ref})      {         push (@row, Tr (                 td (escapeHTML ($f->{label})),                  td (textfield (-name => $f->{name}, -size => 60 ))              ));      }      print table (@row),              submit (-name => "choice", -value => "Submit"),              end_form ();  }

When a contestant submits an entry, we ll perform some tests on it. If the entry is unsuitable, we ll redisplay the form with appropriate feedback indicating how to fix the problems. Otherwise, we ll use the form contents to create a new contestant table record and thank the user for participating:

 sub process_form  { my $field_ref = shift;      # reference to field list  my @errors;  my ($name, $email, $telephone);  my $dbh;      # First, make sure all fields have a value      foreach my $f (@{$field_ref})      {         my $val = WebDB::trim (param ($f->{name})); # get trimmed field value          push (@errors, $f->{label}) if $val eq "";  # it's empty!          # put modified value back into environment          param (-name => $f->{name}, -value => $val);      }      if (@errors)      {         print p ("Some information is missing."                  . " Please fill in the following fields:");          s/:$// foreach (@errors);   # strip colons from end of labels          print ul (li (\@errors));           # print column names          display_entry_form ($field_ref);    # redisplay entry form          return;      }      # Re-extract the modified values from the environment and perform some      # field-specific checks. Also, transform values to more uniform format      # to make it easier to catch duplicates.      $name = param ("name");      $email = param ("email");      $telephone = param ("telephone");      # Collapse runs of white space, eliminate punctuation in the name field      $name = WebDB::collapse_whitespace ($name);      $name = WebDB::strip_punctuation ($name);      # Check the email value using a rudimentary pattern that requires      # a user name, an @ character, and a hostname containing at least      # two components.      if (!WebDB::looks_like_email ($email))      {         push (@errors, "Email address must be in user\@host.name format");      }      # Strip non-digit characters from telephone number      $telephone = WebDB::strip_non_digits ($telephone);      if (length ($telephone) < 7)      {         push (@errors, "Telephone number must contain at least seven digits");      }      if (@errors)      {         print p ("Your entry could not be processed. Please\n"                  . "correct the following problem(s) and resubmit the entry:");          print ul (li (\@errors));           # print error messages          display_entry_form ($field_ref);    # redisplay entry form          return;      }      # Everything looks okay. Insert record, thank user.      # Use INSERT IGNORE to ignore attempts to enter twice.      # (If an entry contains entry_date and telephone values      # that duplicate an existing record, it will be rejected.)      $dbh = WebDB::connect();      $dbh->do ("INSERT IGNORE INTO contestant (entry_date,name,email,telephone)                  VALUES(CURRENT_DATE,?,?,?)",              undef,              $name, $email, $telephone);      $dbh->disconnect ();      print p ("We have received your entry. Thank you.");      print p ("We will notify you if you are a winner!");  }

The test to verify that each field has been filled in is similar to what we ve done before: Trim whitespace from the ends and check whether the result is empty. To avoid having to strip whitespace again later when performing more specific tests, the loop stores the modified value back into the environment:

 # put modified value back into environment  param (-name => $f->{name}, -value => $val);

That way, when we extract the values again later, we get the already-stripped versions. In effect, we re treating the parameter space as a global clipboard that can be used to pass data around between different phases of an application.^[8]

^[8] We could also have used the technique of storing the stripped variable values in a value attribute of the field information hashes, the way we did for the product-registration application. I m using param() here to illustrate another technique that accomplishes the same end without modifying the field hashes.

For fields found to be missing, we collect error messages and display them if the form is not complete. If values were supplied for all fields, further processing is performed on each field according to the type of information it s expected to contain. The field-specific operations are designed to verify that the content matches some expected format and to put values into a standardized format that will help us identify duplicate entries.

For the name, we collapse runs of whitespace characters to single spaces and eliminate punctuation. This isn t necessary for validation of the entry, but it s useful for fraud detection later: After we pick a winning entry, standardizing the name values will help us find possible duplicates on the name field.

For the email value, we call looks_like_email(), a utility routine that checks whether a value looks like an email address in user@host.name form. Email address validation can be a really complicated task to perform in any exhaustive sense. For example, Friedl (Mastering Regular Expressions, by Jeffrey E. F. Friedl) shows a regular expression that works for the general case, but it s 6598 characters long! I prefer something a bit less complex that just weeds out obvious clunkers. We ll require a string containing a username, an @ character, and a domain name consisting of at least two components separated by periods:

 sub looks_like_email  { my $str = shift;      return ($str =~ /^[^@]+@[^.]+\.[^.]/);  }

As an explanation of the pattern, here it is again, this time expressed using Perl s /.../x pattern notation that allows embedded comments:

 return ($str =~ /              ^       # match beginning of string              [^@]+   # match non-empty sequence of non-@ characters              @       # match literal @ character              [^.]+   # match non-empty sequence of non-period characters              \.      # match literal period              [^.]    # require at least one more non-period character  /x);

The pattern match passes correct addresses and flunks several forms of malformed addresses:

 fred@user-surly.com               okay  fred@central.user-surly.com       okay  fred.rumblebuffin@user-surly.com  okay  fred@user-surly                   bad; domain name has only one component  fred@                             bad; no domain name  @user-surly.com                   bad; no user name  @                                 bad; no user name or domain name

looks_like_email() is the kind of routine that s likely to be useful in several scripts, so I put it in WebDB.pm to make it easily accessible. Putting the address-checking test in a function also is beneficial if you decide at some point that you prefer to use a different pattern for checking addresses perhaps one that s stricter. Just modify looks_like_email() and all your scripts that use it immediately become more strict in what they consider a valid address. You don t need to modify each script individually.^[9]

^[9] One caveat: Remember that in a mod_perl environment, you d need to restart Apache to get it to notice that the library file containing the function has been changed.

Telephone numbers are one of those types of information that can be specified different ways, so we strip non-digit characters to convert the value to a standard format before storing it. Also, it s a good idea to check the result to make sure it contains at least seven digits, because anything shorter is sure to be malformed:

 $telephone = WebDB::strip_non_digits ($telephone);  if (length ($telephone) < 7)  {     push (@errors, "Telephone number must contain at least seven digits");  }

If the @errors array is non-empty after the field-specific tests, one or more problems were found, so process_form() displays the error messages and shows the form again so that the user can correct the problems. Otherwise, everything looks okay and the entry can be added to the contestant table.The query uses INSERT IGNORE rather than just INSERT. The IGNORE keyword is a MySQL-specific extension to the INSERT statement that s very helpful in this case. It causes the new entry to be discarded silently if it contains the same telephone number as an existing record entered on the same day. Without IGNORE, the script would abort with an error if the entry happens to duplicate an existing record.

An alternative to INSERT IGNORE is REPLACE (another MySQL extension), which would kick out any earlier entry that matches the new one. However, INSERT IGNORE seems a bit fairer in this case. It keeps the existing record, which gives precedence to the earliest entry.

process_form() could test whether the new entry was inserted or ignored, by checking the row count returned by the do() call. If the result is zero, the record was a duplicate and was discarded. However, we don t bother checking. What for? If someone really is attempting to submit multiple entries, why provide any feedback about the success or failure of the attempt? Letting such users know they failed only serves to alert them that they need to try harder. The principle here is to provide as little information as possible to the bad guys.

Picking Winning Entries

It s necessary to pick a winning entry for each contest period. Our giveaway is a daily one, so picking an entry is a matter of selecting a random row from the entries submitted on a given day. For example, the contest for January 17, 2001 closes at the end of the day, so as of January 18, 2001 on, we can pick a winner any time, like this:

 mysql> SELECT * FROM contestant WHERE entry_date = '2001-01-17'      -> ORDER BY RAND() LIMIT 1;  +-----+------------+-------------+-----------------+-----------+  | id  | entry_date | name        | email           | telephone |  +-----+------------+-------------+-----------------+-----------+  | 97  | 2001-01-17 | PaulDuBois  | paul@snake.net  | 5551212   |  +-----+------------+-------------+-----------------+-----------+

The WHERE clause in the query identifies the candidate rows from which you want to make a selection, ORDER BY RAND() sorts the rows into a random order, and LIMIT 1 restricts the result set to the first of these rows. The effect is to pick one of the given day s rows at random.

You cannot use RAND() in an ORDER BY clause prior to MySQL 3.23.2. However, there is a workaround available for older versions. It involves selecting an additional column containing random numbers and sorting the result on that column. This randomizes the result set with respect to the other columns:

 mysql> SELECT *, id*0+RAND() AS rand_num      -> FROM contestant WHERE entry_date = '2001-01-17'      -> ORDER BY rand_num LIMIT 1;  +-----+------------+-------------+-----------------+------------+--------------+  | id  | entry_date | name        | email           | telephone  | rand_num     |  +-----+------------+-------------+-----------------+------------+--------------+  | 53  | 2001-01-17 | PaulDuBois  | paul@snake.net  | 6085551212 | 0.2591492526 |  +-----+------------+-------------+-----------------+------------+--------------+

The inclusion of id*0 in the expression defeats the query optimizer, which would otherwise think that the additional rand_num column contains constant values and optimizes the ORDER BY rand_num clause by eliminating it from the query.

Now that we have a query for selecting a random row, we can begin picking winning entries.To do so manually, you can run one of the SELECT queries just shown, substituting the appropriate date. Or the computer can do the work automatically if you use a cron job that runs each day to choose a winner from among the preceding day s entries. For example, I might put the following entry in my crontab file to run a script named pick_winner.pl at 5 a.m. every day:

 0 5 * * * /u/paul/bin/pick_winner.pl

Assuming that cron is of the variety that automatically mails job output to the crontab file owner, pick_winner.pl can select a winning entry and display its contents using normal print statements. We can test the script easily by running it from the command line, yet receive the results by mail when the script runs under cron.

pick_winner.pl begins by connecting to the database (not shown), and then determines whether there are any relevant entries for the contest date:

 # default date is "yesterday"  my $contest_date = "DATE_SUB(CURRENT_DATE,INTERVAL 1 DAY)";  # select date in CCYY-MM-DD form  my $date = $dbh->selectrow_array ("SELECT $contest_date");  my $count = $dbh->selectrow_array (                 "SELECT COUNT(*) FROM contestant                  WHERE entry_date = $contest_date");  print "Giveaway contest results for $date:\n";  print "There were $count entries.\n";

The DATE_SUB() expression evaluates to the date for yesterday no matter what the current date is, so we don t need to figure it out ourselves. However, the output from pick_winner.pl should indicate which date it s picking a winner for, and printing a DATE_SUB() expression literally isn t very useful for that, to say the least. The first SELECT query solves this problem.^[10] MySQL evaluates DATE_SUB() and returns the resulting date in standard CCYY-MM-DD format that will be more meaningful for display purposes. (You could put the DATE_SUB() expression inside a call to DATE_FORMAT() if you want a different output format, of course.)

^[10] Some database engines require that a SELECT statement always use at least one table. MySQL does not, so we can use SELECT as a calculator that evaluates expressions without reference to any table.

Next, pick_winner.pl counts the number of entries for the given date and reports the date and entry count. If there are any entries, it picks one at random and displays its contents:

 if ($count > 0)     # don't bother picking winner if there are no entries  {     my ($id, $name, $email, $telephone) = $dbh->selectrow_array (                 "SELECT id, name, email, telephone FROM contestant                  WHERE entry_date = $contest_date                  ORDER BY RAND() LIMIT 1");      # Paranoia check; this shouldn't happen      die "Error, couldn't select winning entry: $DBI::errstr\n"                                          unless defined ($id);      print "The winning entry is:\n";      print "id: $id\n";      print "name: $name\n";      print "email: $email\n";      print "telephone: $telephone\n";  }

Run the script from the command line to make sure it works:

 % /u/paul/bin/pick_winner.pl  Giveaway contest results for 2001-01-17:  There were 4 entries.  The winning entry is:  id: 397  name: Wendell Treestump  email: wt@stumpnet.org  telephone: 1234567

If you have more than one entry, you can run the script several times to verify that it doesn t always pick the same entry.

To make pick_winner.pl more useful, we can modify it slightly to accept an optional date from the command line. That way we can override the default date and pick winners for any day. (And, as it happens, by specifying today s date, we can use pick_winner.pl to see how many entries have been submitted for the current date so far. In this case we ignore the winner entry, of course, because other entries might still arrive.) The modification is quite trivial. Following the line that sets the default date, add another line that checks the @ARGV array for a command-line argument:

 # default date is "yesterday"  my $contest_date = "DATE_SUB(CURRENT_DATE,INTERVAL 1 DAY)";  $contest_date = "'" . shift (@ARGV) . "'" if @ARGV;

Now we can invoke the script with an explicit date:

 % /u/paul/bin/pick_winner.pl 2001-1-15

With a little more work, we can have pick_winner.pl recognize dates such as -n to mean n days ago. Add the following lines after the @ARGV -checking line:

 if ($contest_date =~ /^'-(\d+)'$/)  # convert -n to n days ago  {     $contest_date = "DATE_SUB(CURRENT_DATE,INTERVAL $1 DAY)";  }

That makes it easier to specify dates on the command line. To pick a winner for three days ago or to see how many entries have been submitted today, use the following commands:

 % /u/paul/bin/pick_winner.pl -3  % /u/paul/bin/pick_winner.pl -0

We attempted to eliminate exact duplicates at record-entry time by standardizing the telephone number that forms part of the unique table index. If you like, you can also look for duplicates at winner-picking time by modifying pick_winner.pl to produce additional information that may be helpful for assessing whether your winner is legitimate or fraudulent. To do this, run some queries that look for near matches to the winning entry. If these queries produce any output, you can investigate further. (These results should be considered advisory only; you should examine them manually to evaluate for yourself whether you think something funny is going on.)

One approximate-matching technique can be used to locate questionable entries based on telephone numbers. The number is supposed to be unique among all entries for a given day, but we can eliminate duplicates at entry-submission time only if the number is exactly the same as the number in an existing entry. How about the following entries; do they look like duplicates to you?

 +------+------------+----------------+---------------------+-------------+  |  id  | entry_date | name           | email               | telephone   |  +------+------------+----------------+---------------------+-------------+  | 407  | 2001-01-17 | BillBeaneater  | bill@beaneater.net  | 5551212     |  | 413  | 2001-01-17 | BillBeaneater  | bill@beaneater.net  | 6085551212  |  | 498  | 2001-01-17 | BillBeaneater  | bill@beaneater.net  | 16085551212 |  | 507  | 2001-01-17 | Fred Franklin  | fred@mint.gov       | 4145551212  |  +------+------------+----------------+---------------------+-------------+

The telephone numbers in the second and third entries aren t exact matches to the number in the first entry, so our giveaway.pl script didn t reject them. But as humans, we can easily see that they represent the same telephone number (assuming that 608 is the local area code). The contestant just used the country and area codes to construct phone numbers that are valid but not exactly the same. We can also see that the fourth entry comes from a different area code, so it s not a duplicate which illustrates why we need to examine approximate-match query output manually.

To find entries with a telephone number that is close to the number in the winning entry, use the following query ($id represents the id value of the winning entry):

 SELECT contestant.*  FROM contestant, contestant AS winner  WHERE winner.id = $id  AND contestant.entry_date = winner.entry_date  AND contestant.id != winner.id  AND RIGHT(contestant.telephone,7) = RIGHT(winner.telephone,7)

The query involves a self-join. It locates the winning entry in the contestant table (under the alias winner), and then compares that entry to all others submitted on the same day, looking for a match on the rightmost seven digits of the telephone number. (As with all self-joins, we must refer to the table two different ways to make it clear to MySQL which instance of the table we mean at each point in the query.)

Use the name column instead if you want to see whether the winner submitted several entries under the same name. (Perhaps the winner has several phone numbers and submitted one entry per number.) The query is similar to the preceding one except that it looks for matches in the name value:

 SELECT contestant.*  FROM contestant, contestant AS winner  WHERE winner.id = $id  AND contestant.entry_date = winner.entry_date  AND contestant.id != winner.id  AND contestant.name = winner.name

The query to look for duplicates on the email address is the same except that both references to the name column in the last line should be changed to email.

If you decide that a contest winner should be disqualified, just rerun pick_winner.pl manually to choose another winner for the appropriate date. (This was in fact one of the reasons for writing it to take an optional date on the command line: That enables you to run the script whenever you want and tell it the day for which you want it to pick a winner.)

Other Uses for Near-Match Detection

The techniques discussed for finding similar records in the contestant table can be adapted for other contexts. Suppose you use your database to generate mailing labels from a list of newsletter subscribers or people to whom you send advertising flyers. If you want to cut down on postage costs, you might try modifying these queries to find instances where you re sending multiple mailings to the same person.

Other more general queries can be used to help you assess the extent to which duplicates are present over the entire table (not just for the winning entry). The following queries attempt to identify duplicates for each day, based on the three contestant-supplied values:

 SELECT entry_date, name, COUNT(*) AS count  FROM contestant  GROUP BY entry_date, name HAVING count > 1  SELECT entry_date, email, COUNT(*) AS count  FROM contestant  GROUP BY entry_date, email HAVING count > 1  SELECT entry_date, RIGHT(telephone,7) AS phone, COUNT(*) AS count  FROM contestant  GROUP BY entry_date, phone HAVING count > 1

Notifying the Winner

If you want to contact a contest winner by telephone, you can do that by looking at the phone number in the winner s contestant table entry. Of course, that doesn t involve any programming, so it s not very interesting! Let s assume, therefore, that you want to issue the you re a winner! notification by email, using the email value in the entry. We ll write a notify_winner.pl script that uses the Mail::Sendmail module discussed earlier in the chapter ( Using Email from Within Applications ). The only piece of information this script needs from us is the winning entry ID number. notify_winner.pl can look up the appropriate record and determine from its contents where the message should be sent:

 #! /usr/bin/perl -w  # notify_winner.pl - Notify giveaway contest winner, given winning entry ID  use strict;  use lib qw(/usr/local/apache/lib/perl);  use Mail::Sendmail;  use WebDB;  # Make sure there's a command-line argument and that it's an integer  @ARGV or die "Usage: $0 winning_entry_id\n";  my $id = shift (@ARGV);  $id =~ /^\d+$/ or die "Entry ID $id is not an integer.\n";

 # Retrieve winning entry from database  my $dbh = WebDB::connect();  my ($entry_date, $name, $email, $telephone) = $dbh->selectrow_array (                 "SELECT entry_date, name, email, telephone FROM contestant                  WHERE id = ?",                  undef, $id);  $dbh->disconnect ();  die "Sorry, there's no entry number $id\n" unless defined ($entry_date);  # Construct mail message and send it (use login name of the user who's  # running the script for the From: address)  my $login = getpwuid ($>) or die "Cannot determine your user name: $!\n";  my %mail =  (     To => $email,      From => "$login\@localhost",      Subject => "Congratulations, you're a winner!",      Message => "  Congratulations, $name!  You are the winner of our giveaway  contest for the date $entry_date.  To claim your fabulous prize,  please follow these instructions:  <insert instructions for claiming fabulous prize here>."  );  sendmail (%mail) or die "Attempt to send mail failed\n";  exit (0);

The script constructs the From: address using the login name of the person running it. You might want to change that. (You should also modify the message body to provide appropriate instructions for claiming the prize.)

Dealing with Old Entries

Eventually the contestant table may grow quite large. That might not bother you if you want to maintain the entries for statistical purposes. For example, you can run queries such as the following ones to generate a summary of submission activity to assess how popular the giveaway is. The first produces a daily summary, and the second summarizes by month:

 SELECT      entry_date,      COUNT(*) AS count  FROM contestant GROUP BY entry_date  SELECT      YEAR(entry_date) AS year,      MONTH(entry_date) AS month,      COUNT(*) AS count  FROM contestant GROUP BY year, month

The webdb distribution accompanying the book includes a script giveaway_summary.pl that you can install in your Web server s script directory and invoke to display the result of these queries in a browser window.

You may prefer to delete rather than retain old entries. The following query clobbers all submissions from before January 1, 2001:

 mysql> DELETE FROM contestant WHERE entry_date < '2001-01-01';

It s likely you d want this kind of query to be run automatically so that you don t have to remember to do it yourself. Here s a script, expire_contestant.pl, that does so. (To retain old entries for a different number of days, change the $days value):

 #! /usr/bin/perl -w  # expire_contestant.pl - clobber old contestant entries  use strict;  use lib qw(/usr/local/apache/lib/perl);  use WebDB;  # how long to keep entries (in days)  my $days = 30;  my $dbh = WebDB::connect();  $dbh->do (         "DELETE FROM contestant          WHERE entry_date < DATE_SUB(CURRENT_DATE,INTERVAL ? DAY)",          undef, $days);  $dbh->disconnect ();  exit (0);

Now, set up an entry in your crontab file to run expire_contestant.pl as a cron job. The format of the entry depends on how often you want expiration to occur. The following entries all run the job at 4 a.m., but vary in the frequency of expiration choose the form you prefer. The first runs daily, the second each Sunday, and the third on the first day of each month:

 0 4 * * * /u/paul/bin/expire_contestant.pl  0 4 * * 0 /u/paul/bin/expire_contestant.pl  0 4 1 * * /u/paul/bin/expire_contestant.pl

Suggested Modifications

The contestant table contains no information indicating who the winners are, so you ll have to remember whom you pick each day. Add a column to the table and modify notify_winner.pl to update the appropriate entry by marking it as a winner.

Consider what modifications you d need to make to change the drawing period from daily to weekly or monthly. Changing the period of time over which contestants are allowed a single entry affects several things:

What you d use for the unique key in the contestant table
How often you draw entries, and the date range you use to select candidate rows
The notification message in notify_winner.pl
Any approximate-match duplicate detection queries you may be using
cron job scheduling for pick_winner.pl and expire_contestant.pl

In general, these changes will follow from any changes you make to the contestant table to determine the unique key. If you want to conduct a monthly giveaway, for example, you could add year and month columns to the contestant table, and change the unique index from entry_date and telephone to year, month, and telephone. Then, at entry submission time, use YEAR(CURRENT_DATE) and MONTH(CURRENT_DATE) to insert the current year and month values. That way, you can continue to use INSERT IGNORE to discard duplicate records automatically at entry submission time. Other queries later on in the contest process would select records based on year and month rather than entry_date.

Use MySQL to provide you with information about how your Web site is being used and to help you evaluate its effectiveness: Log apparent attempts at submitting duplicate entries. (Check whether the row count returned by the INSERT IGNORE query in giveaway.pl is zero, and log the entry if so.) This gives you information that might indicate widespread abuse attempts and that can help you decide to implement stricter entry requirements, or perhaps just to terminate the contest altogether.

The INSERT IGNORE statement in giveaway.pl uses the MySQL-specific IGNORE extension. If you want to make the script more portable so that it runs under other databases, yet doesn t die with an error on attempts to enter duplicate entries, how do you do it?

only for RuBoard - do not distribute or recompile