Tables or result sets sometimes contain duplicate records. In some cases this is acceptable. For example, if you conduct a web poll that records dates and client IP numbers along with the votes, duplicate records may be allowable, because it's possible for large numbers of votes to appear to originate from the same IP number for an Internet service that routes traffic from its customers through a single proxy host. In other cases, duplicates will be unacceptable, and you'll want to take steps to avoid them. Operations related to handling of duplicate records include the following:
Several tools are at your disposal for dealing with duplicate records. These can be chosen according to the objective you're trying to achieve:
This chapter describes how each of these techniques applies to duplicate identification and removal, but before proceeding further, I should define what "duplicate" means here. When people say "duplicate record," they may mean different things. For purposes of this chapter, one record is a duplicate of another if both rows contain the same values in columns that are supposed to distinguish them. Consider the following table:
mysql> SELECT * FROM person; +------+-----------+------------+---------------+------+ | id | last_name | first_name | address | age | +------+-----------+------------+---------------+------+ | 1 | Smith | Jim | 428 Mill Road | 36 | | 2 | Smith | Joan | 428 Mill Road | 36 | | 3 | Smith | Junior | 428 Mill Road | 12 | +------+-----------+------------+---------------+------+
None of these records are duplicates if you compare rows using all the columns, because then the records contain the id and first_name columns, each of which happen to contain only unique values. However, if you look only at the last_name or address columns, all the records contain duplicated values. Lying between these extremes, a result set consisting of the age column contains a mix of unique and duplicated values.
Scripts related to the examples shown in this chapter are located in the dups directory of the recipes distribution. For scripts that create the tables used here, look in the tables directory.
Using the mysql Client Program
Writing MySQL-Based Programs
Record Selection Techniques
Working with Strings
Working with Dates and Times
Sorting Query Results
Generating Summaries
Modifying Tables with ALTER TABLE
Obtaining and Using Metadata
Importing and Exporting Data
Generating and Using Sequences
Using Multiple Tables
Statistical Techniques
Handling Duplicates
Performing Transactions
Introduction to MySQL on the Web
Incorporating Query Resultsinto Web Pages
Processing Web Input with MySQL
Using MySQL-Based Web Session Management
Appendix A. Obtaining MySQL Software
Appendix B. JSP and Tomcat Primer
Appendix C. References