Section 14.3. Dealing with Duplicate Rows


14.3. Dealing with Duplicate Rows

Hard- core table types know that every once in a while, despite the utmost caution, a duplicate value slips into a table. Fortunately, Excel has tools that let you find duplicates, wherever they're hiding, and remove them.

14.3.1. Highlighting Duplicates

It's not too hard to fish out these duplicatesone option is to use sorting (described earlier) on the column where you suspect a duplicate exists. Then, if you spot two identical values, you can delete one of the table rows (Section 14.1.3). Of course, in order for this technique to work, you have to be ready to scroll through all the records and check each one. In a supremely long list, that job could take some time.

Fortunately, Excel has another solutionconditional formatting. You've already used it to highlight important data (in Chapter 6), but you can also use it to make repeating values stand out like sore thumbs. Here's how:

  1. Select the table column (as described in Section 14.1.4) that you want to check for duplicate values .

    For example, you could select the Product ID column to look for products that have the same ID value.


    Note: You can highlight more than one column, but if you do, Excel highlights identical values that appear in more than one column. For example, if the same number appears in the Product ID column and in the Price column, Excel highlights it even though it isn't really a duplicate.
  2. Choose Home Styles Conditional Formatting Highlight Cells Rules Duplicate Values .

    When the Duplicate Values dialog box appears, choose the type of formatting you'd like to use to highlight repeated values. People often choose to change the background color .

  3. Click OK .

    Excel changes the background color of all values that appear more than once in the selected column (or columns ), as shown in Figure 14-19. Conditional formatting keeps working even after you've applied it. So, if you add a new record that duplicates the value of an existing record in the column you're checking, Excel immediately highlights it. It's like having a duplicate value cop around at all times.

Figure 14-19. Conditional formatting helps smoke out a product with the same name "Escape Vehicle (Water)"that happens to be in two different places in the table.



Tip: If you have an extremely large table, you may like to use color-based sorting (Section 14.2.3) to bring the duplicate records to the top of the table. For example, if you highlighted duplicate Product ID values with a light red background fill, you'd click the Product ID column's drop-down arrow, choose Sort By Color, and then pick the same color (which automatically appears in the menu).

14.3.2. Removing Duplicates Automatically

Once you've found duplicate records, it's up to you what to do with them. You can leave them in your table or delete them by hand (Section 14.1.3). However, if you don't want to keep the duplicates, Excel has a quicker solution, thanks to a built-in feature that hunts for duplicates and automatically removes the offending rows. Here's how to use it:

  1. Move to any one of the table cells .

    Although you can technically use the duplicate removal feature with any range of cells, it works best with tables because you don't need to select the full range of data yourself.

  2. Choose Data Data Tools Remove Duplicates .

    Excel shows the Remove Duplicates dialog box, where you can choose which columns to search for duplicates.

  3. Decide how many columns need to match in order for Excel to consider the record a duplicate. Add a checkmark next to each column you want to inspect .

    For example, you may decide that any record with the same Product ID should be considered a duplicate. When Excel scans the table, it removes any subsequent record that has the same Product ID as an earlier record, even if the rest of the data is different.

    On the other hand, you may decide that you want to match several columns to prevent the possibility for error. For example, there may be two different products that share the same Product ID due to a minor typo. To avoid this possibility, you could inspect several columns (as shown in Figure 14-20), so that records are removed only if every column value matches. (You can't do this maneuver with conditional formatting and the Highlight Duplicates rule. When you use conditional formatting, you're limited to finding duplicates in one columnif you include more than one column, Excel treats them as one big batch of cells.)

    Figure 14-20. In order to be considered a duplicate in this operation, the record must have the same Product ID, Model Name, and Category. (The Price can vary.) If you need to select all the columns in a hurry, use the Select All button.


  4. Click OK to remove the duplicates .

    Excel scans your table looking for duplicates. If it finds any, it keeps the first copy and deletes those that occur later on in the table. When Excel is finished, it pops up a message box telling you how many rows it removed and how many still remain .

    You don't get a chance to confirm the deletion process, but you can reverse it by using the Undo feature (hit Ctrl+Z) immediately after it finishes.

Remember, when using the Remove Duplicates feature, you have no way of knowing what records Excel's deleted. For that reason, people often use the Highlight Duplicates rule first to check out the duplicates and make sure they don't belong.


Note: The Remove Duplicates feature and the Highlight Duplicates rule don't work in exactly the same way. For instance, Remove Duplicates treats repeated empty cells (blank values) as duplicates, and removes them. The Highlight Duplicates feature ignores empty cells and doesn't highlight them.


Excel 2007[c] The Missing Manual
Excel 2007[c] The Missing Manual
ISBN: 596527594
EAN: N/A
Year: 2007
Pages: 173

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net