Grouping Your Output | Perl Database Programming

Grouping (GROUP BY) your data is different from ordering (ORDER BY) your data. When you use the GROUP BY statement, your data is grouped together in the field(s) that you specify in the SQL statement.

Grouped together means the data is aggregated so that you do not get multiple records with the same data. Think of this like data buckets. Each bucket can hold a number of items, and we are able to count how many items are in each bucket.

Take a look at the following example; this concept should become much clearer:

01: #!/usr/bin/perl -w 02: # 03: # select5.pl 04: # Chapter 4 05: # Listing 4-5 06: # 07: use strict; 08: use DBI; 09: my $dbh = DBI->connect ("DBI:mysql:BibleBook","bookuser","testpass")  10:     or die("Cannot connect: $DBI::errstr");

Lines 1-10 reuse code from the earlier examples in this chapter.

11: my $sql = qq(SELECT COUNT(*), publisher FROM library GROUP BY publisher);

Line 11 is our SQL statement, but it looks a bit different from the others.

This SQL statement shows a feature of the SQL SELECT statement that we have not used yet. After the SELECT and before the FROM statement, instead of putting an asterisk, we place a COUNT(*) statement and a database field name.

In English, this SQL statement sounds something like this: Select the total number of matching items along with the publisher field from the library table, and then group the results by publisher.

The COUNT(*) is the number of matches for each item we group on (publisher in this case). The publisher field is exactly that-the publisher field. We end up with the number of books from each publisher (because of the count(*) statement) and each publisher's name (because of the publisher field after the SELECT statement).

12: my $sth = $dbh->prepare($sql); 13: $sth->execute; 14: while(my $record = $sth->fetch){ 15:     for my $field(@$record){ 16:         print "$field "; 17:     } 18:     print "\n"; 19: }

Lines 12-19 are very similar to the code we have already covered in this chapter. We have, however, removed the counter code, since we've decided to let the DBMS handle the counting for us in this example.

As you can see in the following code, this program produces an output that is very different from that of the other programs we've been working with. The first column is the total number of books from the publisher, the name of which is listed in the second column.

8  Addison Wesley Longman, Inc.  1  Coriolis Group  4  IDG Books Worldwide  3  Macmillan Computer Publishing  1  Manning Publications Company  5  McGraw-Hill Companies, The  1  New Riders Publishing  22  O'Reilly & Associates, Incorporated  2  Prentice Hall  8  Sams  1  Specialized Systems Consultants  3  Wiley, John & Sons, Incorporated  1  Wrox Press, Inc.

Why use the COUNT function instead of just setting up a counter and incrementing it? Setting up and incrementing a counter means you have to go through each data item and increment the counter each time to maintain an accurate count. If you have a small data-set, that may be no big deal. But if you have a large data-set, it can take quite a bit longer. The COUNT function in SQL is optimized internally and doesn't have to iterate through each data element to get the count. Whenever you can, use COUNT; it can definitely make a difference in program-execution speed.

Also, consider this: if you use a counter for the preceding program, you have to keep track of each publisher somehow, incrementing the counter for that publisher only. If you have books from 100 publishers, your code will get pretty unmanageable!