Section 16.5. Slices | Learning Perl, 5th Edition

16.5. Slices

It often happens that we need to work with a few elements from a given list. For example, the Bedrock Library keeps information about their patrons in a large file.^[*] Each line in the file describes one patron with six colon-separated fields: a person's name, library card number, home address, home phone number, work phone number, and number of items currently checked out. A little bit of the file looks something like this:

^[*] It should be a full-featured database rather than a flat file. They plan to upgrade their system, right after the next Ice Age.

     fred flintstone:2168:301 Cobblestone Way:555-1212:555-2121:3     barney rubble:709918:3128 Granite Blvd:555-3333:555-3438:0

One of the library's applications needs the card numbers and number of items checked out; it doesn't use any of the other data. It could use code something like this to get the fields it needs:

     while (<FILE>) {       chomp;       my @items = split /:/;       my($card_num, $count) = ($items[1], $items[5]);       ...  # now work with those two variables     }

The array @items isn't needed for anything else though, so it seems like a waste.^[*] Maybe it would be better to assign the result of split to a list of scalars, like this:

^[*] It's not much of a waste, but stay with us. All of these techniques are used by programmers who don't understand slices, so it's worthwhile to see all of them here.

     my($name, $card_num, $addr, $home, $work, $count) = split /:/;

That avoids the unneeded array @items, but now we have four scalar variables that we didn't need. For this situation, some people used to make up a number of dummy variable names, like $dummy_1, that showed they didn't care about that element from the split. Larry thought that was too much trouble, so he added a special use of undef. If an item in a list being assigned to is undef, that means to ignore the corresponding element of the source list:

     my(undef, $card_num, undef, undef, undef, $count) = split /:/;

Is this any better? Its advantage is having no unneeded variables. Its disadvantage is that you have to count undefs to tell which element is $count. This becomes unwieldy if the list has more elements. For example, some people who wanted just the mtime value from stat were writing code like this:

     my(undef, undef, undef, undef, undef, undef, undef,       undef, undef, $mtime) = stat $some_file;

If you use the wrong number of undefs, you'll get the atime or ctime by mistake, and that's a tough one to debug. There's a better way. Perl can index into a list as if it were an array. This is a list slice. Since the mtime is item 9 in the list returned by stat,^[] we can get it with a subscript:

] Its the tenth item, but the index number is 9 since the first item is at index 0. This is the same kind of zero-based indexing we've used with arrays.

     my $mtime = (stat $some_file)[9];

Those parentheses are required around the list of items (in this case, the return value from stat). If you wrote it like this, it wouldn't work:

     my $mtime = stat($some_file)[9];  # Syntax error!

A list slice must have a subscript expression in square brackets after a list in parentheses. The parentheses holding the arguments to a function call don't count.

Going back to the Bedrock Library, the list we're working with is the return value from split. We use a slice to pull out item 1 and item 5 with subscripts:

     my $card_num = (split /:/)[1];     my $count = (split /:/)[5];

Using a scalar-context slice like this (pulling a single element from the list) isn't bad, but it would be more efficient and simpler if we didn't have to do the split twice. So let's not do it twice; let's get both values at once by using a list slice in list context:

     my($card_num, $count) = (split /:/)[1, 5];

The indices pull out elements 1 and 5 from the list, returning those as a two-element list. When that's assigned to the two my variables, we get exactly what we wanted. We do the slice once, and we set the two variables with a simple notation.

A slice is often the simplest way to pull a few items from a list. Here, we can pull the first and last items from a list, knowing that index -1 means the last element:^[*]

^[*] Sorting a list to find the extreme elements isn't likely to be the most efficient way. But Perl's sort is fast enough that this is generally acceptable as long as the list doesn't have more than a few hundred elements.

     my($first, $last) = (sort @names)[0, -1];

The subscripts of a slice may be in any order and may even repeat values. This example pulls five items from a list of ten:

     my @names = qw{ zero one two three four five six seven eight nine };     my @numbers = ( @names )[ 9, 0, 2, 1, 0 ];     print "Bedrock @numbers\n";  # says Bedrock nine zero two one zero

16.5.1. Array Slice

That previous example could be simplified. When slicing elements from an array (as opposed to a list), the parentheses aren't needed, so we could have done the slice like this:

     my @numbers = @names[ 9, 0, 2, 1, 0 ];

This isn't merely a matter of omitting the parentheses; this is actually a different notation for accessing array elements: an array slice. In Chapter 3, we said that the at sign on @names meant "all of the elements." Actually, in a linguistic sense, it's more like a plural marker, much like the letter "s" in words like "cats" and "dogs." In Perl, the dollar sign means there's one of something, but the at sign means there's a list of items.

A slice is always a list, so the array slice notation uses an at sign to indicate that. When you see something like @names[ ... ] in a Perl program, you'll need to do as Perl does and look at the at sign at the beginning as well as the square brackets at the end. The square brackets mean you're indexing into an array, and the at sign means you're getting a whole list^[*] of elements. A dollar sign would mean a single one. See Figure 16-1.

^[*] When we say "a whole list," that doesn't necessarily mean more elements than one since the list could be empty, after all.

Figure 16-1. Array slices versus single elements

The punctuation mark at the front of the variable reference (the dollar sign or at sign) determines the context of the subscript expression. If there's a dollar sign in front, the subscript expression is evaluated in a scalar context to get an index. If there's an at sign in front, the subscript expression is evaluated in a list context to get a list of indices.

So we see that @names[ 2, 5 ] means the same list as ($names[2], $names[5]) does. If you want that list of values, you can use the array slice notation. Any place you might want to write the list, you can use the array slice.

The slice can be used in one place where the list can't: a slice may be interpolated directly into a string:

     my @names = qw{ zero one two three four five six seven eight nine };     print "Bedrock @names[ 9, 0, 2, 1, 0 ]\n";

If we were to interpolate @names, that would give all of the items from the array, separated by spaces. If we interpolate @names[ 9, 0, 2, 1, 0 ], that gives just those items from the array, separated by spaces.^[] Lets go back to the Bedrock Library for a moment. Maybe our program is updating Mr. Slate's address and phone number in the patron file because he has moved into a large new place in the Hollyrock hills. If we have a list of information about him in @items, we could do something like this to update those two elements of the array:

^[] More accurately, the items of the list are separated by the contents of Perls $" variable, whose default is a space. This should not be changed. When interpolating a list of values, Perl internally does join $", @list, where @list stands in for the list expression.

     my $new_home_phone = "555-6099";     my $new_address = "99380 Red Rock West";     @items[2, 3] = ($new_address, $new_home_phone);

Once again, the array slice makes a more compact notation for a list of elements. In this case, that last line is the same as an assignment to ($items[2], $items[3]) but more compact and efficient.

1.5.2. Hash Slice

Analogous to an array slice, we can slice some elements from a hash in a hash slice. Remember when three of our characters went bowling, and we kept their bowling scores in the %score hash? We could pull those scores with a list of hash elements or with a slice. These two techniques are equivalent, though the second is more concise and efficient:

     my @three_scores = ($score{"barney"}, $score{"fred"}, $score{"dino"});     my @three_scores = @score{ qw/ barney fred dino/ };

A slice is always a list, so the hash slice notation uses an at sign to indicate that.^[*] When you see something like @score{ ... } in a Perl program, you'll need to do as Perl does and look at the at sign at the beginning as well as the curly braces at the end. The curly braces mean that you're indexing into a hash; the at sign means that you're getting a whole list of elements. A dollar sign would mean a single one. See Figure 16-2.

^[*] If it sounds as if we're repeating ourselves, it's because we want to emphasize that hash slices are analogous to array slices. If it sounds as if we're not repeating ourselves, it's because we want to emphasize that hash slices are analogous to array slices.

As with the array slice, the punctuation mark at the front of the variable reference (the dollar sign or at sign) determines the context of the subscript expression. If there's a dollar sign in front, the subscript expression is evaluated in a scalar context to get a single key.^[] If theres an at sign in front, the subscript expression is evaluated in a list context to get a list of keys.

^[] Theres an exception you're not likely to run across since it isn't used much in modern Perl code. See the entry for $; in the perlvar manpage.

It's normal at this point to wonder why there's no percent sign (%) here, when we're talking about a hash. That's the marker that means there's a whole hash; a hash slice

Figure 16-2. Hash slices versus single elements

(like any other slice) is always a list and not a hash.^[*] In Perl, the dollar sign means there's one of something, the at sign means there's a list of items, and the percent sign means there's an entire hash.

^[*] A hash slice is a slice (not a hash) in the same way that a house fire is a fire (not a house), while a firehouse is a house (not a fire). More or less.

As you saw with array slices, a hash slice may be used instead of the corresponding list of elements from the hash, anywhere within Perl. So, we can set our friends' bowling scores in the hash (without disturbing any other elements in the hash) in this way:

     my @players = qw/ barney fred dino /;     my @bowling_scores = (195, 205, 30);     @score{ @players } = @bowling_scores;

That last line does the same thing as if we had assigned to a three-element list: ($score{"barney"}, $score{"fred"}, $score{"dino"}).

A hash slice may be interpolated, too. Here, we print out the scores for our favorite bowlers:

     print "Tonight's players were: @players\n";     print "Their scores were: @score{@players}\n";