Section 2.2. Strings


2.2. Strings

Strings are sequences of characters (like hello). Strings may contain any combination of any characters.[] The shortest possible string has no characters. The longest string fills all of your available memory, though you wouldnt be able to do much with that. This is in accordance with the principle of "no built-in limits" that Perl follows at every opportunity. Typical strings are printable sequences of letters, digits, and punctuation in the ASCII 32 to ASCII 126 range. However, the ability to have any character in a string means you can create, scan, and manipulate raw binary data as strings and that is something with which many other utilities would have great difficulty. For example, you could update a graphical image or compiled program by reading it into a Perl string, making the change, and writing the result back out.

[] Unlike C or C++, theres nothing special about the NUL character in Perl because Perl uses length counting, not a null byte, to determine the end of the string.

Like numbers, strings have a literal representation, which is the way you represent the string in a Perl program. Literal strings come in two different flavors: single-quoted string literals and double-quoted string literals.

2.2.1. Single-Quoted String Literals

A single-quoted string literal is a sequence of characters enclosed in single quotes. The single quotes are not part of the string itself but are there to let Perl identify the beginning and the ending of the string. Any character other than a single quote or a backslash between the quote marks (including newline characters, if the string continues onto successive lines) stands for itself inside a string. To get a backslash, put two backslashes in a row; to get a single quote, put a backslash followed by a single quote:

     'fred'    # those four characters: f, r, e, and d     'barney'  # those six characters     ''        # the null string (no characters)     'Don\'t let an apostrophe end this string prematurely!'     'the last character of this string is a backslash: \\'     'hello\n' # hello followed by backslash followed by n     'hello     there'    # hello, newline, there (11 characters total)     '\'\\'    # single quote followed by backslash 

The \n within a single-quoted string is not interpreted as a newline but as the two characters backslash and n. Only when the backslash is followed by another backslash or a single quote does it have special meaning.

2.2.2. Double-Quoted String Literals

A double-quoted string literal is similar to the strings you may have seen in other languages. Once again, it's a sequence of characters, though this time enclosed in double quotes. But now the backslash takes on its full power to specify certain control characters or any character through octal and hex representations. Here are some double-quoted strings:

     "barney"        # just the same as 'barney'     "hello world\n" # hello world, and a newline     "The last character of this string is a quote mark: \""     "coke\tsprite"  # coke, a tab, and sprite 

The double-quoted literal string "barney" means the same six-character string to Perl as does the single-quoted literal string 'barney'. It's like what you saw with numeric literals, where you saw that 0377 was another way to write 255.0. Perl lets you write the literal in the way that makes more sense to you. Of course, if you wish to use a backslash escape (like \n to mean a newline character), you'll need to use the double quotes.

The backslash can precede different characters to mean different things (generally called a backslash escape). The nearly complete[*] list of double-quoted string escapes is given in Table 2-1.

[*] Recent versions of Perl have introduced Unicode escapes, which we aren't going to show you here.

Table 2-1. Double-quoted string backslash escapes

Construct

Meaning

\n

Newline

\r

Return

\t

Tab

\f

Formfeed

\b

Backspace

\a

Bell

\e

Escape (ASCII escape character)

\007

Any octal ASCII value (here, 007 = bell)

\x7f

Any hex ASCII value (here, 7f = delete)

\cC

A "control" character (here, Ctrl-C)

\\

Backslash

\"

Double quote

\l

Lowercase next letter

\L

Lowercase all following letters until \E

\u

Uppercase next letter

\U

Uppercase all following letters until \E

\Q

Quote non-word characters by adding a backslash until \E

\E

End \L, \U, or \Q


Another feature of double-quoted strings is that they are variable interpolated, meaning that some variable names within the string are replaced with their current values when the strings are used. You haven't formally been introduced to what a variable looks like yet, so we'll get back to this later in this chapter.

2.2.3. String Operators

String values can be concatenated with the . operator. (Yes, that's a single period.) This doesn't alter either string, any more than 2+3 alters either 2 or 3. The resulting (longer) string is then available for further computation or assignment to a variable:

     "hello" . "world"       # same as "helloworld"     "hello" . ' ' . "world" # same as 'hello world'     'hello world' . "\n"    # same as "hello world\n" 

The concatenation must be explicitly requested with the . operator, unlike in some other languages where you merely have to stick the two values next to each other.

A special string operator is the string repetition operator, consisting of the single lowercase letter x. This operator takes its left operand (a string) and makes as many concatenated copies of that string as indicated by its right operand (a number):

     "fred" x 3       # is "fredfredfred"     "barney" x (4+1) # is "barney" x 5, or "barneybarneybarneybarneybarney"     5 x 4            # is really "5" x 4, which is "5555" 

That last example is worth spelling out. The string repetition operator wants a string for a left operand, so the number 5 is converted to the string "5" (using rules described in detail in the next section), giving a one-character string. This new string is then copied four times, yielding the four-character string 5555. If you had reversed the order of the operands, as 4 x 5, you would have made five copies of the string 4, yielding 44444. This shows that string repetition is not commutative.

The copy count (the right operand) is first truncated to an integer value (4.8 becomes 4) before being used. A copy count of less than one results in an empty (zero-length) string.

2.2.4. Automatic Conversion Between Numbers and Strings

For the most part, Perl automatically converts between numbers and strings as needed. How does it know whether a number or a string is needed? It all depends on the operator being used on the scalar value. If an operator expects a number (as + does), Perl will see the value as a number. If an operator expects a string (like . does), Perl will see the value as a string. You don't need to worry about the difference between numbers and strings; use the proper operators, and Perl will make it all work.

When a string value is used where an operator needs a number (say, for multiplication), Perl automatically converts the string to its equivalent numeric value as if it had been entered as a decimal floating-point value.[*] So "12" * "3" gives the value 36. trailing nonnumber stuff and leading whitespace are discarded, so "12fred34" * " 3" will give 36 without any complaints.[] At the extreme end of this, something that isnt a number at all converts to zero. This would happen if you used the string "fred" as a number.

[*] The trick of using a leading zero to mean a non-decimal value works for literals but never for automatic conversion. Use hex( ) or oct( ) to convert those kinds of strings.

[] Unless you request warnings, which well discuss in a moment.

Likewise, if a numeric value is given when a string value is needed (say, for string concatenation), the numeric value expands into whatever string would have been printed for that number. For example, if you want to concatenate the string Z followed by the result of 5 multiplied by 7,[] you can say it this way:

] Youll see about precedence and parentheses shortly.

     "Z" . 5 * 7 # same as "Z" . 35, or "Z35" 

In other words, you don't have to worry about whether you have a number or a string (most of the time). Perl performs all the conversions for you.[§]

[§] And if you're worried about efficiency, don't be. Perl generally remembers the result of a conversion so it's done only once.



Learning Perl
Learning Perl, 5th Edition
ISBN: 0596520107
EAN: 2147483647
Year: 2003
Pages: 232

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net