Section 3.2. Strings

3.2. Strings

A string is a sequence of Unicode letters, digits, punctuation characters, and so on; it is the JavaScript datatype for representing text. As you'll see shortly, you can include string literals in your programs by enclosing them in matching pairs of single or double quotation marks. Note that JavaScript does not have a character datatype such as char, as C, C++, and Java do. To represent a single character, you simply use a string that has a length of 1.

3.2.1. String Literals

A string comprises a sequence of zero or more Unicode characters enclosed within single or double quotes (' or "). Double-quote characters may be contained within strings delimited by single-quote characters, and single-quote characters may be contained within strings delimited by double quotes. String literals must be written on a single line; they may not be broken across two lines. If you need to include a newline character in a string literal, use the character sequence \n, which is documented in the next section. Here are examples of string literals:

 ""  // The empty string: it has zero characters 'testing' "3.14" 'name="myform"' "Wouldn't you prefer O'Reilly's book?" "This string\nhas two lines" "π is the ratio of a circle's circumference to its diameter"

As illustrated in the last example string shown, the ECMAScript v1 standard allows Unicode characters within string literals. Implementations prior to JavaScript 1.3, however, typically support only ASCII or Latin-1 characters in strings. As explained in the next section, you can also include Unicode characters in your string literals using special escape sequences. This is useful if your text editor does not provide complete Unicode support.

Note that when you use single quotes to delimit your strings, you must be careful with English contractions and possessives, such as can't and O'Reilly's. Since the apostrophe is the same as the single-quote character, you must use the backslash character (\) to escape any apostrophes that appear in single-quoted strings (this is explained in the next section).

In client-side JavaScript programming, JavaScript code often contains strings of HTML code, and HTML code often contains strings of JavaScript code. Like JavaScript, HTML uses either single or double quotes to delimit its strings. Thus, when combining JavaScript and HTML, it is a good idea to use one style of quotes for JavaScript and the other style for HTML. In the following example, the string "Thank you" is single-quoted within a JavaScript expression, which is double-quoted within an HTML event-handler attribute:

 <a href="" onclick="alert('Thank you')">Click Me</a>

3.2.2. Escape Sequences in String Literals

The backslash character (\) has a special purpose in JavaScript strings. Combined with the character that follows it, it represents a character that is not otherwise representable within the string. For example, \n is an escape sequence that represents a newline character.^[*]

^[*] C, C++, and Java programmers will already be familiar with this and other JavaScript escape sequences.

Another example, mentioned in the previous section, is the \' escape, which represents the single quote (or apostrophe) character. This escape sequence is useful when you need to include an apostrophe in a string literal that is contained within single quotes. You can see why these are called escape sequences: the backslash allows you to escape from the usual interpretation of the single-quote character. Instead of using it to mark the end of the string, you use it as an apostrophe:

 'You\'re right, it can\'t be a quote'

Table 3-2 lists the JavaScript escape sequences and the characters they represent. Two escape sequences are generic and can be used to represent any character by specifying its Latin-1 or Unicode character code as a hexadecimal number. For example, the sequence \xA9 represents the copyright symbol, which has the Latin-1 encoding given by the hexadecimal number A9. Similarly, the \u escape represents an arbitrary Unicode character specified by four hexadecimal digits; \u03c0 represents the character π, for example. Note that Unicode escapes are required by the ECMAScript v1 standard but are not typically supported in implementations prior to JavaScript 1.3. Some implementations of JavaScript also allow a Latin-1 character to be specified by three octal digits following a backslash, but this escape sequence is not supported in the ECMAScript v3 standard and should no longer be used.

Table 3-2. JavaScript escape sequences
Sequence	Character represented
`\0`	The NUL character (`\u0000`).
`\b`	Backspace (`\u0008`).
`\t`	Horizontal tab (`\u0009`).
`\n`	Newline (`\u000A`).
`\v`	Vertical tab (`\u000B`).
`\f`	Form feed (`\u000C`).
`\r`	Carriage return (`\u000D`).
`\"`	Double quote (`\u0022`).
`\'`	Apostrophe or single quote (`\u0027`).
`\\`	Backslash (`\u005C`).
`\xXX`	The Latin-1 character specified by the two hexadecimal digits `XX`.
`\uXXXX`	The Unicode character specified by the four hexadecimal digits `XXXX`.
`\XXX`	The Latin-1 character specified by the octal digits `XXX`, between 1 and 377. Not supported by ECMAScript v3; do not use this escape sequence.

Finally, note that the backslash escape cannot be used before a line break to continue a string (or other JavaScript) token across two lines or to include a literal line break in a string. If the \ character precedes any character other than those shown in Table 3-2, the backslash is simply ignored (although future versions of the language may, of course, define new escape sequences). For example, \# is the same as #.

3.2.3. Working with Strings

One of the built-in features of JavaScript is the ability to concatenate strings. If you use the + operator with numbers, it adds them. But if you use this operator on strings, it joins them by appending the second to the first. For example:

 msg = "Hello, " + "world";   // Produces the string "Hello, world" greeting = "Welcome to my blog," + " " + name;

To determine the length of a stringthe number of characters it containsuse the length property of the string. If the variable s contains a string, you access its length like this:

 s.length

You can use a number of methods to operate on strings. For example, to get the last character of a string s:

 last_char = s.charAt(s.length - 1)

To extract the second, third, and fourth characters from a string s:

 sub = s.substring(1,4);

To find the position of the first letter "a" in a string s:

 i = s.indexOf('a');

You can use quite a few other methods to manipulate strings. You'll find full documentation of these methods in Part III, under the String object and subsequent listings.

As you can tell from the previous examples, JavaScript strings (and JavaScript arrays, as we'll see later) are indexed starting with zero. That is, the first character in a string is character 0. C, C++, and Java programmers should be perfectly comfortable with this convention, but programmers familiar with languages that have 1-based strings and arrays may find that it takes some getting used to.

In some implementations of JavaScript, individual characters can be read from strings (but not written into strings) using array notation, so the earlier call to charAt( ) can also be written like this:

 last_char = s[s.length - 1];

Note, however, that this syntax is not part of the ECMAScript v3 standard, is not portable, and should be avoided.

When we discuss the object datatype, you'll see that object properties and methods are used in the same way that string properties and methods are used in the previous examples. This does not mean that strings are a type of object. In fact, strings are a distinct JavaScript datatype: they use object syntax for accessing properties and methods, but they are not themselves objects. You'll see just why this is so at the end of this chapter.

3.2.4. Converting Numbers to Strings

Numbers are automatically converted to strings when needed. If a number is used in a string concatenation expression, for example, the number is converted to a string first:

 var n = 100; var s = n + " bottles of beer on the wall.";

This conversion-through-concatenation feature of JavaScript results in an idiom that you may occasionally see: to convert a number to a string, simply add the empty string to it:

 var n_as_string = n + "";

To make number-to-string conversions more explicit, use the String( ) function:

 var string_value = String(number);

Another technique for converting numbers to strings uses the toString( ) method:

 string_value = number.toString( );

The toString( ) method of the Number object (primitive numbers are converted to Number object so that this method can be called) takes an optional argument that specifies a radix, or base, for the conversion. If you do not specify the argument, the conversion is done in base 10. However, you can also convert numbers in other bases (between 2 and 36).^[*] For example:

^[*] The ECMAScript specification supports the radix argument to the toString( ) method, but it allows the method to return an implementation-defined string for any radix other than 10. Thus, conforming implementations may simply ignore the argument and always return a base-10 result. In practice, however, implementations do honor the requested radix.

 var n = 17; binary_string = n.toString(2);        // Evaluates to "10001" octal_string = "0" + n.toString(8);   // Evaluates to "021" hex_string = "0x" + n.toString(16);   // Evaluates to "0x11"

A shortcoming of JavaScript prior to JavaScript 1.5 is that there is no built-in way to convert a number to a string and specify the number of decimal places to be included, or to specify whether exponential notation should be used. This can make it difficult to display numbers that have traditional formats, such as numbers that represent monetary values.

ECMAScript v3 and JavaScript 1.5 solve this problem by adding three new number-to-string methods to the Number class. toFixed( ) converts a number to a string and displays a specified number of digits after the decimal point. It does not use exponential notation. toExponential( ) converts a number to a string using exponential notation, with one digit before the decimal point and a specified number of digits after the decimal point. toPrecision( ) displays a number using the specified number of significant digits. It uses exponential notation if the number of significant digits is not large enough to display the entire integer portion of the number. Note that all three methods round the trailing digits of the resulting string as appropriate. Consider the following examples:

 var n = 123456.789; n.toFixed(0);         // "123457" n.toFixed(2);         // "123456.79" n.toExponential(1);   // "1.2e+5" n.toExponential(3);   // "1.235e+5" n.toPrecision(4);     // "1.235e+5" n.toPrecision(7);     // "123456.8"

3.2.5. Converting Strings to Numbers

When a string is used in a numeric context, it is automatically converted to a number. This means, for example, that the following code actually works:

 var product = "21" * "2"; // product is the number 42.

You can take advantage of this fact to convert a string to a number by simply subtracting zero from it:

 var number = string_value - 0;

(Note, however that adding zero to a string value results in string concatenation rather than type conversion.)

A less tricky and more explicit way to convert a string to a number is to call the Number( ) constructor as a function:

 var number = Number(string_value);

The trouble with this sort of string-to-number conversion is that it is overly strict. It works only with base-10 numbers, and although it allows leading and trailing spaces, it does not allow any nonspace characters to appear in the string following the number.

To allow more flexible conversions, you can use parseInt( ) and parseFloat( ). These functions convert and return any number at the beginning of a string, ignoring any trailing nonnumbers. parseInt( ) parses only integers, while parseFloat( ) parses both integers and floating-point numbers. If a string begins with "0x" or "0X", parseInt( ) interprets it as a hexadecimal number.^[*] For example:

^[*] The ECMAScript specification says that if a string begins with "0" (but not "0x" or "0X"), parseInt( ) may parse it as an octal number or as a decimal number. Because the behavior is unspecified, you should never use parseInt( ) to parse numbers with leading zeros, unless you explicitly specify the radix to be used!

 parseInt("3 blind mice");    // Returns 3 parseFloat("3.14 meters");   // Returns 3.14 parseInt("12.34");           // Returns 12 parseInt("0xFF");            // Returns 255

parseInt( ) can even take a second argument specifying the radix (base) of the number to be parsed. Legal values are between 2 and 36. For example:

 parseInt("11", 2);           // Returns 3 (1*2 + 1) parseInt("ff", 16);          // Returns 255 (15*16 + 15) parseInt("zz", 36);          // Returns 1295 (35*36 + 35) parseInt("077", 8);          // Returns 63 (7*8 + 7) parseInt("077", 10);         // Returns 77 (7*10 + 7)

If parseInt( ) or parseFloat( ) cannot convert the specified string to a number, it returns NaN:

 parseInt("eleven");          // Returns NaN parseFloat("$72.47");        // Returns NaN