10.2. StringsWe'll start by taking a closer look at the Java String class (or, more specifically, java.lang.String). Since working with Strings is so fundamental, understanding the way Strings are implemented in Java and what you can do with them is very important. A String object encapsulates a sequence of Unicode characters. Internally, these characters are stored in a regular Java array, but the String object guards this array jealously and gives you access to it only through its own API. This is to support the idea that Strings are immutable; once you create a String object, you can't change its value. Lots of operations on a String object appear to change the characters or length of a string, but what they really do is return a new String object that copies or internally references the needed characters of the original. Java implementations make an effort to consolidate identical strings used in the same class into a shared-string pool and to share parts of Strings where possible. The original motivation for all of this was performance. Immutable Strings can save memory and be optimized for speed by the Java VM. The flip side is that a programmer should have a basic understanding of the String class in order to avoid creating an excessive number of String objects in places where performance is an issue. That was especially true in the past, when VMs were slow and handled memory poorly. Nowadays string usage is not usually an issue in the overall performance of a real application.[10]
10.2.1. Constructing StringsLiteral strings, defined in your source code, are declared with double quotes and can be assigned to a String variable: String quote = "To be or not to be"; Java automatically converts the literal string into a String object and assigns it to the variable. Strings keep track of their own length, so String objects in Java don't require special terminators. You can get the length of a String with the length( ) method: int length = quote.length( ); Strings can take advantage of the only overloaded operator in Java, the + operator, for string concatenation. The following code produces equivalent strings: String name = "John " + "Smith"; String name = "John ".concat("Smith"); Literal strings can't span lines in Java source files, but we can concatenate lines to produce the same effect: String poem = "'Twas brillig, and the slithy toves\n" + " Did gyre and gimble in the wabe:\n" + "All mimsy were the borogoves,\n" + " And the mome raths outgrabe.\n"; Embedding lengthy text in source code is not normally something you want to do. In this and the following chapter, we'll talk about ways to load Strings from files, special packages called resource bundles, and URLs. Technologies like Java Server Pages and template engines also provide a way to factor out large amounts of text from your code. For example, in Chapter 14, we'll see how to load our poem from a web server by opening a URL like this: InputStream poem = new URL( "http://myserver/~dodgson/jabberwocky.txt").openStream( ); In addition to making strings from literal expressions, you can construct a String directly from an array of characters: char [] data = new char [] { 'L', 'e', 'm', 'm', 'i', 'n', 'g' }; String lemming = new String( data ); You can also construct a String from an array of bytes: byte [] data = new byte [] { (byte)97, (byte)98, (byte)99 }; String abc = new String(data, "ISO8859_1"); In this case, the second argument to the String constructor is the name of a character-encoding scheme. The String constructor uses it to convert the raw bytes to standard 2-byte Unicode characters. If you don't specify a character encoding, the default encoding scheme on your system is used.[10]
Conversely, the charAt( ) method of the String class lets you access the characters of a String in an array-like fashion: String s = "Newton"; for ( int i = 0; i < s.length( ); i++ ) System.out.println( s.charAt( i ) ); This code prints the characters of the string one at a time. Alternately, we can get the characters all at once with toCharArray( ). Here's a way to save typing a bunch of single quotes and get an array holding the alphabet: char [] abcs = "abcdefghijklmnopqrstuvwxyz".toCharArray( ); The notion that a String is a sequence of characters is also codified by the String class implementing the interface java.lang.CharSequence, which prescribes the methods length( ) and charAt( ) as well as a way to get a subset of the characters. 10.2.2. Strings from ThingsObjects and primitive types in Java can be turned into a default textual representation as a String. For primitive types like numbers, the string should be fairly obvious; for object types it is under the control of the object itself. We can get the string representation of an item with the static String.valueOf( ) method. Various overloaded versions of this method accept each of the primitive types: String one = String.valueOf( 1 ); // integer, "1" String two = String.valueOf( 2.384f ); // float, "2.384" String notTrue = String.valueOf( false ); // boolean, "false" All objects in Java have a toString( ) method, inherited from the Object class. For many objects, this method returns a useful result that displays the contents of the object. For example, a java.util.Date object's toString( ) method returns the date it represents formatted as a string. For objects that do not provide a representation, the string result is just a unique identifier that can be used for debugging. The String.valueOf( ) method, when called for an object, invokes the object's toString( ) method and returns the result. The only real difference in using this method is that if you pass it a null object reference, it returns the String "null" for you, instead of producing a NullPointerException: Date date = new Date( ); // Equivalent, e.g., "Sun Dec 19 05:45:34 CST 1969" String d1 = String.valueOf( date ); String d2 = date.toString( ); date = null; d1 = String.valueOf( date ); // "null" d2 = date.toString( ); // NullPointerException! String concatenation uses the valueOf( ) method internally, so if you "add" an object or primitive using the plus operator (+), you get a String: String today = "Today's date is :" + date; You'll sometimes see people use the empty string and the plus operator (+) as shorthand to get the string value of an object. For example: String two = "" + 2.384f; String today = "" + new Date( ); 10.2.3. Comparing StringsThe standard equals( ) method can compare strings for equality; they contain exactly the same characters. You can use a different method, equalsIgnoreCase( ), to check the equivalence of strings in a case-insensitive way: String one = "FOO"; String two = "foo"; one.equals( two ); // false one.equalsIgnoreCase( two ); // true A common mistake for novice programmers in Java is to compare strings with the == operator when they mean to use the equals( ) method. Remember that strings are objects in Java, and == tests for object identity, that is, whether the two arguments being tested are the same object. In Java, it's easy to make two strings that have the same characters but are not the same string object. For example: String foo1 = "foo"; String foo2 = String.valueOf( new char [] { 'f', 'o', 'o' } ); foo1 == foo2 // false! foo1.equals( foo2 ) // true This mistake is particularly dangerous because it often works for the common case in which you are comparing literal strings (strings declared with double quotes right in the code). The reason for this is that Java tries to manage strings efficiently by combining them. At compile time, Java finds all the identical strings within a given class and makes only one object for them. This is safe because strings are immutable and cannot change. You can coalesce strings yourself in this way at runtime using the String intern( ) method. Interning a string returns an equivalent string reference that is unique across the VM. The compareTo( ) method compares the lexical value of the String to another String, determining whether it sorts alphabetically earlier than, the same as, or later than the target string. It returns an integer that is less than, equal to, or greater than zero: String abc = "abc"; String def = "def"; String num = "123"; if ( abc.compareTo( def ) < 0 ) // true if ( abc.compareTo( abc ) == 0 ) // true if ( abc.compareTo( num ) > 0 ) // true The compareTo( ) method compares strings strictly by their characters' positions in the Unicode specification. This works for simple text but does not handle all language variations well. The Collator class, discussed next, can be used for more sophisticated comparisons. 10.2.3.1 The Collator classThe java.text package provides a sophisticated set of classes for comparing strings in specific languages. German, for example, has vowels with umlauts and another character that resembles the Greek letter beta and represents a double "s." How should we sort these? Although the rules for sorting such characters are precisely defined, you can't assume that the lexical comparison we used earlier has the correct meaning for languages other than English. Fortunately, the Collator class takes care of these complex sorting problems. In the following example, we use a Collator designed to compare German strings. You can obtain a default Collator by calling the Collator.getInstance( ) method with no arguments. Once you have an appropriate Collator instance, you can use its compare( ) method, which returns values just like String's compareTo( ) method. The following code creates two strings for the German translations of "fun" and "later," using Unicode constants for these two special characters. It then compares them, using a Collator for the German locale. (Locales help you deal with issues relevant to particular languages and cultures; we'll talk about them in detail later in this chapter.) The result in this case is that "fun" (Spaß) sorts before "later" (später): String fun = "Spa\u00df"; String later = "sp\u00e4ter"; Collator german = Collator.getInstance(Locale.GERMAN); if (german.compare(fun, later) < 0) // true Using collators is essential if you're working with languages other than English. In Spanish, for example, "ll" and "ch" are treated as unique characters and alphabetized separately. A collator handles cases like these automatically. 10.2.4. SearchingThe String class provides several simple methods for finding fixed substrings within a string. The startsWith( ) and endsWith( ) methods compare an argument string with the beginning and end of the String, respectively: String url = "http://foo.bar.com/"; if ( url.startsWith("http:") ) // true The indexOf( ) method searches for the first occurrence of a character or substring and returns the starting character position, or -1 if the substring is not found: String abcs = "abcdefghijklmnopqrstuvwxyz"; int i = abcs.indexOf( 'p' ); // 15 int i = abcs.indexOf( "def" ); // 3 int I = abcs.indexOf( "Fang" ); // -1 Similarly, lastIndexOf( ) searches backward through the string for the last occurrence of a character or substring. Java 5.0 added a contains( ) method to handle the very common task of checking to see whether a given substring is contained in the target string: String log = "There is an emergency in sector 7!"; if ( log.contains("emergency") ) pageSomeone( ); // equivalent to if ( log.indexOf("emegency") != -1 ) ... For more complex searching, you can use the Regular Expression API, which allows you to look for and parse complex patterns. We'll talk about regular expressions later in this chapter. 10.2.5. EditingA number of methods operate on the String and return a new String as a result. While this is useful, you should be aware that creating lots of strings in this manner can affect performance. If you need to modify a string often or build a complex string from components, you should use the StringBuilder class, as we'll discuss shortly. trim( ) is a useful method that removes leading and trailing whitespace (i.e., carriage return, newline, and tab) from the String: String str = " abc "; str = str.trim( ); // "abc" In this example, we have thrown away the original String (with excess whitespace), and it will be garbage-collected. The toUpperCase( ) and toLowerCase( ) methods return a new String of the appropriate case: String down = "FOO".toLowerCase( ); // "foo" String up = down.toUpperCase( ); // "FOO" substring( ) returns a specified range of characters. The starting index is inclusive; the ending is exclusive: String abcs = "abcdefghijklmnopqrstuvwxyz"; String cde = abcs.substring( 2, 5 ); // "cde" The replace( ) method was added in Java 5.0, providing simple, literal string substitution. One or more occurrences of the target string are replaced with the replacement string, moving from beginning to end. For example: String message = "Hello NAME, how are you?".replace( "NAME", "Penny" ); // "Hello Penny, how are you?" String xy = "xxooxxxoo".replace( "xx", "X" ); // "XooXxoo" The String class also has two methods that allow you to do more complex pattern substitution: replaceAll( ) and replaceFirst( ). Unlike the simple replace( ) method, these methods use regular expressions to describe the replacement pattern, which we'll cover later in this chapter. 10.2.6. String Method SummaryTable 10-2 summarizes the methods provided by the String class.
10.2.7. StringBuilder and StringBufferIn contrast to the immutable string, the java.lang.StringBuilder class is a modifiable and expandable buffer for characters. You can use it to create a big string efficiently. StringBuilder and StringBuffer are twins; they have exactly the same API. StringBuilder was added in Java 5.0 as a drop-in, unsynchronized replacement for StringBuffer. We'll come back to that in a bit. First, let's look at some examples of String construction: // Could be better String ball = "Hello"; ball = ball + " there."; ball = ball + " How are you?"; This example creates an unnecessary String object each time we use the concatenation operator (+). Whether this is significant depends on how often this code is run and how big the string actually gets. Here's a more extreme example: // Bad use of + ... while( (line = readLine( )) != EOF ) text += line; This example repeatedly produces new String objects. The character array must be copied over and over, which can adversely affect performance. The solution is to use a StringBuilder object and its append( ) method: StringBuilder sb = new StringBuilder("Hello"); sb.append(" there."); sb.append(" How are you?"); StringBuilder text = new StringBuilder( ); while( (line = readline( )) != EOF ) text.append( line ); Here, the StringBuilder efficiently handles expanding the array as necessary. We can get a String back from the StringBuilder with its toString( ) method: String message = sb.toString( ); You can also retrieve part of a StringBuilder, as a String, using one of the substring( ) methods. You might be interested to know that when you write a long expression using string concatenation, the compiler generates code that uses a StringBuilder behind the scenes: String foo = "To " + "be " + "or"; It is really equivalent to: String foo = new StringBuilder( ).append("To ").append("be ").append("or").toString( ); In this case, the compiler knows what you are trying to do and takes care of it for you. The StringBuilder class provides a number of overloaded append( ) methods for adding any type of data to the buffer. StringBuilder also provides a number of overloaded insert( ) methods for inserting various types of data at a particular location in the string buffer. Furthermore, you can remove a single character or a range of characters with the deleteCharAt( ) and delete( ) methods. Finally, you can replace part of the StringBuilder with the contents of a String using the replace( ) method. The String and StringBuilder classes cooperate so that, in some cases, no copy of the data has to be made; the string data is shared between the objects. You should use a StringBuilder instead of a String any time you need to keep adding characters to a string; it's designed to handle such modifications efficiently. You can convert the StringBuilder to a String when you need it, or simply concatenate or print it anywhere you'd use a String. As we said earlier, StringBuilder was added in Java 5.0 as a replacement for StringBuffer. The only real difference between the two is that the methods of StringBuffer are synchronized and the methods of StringBuilder are not. This means that if you wish to use StringBuilder from multiple threads concurrently, you must synchronize the access yourself (which is easily accomplished). The reason for the change is that most simple usage does not require any synchronization and shouldn't have to pay the associated penalty (slight as it is). |