4.4 STRINGS IN JAVA


4.4 STRINGS IN JAVA

Java provides two classes, String and StringBuffer, for representing strings and for string processing. An object of type String cannot be modified after it is created.[20] It can be deleted by the garbage collector if there are no variables holding references to it, but it cannot be changed. For this reason, string objects of type String are called immutable. If you want to carry out an in-place modification of a string, the string needs to be an object of type StringBuffer.

As in C++, a string literal in Java is double-quoted. String literals in Java are objects of type String. As in C++, two string literals consisting of the same sequence of characters are one and the same object in the memory. That is, there is only one String object stored for each string literal even when that literal is mentioned at different places in a program, in different classes, or even in different packages of a Java program.

That a string literal consisting of a given sequence of characters is stored only once in the memory is made clear by the following program. Lines (A) and (B) of the program define two different String variables, strX and strY, in two different classes; both strX and strY are initialized with string literals consisting of the same sequence of characters. Nonetheless, a comparison of the two with the ‘==' operator in line (D) tests true. Recall, the operator ‘==' returns true only when its two operands are one and the same object in the memory.

Line (C) of the program illustrates the following string-valued constant expression on the right-hand-side of the assignment operator

     "hell" + "o" 

In such cases, the Java compiler creates a new string literal by joining the two string literals "hell" and "o". Being still a literal, the resulting literal is not stored separately in the memory if it was previously seen by the compiler. So in our case, the variable strZ in line (C) will point to the same location in the memory as the variables strX in line (A) and strY in line (B). This is borne out by the fact that the ‘==' comparison in line (E) tests true.

While joining two string literals together results in a constant expression that is resolved at compile time, the assignment to the variable s3 in the following three instructions can only be made at run time. Therefore, the string hello constructed on the right-hand side in the third statement below will have a separate existence as a String object in the memory even if a string literal consisting of the same sequence of characters was created previously by the program. That should explain why the comparison in line (F) of the program tests false.

     String s1 = "hel";     String s2 = "lo";     String s3 = s1 + s2; 

However, Java provides a mechanism through the method intern () defined for the String class that allows a string created at run-time to be added to the pool of string literals (if it was not in the pool already). If the above three instructions are replaced with

     String s1 = "hel";     String s2 = "lo";     String s3 = (s1 + s2).intern(); 

Java will compare the character sequence in the string object returned by s1 + s2 with the string literals already in store. If a match is found, intern() returns a reference to that literal. If a match is not found, then the string returned by s1 + s2 is added to the pool of string literals and a reference to the new literal returned. That should explain why the ‘==' comparison in line (G) of the program tests true. The reference returned by (s1 + s2). intern () will point to the same string literal as the data member strx of class X.

Here is the program:

 
//StringLiteralUniqueness.java class X { public static String strX = "hello"; } //(A) class Y { public static String strY = "hello"; } //(B) class Z { public static String strZ = "hell" + "o"; } //(C) class Test { public static void main( String[] args ) { // output: true System.out.println( X.strX == Y.strY ); //(D) // output: true System.out.println( X.strX == Z.strZ ); //(E) String s1 = "hel"; String s2 = "lo"; // output: false System.out.println( X.strX == (s1 + s2 ) ); //(F) // output: true System.out.println( X.strX == (s1 + s2).intern() ); //(G) } }

4.4.1 Constructing String and StringBuffer Objects

String objects are commonly constructed using the following syntax

      String str = "hello there"; 

or

     String str = new String( "hello there" ); 

For constructing a StringBuffer object, the first declaration does not work because of type incompatibilities caused by the fact that the right hand side would be a String object and the left hand side a StringBuffer object.

     StringBuffer strbuf = "hello there";        //WRONG 

StringBuffer objects are commonly constructed using the following syntax

     StringBuffer strbuf = new StringBuffer( "hello there" ); 

An empty String object, meaning a String object with no characters stored in it, can be created by

     String s0 = ""; 

or by

     String s0 = new String(); 

To create an empty StringBuffer object, use either

     StringBuffer sb0 = new StringBuffer( "" ); 

or

     StringBuffer sb0 = new StringBuffer(); 

When a String object is created with a nonempty initialization, the amount of memory allocated to the object for the storage of the characters equals exactly what's needed for the characters. On the other hand, when a new StringBuffer object is created, the amount of memory allocated to the object for actual representation of the string is often 16 characters larger than what is needed. This is to reduce the memory allocation overhead for modifications to a string that add small number of characters to the string at a time. The number of characters that a StringBuffer object can accommodate without additional memory allocation is called its capacity. The number of characters stored in a String or a StringBuffer object can be ascertained by invoking the method length () and the capacity of a StringBuffer object by invoking the method capacity ():

     String str = "hello there";     System.out.println( str.length() );                                  // 11     StringBuffer strbuf = new StringBuffer( "hello there" );     System.out.println( strbuf.length() );                               // 11     System.out.println( strbuf.capacity() );                             // 27 

One is, of course, not limited to the capacity that comes with the default initialization of a StringBuffer object-usually 16 over what is needed for the initialization string. If we invoke the StringBuffer with an int argument, it constructs a string buffer with no characters in, but a capacity as specified by the argument. So the following invocation

     StringBuffer strbuf = new StringBuffer( 1024 ); 

would create string buffer of capacity 1024. Characters may then be inserted into the buffer by using, say, the append function that we will discuss later in this section.

While we have shown all the different possible constructor invocations for the StringBuffer class, the String class allows for many more, all with different types of arguments. In the rest of this section, we will show a few more of the String constructors. One of the String constructors takes a char array argument to construct a String object from an array of characters, as in the following example:[21]

     char[] charArr = { 'h', 'e', 'l', 'l', 'o' };     String str4 = new String( charArr ); 

A String object can also be constructed from an array of bytes, as in

     byte[] byteArr = { 'h', 'e', 'l', 'l', 'o' };     String str5 = new String( byteArr );          // "hello" 

Each byte of the byte array byteArr will be set to the ASCII encoding of the corresponding character in the initializer. When constructing a String from the byte array, the Java Virtual Machine translates the bytes into characters using the platform's default encoding, which in most cases would be the ASCII encoding. Subsequently, the String object is constructed from the default encodings for the characters.

If the default encoding will not do the job for constructing a String from a byte array, it is possible to specify the encoding to be used.[22] In the following example, the byte array is specified so that each pair of bytes starting from the beginning corresponds to a Unicode representation of the character shown by the second byte of the pair. For example, the 16-bit pattern obtained by joining together one-byte ASCII based representations of '\O' and 'h' is the Unicode in its big-endian representation for the character 'h'. As a result, the string formed by the constructor is again "hello".

     byte[] byteArr2 = { '\O', 'h', '\o', 'e', '\o', 'l',                                    '\0', 'l', '\0', 'o' };     String str6 = new String( byteArr2, "UTF-16BE" );   // "hello" 

If we wanted to specify the byte order in the little-endian representation, we'd need to use the "UTF-16LE" encoding, as shown below:

     byte[] byteArr3 = { 'h', '\0', 'e', '\0', 'l', '\0',                                  'l', '\0', 'o', '\0' };     String str7 = new String( byteArr3, "UTF-16LE" );  // "hello" 

The last two invocations of the String constructor throw the UnsupportedEncodingException if the specified encoding is not supported by a JVM. The topic of exceptions and how to deal with them will be discussed in Chapter 10.

4.4.2 Accessing Individual Characters

The individual characters of a Java string can be accessed by invoking the charAt method with an int argument:

     String str = "hello";     char ch = str.charAt( 1 );                        // 'e'     StringBuffer strbuf = new StringBuffer( "hello" );     ch = strbuf.charAt( 1 );                          // 'e' 

Since the strings created through the StringBuffer class are mutable, it is possible to write into each character position in such a string, as the following example illustrates:

     StringBuffer strbuf = new StringBuffer( "hello" );     strbuf.setCharAt( 0, 'j' ); 

which would convert "hello" into "jello".

Indexing for accessing the individual characters of a string is always range checked in Java. If you try to access an index that is outside the valid limits for a string, JVM will throw an exception of type StringIndexOutOf BoundsException:

     String str = "hello";     char ch = str.charAt( 100 );                             // ERROR     StringBuffer strbuf = new StringBuffer( "hello" );     ch = strbuf.charAt( 100 );                               // ERROR 

For a StringBuffer string, it is a range violation if you try to access an index that is outside the length of the string even if the index is inside the capacity.

     StringBuffer strbuf = new StringBuffer( "hello" );     System.out.println( strbuf.capacity() );                       // 21     ch = strbuf.charAt( 20 );                                      // ERROR 

For a StringBuffer string, you can delete a character by invoking deleteCharAt:

     StringBuffer strbuf = new StringBuffer( "hello" );     strbuf.deleteCharAt( 0 );     System.out.println( strbuf.length() );      // 4, was 5     System.out.println( strbuf.capacity() );    // 21, was 21 

By deleting a character, the deleteCharAt method shrinks the length of the string by one, but note that the capacity of the string buffer remains unaltered.

4.4.3 String Comparison

Java strings are compared using the equals and compareTo methods, and the ‘==' operator. The method equals returns a TRUE/FALSE answer, whereas the method compareTo returns an integer that tells us whether the String on which the method is invoked is less than, equal to, or greater than the argument String. For example, in the following program fragment

     String str1 = "stint";     String str2 = "stink";     System.out.println( str1.equals( str2 ) );                      // false     String str3 = "stint";     String str4 = "stink";     System.out.println( str3.compareTo( str4 ) > 0 );        // true 

the first print statement outputs false because the strings pointed to by str1 and str2 are composed of different character sequences. The second print statement outputs true because the string str3 is indeed "greater" than the string str4. We'll have more to say on the compareTo method later in this subsection when we talk about sorting arrays of strings.

With regard to the ‘==' operator, as we have already mentioned, the operator can only be used for testing whether two different String variables are pointing to the same String object. Suppose we have the following statements in a program

     String s1 = new String("Hello");     String s2 = s1; 

then s1 == s2 would evaluate to true because both s1 and s2 will be holding references to the same string object, meaning an object that resides at the same place in the memory. On the other hand, if we say

     String s1 = new String("hello");     String s2 = new String("hello"); 

then s1 == s2 will evaluate to false because we now have two distinct String objects at two different places in the memory even though the contents of both objects are identical in value, since they are both formed from the same string literal.

As was mentioned earlier in Chapter 3, both equals and ‘==' are defined for the Object class, the root class in the Java hierarchy of classes, and that the system-supplied definitions for both are the same for Object-comparison on the basis of equality of reference. So, as defined for Object, both these predicates tell us whether the two references point to exactly the same object in the memory. However, while equals can be overridden, ‘==' cannot because it is an operator. The method equals has already been overridden for us in the String class. So it carries out its comparisons on the basis of equality of content for String type strings. But since, in general, operators cannot be overridden in Java, the operator ‘==' retains its meaning as defined in the Object class.

A word of caution about comparing objects of type StringBuffer: While the system provides us with an overridden definition for the equals method for the String class, it does not do so for the StringBuffer class. In other words, while for the String class you can use the equals method to test for the equality of content, you cannot do so for the StringBuffer class, as borne out by the following code:

     String s1 = new String( "Hello" );     String s2 = new String( "Hello" );     System.out.println( ( s1.equals( s2 ) ) + """ );    // true     StringBuffer s3 = new StringBuffer( "Hello" );     StringBuffer s4 = new StringBuffer( "Hello" );     System.out.println( ( s3.equals( s4 )) + "" );      // false 

If you must compare two StringBuffer objects for equality of content, you can can do so by first constructing String objects out of them via the toString method, as in

     StringBuffer sb = new StringBuffer( "Hello" );     if ( ( sb.toString().equals( "jello" ) )     .... 

We will now revisit the compareTo method for the String class. The String class implements the Comparable interface by providing an implementation for the compareTo method. The compareTo method as provided for the String class compares two strings lexicographically using the Unicode values associated with the characters in the string.[23] Because the String class comes equipped with compareTo method, we say that String objects possess a natural ordering, which implies that we are allowed to sort an array of Strings by invoking, say, java.util.Arrays.sort without having to explicitly supply a comparison function to the sort method. This is in accord with our Chapter 3 discussion on comparing objects in Java. The following example illustrates invoking java.util.Arrays.sort for sorting an array of strings.

If we do not want the array of strings to be sorted according to the compareTo comparison function, we can invoke a two-argument version of java.util.Arrays.sort and supply for its second argument an object of type Comparator that has an implementation for a method called compare that tells the sort function how to carry out comparisons.[24] If all you want to do is to carry out a case-insensitive comparison, you can use the Comparator object CASE_INSENSITIVE_ORDER that comes as a static data member of the String class. In the code example shown below, the second sort is a case-insensitive sort. The java.util.Arrays.sort is based on the merge-sort algorithm.

 
//StringSort.java import java.util.*; class StringSort { public static void main( String[] args ) { String[] strArr = { "apples", "bananas", "Apricots", "Berries", "oranges", "Oranges", "APPLES", "peaches"}; String[] strArr2 = strArr; System.out.println("Case sensitive sort with Arrays.sort:" ); Arrays.sort( strArr ); for (int i=0; i<strArr.length; i++) System.out.println( strArr[i] ); System.out.println("\nCase insensitive sort:" ); Arrays.sort( strArr2, String.CASE_INSENSITIVE_ORDER ); for (int i=0; i<strArr2.length; i++) System.out.println( strArr2[i] ); } }

The output of this program is

     Case sensitive sort:     APPLES     Apricots     Berries     Oranges     apples     bananas     oranges     peaches     Case insensitive sort:     APPLES     apples     Apricots     bananas     Berries     Oranges     oranges     peaches 

4.4.4 Joining Strings Together

In general, Java does not overload its operators. But there is one exception to that general rule, the operator ‘+' for just the String type (and not even for the StringBuffer type). The overload definition for this operator will cause the object str3 in the following code fragment to store the string "hello there".

     String str1 = "hello";     String str2 = "there";     String str3 = str1 + str2; 

Strings of type StringBuffer can be joined by invoking the append method, as in

     StringBuffer strbuf = new StringBuffer( "hello" );     StringBuffer strbuf2 = new StringBuffer( " there" );     strbuf.append( strbuf2 );     System.out.println( strbuf );                 // "hello there"     String str = "!";     strbuf.append( str );     System.out.println( strbuf );                // "hello there!" 

The capacity of a string buffer is automatically increased if it runs out of space as additional characters are added to the string already there.

In addition to invoking the append method with either the String or the StringBuffer arguments, you can also invoke it with some of the other types that Java supports, as illustrated by:

     StringBuffer strbuf = new StringBuffer( "hello" );     int x = 123;     strbuf.append( x );     System.out.println( strbuf );                  // "hello123"     double d = 9.87;     strbuf.append( d );     System.out.println( strbuf );                // "hello1239.87" 

As you can see, append first converts its argument to a string representation and then appends the new string to the one already in the buffer. This permits append to be invoked for any object, even a programmer-defined object, as long as it is possible to convert the object into its string representation. As we saw in Chapter 3, when a class is supplied with an override definition for the toString method, the system can automatically create string representations of the objects made from the class.

Going back to the joining of String type strings, an immutable string class is inefficient for serial concatenation of substrings, as in

     String s = "hello" + " there" + " how" + " are" + " you"; 

The string concatenations on the right are equivalent to

     String s = "hello" + (" there" + (" how" + (" are" + " you"))); 

If the Java compiler had available to it only the immutable String class for string processing, each parenthesized concatenation on the right would demand that a new String object be created. Therefore, this example would entail creation of five String objects, of which only one would be used. And then there would be further work entailed in the garbage collection of the eventually unused String objects. Fortunately, the Java compiler does not really use the String class for the operations on the right. Instead, it uses the mutable StringBuffer class and the append method of that class to carry out the concatenations shown above. The final result is then converted back to a String.

4.4.5 Searching and Replacing

One can search for individual characters and substrings in a String type string by invoking the indexOf method:

     String str = "hello there";     int n = str.indexOf( "the" );                     // 6 

By supplying indexOf with a second int argument, it is also possible to specify the index of the starting position for the search. This can be used to search for all occurrences of a character or a substring, as the following code fragment illustrates:

     String mystr = new String( "one hello is like any other hello" );     String search = "hello";     int pos = 0;     while ( true ) {       pos = mystr.indexOf( search, pos );       if ( pos == -1 ) break;      System.out.println( "hello found at: " + pos );                                                                           // 4 and 28      pos++; } 

To parallel our C++ program StringFind.cc, we show next a program that searches for all occurrences of a substring and, when successful, it replaces the substring by another string. Since a String is immutable, we'll have to use a StringBuffer for representing the original string. But since there are no search functions defined for the StringBuffer class, we have to somehow combine the the mutability of a StringBuffer with the searching capability of a String. The following program illustrates this to convert "one hello is like any other hello" into "one armadillo is alike any other armadillo".

 
//StringFind.java class StringFind { public static void main( String[] args ) { StringBuffer strbuf = new StringBuffer( "one hello is like any other hello" ); String searchString = "hello"; String replacementString = "armadillo"; int pos = 0; while ( ( pos = (new String(strbuf)).indexOf( searchString, pos ) ) != -1 ) { strbuf.replace( pos, pos + searchString.length(), replacementString ); pos++; } System.out.println( strbuf ); } }

There is also the method lastIndexOf that searches for the rightmost occurrence of a character or a substring:

 String str = "hello there"; int n = str.lastIndxOf( "he" );                      // 7 

The methods endsWith and startsWith can be invoked to check for suffixes and prefixes in strings:

     String str = "hello there";     if (str.startsWith( "he" ) )                           // true      ....     if ( str.endsWith( "re" ) )                            // true      .... 

4.4.6 Erasing and Inserting Substrings

The following example shows how we can search for a substring, erase it, and then insert in its place another substring. What erase did for C++ is now done by delete with two int arguments for the beginning index and the ending index of the character sequence to be deleted. Insertion of a substring is carried out with the insert method whose first argument, of type int, specifies the index where the new substring is to be spliced in.

 
// StringInsert.java class StringInsert { public static void main( String[] args ) { int pos = 0; StringBuffer quote = new StringBuffer( "Some cause happiness wherever they go," + " others whenever they go - Oscal Wilde" ); String search = "happiness"; if ( ( pos = ( new String(quote) ).indexOf( search) ) != -1 ) { quote.delete( pos, pos + search.length() ); quote.insert( pos, "excitement" ); } System.out.println( quote ); } }

4.4.7 Extracting Substrings

Both String and StringBuffer support substring extraction by invoking the substring method with either one int argument or two int arguments. When only one argument is supplied to substring, that is the beginning index for the substring to be extracted. The substring extracted will include all of the characters from the beginning index till the end. When two arguments are supplied, the second argument stands for the ending index of the desired substring. In all cases, for both String and StringBuffer, the returned object is a new String. For illustration:

     String str = "0123456789abc";     System.out.println( str.substring( 5 ) );         // "56789abc"     System.out.println( str.substring( 5, 9 ) );      // "56789"     StringBuffer stb = new StringBuffer( "0123456789abc" );     System.out.println( stb.substring( 5 ) );         // "56789abc"     System.out.println( stb.substring( 5, 9 ) );      // "56789" 

[20]Operations on String type objects sometimes have the appearance that you might be changing an object of type String, but that is never the case. In all such operations, a new String object is usually formed. For example, in the following statements the string literal "jello" in line (A) did not get changed into "hello" in line (B). The string literals "jello" and "hello" occupy separate places in the memory. Initially, s1 holds a reference to the former literal and then to the latter literal. After s1 changes its reference to "hello", the string literal "jello" will eventually be garbage collected if no other variable is holding a reference to it. The statement in line (C) results in the creation of a new String object whose reference is held by the variable s2.

     String s1 = "jello";                                               //(A)     s1 = "hello";                                                      //(B)     String s2 = s1 + " there";                                         //(C) 

By the same token, in lines (D) and (E) below, the object s2 is a new string object, as opposed to being an extension of the object s1:

      String s1 = "hello";                            //(D)      String s2 = s1.concat( "there" );               //(E) 

The invocation of the concat method in line (E) returns a new string that is a concatenation of the string on which the method is invoked and the argument string.

[21]The reader may wish to read the rest of this subsection after we discuss the different primitive types in Java in Chapter 6.

[22]Java supports the following character encodings that we will discuss further in Chapter 6:

US-ASCII (this is the seven-bit ASCII)

ISO-8859-1 (ISO-Latin-1)

UTF-8 (8-bit Unicode Transformation Format)

UTF-16BE (16-bit Unicode in big-endian byte order)

UTF-16LE (16-bit Unicode in little-endian byte order)

UTF-16 (16-bit Unicode in which the byte order is specified by a mandatory initial byte-order mark)

[23]For the characters represented in the ASCII code, these comparisons essentially boil down to ASCII-code based comparisons, as was the case earlier with C++ strings. We will have more to say about the Unicode in Chapter 6.

[24]An example of how to set up a Comparator object for a sorting function is shown in Chapter 5. Comparator is defined as an interface in the java.util package. Interfaces, presented briefly in Chapter 3, are discussed more fully in Chapter 15.




Programming With Objects[c] A Comparative Presentation of Object-Oriented Programming With C++ and Java
Programming with Objects: A Comparative Presentation of Object Oriented Programming with C++ and Java
ISBN: 0471268526
EAN: 2147483647
Year: 2005
Pages: 273
Authors: Avinash Kak

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net