Strings


Strings are sequences of characters. However, what constitutes a character depends greatly on the language being used and the settings of the operating system on which the application runs. Gone are the days when a string was just a set of bytes, with each byte representing a character from the ASCII character encoding. Multibyte encodings (either fixed length or variable length) are needed to accurately store text in today’s global economy.

With that said, most interview problems will avoid variable-length character encodings to simplify matters. The individual characters will be referred to as characters or bytes depending mostly on the language being used: Languages such as Java and C# have a built-in Unicode character type, whereas C/C++ does not. In general, most programming examples involving strings will use the natural character type for the language in question.

If you have specific experience with internationalization and localization, don’t hesitate to point this out during the interview. You can explain what would have to be done differently to handle a variable-length character encoding, for example, even as you code the solution to only work (as the interviewer requested) with a single-byte character encoding such as ASCII.

No matter how they’re encoded, most languages store strings internally as arrays, even if they differ greatly in how they treat arrays and strings. As before, we’ll look at each language separately.

C

A C string is nothing more than a char array. Just as C doesn’t track the size of arrays, it doesn’t track the size of strings. Instead, the end of the string is marked with a null character, represented in the language as ‘\0’. (The null character is sometimes referred to as NULLCHAR. Using NULL is incorrect because NULL is specifically reserved for use with pointers.) The character array must have room for the terminator: A 10-character string requires an 11-character array. This scheme makes finding the length of the string an O(n) operation instead of O(1) as you might expect: strlen() (the library function that returns the length of a string) must scan through the string until it finds the end.

For the same reason that you can’t assign one C array to another, you cannot copy C strings using the = operator. Instead, you generally use the strcpy() function.

It is often convenient to read or alter a string by addressing individual characters of the array. If you change the length of a string in this manner, make sure you write a null character after the new last character in the string, and that the character array you are working in is large enough to accommodate the new string and terminator. It’s easy to truncate a C string: Just place a null character immediately after the new end of the string.

C++

C-style strings can be used with C++, but the preferred approach is to use the string class from the standard libraries whenever possible. The string class is a specialization of the basic_string template class using a char data type. If you want to create strings that store Unicode values (as in Java or C#), you can define a new variant of basic_string based on the wchar_t (wide character) type.

The string class is very well integrated into the C++ standard libraries. You can use them with streams and iterators. In addition, C++ strings are not null-terminated, so they can store null bytes, unlike C strings. Multiple copies of the same string share the same underlying buffer whenever possible, but because a string is mutable (the string can be changed), new buffers are created as necessary. For compatibility with older code, it is possible to derive a C-style string from a C++ string, and vice versa.

Java

Java strings are objects of the String class, a special system class. Although strings can be readily converted to and from character and byte arrays - internally, the class holds the string using a char array - they are a distinct type. Java’s char type holds 16-bit Unicode characters. The individual characters of a string cannot be accessed directly, but only through methods on the String class. String liter-als in program source code are automatically converted into String instances by the Java compiler. As in C++, the underlying array is shared between instances whenever possible. The length of a string can be retrieved via the length() method. Various methods are available to search and return substrings, extract individual characters, trim whitespace characters, and so on.

Java strings are immutable: They cannot be changed once the string has been constructed. Methods that appear to modify a string actually return a new string instance. The StringBuffer and StringBuilder classes (the former is in all versions of Java and is thread safe, the latter is new starting with Java 5 and is not thread safe) create mutable strings that can be converted to a String instance as necessary. The compiler implicitly uses StringBuffer instances when two String instances are concatenated using the + operator, which is convenient but can lead to inefficient code if you’re not careful. For example, the code

 String s = **; for( int i = 0; i < 10; ++i ){ s = s + i + " "; }

is equivalent to

 String s = **; for( int i = 0; i < 10; ++i ){     StringBuffer t = new StringBuffer();     t.append( s );     t.append( i );     t.append( " " );     s = t.toString(); }

which would be much more efficiently coded as:

 StringBuffer b = new StringBuffer(); for( int i = 0; i < 10; ++i ){     b.append( i );     b.append( ' ' ); } String s = b.toString();

Watch for this case whenever you’re manipulating strings within a loop.

C#

C# strings are almost identical to Java strings. They are instances of the String class (the alternate form string is an alias), which is very similar to Java’s String class. C# strings are also immutable, just like Java strings. Mutable strings are created with the StringBuilder class, and similar caveats apply when strings are being concatenated.

JavaScript

Although JavaScript defines a String object, many developers are unaware of its existence due to JavaScript’s implicit typing. However, the usual string operations are there, as well as more advanced capabilities such as using regular expressions for string matching and replacement.




Programming Interviews Exposed. Secrets to Landing Your Next Job
Programming Interviews Exposed: Secrets to Landing Your Next Job, 2nd Edition (Programmer to Programmer)
ISBN: 047012167X
EAN: 2147483647
Year: 2007
Pages: 94

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net