Many string problems involve operations that require converting the string to an array and back for efficient processing of the string.
| Important |
Write an efficient function to find the first nonrepeated character in a string. For instance, the first nonrepeated character in “total” is ‘o’ and the first nonrepeated character in “
|
At first, this task seems almost trivial. If a character is repeated, it must appear in at least two places in the string. Therefore, you can determine whether a particular character is repeated by comparing it with all other
What’s the time order of this solution? If the string is
n
characters long, then in the worst case you’ll make almost
n
comparisons for each of the
n
characters. That gives worst case O(
n
2
) for this algorithm. You are
| Tip |
The algorithm described can be improved somewhat by comparing each character with only the characters following it because it has already been compared with the characters
|
Why was the previous algorithm O(
n
2
)? One factor of
n
came from checking each character in the string to determine whether it was nonrepeated. Because the nonrepeated character could be
You’ll want to be able to quickly determine whether a character is repeated, so you need to be able to search the data structure by character. This means you have to use the character as the index (in an array) or key (in a hash table). What values would you store in these data structures? A nonrepeated character appears only once in the string, so if you stored the number of times each character appeared, it would help you identify nonrepeating characters. You’ll have to scan the entire string before you have the final counts for each character.
| Tip |
You can convert a character to an integer in order to use it as an index. If the strings are restricted to ASCII characters, this gives you 128 different possible character values. Unicode characters as used in Java or C#, however, have 65,536 possible values. |
Once you’ve completed this, you could scan through all the count values in the array or hash table looking for a 1. That would find a nonrepeated character, but it wouldn’t
Therefore, you need to search your count values in the order of the characters in the original string. This isn’t difficult - you just look up the count value for each character until you find a 1. When you find a 1, you’ve located the first nonrepeated character.
Consider whether this new algorithm is actually an improvement. You will always have to go through the entire string to build the count data structure. In the worst case, you might have to look up the count value for each character in the string to find the first nonrepeated character. Because the operations on the array or hash you’re using to hold the counts are constant time, the worst case would be two operations for each character in the string, giving 2 n, which is O( n ) - a major improvement over the previous attempt.
Both hash tables and arrays provide constant-time lookup; you need to decide which one you will use. On the one hand, hash tables have a higher lookup overhead than arrays. On the other hand, an array would initially contain random values that you would have to take time to set to zero, whereas a hash table initially has no values. Perhaps the greatest difference is in memory requirements. An array would need an element for every possible value of a character. This would amount to a relatively reasonable 128 elements if you were processing ASCII strings, but if you had to process Unicode strings you would need more than 65,000 elements,
You could implement the solution either way. Assume the code may need to process Unicode strings (a safe bet these days) and choose the hash table implementation. You might choose to write the function in Java or C#, which have built-in support for both hash tables and Unicode. In outline form, the function you’ll be writing looks like this:
First, build the character count hash table: For each character If no value is stored for the character, store 1 Otherwise, increment the value Second, scan the string: For each character Return character if count in hash table is 1 If no characters have count 1, return null
Now implement the function. Because you don’t know what class your function would be part of, implement it as a public static function (this is equivalent to a normal C function). Remember that the Java Hashtable stores references to Object , which means you can store the reference type Integer , but not the fundamental type int :
public static Character firstNonRepeated( String str ){ Hashtable charHash = new Hashtable(); int i, length; Character c; Integer intgr; length = str.length(); // Scan str, building hash table for (i = 0; i < length; i++) { c = new Character(str.charAt(i)); intgr = (Integer) charHash.get(c); if (intgr == null) { charHash.put(c, new Integer(1)); } else { // Increment count corresponding to c charHash.put(c, new Integer(intgr.intValue() + 1)); } } // Search hashtable in order of str for (i = 0; i < length; i++) { c = new Character(str.charAt(i)); if (((Integer)charHash.get(c)).intValue() == 1) return c; } return null; }
The preceding code is actually quite inefficient from a memory standpoint, however. It
public static Character firstNonRepeated( String str ){ Hashtable charHash = new Hashtable(); int i, length; Character c; Object seenOnce = new Object(); Object seenTwice = new Object(); length = str.length(); // Scan str, building hash table for( i = 0; i < length; i++ ){ c = new Character(str.charAt(i)); Object o = charHash.get(c); if( o == null ){ charHash.put( c, seenOnce ); } else if( o == seenOnce ){ // Increment count corresponding to c charHash.put( c, seenTwice ); } } // Search hashtable in order of str for( i = 0; i < length; i++ ){ c = new Character(str.charAt(i)); if( charHash.get(c) == seenOnce ){ return c; } } return null; }
A (significantly) further
| Important |
Write an efficient function in C# that deletes characters from a string. Use the prototype
string removeChars( string str, string remove ); where any character existing in remove must be deleted from str . For example, given a str of “Battle of the Vowels: Hawaii vs. Grozny” and a remove of “aeiou”, the function should transform str to “Bttl f th Vwls: Hw vs. Grzny”. Justify any design decisions you make and discuss the efficiency of your solution. |
Because this problem involves string manipulation and the string class is immutable, you’ll need to either convert the string to a character array or use a StringBuilder for your manipulations. Using an array will save you a lot of overhead, so that’s what you should use.
This problem breaks down into two separate
Your initial task is to delete an element from an array. An array is a contiguous block of memory, so you can’t simply remove an element from the middle as you might with a linked list. Instead, you’ll have to rearrange the data in the array so it remains a contiguous sequence of characters after the deletion. For example, if you want to delete “c” from the string “abcd” you could either shift “a” and “b” forward one position (toward the end) or shift “d” back one position (toward the beginning). Either approach
In addition to shifting the data, you need to decrease the
How would the proposed algorithm fare in the worst-case scenario in which you need to delete all the characters in
str
? For each deletion, you would shift all the remaining characters back one position. If
str
were
n
characters long, you would move the last character
n
– 1 times, the
| Tip |
If you start at the end of the string and work back toward the beginning, it’s somewhat more efficient but still O(n 2 ) in the worst case. |
What if you allocated a temporary string buffer and built your modified string there instead of in place? Then you could simply copy the characters you need to keep into the temporary string, skipping the characters you want to delete. When you finish building the modified string, you can copy it from the temporary buffer back into str . This way, you move each character at most twice, giving O( n ) deletion. However, you’ve incurred the memory overhead of a temporary buffer the same size as the original string, and the time overhead of copying the modified string back over the original string. Is there any way you could avoid these penalties while retaining your O( n ) algorithm?
To implement the O(
n
) algorithm just described, you need to track a source position for the read location in the original string and a destination position for the write position in the temporary buffer. These positions both start at zero. The source position is incremented every time you read, and the destination position is incremented every time you write. In other words, when you copy a character you increment both
Now that you know how to delete characters, consider the task of deciding whether to delete a particular character. The easiest way to do this is to compare the character to each character in remove and delete it if it matches any of them. How efficient is this? If str is n characters long and remove is m characters long, then in the worst case you make m comparisons for each of n characters, so the algorithm is O( nm ). You can’t avoid checking each of the n characters in str , but perhaps you can make the lookup that determines whether a given character is in remove better than O( m ).
If you’ve already read the solution to the section “Find the First Nonrepeated Character,” this should sound very familiar. Just as you did in that problem, you can use remove to build an array or hash table that has constant time lookup, thus giving an O( n ) solution. The trade-offs between hashes and arrays have been discussed. In this case, an array is most appropriate when str and remove are long and characters have relatively few possible values (for example, ASCII strings). A hash table may be a better choice when str and remove are short or characters have many possible values (for example, Unicode strings). This time, the assumption is that you’re processing long ASCII strings and using an array instead of a hash table.
| Tip |
Why build an array? Isn’t remove already an array? Yes, it is, but it is an array of characters indexed by an arbitrary (that is, meaningless for this problem) position, requiring you to search through each element. The array that is referred to here would be an array of Boolean values indexed by all the possible values for a char . This enables you to determine whether a character is in remove by checking a single element. |
Your function will have three
Set all the elements in your lookup array to false .
Iterate through each character in remove , setting the corresponding value in the lookup array to true .
Iterate through str with a source and destination index, copying each character only if its corresponding value in the lookup array is false .
Now that you’ve combined both
Having justified and
stri ng removeChars( string str, string remove ){ char[] s = str.toCharArray(); char[] r = remove.toCharArray(); bool[] flags = new bool[128]; // assumes ASCII! int len = s.Length; int src, dst; // Set flags for characters to be removed for( src = 0; src < len; ++src ){ flags[r[src]] = true; } src = 0; dst = 0; // Now loop through all the characters, // copying only if they aren't flagged while( src < len ){ if( !flags[ (int)s[src] ] ){ s[dst++] = s[src]; } ++src; } return new string( s, 0, dst ); }
| Important |
Write a function that reverses the order of the words in a string. For example, your function should transform the string “Do or do not, there is no try.” to “try. no is there not, do or Do”. Assume that all words are space delimited and treat punctuation the same as
|
Just for variety, solve this problem in C, not Java or C#, and assume that you’re only dealing with ASCII characters that can be safely stored in byte arrays.
You probably already have a pretty good idea how you’re going to start this problem. Because you need to
The most obvious approach is to use your scanner to identify words, write these words into a temporary buffer, and then copy the buffer back over the original string. To reverse the order of the words, you will have to either scan the string backward to identify the words in reverse order or write the words into the buffer in reverse order (starting at the end of the buffer). It doesn’t matter which method you choose; the following discussion identifies the words in reverse order.
As always, consider the mechanics of how this works before you begin coding. First, you need to allocate a temporary buffer of the appropriate size. Next, you’ll enter the scanning loop, starting with the last character of the string. When you find a nonword character, you can write it directly to the buffer. When you find a word character, however, you can’t write it immediately to the temporary buffer. Because you’re scanning the string in reverse, the first word character you encounter is the last character of the word, so if you were to copy the characters in the order you find them, you’d write the characters within each word backward. Instead, you need to keep scanning until you find the first character of the word and then copy each character of the word in the correct, nonreversed order. When you’re copying the characters of a word, you need to identify the end of the word so that you know when to stop. You could do this by checking whether each character is a word character, but because you already know the position of the last character in the word, a better solution is to continue copying until you reach that position.
| Tip |
You may think you could avoid this complication by scanning the string forward and writing the words in reverse. However, you then have to solve a similar,
|
An example may help to clarify this. Suppose you are given the string “piglet quantum.” The first word character you encounter is ‘m’. If you copy the characters as you found them, you end up with the string “mutnauq telgip”, which is not nearly as good a
Finally, after you scan and copy the whole string, write a null character to terminate the string in the temporary buffer and call
Figure 6-1
It’s obviously important that your scanner stop when it gets to the first character of the string. Although this sounds simple, it can be easy to forget to check that the read position is still in the string,
Programmed in C, the algorithm described so far looks like the following:
bool reverseWords( char str[] ){ char *buffer; int tokenReadPos, wordReadPos, wordEnd, writePos = 0; /* Position of the last character is length - 1 */ tokenReadPos = strlen(str) - 1; buffer = (char *) malloc(tokenReadPos + 2); if( !buffer ) return false; /* reverseWords failed */ while( tokenReadPos >= 0 ){ if( str[tokenReadPos] == ' ' ){ /* Non-word characters */ /* Write character */ buffer[writePos++] = str[tokenReadPos--]; } else { /* Word characters */ /* Store position of end of word */ wordEnd = tokenReadPos; /* Scan to next non-word character */ while( tokenReadPos >= 0 && str[tokenReadPos] != ' ' ) tokenReadPos--; /* tokenReadPos went past the start of the word */ wordReadPos = tokenReadPos + 1; /* Copy the characters of the word */ while( wordReadPos <= wordEnd ){ buffer[writePos++] = str[wordReadPos++]; } } } /* null terminate buffer and copy over str */ buffer[writePos] = '\0'; strcpy(str, buffer); free(buffer); return true; /* ReverseWords successful */ }
The preceding token scanner-based implementation is the general-case solution for this type of problem. It is reasonably efficient, and its functionality could easily be extended. It is important that you are able to implement this type of solution, but the solution is not perfect. All the scanning backward, storing positions, and copying forward is somewhat lacking in algorithmic
Often, interview problems have obvious general solutions and less-obvious special-case solutions. The special-case solution may be less extensible than a general solution, but more efficient or elegant. Reversing the words of a string is such a problem. You have seen the general solution, but a special-case solution also exists. In an interview, you might have been steered away from the general solution before you got to coding it. The general solution is followed through to code because token scanning and string scanning are important techniques to
One way to improve an algorithm is to focus on a particular, concrete deficiency and try to remedy that. Because elegance, or lack thereof, is hard to quantify, you might try to eliminate the need for a temporary buffer from your algorithm. You can probably see that this is going to require a significantly different algorithm. You can’t simply alter the preceding approach to write to the same string it reads from - by the time you get halfway through you will have overwritten the rest of the data you need to read.
Rather than focus on what you can’t do without a buffer, you should
Now you just have to design an in-place reverse string function. The only trick is to remember that there’s no one-statement method of exchanging two values in C - you have to use a temporary variable and three assignments. Your reverse string function should take a string, a start index, and an end index as arguments. Begin by exchanging the character at the start index with the character at the end index, and then increment the start index and decrement the end index. Continue like this until the start and end index meet in the middle (in a string with odd length) or end is less than start (in a string with even length) - put more succinctly, continue while end is greater than start.
In C, these functions would look like the following:
void reverseWords( char str[] ){ int start = 0, end = 0, length; length = strlen(str); /* Reverse entire string */ reverseString(str, start, length - 1); while( end < length ){ if( str[end] != ' ' ){ /* Skip non-word characters */ /* Save position of beginning of word */ start = end; /* Scan to next non-word character */ while( end < length && str[end] != ' ' ) end++; /* Back up to end of word */ end--; /* Reverse word */ reverseString( str, start, end ); } end++; /* Advance to next token */ } } void reverseString( char str[], int start, int end ){ char temp; while( end > start ){ /* Exchange characters */ temp = str[start]; str[start] = str[end]; str[end] = temp; /* Move indices towards middle */ start++; end--; } }
This solution does not need a temporary buffer and is considerably more elegant than the previous solution. It’s also more efficient, mostly because it doesn’t suffer from dynamic memory overhead and doesn’t need to copy a result back from a temporary buffer.
| Important |
Write two conversion routines. The first routine converts a string to a signed integer. You may assume that the string only contains digits and the minus character (‘-’), that it is a properly formatted integer number, and that the number is within the range of an int type. The second routine converts a signed integer stored as an int back to a string. |
Every language has library routines to do these conversions. For example, in C# the
Convert.ToInt32()
and
Convert.ToString()
You can start with the string-to-integer routine, which is passed a valid string representation of an integer. Think about what that gives you to work with. Suppose you were given ‘137’. You would have a three-character string with the character encoding for ‘1’ at position 0, ‘3’ at position 1, and ‘7’ at position 2. Recall from grade school that the 1 represents 100 because it is in the hundreds place, the 3 represents 30 because it is in the tens place, and the 7 is just 7 because it is in the ones place. Summing these values gives the complete number: 100 + 30 + 7 = 137.
This gives you a framework for dissecting the string representation and building it back into a single integer value. You need to determine the numeric (integer) value of the digit represented by each character, multiply that value by the appropriate place value, and then sum these products.
Consider the character-to-numeric-value conversion first. What do you know about the values of digit characters? In all common character encodings the values are sequential: ‘0’ has a value one less than ‘1’, which in turn is followed by ‘2’, ‘3’, and so on (of course, if you didn’t know this, you’d have to ask the interviewer). Therefore, the value of a digit character is equal to the digit plus the value of 0 (note that ‘the value of 0’ is the nonzero code number representing the character ‘0’, not the number zero). This means you subtract the value of ‘0’ from a digit character to find the numeric value of the digit. You don’t even have to know what the value of ‘0’ is, just say -’0’ , which the compiler will interpret as “subtract the value of ‘0’.”
Next you need to know what place value each digit must be multiplied by. Working through the digits left to right seems
Perhaps the alternative of working through the characters left to right was too hastily dismissed. Is there a way you could get around the problem of not knowing the place value for a digit until you’ve scanned the whole string? Returning to the example of “367”, when you encounter the first character, ‘3’, you register a value of 3. If the next character were the end of the string, the number’s value would be 3. However, you encounter ‘6’ as the next character of the string. Now the ‘3’ represents 30 and the 6 represents ‘6’. On the next iteration, you read the last character, ‘7,’ so the ‘3’ represents 300, the ‘6’ represents 60 and the ‘7’ represents 7. In summary, the value of the number you’ve scanned so far
Up to this point, the discussion has only touched on positive numbers. How can you expand your strategy to include negative
In summary, the algorithm is as
Start number at 0 If the first character is '-' Set the negative flag Start scanning with the next character For each character in the string Multiply number by 10 Add (digit character – '0') to number Return number
Coding this in C# results in the following:
int strToInt( string str ){ int i = 0, num = 0; bool isNeg = false; int len = str.Length(); if( str[0] == '-' ){ isNeg = true; i = 1; } while( i < len ){ num *= 10; num += ( str[i++] - '0' ); } if( isNeg ) num *= -1; return num; }
Before you declare this function finished, check it for cases that may be problematic. At minimum, you should check –1, 0, and 1, so you’ve checked a positive value, a negative value, and a value that’s neither positive nor negative. You should also check a multidigit value like 324 to ensure that the loop has no problems. The function appears to work properly for these cases, so you can move on to
intToStr
, its
In
intToStr
, you will be performing the inverse of the conversion you did in
strToInt
. Given this, much of what you
Before you can convert values to characters, you need to know what those values are. Consider how you might do this. Suppose you have the number 732. Looking at this number’s decimal representation on paper, it seems a simple matter to identify the digit values 7, 3, and 2. However, you must remember that the computer isn’t using a decimal representation, but rather the binary representation 1011011100.
Because you can’t select decimal digits directly from a binary number, you’ll have to calculate the value of each digit. It seems logical to try to find the digit values either left to right or right to left.
Try left to right first. Integer dividing 732 by the place value (100) gives the first digit, 7. However, now if you integer divide by the next place value (10), you get 73, not 3. It looks as if you need to subtract the hundreds value you found before moving on. Starting over with this new process gives you the following:
732 ÷ 100 = 7 (first digit); 732 – 7 _ 100 = 32 32 ÷ 10 = 3 (second digit); 32 – 3 _ 10 = 2 2 ÷ 1 = 2 (third digit)
To implement this algorithm, you’re going to have to find the place value of the first digit and divide the place value by 10 for each new digit. This algorithm seems
Starting again with 732, what arithmetic operation can you perform to yield 2, the rightmost digit? 732 modulo 10 will give you 2. Now how can you get the next digit? 732 modulo 100 gives you 32. You could integer divide this by 10 to get the second digit, 3, but now you have to track two separate place values.
| Tip |
Modulo gives the remainder of an integer division. In C/C++ and C-like languages such as Java and C#, the modulo operator is %. |
What if you did the integer divide before the modulo? Then you’d have 732 integer divide by 10 is 73; 73 modulo 10 is 3. Repeating this for the third digit you have 73 / 10 = 7; 7 % 10 = 7. This seems like an easier solution - you don’t even have to track place values, you just divide and modulo until there’s nothing left.
The major downside of this approach is that you find the digits in reverse order. Because you don’t know how many there will be until you’ve found them all, you don’t know where in the string to begin writing. You could run through the calculations twice - once to find the number of digits so you know where to start writing them and again to actually write the digits - but this seems
Again, negative numbers have been ignored so far. Unfortunately, the modulo of a negative number is not handled consistently across different languages. The easiest way around this problem is to avoid the problem entirely. In
strToInt
, you treated the number as if it were positive and then made an adjustment at the end if it were, in fact, negative. How might you
You’ve
If number less than zero: Multiply number by –1 Set negative flag While number not equal to 0 Add '0' to number % 10 and write this to temp buffer Integer divide number by 10 If negative flag is set Write '-' into next position in temp buffer Write characters in temp buffer into output string in reverse order:
Rendering this in Java might give the following:
public static final int MAX_DIGITS = 10; String intToStr( int num ){ int i = 0; boolean isNeg = false; /* Buffer big enough for largest int and - sign */ char[] temp = new char[ MAX_DIGITS + 1 ]; /* Check to see if the number is negative */ if( num < 0 ){ num *= -1; isNeg = true; } /* Fill buffer with digit characters in reverse order */ while( num != 0 ){ temp[i++] = (char)((num % 10) + '0'); num /= 10; } StringBuffer b = new StringBuffer(); if( isNeg ) b.append( '-' ); while( i > 0 ){ b.append( temp[--i] ); } return b.toString(); }
Again, check the same
public static final int MAX_DIGITS = 10; String intToStr( int num ){ int i = 0; boolean isNeg = false; /* Buffer big enough for largest int and - sign */ char[] temp = new char[ MAX_DIGITS + 1 ]; /* Check to see if the number is negative */ if( num < 0 ){ num *= -1; isNeg = true; } /* Fill buffer with digit characters in reverse order */
do {
temp[i++] = (char)((num % 10) + '0'); num /= 10;
} while( num != 0 );
StringBuffer b = new StringBuffer(); if( isNeg ) b.append( '-' ); while( i > 0 ){ b.append( temp[--i] ); } return b.toString(); }