Array and String Problems


Many string problems involve operations that require converting the string to an array and back for efficient processing of the string.

Find the First Nonrepeated Character

Important 

Write an efficient function to find the first nonrepeated character in a string. For instance, the first nonrepeated character in “total” is ‘o’ and the first nonrepeated character in “teeter” is ‘r’. Discuss the efficiency of your algorithm.

At first, this task seems almost trivial. If a character is repeated, it must appear in at least two places in the string. Therefore, you can determine whether a particular character is repeated by comparing it with all other characters in the string. It’s a simple matter to perform this search for each character in the string, starting with the first. When you find a character that has no match elsewhere in the string you’ve found the first nonrepeated character.

What’s the time order of this solution? If the string is n characters long, then in the worst case you’ll make almost n comparisons for each of the n characters. That gives worst case O(n2) for this algorithm. You are unlikely to encounter the worst case for single-word strings, but for longer strings, such as a paragraph of text, it’s likely that most characters would be repeated, and the most common case might be close to the worst case. The ease with which you arrived at this solution suggests that there are better alternatives - if the answer were truly this trivial, the interviewer wouldn’t bother you with the problem. There must be an algorithm with a worst case better than O(n2).

Tip 

The algorithm described can be improved somewhat by comparing each character with only the characters following it because it has already been compared with the characters preceding it. This would give you a total of (n – 1) + (n – 2)+ + 1 comparisons. As discussed in Chapter 3, this is still O(n2).

Why was the previous algorithm O(n2)? One factor of n came from checking each character in the string to determine whether it was nonrepeated. Because the nonrepeated character could be anywhere in the string, it seems unlikely that you’ll be able to improve efficiency here. The other factor of n was due to searching the entire string when trying to look up matches for each character. If you improve the efficiency of this search, you’ll improve the efficiency of the overall algorithm. The easiest way to improve search efficiency on a set of data is to put it in a data structure that allows more efficient searching. What data structures can be searched more efficiently than O(n)? Binary trees can be searched in O(log(n)). Arrays and hash tables both have constant time element lookup. (Hash tables have worst-case lookup of O(n) but the average case is O(1).) Begin by trying to take advantage of an array or hash table because these data structures offer the greatest potential for improvement.

You’ll want to be able to quickly determine whether a character is repeated, so you need to be able to search the data structure by character. This means you have to use the character as the index (in an array) or key (in a hash table). What values would you store in these data structures? A nonrepeated character appears only once in the string, so if you stored the number of times each character appeared, it would help you identify nonrepeating characters. You’ll have to scan the entire string before you have the final counts for each character.

Tip 

You can convert a character to an integer in order to use it as an index. If the strings are restricted to ASCII characters, this gives you 128 different possible character values. Unicode characters as used in Java or C#, however, have 65,536 possible values.

Once you’ve completed this, you could scan through all the count values in the array or hash table looking for a 1. That would find a nonrepeated character, but it wouldn’t necessarily be the first one in the original string.

Therefore, you need to search your count values in the order of the characters in the original string. This isn’t difficult - you just look up the count value for each character until you find a 1. When you find a 1, you’ve located the first nonrepeated character.

Consider whether this new algorithm is actually an improvement. You will always have to go through the entire string to build the count data structure. In the worst case, you might have to look up the count value for each character in the string to find the first nonrepeated character. Because the operations on the array or hash you’re using to hold the counts are constant time, the worst case would be two operations for each character in the string, giving 2n, which is O(n) - a major improvement over the previous attempt.

Both hash tables and arrays provide constant-time lookup; you need to decide which one you will use. On the one hand, hash tables have a higher lookup overhead than arrays. On the other hand, an array would initially contain random values that you would have to take time to set to zero, whereas a hash table initially has no values. Perhaps the greatest difference is in memory requirements. An array would need an element for every possible value of a character. This would amount to a relatively reasonable 128 elements if you were processing ASCII strings, but if you had to process Unicode strings you would need more than 65,000 elements, assuming a 16-bit Unicode encoding. In contrast, a hash table would require storage for only the characters that actually exist in the input string. Therefore, arrays are a better choice for long strings with a limited set of possible character values; hash tables are more efficient for shorter strings or when there are many possible character values.

You could implement the solution either way. Assume the code may need to process Unicode strings (a safe bet these days) and choose the hash table implementation. You might choose to write the function in Java or C#, which have built-in support for both hash tables and Unicode. In outline form, the function you’ll be writing looks like this:

 First, build the character count hash table:     For each character         If no value is stored for the character, store 1         Otherwise, increment the value Second, scan the string:     For each character         Return character if count in hash table is 1     If no characters have count 1, return null

Now implement the function. Because you don’t know what class your function would be part of, implement it as a public static function (this is equivalent to a normal C function). Remember that the Java Hashtable stores references to Object, which means you can store the reference type Integer, but not the fundamental type int:

 public static Character firstNonRepeated( String str ){     Hashtable charHash = new Hashtable();     int i, length;     Character c;     Integer intgr;     length = str.length();     // Scan str, building hash table     for (i = 0; i < length; i++) {         c = new Character(str.charAt(i));         intgr = (Integer) charHash.get(c);         if (intgr == null) {             charHash.put(c, new Integer(1));         } else {             // Increment count corresponding to c             charHash.put(c, new Integer(intgr.intValue() + 1));         }     }     // Search hashtable in order of str     for (i = 0; i < length; i++) {         c = new Character(str.charAt(i));         if (((Integer)charHash.get(c)).intValue() == 1)             return c;     }     return null; }

The preceding code is actually quite inefficient from a memory standpoint, however. It allocates a lot of objects unnecessarily. The emphasis is not actually on how many times a character is repeated. You just want to know whether it occurs zero times, one time, or more than one time. Instead of storing integers in the hash table, why not just reserve two Object values for use as your “one time” and “more than one time” flags (with the null object meaning “zero times,” of course) and store those in the hash table. Here’s the simplified version:

 public static Character firstNonRepeated( String str ){     Hashtable charHash = new Hashtable();     int i, length;     Character c;     Object seenOnce = new Object();     Object seenTwice = new Object();     length = str.length();     // Scan str, building hash table     for( i = 0; i < length; i++ ){         c = new Character(str.charAt(i));         Object o = charHash.get(c);         if( o == null ){             charHash.put( c, seenOnce );         } else if( o == seenOnce ){             // Increment count corresponding to c             charHash.put( c, seenTwice );         }     }     // Search hashtable in order of str     for( i = 0; i < length; i++ ){         c = new Character(str.charAt(i));         if( charHash.get(c) == seenOnce ){             return c;         }      }     return null; }

A (significantly) further speedup could be achieved by implementing a faster char to Character mapping, possibly using an array to cache the mappings, or at least the most frequent mappings (such as for ASCII characters). Or use a hash table implementation that could directly store character char values.

Remove Specified Characters

Important 

Write an efficient function in C# that deletes characters from a string. Use the prototype

 string removeChars( string str, string remove );

where any character existing in remove must be deleted from str. For example, given a str of “Battle of the Vowels: Hawaii vs. Grozny” and a remove of “aeiou”, the function should transform str to “Bttl f th Vwls: Hw vs. Grzny”. Justify any design decisions you make and discuss the efficiency of your solution.

Because this problem involves string manipulation and the string class is immutable, you’ll need to either convert the string to a character array or use a StringBuilder for your manipulations. Using an array will save you a lot of overhead, so that’s what you should use.

This problem breaks down into two separate tasks. For each character in str, you must determine whether it should be deleted. Then, if appropriate, you must delete the character. The second task, deletion, is discussed first.

Your initial task is to delete an element from an array. An array is a contiguous block of memory, so you can’t simply remove an element from the middle as you might with a linked list. Instead, you’ll have to rearrange the data in the array so it remains a contiguous sequence of characters after the deletion. For example, if you want to delete “c” from the string “abcd” you could either shift “a” and “b” forward one position (toward the end) or shift “d” back one position (toward the beginning). Either approach leaves you with the characters “abd” in contiguous elements of the array.

In addition to shifting the data, you need to decrease the size of the string by one character. If you shift characters before the deletion forward, you need to eliminate the first element; if you shift the characters after the deletion backward you need to eliminate the last element. Because you’re already keeping track of the string’s length (strings are not null-terminated as they are in C), shifting characters backward and decrementing the string length is the cleanest choice.

How would the proposed algorithm fare in the worst-case scenario in which you need to delete all the characters in str? For each deletion, you would shift all the remaining characters back one position. If str were n characters long, you would move the last character n – 1 times, the next to last n – 2 times, and so on, giving worst-case O(n2) for the deletion. Moving the same characters many times seems extremely inefficient. How might you avoid this?

Tip 

If you start at the end of the string and work back toward the beginning, it’s somewhat more efficient but still O(n2) in the worst case.

What if you allocated a temporary string buffer and built your modified string there instead of in place? Then you could simply copy the characters you need to keep into the temporary string, skipping the characters you want to delete. When you finish building the modified string, you can copy it from the temporary buffer back into str. This way, you move each character at most twice, giving O(n) deletion. However, you’ve incurred the memory overhead of a temporary buffer the same size as the original string, and the time overhead of copying the modified string back over the original string. Is there any way you could avoid these penalties while retaining your O(n) algorithm?

To implement the O(n) algorithm just described, you need to track a source position for the read location in the original string and a destination position for the write position in the temporary buffer. These positions both start at zero. The source position is incremented every time you read, and the destination position is incremented every time you write. In other words, when you copy a character you increment both positions, but when you delete a character you increment only the source position. This means the source position will always be the same as or ahead of the destination position. Once you read a character from the original string (that is, the source position has advanced past it), you no longer need that character - in fact, you’re just going to copy the modified string over it. Because the destination position in the original string is always a character you don’t need anymore, you can write directly into the original string, eliminating the temporary buffer entirely. This is still an O(n) algorithm, but without the memory and time overhead of the earlier version.

Now that you know how to delete characters, consider the task of deciding whether to delete a particular character. The easiest way to do this is to compare the character to each character in remove and delete it if it matches any of them. How efficient is this? If str is n characters long and remove is m characters long, then in the worst case you make m comparisons for each of n characters, so the algorithm is O(nm). You can’t avoid checking each of the n characters in str, but perhaps you can make the lookup that determines whether a given character is in remove better than O(m).

If you’ve already read the solution to the section “Find the First Nonrepeated Character,” this should sound very familiar. Just as you did in that problem, you can use remove to build an array or hash table that has constant time lookup, thus giving an O(n) solution. The trade-offs between hashes and arrays have been discussed. In this case, an array is most appropriate when str and remove are long and characters have relatively few possible values (for example, ASCII strings). A hash table may be a better choice when str and remove are short or characters have many possible values (for example, Unicode strings). This time, the assumption is that you’re processing long ASCII strings and using an array instead of a hash table.

Tip 

Why build an array? Isn’t remove already an array? Yes, it is, but it is an array of characters indexed by an arbitrary (that is, meaningless for this problem) position, requiring you to search through each element. The array that is referred to here would be an array of Boolean values indexed by all the possible values for a char. This enables you to determine whether a character is in remove by checking a single element.

Your function will have three parts:

  1. Set all the elements in your lookup array to false.

  2. Iterate through each character in remove, setting the corresponding value in the lookup array to true.

  3. Iterate through str with a source and destination index, copying each character only if its corresponding value in the lookup array is false.

Now that you’ve combined both subtasks into a single algorithm, analyze the overall efficiency for str of length n and remove of length m. Because the size of a character is fixed for a given platform, zeroing the array is constant time. You perform a constant time assignment for each character in remove, so building the lookup array is O(m). Finally, you do at most one constant time lookup and one constant time copy for each character in str, giving O(n) for this stage. Summing these parts yields O(n + m), so the algorithm has linear running time.

Having justified and analyzed your solution, you’re ready to code it:

 stri ng removeChars( string str, string remove ){     char[] s = str.toCharArray();     char[] r = remove.toCharArray();     bool[] flags = new bool[128]; // assumes ASCII!     int    len = s.Length;     int    src, dst;     // Set flags for characters to be removed     for( src = 0; src < len; ++src ){         flags[r[src]] = true;     }     src = 0;     dst = 0;     // Now loop through all the characters,     // copying only if they aren't flagged     while( src < len ){         if( !flags[ (int)s[src] ] ){             s[dst++] = s[src];         }         ++src;     }     return new string( s, 0, dst ); }

Reverse Words

Important 

Write a function that reverses the order of the words in a string. For example, your function should transform the string “Do or do not, there is no try.” to “try. no is there not, do or Do”. Assume that all words are space delimited and treat punctuation the same as letters.

Just for variety, solve this problem in C, not Java or C#, and assume that you’re only dealing with ASCII characters that can be safely stored in byte arrays.

You probably already have a pretty good idea how you’re going to start this problem. Because you need to operate on words, you have to be able to recognize where words start and end. You can do this with a simple token scanner that iterates through each character of the string. Based on the definition given in the problem statement, your scanner will differentiate between nonword characters - namely, the space character - and word characters, which for this problem are all characters except space. A word begins, not surprisingly, with a word character and ends at the next nonword character or the end of the string.

The most obvious approach is to use your scanner to identify words, write these words into a temporary buffer, and then copy the buffer back over the original string. To reverse the order of the words, you will have to either scan the string backward to identify the words in reverse order or write the words into the buffer in reverse order (starting at the end of the buffer). It doesn’t matter which method you choose; the following discussion identifies the words in reverse order.

As always, consider the mechanics of how this works before you begin coding. First, you need to allocate a temporary buffer of the appropriate size. Next, you’ll enter the scanning loop, starting with the last character of the string. When you find a nonword character, you can write it directly to the buffer. When you find a word character, however, you can’t write it immediately to the temporary buffer. Because you’re scanning the string in reverse, the first word character you encounter is the last character of the word, so if you were to copy the characters in the order you find them, you’d write the characters within each word backward. Instead, you need to keep scanning until you find the first character of the word and then copy each character of the word in the correct, nonreversed order. When you’re copying the characters of a word, you need to identify the end of the word so that you know when to stop. You could do this by checking whether each character is a word character, but because you already know the position of the last character in the word, a better solution is to continue copying until you reach that position.

Tip 

You may think you could avoid this complication by scanning the string forward and writing the words in reverse. However, you then have to solve a similar, related problem of calculating the start position of each word when writing to the temporary buffer.

An example may help to clarify this. Suppose you are given the string “piglet quantum.” The first word character you encounter is ‘m’. If you copy the characters as you found them, you end up with the string “mutnauq telgip”, which is not nearly as good a name for a techno group as the string you were supposed to produce, “quantum piglet”. To get “quantum piglet” from “piglet quantum” you need to scan until you get to ‘q’ and then copy the letters in the word in the forward direction until you get back to ‘m’ at position 13. Next, copy the space character immediately because it’s a nonword character. Then, just as for “quantum”, you would recognize the character ‘t’ as a word character, store position 5 as the end of the word, scan backward to ‘p’, and finally write the characters of “piglet” until you got to position 5.

Finally, after you scan and copy the whole string, write a null character to terminate the string in the temporary buffer and call strcpy to copy the buffer back over the original string. Then you can deallocate the temporary buffer and return from the function. This process is illustrated graphically in Figure 6-1.

image from book
Figure 6-1

It’s obviously important that your scanner stop when it gets to the first character of the string. Although this sounds simple, it can be easy to forget to check that the read position is still in the string, especially when the read position is changed at more than one place in your code. In this function, you move the read position in the main token scanning loop to get to the next token and in the word scanning loop to get to the next character of the word. Make sure neither loop runs past the beginning of the string.

Programmed in C, the algorithm described so far looks like the following:

 bool reverseWords( char str[] ){     char *buffer;     int tokenReadPos, wordReadPos, wordEnd, writePos = 0;     /* Position of the last character is length - 1 */     tokenReadPos = strlen(str) - 1;     buffer = (char *) malloc(tokenReadPos + 2);     if( !buffer )         return false; /* reverseWords failed */     while( tokenReadPos >= 0 ){         if( str[tokenReadPos] == ' ' ){ /* Non-word characters */             /* Write character */             buffer[writePos++] = str[tokenReadPos--];         } else {  /* Word characters */             /* Store position of end of word */             wordEnd = tokenReadPos;             /* Scan to next non-word character */             while( tokenReadPos >= 0 && str[tokenReadPos] != ' ' )                 tokenReadPos--;             /* tokenReadPos went past the start of the word */             wordReadPos = tokenReadPos + 1;            /* Copy the characters of the word */            while( wordReadPos <= wordEnd ){                buffer[writePos++] = str[wordReadPos++];            }         }     }     /* null terminate buffer and copy over str */     buffer[writePos] = '\0';     strcpy(str, buffer);     free(buffer);     return true; /* ReverseWords successful */ }

The preceding token scanner-based implementation is the general-case solution for this type of problem. It is reasonably efficient, and its functionality could easily be extended. It is important that you are able to implement this type of solution, but the solution is not perfect. All the scanning backward, storing positions, and copying forward is somewhat lacking in algorithmic elegance. The need for a temporary buffer is also less than desirable.

Often, interview problems have obvious general solutions and less-obvious special-case solutions. The special-case solution may be less extensible than a general solution, but more efficient or elegant. Reversing the words of a string is such a problem. You have seen the general solution, but a special-case solution also exists. In an interview, you might have been steered away from the general solution before you got to coding it. The general solution is followed through to code because token scanning and string scanning are important techniques to illustrate.

One way to improve an algorithm is to focus on a particular, concrete deficiency and try to remedy that. Because elegance, or lack thereof, is hard to quantify, you might try to eliminate the need for a temporary buffer from your algorithm. You can probably see that this is going to require a significantly different algorithm. You can’t simply alter the preceding approach to write to the same string it reads from - by the time you get halfway through you will have overwritten the rest of the data you need to read.

Rather than focus on what you can’t do without a buffer, you should turn your attention to what you can do. It is possible to reverse an entire string in place by exchanging characters. Try an example to see whether this might be helpful: “in search of algorithmic elegance” would become “ecnagele cimhtirogla fo hcraes ni.” Look at that! The words are in exactly the order you need them, but the characters in the words are backward. All you have to do is reverse each word in the reversed string. You can do that by locating the beginning and end of each word using a scanner similar to the one used in the preceding implementation and calling a reverse function on each word substring.

Now you just have to design an in-place reverse string function. The only trick is to remember that there’s no one-statement method of exchanging two values in C - you have to use a temporary variable and three assignments. Your reverse string function should take a string, a start index, and an end index as arguments. Begin by exchanging the character at the start index with the character at the end index, and then increment the start index and decrement the end index. Continue like this until the start and end index meet in the middle (in a string with odd length) or end is less than start (in a string with even length) - put more succinctly, continue while end is greater than start.

In C, these functions would look like the following:

 void reverseWords( char str[] ){     int start = 0, end = 0, length;     length = strlen(str);     /* Reverse entire string */     reverseString(str, start, length - 1);     while( end < length ){        if( str[end] != ' ' ){ /* Skip non-word characters */            /* Save position of beginning of word */            start = end;            /* Scan to next non-word character */            while( end < length && str[end] != ' ' )                end++;             /* Back up to end of word */             end--;             /* Reverse word */             reverseString( str, start, end );         }         end++; /* Advance to next token */     } } void reverseString( char str[], int start, int end ){     char temp;     while( end > start ){         /* Exchange characters */         temp = str[start];         str[start] = str[end];         str[end] = temp;         /* Move indices towards middle */         start++; end--;     } }

This solution does not need a temporary buffer and is considerably more elegant than the previous solution. It’s also more efficient, mostly because it doesn’t suffer from dynamic memory overhead and doesn’t need to copy a result back from a temporary buffer.

Integer/String Conversions

Important 

Write two conversion routines. The first routine converts a string to a signed integer. You may assume that the string only contains digits and the minus character (‘-’), that it is a properly formatted integer number, and that the number is within the range of an int type. The second routine converts a signed integer stored as an int back to a string.

Every language has library routines to do these conversions. For example, in C# the Convert.ToInt32() and Convert.ToString() methods are available. Java uses the Integer.parseInt() and Integer.toString() methods. You should mention to the interviewer that under normal circumstances, you know better than to duplicate functionality provided by standard libraries. This doesn’t get you off the hook - you still have to implement the functions called for by the problem.

From String to Integer

You can start with the string-to-integer routine, which is passed a valid string representation of an integer. Think about what that gives you to work with. Suppose you were given ‘137’. You would have a three-character string with the character encoding for ‘1’ at position 0, ‘3’ at position 1, and ‘7’ at position 2. Recall from grade school that the 1 represents 100 because it is in the hundreds place, the 3 represents 30 because it is in the tens place, and the 7 is just 7 because it is in the ones place. Summing these values gives the complete number: 100 + 30 + 7 = 137.

This gives you a framework for dissecting the string representation and building it back into a single integer value. You need to determine the numeric (integer) value of the digit represented by each character, multiply that value by the appropriate place value, and then sum these products.

Consider the character-to-numeric-value conversion first. What do you know about the values of digit characters? In all common character encodings the values are sequential: ‘0’ has a value one less than ‘1’, which in turn is followed by ‘2’, ‘3’, and so on (of course, if you didn’t know this, you’d have to ask the interviewer). Therefore, the value of a digit character is equal to the digit plus the value of 0 (note that ‘the value of 0’ is the nonzero code number representing the character ‘0’, not the number zero). This means you subtract the value of ‘0’ from a digit character to find the numeric value of the digit. You don’t even have to know what the value of ‘0’ is, just say -’0’, which the compiler will interpret as “subtract the value of ‘0’.”

Next you need to know what place value each digit must be multiplied by. Working through the digits left to right seems problematic because you don’t know what the place value of the first digit is until you know how long the number is. For example, the first character of “367” is identical to that of “31”, although it represents 300 in the first case and 30 in the second case. The most obvious solution is to scan the digits from right to left because the rightmost position is always the ones place, the next to rightmost is always the tens, and so on. This enables you to start at the right end of the string with a place value of 1 and work backward through the string, multiplying the place value by 10 each time you move to a new place. This method, however, requires two multiplications per iteration, one for multiplying the digit by the place value and another for increasing the place value. That seems a little inefficient.

Perhaps the alternative of working through the characters left to right was too hastily dismissed. Is there a way you could get around the problem of not knowing the place value for a digit until you’ve scanned the whole string? Returning to the example of “367”, when you encounter the first character, ‘3’, you register a value of 3. If the next character were the end of the string, the number’s value would be 3. However, you encounter ‘6’ as the next character of the string. Now the ‘3’ represents 30 and the 6 represents ‘6’. On the next iteration, you read the last character, ‘7,’ so the ‘3’ represents 300, the ‘6’ represents 60 and the ‘7’ represents 7. In summary, the value of the number you’ve scanned so far increases by a factor of 10 every time you encounter a new character. It really doesn’t matter that you don’t initially know whether the ‘3’ represents 3, 30, or 30,000 - every time you find a new digit you just multiply the value you’ve already read by 10 and add the value of the new digit. You’re no longer tracking a place value, so this algorithm saves you a multiplication on each iteration. The optimization described in this algorithm is frequently useful in computing checksums and is considered clever enough to merit a name: Horner’s Rule.

Up to this point, the discussion has only touched on positive numbers. How can you expand your strategy to include negative numbers? A negative number will have a ‘-’ character in the first position. You’ll want to skip over the ‘-’ character so you don’t interpret it as a digit. After you’ve scanned all the digits and built the number, you’ll need to change the number’s sign so that it’s negative. You can change the sign by multiplying by –1. You have to check for the ‘-’ character before you scan the digits so you know whether to skip the first character, but you can’t multiply by negative 1 until after you’ve scanned all the digits. One way around this problem is to set a flag if you find the ‘-’ character and then multiply by –1 only if the flag is set.

In summary, the algorithm is as follows:

 Start number at 0 If the first character is '-'     Set the negative flag     Start scanning with the next character For each character in the string     Multiply number by 10     Add (digit character – '0') to number Return number

Coding this in C# results in the following:

 int strToInt( string str ){     int i = 0, num = 0;     bool isNeg = false;     int len = str.Length();     if( str[0] == '-' ){         isNeg = true;         i = 1;     }     while( i < len ){         num *= 10;         num += ( str[i++] - '0' );     }     if( isNeg )         num *= -1;     return num; }

Before you declare this function finished, check it for cases that may be problematic. At minimum, you should check –1, 0, and 1, so you’ve checked a positive value, a negative value, and a value that’s neither positive nor negative. You should also check a multidigit value like 324 to ensure that the loop has no problems. The function appears to work properly for these cases, so you can move on to intToStr, its opposite.

From Integer to String

In intToStr, you will be performing the inverse of the conversion you did in strToInt. Given this, much of what you discovered in writing strToInt should be of use to you here. For example, just as you converted digits to integer values by subtracting ‘0’ from each digit, you can convert integer values back to digits by adding ‘0’ to each digit.

Before you can convert values to characters, you need to know what those values are. Consider how you might do this. Suppose you have the number 732. Looking at this number’s decimal representation on paper, it seems a simple matter to identify the digit values 7, 3, and 2. However, you must remember that the computer isn’t using a decimal representation, but rather the binary representation 1011011100.

Because you can’t select decimal digits directly from a binary number, you’ll have to calculate the value of each digit. It seems logical to try to find the digit values either left to right or right to left.

Try left to right first. Integer dividing 732 by the place value (100) gives the first digit, 7. However, now if you integer divide by the next place value (10), you get 73, not 3. It looks as if you need to subtract the hundreds value you found before moving on. Starting over with this new process gives you the following:

 732 ÷ 100 = 7 (first digit); 732 – 7 _ 100 = 32 32 ÷ 10 = 3 (second digit); 32 – 3 _ 10 = 2 2 ÷ 1 = 2 (third digit)

To implement this algorithm, you’re going to have to find the place value of the first digit and divide the place value by 10 for each new digit. This algorithm seems workable but complicated. What about working right to left?

Starting again with 732, what arithmetic operation can you perform to yield 2, the rightmost digit? 732 modulo 10 will give you 2. Now how can you get the next digit? 732 modulo 100 gives you 32. You could integer divide this by 10 to get the second digit, 3, but now you have to track two separate place values.

Tip 

Modulo gives the remainder of an integer division. In C/C++ and C-like languages such as Java and C#, the modulo operator is %.

What if you did the integer divide before the modulo? Then you’d have 732 integer divide by 10 is 73; 73 modulo 10 is 3. Repeating this for the third digit you have 73 / 10 = 7; 7 % 10 = 7. This seems like an easier solution - you don’t even have to track place values, you just divide and modulo until there’s nothing left.

The major downside of this approach is that you find the digits in reverse order. Because you don’t know how many there will be until you’ve found them all, you don’t know where in the string to begin writing. You could run through the calculations twice - once to find the number of digits so you know where to start writing them and again to actually write the digits - but this seems wasteful. Perhaps a better solution is to write the digits out backward as you discover them and then reverse them into the proper order when you’re done. Because the largest possible value of an integer yields a relatively short string, you could write the digits into a temporary buffer and then reverse them into the final string.

Again, negative numbers have been ignored so far. Unfortunately, the modulo of a negative number is not handled consistently across different languages. The easiest way around this problem is to avoid the problem entirely. In strToInt, you treated the number as if it were positive and then made an adjustment at the end if it were, in fact, negative. How might you employ this type of strategy here? You could start by multiplying the number by –1 if it were negative. Then it would be positive, so treating it as a positive number wouldn’t be a problem. The only wrinkle would be that you’d need to write a ‘-’ if the number had originally been negative, but that isn’t difficult - just set a flag indicating that the number is negative when you multiply by –1.

You’ve solved all the important subproblems in intToStr - now assemble these solutions into an outline you can use to write your code.

 If number less than zero:      Multiply number by –1      Set negative flag While number not equal to 0      Add '0' to number % 10 and write this to temp buffer      Integer divide number by 10 If negative flag is set      Write '-' into next position in temp buffer Write characters in temp buffer into output string in reverse order:

Rendering this in Java might give the following:

 public static final int MAX_DIGITS = 10; String intToStr( int num ){     int i = 0;     boolean isNeg = false;     /* Buffer big enough for largest int and - sign */     char[] temp = new char[ MAX_DIGITS + 1 ];     /* Check to see if the number is negative */     if( num < 0 ){         num *= -1;         isNeg = true;     }     /* Fill buffer with digit characters in reverse order */     while( num != 0 ){         temp[i++] = (char)((num % 10) + '0');         num /= 10;     }     StringBuffer b = new StringBuffer();     if( isNeg )         b.append( '-' );     while( i > 0 ){         b.append( temp[--i] );     }     return b.toString(); }

Again, check the same potentially problematic cases you tried for strToInt (multidigit, –1, 0, and 1). Multidigit numbers, –1, and 1 cause no problems, but if num is 0 you never go through the body of the while loop. This causes the function to write an empty string instead of “0.” How can you fix this bug? You need to go through the body of the while loop at least once, so that you write a ‘0’ even if num starts at 0. You can ensure that the body of the loop is executed at least once by changing it from a while loop to a do...while loop. This fix yields the following code, which can handle converting 0 as well as positive and negative values to strings:

 public static final int MAX_DIGITS = 10; String intToStr( int num ){     int i = 0;     boolean isNeg = false;     /* Buffer big enough for largest int and - sign */     char[] temp = new char[ MAX_DIGITS + 1 ];     /* Check to see if the number is negative */     if( num < 0 ){         num *= -1;         isNeg = true;     }     /* Fill buffer with digit characters in reverse order */     do {         temp[i++] = (char)((num % 10) + '0');         num /= 10;     } while( num != 0 );     StringBuffer b = new StringBuffer();     if( isNeg )         b.append( '-' );     while( i > 0 ){         b.append( temp[--i] );     }     return b.toString(); }




Programming Interviews Exposed. Secrets to Landing Your Next Job
Programming Interviews Exposed: Secrets to Landing Your Next Job, 2nd Edition (Programmer to Programmer)
ISBN: 047012167X
EAN: 2147483647
Year: 2007
Pages: 94

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net