Chapter 6: Arrays and Strings | Programming Interviews Exposed: Secrets to Landing Your Next Job, 2nd Edition (Programmer to Programmer)

Arrays and strings are closely related. In the abstract sense, a string is really just an array (possibly read-only) of characters. Most of the string-manipulation problems you’ll encounter are therefore based on your understanding of array data types, particularly in languages such as C and C++ in which strings and character arrays are essentially identical. Although other languages - especially the object-oriented ones such as C# and Java - consider strings and character arrays to be separate, there’s always a way to convert a string to an array and vice versa. When the two are different, however, it’s very important to understand where and why they diverge. In addition, not all array problems will involve strings, so understanding how arrays work in the abstract and how they’re implemented by the language you’re using is absolutely crucial to answering those array-focused problems.

Arrays

An array is a sequence of variables of the same type arranged contiguously in a block of memory. Because arrays play an important role in every major language used in commercial development, we assume you’re at least somewhat familiar with their syntax and usage. With that in mind, this discussion focuses on the theory and application of arrays.

Like a linked list, an array provides an essentially linear form of storage, but its properties are significantly different. In a linked list, lookup is always an O(n) operation, but array lookup is O(1) as long as you know the index of the element you want. The provision regarding the index is important - if you know only the value, lookup is still O(n) in the worst case. For example, suppose you have an array of characters. Locating the sixth character is O(1), but locating the character with value ‘w’ is O(n).

Tip

Of course, multidimensional arrays are not exactly linear, but they are implemented as linear arrays of linear arrays (of linear arrays … repeated as needed), so even multidimensional arrays are linear in each dimension.

The price for this improved lookup is significantly decreased efficiency in the insertion and deletion of data in the middle of the array. Because an array is essentially a block of contiguous memory, it’s not possible to create or eliminate storage between any two elements as it is with a linked list. Instead, you must physically move data within the array to make room for an insertion or to close the gap left by a deletion.

Arrays are not dynamic data structures: They have a finite, fixed number of elements. Memory must be allocated for every element in an array, even if only part of the array is used. Arrays are best used when you know how many elements you need to store before the program executes. When the program needs a variable amount of storage, the size of the array imposes an arbitrary limit on the amount of data that can be stored. Making the array large enough so that the program always operates below the limit doesn’t solve the problem: Either memory will be wasted or there won’t be enough memory to handle the largest data sizes possible.

Some languages do support dynamic arrays, which are arrays that can change size to efficiently store as much or as little data as necessary. (Note that in this case we’re referring to dynamic arrays as a language feature. Dynamic arrays can be simulated in code by other languages.) This discussion won’t go into the details of implementing a dynamic array, but it is important to know that most dynamic array implementations use static arrays internally. A static array cannot be resized, so dynamic arrays are resized by allocating a new array of the appropriate size, copying every element from the old array into the new array, and freeing the old array. This is an expensive operation that must be done as infrequently as possible. (Many strings are implemented as dynamic arrays of characters.)

Each language handles arrays somewhat differently, giving each language a different set of array programming pitfalls. Let’s take a look at array usage across several different languages.

C/C++

Despite the differences between C and C++, they are very similar in their treatment of arrays. In most cases, an array name is equivalent to a pointer constant to the first element of the array.^[1] This means that you can’t initialize the elements of one array with another array using a simple assignment.

Tip

Pointers and constants can be confusing concepts separately; they are often nearly incomprehensible in combination. When we say pointer constant we mean a pointer declared like char *const chrPtr that cannot be altered to point at a different place in memory, but that can be used to change the contents of the memory it points to. This is not the same as the more-commonly seen constant pointer, declared like const char *chrPtr, which can be changed to point at a different memory location but cannot be used to change the contents of a memory location. If you find this confusing, you’re certainly not the only one.

For example,

 arrayA = arrayB;  /* Compile error: arrayA is not an lvalue */

is interpreted as an attempt to make arrayA refer to the same area of memory as arrayB. If arrayA has been declared as an array, this causes a compile error because you can’t change the memory location to which arrayA refers. To copy arrayB into arrayA, you have to write a loop that does an element-by-element assignment or use a library function such as memcpy that does the copying for you (and usually much more efficiently).

In C and C++, the compiler tracks only the location of arrays, not their size. The programmer is responsible for tracking array sizes, and there is no bounds checking on array accesses - the language won’t complain if you store something in the twentieth element of a ten-element array. As you can imagine, writing outside of the bounds of an array will probably overwrite some other data structure, leading to all manner of curious and difficult-to-find bugs. Development tools are available to help programmers identify out-of-bounds array accesses and other memory-related problems in their C/C++ programs.

Java

Unlike a C array, a Java array is an object in and of itself, separate from the data type it holds. A reference to an array is therefore not interchangeable with a reference to an element of the array. As in C, arrays cannot be copied with a simple assignment: If two array references have the same type, assignment of one to the other is allowed, but it results in both symbols referring to the same array, as shown in the following example:

 byte[] arrayA = new byte[10]; byte[] arrayB = new byte[10]; arrayA = arrayB; // arrayA now refers to the same array as arrayB

Java arrays are static and the language tracks the size of each array, which can be accessed via the implicit length data member:

 if( arrayA.length <= arrayB.length ){     System.arraycopy( arrayA, 0, arrayB, 0, array.length ); }

Each access to an array index is checked against the current size of the array and an exception is thrown if the index is out of bounds. This makes array access a relatively expensive operation when compared to C/C++ arrays, especially with code like this:

 void parse( char[] arr ){     for( int i = 0; i < arr.length; ++i ){         int val;         if( arr[i] >= '0' && arr[i] <= '9' ){             val = arr[i] - '0';         } else if( arr[i] >= 'A' && arr[i] <= 'F' ){             val = arr[i] - 'A';         } else if( arr[i] >= 'a' && arr[i] <= 'f' ){             val = arr[i] - 'a';         } else {             // error         }         ..... // do something     } }

Multiple references to the same array element should be simplified by storing the value of the array ele-ment into a single variable:

 void fasterparse( char[] arr ){     for( int i = 0; i < arr.length; ++i ){         int  val;         char ch = arr[i];         if( ch >= '0' && ch <= '9' ){             val = ch - '0';         } else if( ch >= 'A' && ch <= 'F' ){             val = ch - 'A';         } else if( ch >= 'a' && ch <= 'f' ){             val = ch - 'a';         } else {             // error         }         ..... // do something     } }

When arrays are allocated, note that the elements in the array are not initialized, even if the array is declared to hold an object type. You must construct the objects yourself and assign them to the elements of the array:

 Button myButtons[] = new Button[5]; // Buttons not yet constructed for (int i = 0; i < 5; i++) {     myButtons[i] = new Button();  // Constructing Buttons } // All Buttons constructed

C#

C# arrays are similar to Java arrays, but there are some differences. The Java concept of a multidimensional array - an array of arrays such as int[2][3] - is called a jagged array in C#, and multidimensional arrays are specified using comma-separated arguments, as in int[2,3]. Arrays can also be declared to be read-only. All arrays also derive from the System.array abstract base class, which defines a number of useful methods for array manipulation.

JavaScript

Arrays in JavaScript are instances of the Array object. JavaScript arrays are completely dynamic and resize themselves automatically:

 Array cities = new Array(); // zero length array cities[0] = "New York"; cities[1] = "Los Angeles"; // now array is length 2

You can change the size of an array simply by modifying its length property:

 cities.length = 1; // drop Los Angeles... cities[ cities.length ] = "San Francisco"; // new cities[1] value

Methods on the Array object can be used to split, combine, and sort arrays.

^[1] For an excellent discussion of when this analogy breaks down, see Expert C Programming: Deep C Secrets by Peter Van Der Linden (Prentice Hall, 1994).