Native C Arrays

Native C++ Arrays

Native arrays are those provided as part of the C++ language. They’re based on the arrays that C++ inherits from C. Although native arrays are designed to be fast and efficient, there are drawbacks associated with using them, as you’ll see shortly.

This first exercise will introduce you to C++ native arrays by showing you how to create an array of value types and how to use the array.

Open Microsoft Visual Studio .NET, and open a new Visual C++ Console Application (.NET) project named Trad.

Open the source file Trad.cpp, and add the following code to the _tmain function:

int _tmain() { Console::WriteLine(S"Traditional Arrays"); // Create an array int arr[10]; // Fill the array for(int i=0; i<10; i++) arr[i] = i*2; return 0; }

The array is created by giving a type, a name, and a size enclosed in square brackets ([]). Here the array is named arr and it holds ten int values. All arrays are created using the same syntax, as shown here:

// Create an array of six doubles double arr[6]; // Create an array of two char*’s char* arr[2];

Here’s the first important point about native arrays: once you’ve created an array, you can’t resize it, so you have to know how many elements you’re going to need before you start. If you don’t know how many elements you’re going to need, you might be better off using a .NET array, discussed later in this chapter.

Note

The array size has to be known at compile time, so, for example, you can’t ask the user for a value and then use that value to specify an array dimension at run time. However, it’s common to create constants, either by using preprocessor #define declarations or by declaring const int variables, and using them to specify array dimensions.

As you can see from the loop in the preceding code, array elements are accessed using square brackets that contain the index. Here’s the second important point about native arrays: indexing starts from zero rather than one, so the valid range of indices for an array is from zero to one less than the size of the array. In other words, for a 10-element array, valid indices are [0] to [9].

Add a second loop to print out the array’s contents after filling it.
```
 // Print its contents for(int j=0; j<10; j++) Console::WriteLine(arr[j]);
```
You should find that the values print, one to a line, as shown in the following figure.
What happens if you change the range of the second loop so that it tries to print the element at [10]? Alter the code in the second loop to look like this:
```
 // Print its contents for(int j=0; j<=10; j++) Console::WriteLine(arr[j]);
```
Notice the less than or equal to (<=) condition. The effect of this condition is to try to print 11 elements rather than 10. Compile and run the program, and you should see output similar to the following figure.

Notice the random value that’s been printed at the end of the list. Here’s the third important point about native arrays: bounds aren’t checked. Native arrays in C++ aren’t objects, and therefore they have no knowledge of how many elements they contain. It’s up to you to keep within the bounds of the array, and if you don’t, you risk corrupting data or crashing your program.

Passing Arrays to Functions

Passing arrays to functions introduces one complication because of the fact that an array has no knowledge of its size or contents. As you’ll see shortly, when you pass an array to a function, you pass only the starting address, which means that you have to figure out some way of passing the size information along with the array when you call the function. Normally this is accomplished in one of two ways:

Pass the size as an explicit parameter to the function call.
Make sure that the array is always terminated by a unique marker value so that the function can tell when the end of the data has been reached.

How Do Native Arrays Work?

A native array in C++ isn’t an object; it’s simply a collection of values strung together in memory. So a 10-element array of integers consists of 10 integers one after the other in memory. The name of the array is a pointer to the first element, so when you declare an array like this

int foo[10];

you’re telling the compiler to reserve memory large enough to hold 10 integers and return you the address as foo. When you access an array element, you’re actually specifying the offset from this address, so that foo[1] means “offset one int from the address foo, and use what is stored there.” This explains why array indexing starts from 0: an index of 0 denotes an offset of zero from the start address, so it means the first element.

Once the compiler has allocated the space, pretty much all it knows about an array is its starting address. When you provide an offset in terms of an array index, the compiler generates code to access that piece of memory. And if you’ve got it wrong and stepped outside the bounds of the array, you can end up reading or writing somewhere inappropriate. In fact, deliberately accessing outside the bounds of arrays has been the basis for many security attacks on programs and systems over the years.

To finish this brief explanation, note that there’s a close link between arrays and pointers—so close, in fact, that any array access can be written using pointer notation instead of array notation, as shown here:

// These two are equivalent n = arr[3]; n = *(arr + 3);

In the second example, the compiler is being told to dereference the location in memory whose distance from address arr is the size of three int variables.

Let’s investigate passing an array to a function.

Continue with the project from the previous exercise; reopen it if necessary.
Add the following function definition immediately after the using namespace System; line:
```
void func(int arr[], size_t size) { for(size_t i=0; i<size; i++) Console::WriteLine(arr[i]); }
```
The first argument to the function tells the compiler that the address of an array is going to be passed, which is equivalent to passing a pointer. It’s very common to see int* used instead. The second argument passes the size of the array—in effect, the amount of memory pointed to by the first argument. The size_t type is a typedef for unsigned int, and it’s good practice to use this type for integer arguments that denote sizes, lengths, or dimensions. The function prints out the array by using the size, just as before.
Call the function from the _tmain routine like this:
```
func(arr, 10);
```
What if the array size was changed at some point? You can make your code more robust by calculating the number of elements in the array automatically using the sizeof operator, like this:
```
func(arr, sizeof(arr)/sizeof(arr[0]));
```
The sizeof operator returns the size of its argument in bytes, where the argument can be a variable name or a type name. Using sizeof on an array returns the total size of the array in bytes, in this case, 40 bytes. When divided by the size of one element—4 bytes—you’re left with the number of elements in the array.

Initializing Arrays

It’s possible to initialize arrays at the point of declaration, as shown in the following syntax fragment:

int arr[4] = { 1, 2, 3, 4 };

The values to be used for initialization are provided as a comma-separated list in braces ({}) on the right-hand side of an assignment; these values are known as an aggregate initializer. The compiler is clever enough to figure out how many values are in the list, and it will dimension the array to fit if you don’t provide a value.

// Dimension the array automatically int arr[] = { 1, 2, 3, 4 };

If you give a dimension and then provide too many values, you’ll get a compiler error. If you provide too few values, the initial values you give will be used to initialize the array starting from element zero, and the remaining elements will be set to zero.

Multidimensional Arrays

Multidimensional arrays in C++ are an extension of the single-dimensional variety, in that a two-dimensional array is actually an array of single-dimensional arrays. So in C++, arrays of higher dimensions are all built out of single-dimensional arrays. The following short exercise shows how to create and use a two- dimensional array.

Open Visual Studio, and create a new Visual C++ Console Application (.NET) project named MultiD.
Open the source file MultiD.cpp, and add the following code to the _tmain function:
```
int _tmain() { Console::WriteLine(S"Multidimensional Arrays"); // Create a 2D array int arr[2][3]; // Fill the array for(int i=0; i<2; i++) for(int j=0; j<3; j++) arr[i][j] = (i+1)*(j+1); return 0; } 
```
Note that a two-dimensional array is declared by using two sets of square brackets. You don’t put the two values inside one set of brackets, as you do in many other languages, and for higher order arrays, you simply add more sets of square brackets. As with single-dimensional arrays, you have to give the size at compile time, and the indices of each dimension vary from zero to one less than the declared size. Array elements are also accessed using two sets of square brackets.
Print out the array using an extension of the method for printing out the elements of the single-dimensional array, as follows:
```
// Print the array content for(int i=0; i<2; i++) { for(int j=0; j<3; j++) Console::Write("{0} ", __box(arr[i][j])); Console::WriteLine(); }
```
Notice that one row of the array gets printed on one line. The inner loop prints a single row using repeated calls to Console::Write. To format the output, the array element has to be boxed using a call to the __box keyword. After each row has been output, a call to Console::WriteLine outputs a new line.

To pass a multidimensional array to a function, use two empty sets of square brackets (for example, int arr[][]) and specify the dimension information as before.

Dynamic Allocation and Arrays

So far, all arrays in this chapter have had a fixed size allocated at compile time. It is possible—and very common—to create arrays dynamically at run time using the new operator. The array you create still has a fixed size, but this size can be specified at run time when you know how many elements you need. The following exercise shows how to create an array dynamically and then use it.

Open Visual Studio, and create a new Visual C++ Console Application (.NET) project named Dynamic.
Open the source file Dynamic.cpp, and add the following code to the _tmain function:
```
int _tmain() { Console::WriteLine(S"Dynamic Arrays"); // Create an array dynamically int* pa = new int[10]; // Fill the array for(int i=0; i<10; i++) pa[i] = i*2; // Print the array content for(int j=0; j<10; j++) Console::WriteLine(pa[j]); // Get rid of the array once we’re finished with it delete pa; return 0; }
```
You’ve previously used the new operator to create .NET reference types, but the operator is also used in traditional C++ code to allocate memory dynamically at run time. The syntax is new, followed by the type of the array and the dimension in square brackets. Once the array has been created, you’re returned a pointer to the start of the array.

You can see that dynamic arrays are accessed in exactly the same way as statically allocated arrays, using the square brackets notation. This use of a pointer with array notation underlines the relationship between pointers and arrays, as explained in the sidebar “How Do Native Arrays Work?” earlier in this chapter.

Notice the call to delete just before the program exits. Allocating an array dynamically in traditional C++ doesn’t create a managed object, so there is no garbage collection associated with this array. So, to use memory efficiently, you have to remember to deallocate memory once you’ve finished with the array. Strictly speaking, the call is unnecessary here because all allocated memory is freed up when the program exits. However, in any real-world program, you need to manage your memory carefully to make sure all memory is freed up at an appropriate point.

Note

Once you’ve called delete on a pointer, you must not use the pointer again because the memory it points to is no longer allocated to you. If you try to use a pointer after freeing up the memory it points to, you can expect to get a run-time error.

Problems with Manual Memory Management

Manual memory management is widely considered to be the single biggest cause of bugs in C and C++ programs, and it’s the driving force behind the development of the garbage collection mechanisms in languages such as C#. If it’s up to the programmers to call delete on every piece of memory they allocate, mistakes are going to be made.

Two main problems are associated with manual memory management:

Not freeing up memory. This problem is normally the less serious of the two, and it results in a program taking up more memory than it needs, a process known as memory leakage. In extreme cases, the amount of extra memory consumed by an application can reach the point where memory leakage starts to interfere with other applications or even the operating system.
Freeing up memory inappropriately. In a complex program, it might not be obvious where a particular piece of memory should be freed up or whose responsibility it is to free it. If delete gets called too soon and another piece of code tries to use the dynamically allocated array, you can expect a run-time error. The same is true if anyone attempts to call delete on the same pointer more than once.

Although manual memory allocation using new and delete lets you do some very clever things, these two problems were the impetus behind the development of garbage collectors, which make the system track the use of dynamically allocated memory and free it up when no one else is using it.

__gc Arrays

The .NET Framework has extended the C++ array model by adding __gc arrays. As you might expect from the use of the __gc keyword, a __gc array is a dynamic array that is allocated on the .NET heap and is subject to the usual garbage collection rules.

Note

Unlike standard C++ arrays, subscripting in __gc arrays is not a synonym for pointer arithmetic.

You can create a __gc array in a very similar way to a traditional dynamic array:

Int32 gcArray[] = new Int32[10];

Notice that the array type is the .NET Int32 rather than the built-in int type. Notice also the way that gcArray has been declared: it’s no longer a pointer, as with traditional arrays, but it’s a managed object. All __gc arrays inherit from System::Array, so any method or property of System::Array can be directly applied to the __gc array. See the section “The .NET Array Class” later in this chapter for details about System::Array and how to use it.

Using the gc and nogc Keywords

You can use the __gc and __nogc keywords to determine whether a managed or an unmanaged array is going to be created. Normally, creating an array of primitive types will result in an unmanaged array. However, you can use the __gc keyword to create a managed array of a primitive type, as shown in the following code:

// Create an unmanaged array of ints int* arr = new int[10]; // Create a managed array of ints int arr1 __gc[] = new int __gc[10];

The __gc[] syntax will create an array of primitive types that is subject to the usual garbage collection mechanism. In a similar way, you can use __nogc to create “traditional” unmanaged arrays of .NET types, provided that the type corresponds to one of the C++ primitive types.

// Create an unmanaged array of Int32 Int32 arr1 __nogc[10];

This array is not a managed array object, and it won’t be garbage collected. In addition, because it isn’t an array object, it doesn’t support any of the functionality of the System::Array class.

Arrays and Reference Types

Because reference types are always accessed using references, creating and initializing arrays of reference types is slightly different from creating arrays of value types. The following exercise shows how to create and use an array of reference types. In this example, you’ll use System::String as the reference type, but you can easily substitute a reference type of your own.

Open Visual Studio, and create a new Visual C++ Console Application (.NET) project named RefArray.

Open the RefArray.cpp source file, and add the following code to the _tmain function:

int _tmain() { Console::WriteLine(S"Arrays of Reference Types"); // Create an array of String references String* pa[] = new String*[5]; // Explicitly assign a new String to element zero pa[0] = new String("abc"); // Implicitly assign a new String to element one pa[1] = "def"; // Print the array content for(int i=0; i<5; i++) { if (pa[i] == 0) Console::WriteLine("null"); else Console::WriteLine(pa[i]); } return 0; }

Compile and run the code. You should see the two strings abc and def printed, followed by three null entries.

The declaration of pa creates a new array of string references, not of string objects. The references in the array are initialized with nulls, and you need to create objects and assign them to the references in the array. In the example, you’ve assigned values to the first two out of five values, so when you print the array, you see the two strings printed, followed by three nulls.

Multidimensional __gc Arrays

The Managed Extensions for C++ provide a new model for creating multidimensional __gc arrays, as shown in the following code fragment:

// Create a multidimensional array of String references Int32 pn[,] = new Int32[3,2]; // Initialize two members pn[0,0] = 3; pn[1,1] = 4;

The declaration of the array reference—in this case, pn—uses square brackets containing zero or more commas to denote the number of dimensions in the array. There’s always one fewer comma than the number of dimensions, so if you wanted to create a three-dimensional array, you’d declare it as Int32 p3d[,,]. The new operator also uses square brackets, but with the list of dimensions inside one set of brackets, which makes it easy to tell when you’re dealing with a multidimensional managed array as opposed to a traditional multidimensional array.

You’ll meet multidimensional __gc arrays in the “Basic Operations on Arrays” section later in this chapter, when we talk about the System::Array class in more detail.

Native C++ Arrays

Passing Arrays to Functions

Initializing Arrays

Multidimensional Arrays

Dynamic Allocation and Arrays

__gc Arrays

Using the __gc and __nogc Keywords

Arrays and Reference Types

Multidimensional __gc Arrays

Using the gc and nogc Keywords