Pointers and Addresses | Essential C# 2.0

On occasion, developers will want to be able to access and work with memory, and with pointers to memory locations, directly. This is necessary for certain operating system interaction as well as with certain types of time-critical algorithms. To support this, C# requires use of the unsafe code construct.

Unsafe Code

One of C#'s great features is the fact that it is strongly typed and supports type checking throughout the runtime execution. What makes this feature especially great is that it is possible to circumvent this support and manipulate memory and addresses directly. You would do this when working with things like memory-mapped devices, or if you wanted to implement time-critical algorithms. The key is to designate a portion of the code as unsafe.

Unsafe code is an explicit code block and compilation option, as shown in Listing 17.11. The unsafe modifier has no effect on the generated CIL code itself. It is only a directive to the compiler to permit pointer and address manipulation within the unsafe block. Furthermore, unsafe does not imply unmanaged.

Listing 17.11. Designating a Method for Unsafe Code

class Program {   unsafe static int Main(string[] args)   {       // ...   } }

You can use unsafe as a modifier to the type or to specific members within the type.

In addition, C# allows unsafe as a statement that flags a code block to allow unsafe code (see Listing 17.12).

Listing 17.12. Designating a Code Block for Unsafe Code

 class Program {    static int Main(string[] args)    {        unsafe        {            // ...        }    } }

Code within the unsafe block can include unsafe constructs such as pointers.

Note

It is important to note that it is necessary to explicitly indicate to the compiler that unsafe code is supported.

From the command line, this requires the /unsafe switch. For example, to compile the previous code, you need to use the command shown in Output 17.1.

Output 17.1.

        csc.exe /unsafe Program.cs

You need to use the /unsafe switch because unsafe code opens up the possibility of buffer overflows and similar possibilities that expose the potential for security holes. The /unsafe switch includes the ability to directly manipulate memory and execute instructions that are unmanaged. Requiring /unsafe, therefore, makes the choice of potential exposure explicit.

Pointer Declaration

Now that you have marked a code block as unsafe, it is time to look at how to write unsafe code. First, unsafe code allows the declaration of a pointer. Consider the following example.

byte* pData;

Assuming pData is not null, its value points to a location that contains one or more sequential bytes; the value of pData represents the memory address of the bytes. The type specified before the * is the referent type, or the type located where the value of the pointer refers. In this example, pData is the pointer and byte is the referent type, as shown in Figure 17.1.

Figure 17.1. Pointers Contain the Address of the Data

Because pointers (which are just byte values) are not subject to garbage collection, C# does not allow referent types other than unmanaged types, which are types that are not reference types, are not generics, and do not contain reference types. Therefore, the following is not valid:

 string* pMessage

Neither is this:

 ServiceStatus* pStatus

where ServiceStatus is defined as shown in Listing 17.13; the problem again is that ServiceStatus includes a string field.

Listing 17.13. Invalid Referent Type Example

        struct ServiceStatus         {            int State;            string Description; // Description is a reference type         }

Language Contrast: C/C++Pointer Declaration

In C/C++, multiple pointers within the same declaration are declared as follows:

 int *p1, *p2;

Notice the * on p2; this makes p2 an int* rather than an int. In contrast, C# always places the * with the data type:

 int* p1, p2;

The result is two variables of type int*. The syntax matches that of declaring multiple arrays in a single statement:

 int[] array1, array2;

Pointers are an entirely new category of type. Unlike structs, enums, and classes, pointers don't ultimately derive from System.Object.

In addition to custom structs that contain only unmanaged types, valid referent types include enums, predefined value types (sbyte, byte, short, ushort, int, uint, long, ulong, char, float, double, decimal, and bool), and pointer types (such as byte**). Lastly, valid syntax includes void* pointers, which represent pointers to an unknown type.

Assigning a Pointer

Once code defines a pointer, it needs to assign a value before accessing it. Just like other reference types, pointers can hold the value null; this is their default value. The value stored by the pointer is the address of a location. Therefore, in order to assign it, you must first retrieve the address of the data.

You could explicitly cast an integer or a long into a pointer, but this rarely occurs without a means of determining the address of a particular data value at execution time. Instead, you need to use the address operator (&) to retrieve the address of the value type:

 byte* pData = &bytes[0]; // Compile error

The problem is that in a managed environment, data can move, thereby invalidating the address. The error message is "You can only take the address of [an] unfixed expression inside of a fixed statement initializer." In this case, the byte referenced appears within an array and an array is a reference type (a moveable type). Reference types appear on the heap and are subject to garbage collection or relocation. A similar problem occurs when referring to a value type field on a moveable type:

 int* a = &"message".Length;

Either way, to complete the assignment, the data needs to be a value type, fixed, or explicitly allocated on the call stack.

Fixing Data

To retrieve the address of a moveable data item, it is necessary to fix, or pin, the data, as demonstrated in Listing 17.14.

Listing 17.14. Fixed Statement

        byte[] bytes = new byte[24];         fixed (byte* pData = &bytes[0]) // pData = bytes also allowed          {              // ...          }

Within the code block of a fixed statement, the assigned data will not move. In this example, bytes will remain at the same address, at least until the end of the fixed statement.

The fixed statement requires the declaration of the pointer variable within its scope. This avoids accessing the variable outside of the fixed statement, when the data is no longer fixed. However, it is the programmer's responsibility to ensure that he doesn't assign the pointer to another variable that survives beyond the scope of the fixed statementpossibly in an API call, for example. Similarly, using ref or out parameters will be problematic for data that will not survive beyond the method call.

Since a string is an invalid referent type, it would appear invalid to define pointers to strings. However, as in C++, internally a string is a pointer to the first character of an array of characters, and it is possible to declare pointers to characters using char*. Therefore, C# allows declaring a pointer of type char* and assigning it to a string within a fixed statement. The fixed statement prevents the movement of the string during the life of the pointer. Similarly, it allows any moveable type that supports an implicit conversion to a pointer of another type, given a fixed statement.

You can replace the verbose assignment of &bytes[0] with the abbreviated bytes, as shown in Listing 17.15.

Listing 17.15. Fixed Statement without Address or Array Indexer

        byte[] bytes = new byte[24];         fixed (byte* pData = bytes)         {            // ...         }

Depending on the frequency and time to execute, fixed statements have the potential to cause fragmentation in the heap because the garbage collector cannot compact fixed objects. To reduce this problem, the best practice is to pin blocks early in the execution and to pin fewer large blocks rather than many small blocks. .NET 2.0 reduces the .NET Framework problem as well, due to some additional fragmentation-aware code.

Allocating on the Stack

You should use the fixed statement on an array to prevent the garbage collector from moving the data. However, an alternative is to allocate the array on the call stack. Stack allocated data is not subject to garbage collection or to the finalizer patterns that accompany it. Like referent types, the requirement is that the stackalloc data is an array of unmanaged types. For example, instead of allocating an array of bytes on the heap, you can place it onto the call stack, as shown in Listing 17.16.

Listing 17.16. Allocating Data on the Call Stack

byte* bytes = stackalloc byte[42];

Because the data type is an array of unmanaged types, it is possible for the runtime to allocate a fixed buffer size for the array and then to restore that buffer once the pointer goes out of scope. Specifically, it allocates sizeof(T) * E, where E is the array size and T is the referent type. Given the requirement of using stackalloc only on an array of unmanaged types, the runtime restores the buffer back to the system simply by unwinding the stack, eliminating the complexities of iterating over the f-reachable queue and compacting reachable data. Therefore, there is no way to explicitly free stackalloc data.

Dereferencing a Pointer

Accessing the value of a type referred to by a pointer requires that you dereference the pointer, placing the indirection operator prior to the pointer type. byte data = *pData;, for example, dereferences the location of the byte referred to by pData and returns the single byte at that location.

Using this principle in unsafe code allows the unorthodox behavior of modifying the "immutable" string, as shown in Listing 17.17. In no way is this recommended, but it does expose the potential of low-level memory manipulation.

Listing 17.17. Modifying an Immutable String

        string text = "S5280ft";         Console.Write("{0} = ", text);         unsafe // Requires /unsafe switch.         {                  fixed (char* pText = text)               {                      char* p = pText;                      *++p = 'm';                      *++p = 'i';                      *++p = 'l';                      *++p = 'e';                      *++p = ' ';                      *++p = ' ';               }          }          Console.WriteLine(text);

The results of Listing 17.17 appear in Output 17.2.

Output 17.2.

S5280ft = Smile

In this case, you take the original address and increment it by the size of the referent type (sizeof(char)), using the preincrement operator. Next, you dereference the address using the indirection operator and then assign the location with a different character. Similarly, using the + and - operators on a pointer changes the address by the * sizeof(T) operand, where T is the referent type.

Similarly, the comparison operators (==, !=, <, >, <=, and =>) work to compare pointers translating effectively to the comparison of address location values.

One restriction on the dereferencing operator is the inability to dereference a void*. The void* data type represents a pointer to an unknown type. Since the data type is unknown, it can't be dereferenced to another type. Instead, to access the data referenced by a void*, you must cast it to first assign it to any other pointer type and then to dereference the later type, for example.

You can achieve the same behavior as Listing 17.17 by using the index operator rather than the indirection operator (see Listing 17.18).

Listing 17.18. Modifying an Immutable with the Index Operator in Unsafe Code

        stringtext;         text = "S5280ft";         Console.Write("{0} = ", text);         Unsafe// Requires /unsafe switch.         {             fixed (char* pText = text)            {                  pText[1] = 'm';                  pText[2] = 'i';                  pText[3] = 'l';                  pText[4] = 'e';                  pText[5] = ' ';                  pText[6] = ' ';            }        }        Console.WriteLine(text);

The results of Listing 17.18 appear in Output 17.3.

Output 17.3.

        S5280ft = Smile

Modifications such as those in Listing 17.17 and Listing 17.18 lead to unexpected behavior. For example, if you reassigned text to "S5280ft" following the Console.WriteLine() statement and then redisplayed text, the output would still be Smile because the address of two equal string literals is optimized to one string literal referenced by both variables. In spite of the apparent assignment

 text = "S5280ft";

after the unsafe code in Listing 17.17, the internals of the string assignment are an address assignment of the modified "S5280ft" location, so text is never set to the intended value.

Accessing the Member of a Referent Type

Dereferencing a pointer makes it possible for code to access the members of the referent type. However, this is possible without the indirection operator (&). As Listing 17.19 shows, it is possible to directly access a referent type's members using the -> operator (shorthand for (*p)).

Listing 17.19. Directly Accessing a Referent Type's Members

unsafe {     Angle angle = new Angle(30, 18, 0);     Angle* pAngle = &angle;     System.Console.WriteLine("{0}Â° {1}' {2}",         pAngle->Hours, pAngle->Minutes, pAngle->Seconds); }

The results of Listing 17.19 appear in Output 17.4.

Output 17.4.

        30° 18' 0