Derived Types | C Pocket Reference

1.10 Derived Types

A programmer can also define new types, including enumerated types and derived types. Derived types include pointers, arrays, structures, and unions.

The basic types and the enumerated types are collectively called the arithmetic types . The arithmetic types and the pointer types in turn make up the scalar types . The array and structure types are known collectively as the aggregate types .

1.10.1 Enumeration Types

Enumeration types are used to define variables that can only be assigned certain discrete integer values throughout the program. The possible values and names for them are defined in an enumeration. The type specifier begins with the keyword enum; for example:

enum toggle { OFF, ON, NO = 0, YES };

The list of enumerators inside the braces defines the new enumeration type. The identifier toggle is the tag of this enumeration. This enumeration defines the identifiers in the list (OFF, ON, NO, and YES) as constants with type int.

The value of each identifier in the list may be determined explicitly, as in NO = 0 in the example above. Identifiers for which no explicit value is specified are assigned a value automatically based on their position in the list, as follows: An enumerator without an explicit value has the value 0 if it is the first in the list; otherwise its value is 1 greater than that of the preceding enumerator. Thus in the example above, the constants OFF and NO have the value 0, while ON and YES have the value 1.

Once an enumeration type has been defined, variables with the type can be declared within its scope. For example:

enum toggle t1, t2 = ON;

This declaration defines t1 and t2 as variables with type enum toggle, and also initializes t2 with the value ON, or 1.

Following is an enumeration without a tag:

enum { black, blue, green, cyan, red, magenta, white };

As this example illustrates, the definition of an enumeration does not necessarily include a tag. In this case, the enumeration type cannot be used to declare variables, but the enumeration constants can be used to designate a set of discrete values. This technique can be used as an alternative to the #define directive. The constants in the example above have the following values: black = 0, blue = 1, ... , white = 6.

Variables with an enumeration type can generally be used in a C program in comparative or arithmetic expressions, for example as ordinary int variables.

1.10.2 Structures, Unions, and Bit-Fields

Different data items that make up a logical unit are generally grouped together in a record. The structure of a record i. e., the names, types, and order of its components is represented in C by a structure type .

The components of a record are called the members of the structure. Each member can be of any type. The type specifier begins with the keyword struct; for example:

struct article  {  char  name[40];

  int  quantity;

  double price;

};

This example declares a structure type with three members. The identifier article is the tag of the structure, and name, quantity, and price are the names of its members. Within the scope of a structure declaration, variables can be declared with the structure type:

struct article  a1, a2, *pArticle, arrArticle[100];

a1 and a2 are variables of type struct article, and pArticle is a pointer to an object of type struct article. The array arrArticle has 100 elements of type struct article.

Structure variables can also be declared simultaneously with the structure type definition. If no further reference is made to a structure type, then its declaration need not include a tag. For example:

struct {unsigned char character, attribute;}

xchar, xstr[100];

The structure type defined here has the members character and attribute, both of which have the type unsigned char. The variable xchar and the elements of the array xstr have the type of the new tagless structure.

The members of a structure variable are located in memory in order of their declaration within the structure. The address of the first member is identical to the address of the entire structure. The addresses of the other members and the total storage space required by the structure may vary, however, since the compiler can insert unnamed gaps between the individual members for the sake of optimization. For this reason the storage size of a structure should always be obtained using the sizeof operator.

The macro offsetof, defined in the header file stddef.h, can be used to obtain the location of a member within a structure. The expression:

offsetof( structure_type, member )

has the type size_t, and yields the distance in bytes between the beginning of the structure and member.

Structure variables can be initialized by an initialization list containing a value for each member:

struct article flower =  // Declare and initialize the

  { "rose", 7, 2.49 };  // structure variable flower

A structure variable with automatic storage duration can also be initialized with the value of an existing structure variable. The assignment operator can be used on variables of the same structure type. For example:

arrArticle[0] = flower;

This operation copies the value of each member of flower to the corresponding member of arrArticle[0].

A specific structure member can be accessed by means of the dot operator, which has a structure variable and the name of a member as its operands:

 flower.name  // The array 'name'

 flower.price  // The double variable 'price'

Efficient data handling often requires the use of pointers to structures. The arrow operator provides convenient access to a member of a structure identified by a pointer. The left operand of the arrow operator is a pointer to a structure. Some examples follow:

pArticle = &flower;  // Let pArticle point to flower

pArticle->quantity  // Access members of flower

pArticle->price  // using the pointer pArticle

A structure cannot have itself as a member. Recursive structures can be defined, however, by means of members that are pointers to the structure's own type. Such recursive structures are used to implement linked lists and binary trees, for example.

1.10.2.1 Unions

A union permits references to the same location in memory to have different types. The declaration of a union differs from that of a structure only in the keyword union:

union number {long n; double x;};

This declaration creates a new union type with the tag number and the two members n and x.

Unlike the members of a structure, all the members of a union begin at the same address! Hence the size of a union is that of its largest member. According to the example above, a variable of type union number occupies 8 bytes.

Once a union type has been defined, variables of that type can be declared. Thus:

union number  nx[10];

declares an array nx with ten elements of type union number. At any given time, each such element contains either a long or a double value. The members of a union can be accessed in the same ways as structure members. For example:

nx[0].x = 1.234;  // Assign a double value to nx[0]

Like structures, union variables are initialized by an initializer list. For a union, however, the list contains only one initializer. If no union member is explicitly designated, the first member named in the union type declaration is initialized:

union number length = { 100L };

After this declaration, length.n has the value 100.

1.10.2.2 Bit-fields

Members of structures or unions can also be bit-fields. Bit-fields are integers which consist of a defined number of bits. The declaration of a bit-field has the form:

type   [identifier] : width;

where type is either unsigned int or signed int, identifier is the optional name of the bit-field, and width is the number of bits occupied by the bit-field in memory.

A bit-field is normally stored in a machine word that is a storage unit of length sizeof(int). The width of a bit-field cannot be greater than that of a machine word. If a smaller bit-field leaves sufficient room, subsequent bit-fields may be packed into the same storage unit. A bit-field with width zero is a special case, and indicates that the subsequent bit-field is to be stored in a new storage unit regardless of whether there's room in the current storage unit. Here's an example of a structure made up of bit fields:

  struct  {  unsigned int  b0_2 : 3;

  signed  int  b3_7 : 5;

  unsigned int  : 7;

  unsigned int  b15  : 1;

  } var;

The structure variable var occupies at least two bytes, or 16 bits. It is divided into four bit-fields: var.b0_2 occupies the lowest three bits, var.b3_7 occupies the next five bits, and var.b15 occupies the highest bit. The third member has no name, and only serves to define a gap of seven bits, as shown in Figure 1-5.

Figure 1-5. Bit assignments in the example struct

Bit-fields with the type unsigned int are interpreted as unsigned. Bit-fields of type signed int can have negative values in two's-complement encoding. In the example above, var.b0_2 can hold values in the range from 0 to 7, and var.b3_7 can take values in the range from -16 to 15.

Bit-fields also differ from ordinary integer variables in the following ways:

The address operator (&) cannot be applied to bit-fields (but it can be applied to a structure variable that contains bit-fields).

Some uses of bit-fields may lead to portability problems, since the interpretation of the bits within a word can differ from one machine to another.

1.10.3 Arrays

Arrays are used to manage large numbers of objects of the same type. Arrays in C can have elements of any type except a function type. The definition of an array specifies the array name, the type, and, optionally, the number of array elements. For example:

  char line[81];

The array line consists of 81 elements with the type char. The variable line itself has the derived type "array of char" (or "char array").

In a statically defined array, the number of array elements (i. e., the length of the array) must be a constant expression. In ANSI C99, any integer expression with a positive value can be used to specify the length of a non-static array with block scope. This is also referred to as a variable-length array.

An array always occupies a continuous location in memory. The size of an array is thus the number of elements times the size of the element type:

sizeof( line ) == 81 * sizeof( char ) == 81 bytes

The individual array elements can be accessed using an index. In C, the first element of an array has the index 0. Thus the 81 elements of the array line are line[0], line[1], ... , line[80].

Any integer expression can be used as an index. It is up to the programmer to ensure that the value of the index lies within the valid range for the given array.

A string is a sequence of consecutive elements of type char that ends with the null character, '\0'. The length of the string is the number of characters excluding the string terminator '\0'. A string is stored in a char array, which must be at least one byte longer than the string.

A wide string consists of characters of type wchar_t and is terminated by the wide null character, L'\0'. The length of a wide string is the number of wchar_t characters in the string, excluding the wide string terminator. For example:

  wchar_t wstr[20] = L"Mister Fang"; //  length: 11

  //  wide characters

A multi-dimensional array in C is an array whose elements are themselves arrays. For example:

  short point[50][20][10];

The three-dimensional array point consists of 50 elements that are two-dimensional arrays. The declaration above defines a total of 50*20*10 = 10,000 elements of type short, each of which is uniquely identified by three indices:

point[0][0][9] = 7;  // Assign the value 7 to  the "point"

 // with the "coordinates" (0,0,9).

Two-dimensional arrays, also called matrices, are the most common multi-dimensional arrays. The elements of a matrix can be thought of as being arranged in rows (first index) and columns (second index).

Arrays in C are closely related to pointers: in almost all expressions, the name of an array is converted to a pointer to the first element of the array. The sizeof operator is an exception, however: if its operand is an array, it yields the number of bytes occupied, not by a pointer, but by the array itself. After the declaration:

char msg[] = "Hello, world!";

the array name msg points to the character 'H'. In other words, msg is equivalent to &msg[0]. Thus in a statement such as:

puts( msg ); // Print string to display

only the address of the beginning of the string is passed to the function puts(). Internally, the function processes the characters in the string until it encounters the terminator character '\0'.

An array is initialized by an initialization list containing a constant initial value for each of the individual array elements:

double x[3] = { 0.0, 0.5, 1.0 };

After this definition, x[0] has the value 0.0, x[1] the value 0.5, and x[2] the value 1.0. If the length of the array is greater than the number of values in the list, then all remaining array elements are initialized with 0. If the initialization list is longer than the array, the redundant values are ignored.

The length of the array need not be explicitly specified, however:

double x[] = { 0.0, 0.5, 1.0 };

In this definition, the length of the array is determined by the number of values in the initialization list.

A char array can be initialized by a string literal:

char str[] = "abc";

This definition allocates and initializes an array of four bytes, and is equivalent to:

char str[] = { 'a', 'b', 'c', '\0' } ;

In the initialization of a multi-dimensional array , the magnitude of all dimensions except the first must be specified. In the case of a two-dimensional array, for example, the number of rows can be omitted. For example:

char error_msg[][40] =  { "Error opening file!",

   "Error reading file!",

  "Error writing to file!"};

The array error_msg consists of three rows, each of which contains a string.

1.10.4 Pointers

A pointer represents the address and type of a variable or a function. In other words, for a variable x, &x is a pointer to x.

A pointer refers to a location in memory, and its type indicates how the data at this location is to be interpreted. Thus the pointer types are called pointer to char, pointer to int, and so on, or for short, char pointer, int pointer, etc.

Array names and expressions such as &x are address constants or constant pointers, and cannot be changed. Pointer variables, on the other hand, store the address of the object to which they refer, which address you may change. A pointer variable is declared by an asterisk (*) prefixed to the identifier. For example:

float  x, y, *pFloat;

pFloat = &x;  // Let pFloat point to x.

After this declaration, x and y are variables of type float, and pFloat is a variable of type float * (pronounced "pointer to float"). After the assignment operation, the value of pFloat is the address of x.

The indirection operator * is used to access data by means of pointers. If ptr is a pointer, for example, then *ptr is the object to which ptr points. For example:

y = *pFloat;  //  equivalent to  y = x;

As long as pFloat points to x, the expression *pFloat can be used in place of the variable x. Of course, the indirection operator * must only be used with a pointer which contains a valid address.

A pointer with the value 0 is called a null pointer. Null pointers have a special significance in C. Because all objects and functions have non-zero addresses, a null pointer always represents an invalid address. Functions that return a pointer can therefore return a null pointer to indicate a failure condition. The constant NULL is defined in stdio.h, stddef.h, and other header files as a null pointer (i.e., a pointer with a value of zero).

All object pointer variables have the same storage size, regardless of their type. Two or four bytes are usually required to store an address.

Parentheses are sometimes necessary in complex pointer declarations. For example:

long arr[10];  // Array arr with ten elements

long (*pArr)[10]; // Pointer pArr to an array

  //  of ten long elements

Without the parentheses, the declaration long *pArr[10]; would create an array of ten pointers to long. Parentheses are always necessary in order to declare pointers to arrays or functions.

1.10.4.1 Pointer arithmetic

Two arithmetic operations can be performed on pointers:

An integer can be added to or subtracted from a pointer.

One pointer can be subtracted from another of the same type.

These operations are generally useful only when the pointers point to elements of the same array. In arithmetic operations on pointers, the size of the objects pointed to is automatically taken into account. For example:

int a[3] = { 0, 10, 20 };  // An array with three elements

int *pa = a;  // Let pa point to a[0]

Since pa points to a[0], the expression pa + 1 yields a pointer to the next array element, a[1], which is sizeof( int ) bytes away in memory. Furthermore, because the array name a likewise points to a[0], a+1 also yields a pointer to a[1].

Thus for any integer i, the following expressions are equivalent:

&a[i] , a+i , pa+i // pointers to the i-th array element

By the same token, the following expressions are also equivalent:

a[i] , *(a+i) , *(pa+i) , pa[i]  // the i-th array element

Thus a pointer can be treated as an array name: pa[i] and *(pa+i) are equivalent. Unlike the array name, however, pa is a variable, not an address constant. For example:

pa = a+2;  // Let pa point to a[2]

int n = pa-a;  //  n = 2

The subtraction of two pointers yields the number of array elements between the pointers. For example, the expression pa-a yields the integer value 2 if pa points to a[2]. This value has the integer type ptrdiff_t, which is defined (usually as int) in stddef.h.

The addition of two pointers is not a useful operation, and hence is not permitted. It is possible, however, to compare two pointers of the same type, as the following example illustrates:

// Formatted output of the elements of an array

#define  LEN  10

float numbers[LEN], *pn;

  . . .

for ( pn = numbers; pn < numbers+LEN; ++pn )

 printf( "%16.4f", *pn );

1.10.4.2 Function pointers

The name of a function is a constant pointer to the function. Its value is the address of the function's machine code in memory. For example, the name puts is a pointer to the function puts(), which outputs a string:

#include <stdio.h>  // Include declaration of puts()

int (*pFunc)(const char*);  // Pointer to a function

 . . .  // whose parameter is a string

  // and whose return value

 // has type int

pFunc = puts;  // Let pFunc point to puts()

(*pFunc)("Any questions?"); // Call puts() using the

   // pointer

Note that the first pair of parentheses is required in the declaration of the variable pFunc. Without it, int *pFunc( const char* ); would declare pFunc as a function that returns a pointer to int.

1.10.5 Type Qualifiers and Type Definitions

The type of an object can be qualified by the keywords const and volatile in the declaration.

The type qualifier const indicates that the program can no longer modify an object after its declaration. For example:

const double pi = 3.1415927;

After this declaration, a statement that modifies the object pi, such as pi = pi+1;, is illegal and results in a compiler error.

The type qualifier volatile indicates variables that can be modified by processes other than the present program. Based on this information, the compiler may refrain from optimizing access to the variable.

The type qualifiers volatile and const can also be combined:

extern const volatile unsigned  clock_ticks;

After this declaration, clock_ticks cannot be modified by the program, but may be modified by another process, such as a hardware clock interrupt handler.

Type qualifiers are generally prefixed to the type specifier. In pointer declarations, however, type qualifiers may be applied both to the pointer itself and to the object it addresses. If the type qualifier is to be applied to the pointer itself, it must be placed immediately before the identifier.

The most common example of such a declaration is the "pointer to a constant object." Such a pointer may point to a variable, but cannot be used to modify it. For this reason, such pointers are also called "read-only" pointers. For example:

int var1 = 1, var2 = 2, *ptr;

const int cArr[2];

const int *ptrToConst;// "Read-only pointer" to int

The following statements are now permitted:

ptrToConst = &cArr[0];  // Change the value of

++ptrToConst;  // the pointer variable

ptrToConst = &var1;

var2 = *ptrToConst;  // "Read" access

The following statements are not permitted:

ptr = ptrToConst;  // "Read-only" cannot be copied to

  // "read-write"

*ptrToConst = 5;  // "Write" access not allowed!

restrict

ANSI C99 introduces the type qualifier restrict , which is only applicable to pointers. If a pointer declared with the restrict qualifier points to an object that is to be modified, then the object can only be accessed using that pointer. This information allows the compiler to generate optimized machine code. It is up to the programmer to ensure that the restriction is respected!

Example:

void *memcpy( void * restrict dest,  // destination

  const void* restrict src, // source

  size_t n );

In using the standard function memcpy() to copy a memory block of n bytes, the programmer must ensure that the source and destination blocks do not overlap.

typedef

The keyword typedefis used to give a type a new name.

Examples:

typedef unsigned char UCHAR;

typedef struct { double x, y } POINT;

After these type definitions, the identifier UCHAR can be used as an abbreviation for the type unsigned char, and the identifier POINT can be used to specify the given structure type.

Examples:

UCHAR  c1, c2, tab[100];

POINT point, *pPoint;

In a typedef declaration, the identifier is declared as the new type name. The same declaration without the typedef keyword would declare a variable and not a type name.