Item 4: Make sure that objects are initialized before they re used


Item 4: Make sure that objects are initialized before they're used

C++ can seem rather fickle about initializing the values of objects. For example, if you say this,

 int x; 

in some contexts, x is guaranteed to be initialized (to zero), but in others, it's not. If you say this,

 class Point {   int x, y; }; ... Point p; 

p's data members are sometimes guaranteed to be initialized (to zero), but sometimes they're not. If you're coming from a language where uninitialized objects can't exist, pay attention, because this is important.

Reading uninitialized values yields undefined behavior. On some platforms, the mere act of reading an uninitialized value can halt your program. More typically, the result of the read will be semi-random bits, which will then pollute the object you read the bits into, eventually leading to inscrutable program behavior and a lot of unpleasant debugging.

Now, there are rules that describe when object initialization is guaranteed to take place and when it isn't. Unfortunately, the rules are complicated too complicated to be worth memorizing, in my opinion. In general, if you're in the C part of C++ (see Item 1) and initialization would probably incur a runtime cost, it's not guaranteed to take place. If you cross into the non-C parts of C++, things sometimes change. This explains why an array (from the C part of C++) isn't necessarily guaranteed to have its contents initialized, but a vector (from the STL part of C++) is.

The best way to deal with this seemingly indeterminate state of affairs is to always initialize your objects before you use them. For non-member objects of built-in types, you'll need to do this manually. For example:

 int x = 0;                                // manual initialization of an int const char * text = "A C-style string";   // manual initialization of a                                           // pointer (see also Item 3) double d;                                 // "initialization" by reading from std::cin >> d;                            // an input stream 

For almost everything else, the responsibility for initialization falls on constructors. The rule there is simple: make sure that all constructors initialize everything in the object.

The rule is easy to follow, but it's important not to confuse assignment with initialization. Consider a constructor for a class representing entries in an address book:

 class PhoneNumber { ... }; class ABEntry {                       // ABEntry = "Address Book Entry" public:   ABEntry(const std::string& name, const std::string& address,           const std::list<PhoneNumber>& phones); private:   std::string theName;   std::string theAddress;   std::list<PhoneNumber> thePhones;   int num TimesConsulted; }; ABEntry::ABEntry(const std::string& name, const std::string& address,                  const std::list<PhoneNumber>& phones) {   theName = name;                       // these are all assignments,   theAddress = address;                 // not initializations   thePhones = phones   numTimesConsulted = 0; } 

This will yield ABEntry objects with the values you expect, but it's still not the best approach. The rules of C++ stipulate that data members of an object are initialized before the body of a constructor is entered. Inside the ABEntry constructor, theName, theAddress, and thePhones aren't being initialized, they're being assigned. Initialization took place earlier when their default constructors were automatically called prior to entering the body of the ABEntry constructor. This isn't true for numTimesConsulted, because it's a built-in type. For it, there's no guarantee it was initialized at all prior to its assignment.

A better way to write the ABEntry constructor is to use the member initialization list instead of assignments:

 ABEntry::ABEntry(const std::string& name, const std::string& address,                  const std::list<PhoneNumber>& phones) : theName(name),   theAddress(address),                  // these are now all initializations   thePhones(phones),   numTimesConsulted(0) {}                                      // the ctor body is now empty 

This constructor yields the same end result as the one above, but it will often be more efficient. The assignment-based version first called default constructors to initialize theName, theAddress, and thePhones, then promptly assigned new values on top of the default-constructed ones. All the work performed in those default constructions was therefore wasted. The member initialization list approach avoids that problem, because the arguments in the initialization list are used as constructor arguments for the various data members. In this case, theName is copy-constructed from name, theAddress is copy-constructed from address, and thePhones is copy-constructed from phones. For most types, a single call to a copy constructor is more efficient sometimes much more efficient than a call to the default constructor followed by a call to the copy assignment operator.

For objects of built-in type like numTimesConsulted, there is no difference in cost between initialization and assignment, but for consistency, it's often best to initialize everything via member initialization. Similarly, you can use the member initialization list even when you want to default-construct a data member; just specify nothing as an initialization argument. For example, if ABEntry had a constructor taking no parameters, it could be implemented like this:

 ABEntry::ABEntry() :theName(),                         // call theName's default ctor;  theAddress(),                      // do the same for theAddress;  thePhones(),                       // and for thePhones;  numTimesConsulted(0)               // but explicitly initialize {}                                  // numTimesConsulted to zero 

Because compilers will automatically call default constructors for data members of user-defined types when those data members have no initializers on the member initialization list, some programmers consider the above approach overkill. That's understandable, but having a policy of always listing every data member on the initialization list avoids having to remember which data members may go uninitialized if they are omitted. Because numTimesConsulted is of a built-in type, for example, leaving it off a member initialization list could open the door to undefined behavior.

Sometimes the initialization list must be used, even for built-in types. For example, data members that are const or are references must be initialized; they can't be assigned (see also Item 5). To avoid having to memorize when data members must be initialized in the member initialization list and when it's optional, the easiest choice is to always use the initialization list. It's sometimes required, and it's often more efficient than assignments.

Many classes have multiple constructors, and each constructor has its own member initialization list. If there are many data members and/or base classes, the existence of multiple initialization lists introduces undesirable repetition (in the lists) and boredom (in the programmers). In such cases, it's not unreasonable to omit entries in the lists for data members where assignment works as well as true initialization, moving the assignments to a single (typically private) function that all the constructors call. This approach can be especially helpful if the true initial values for the data members are to be read from a file or looked up in a database. In general, however, true member initialization (via an initialization list) is preferable to pseudo-initialization via assignment.

One aspect of C++ that isn't fickle is the order in which an object's data is initialized. This order is always the same: base classes are initialized before derived classes (see also Item 12), and within a class, data members are initialized in the order in which they are declared. In ABEntry, for example, theName will always be initialized first, theAddress second, thePhones third, and numTimesConsulted last. This is true even if they are listed in a different order on the member initialization list (something that's unfortunately legal). To avoid reader confusion, as well as the possibility of some truly obscure behavioral bugs, always list members in the initialization list in the same order as they're declared in the class.

Once you've taken care of explicitly initializing non-member objects of built-in types and you've ensured that your constructors initialize their base classes and data members using the member initialization list, there's only one more thing to worry about. That thing is take a deep breath the order of initialization of non-local static objects defined in different translation units.

Let's pick that phrase apart bit by bit.

A static object is one that exists from the time it's constructed until the end of the program. Stack and heap-based objects are thus excluded. Included are global objects, objects defined at namespace scope, objects declared static inside classes, objects declared static inside functions, and objects declared static at file scope. Static objects inside functions are known as local static objects (because they're local to a function), and the other kinds of static objects are known as non-local static objects. Static objects are automatically destroyed when the program exits, i.e., their destructors are automatically called when main finishes executing.

A translation unit is the source code giving rise to a single object file. It's basically a single source file, plus all of its #include files.

The problem we're concerned with, then, involves at least two separately compiled source files, each of which contains at least one non-local static object (i.e., an object that's global, at namespace scope, or static in a class or at file scope). And the actual problem is this: if initialization of a non-local static object in one translation unit uses a non-local static object in a different translation unit, the object it uses could be uninitialized, because the relative order of initialization of non-local static objects defined in different translation units is undefined.

An example will help. Suppose you have a FileSystem class that makes files on the Internet look like they're local. Since your class makes the world look like a single file system, you might create a special object at global or namespace scope representing the single file system:

 class FileSystem {                    // from your library public:   ...   std::size_t numDisks() const;       // one of many member functions   ... }; extern FileSystem tfs;                // object for clients to use;                                       // "tfs" = "the file system" 

A FileSystem object is decidedly non-trivial, so use of theFileSystem object before it has been constructed would be disastrous.

Now suppose some client creates a class for directories in a file system. Naturally, their class uses theFileSystem object:

 class Directory {                       // created by library client public:    Directory( params );   ... }; Directory::Directory( params ) {   ...   std::size_t disks = tfs.numDisks();   // use the tfs object   ... } 

Further suppose this client decides to create a single Directory object for temporary files:

 Directory tempDir( params );           // directory for temporary files 

Now the importance of initialization order becomes apparent: unless tfs is initialized before tempDir, tempDir's constructor will attempt to use tfs before it's been initialized. But tfs and tempDir were created by different people at different times in different source files they're non-local static objects defined in different translation units. How can you be sure that tfs will be initialized before tempDir?

You can't. Again, the relative order of initialization of non-local static objects defined in different translation units is undefined. There is a reason for this. Determining the "proper" order in which to initialize non-local static objects is hard. Very hard. Unsolvably hard. In its most general form with multiple translation units and non-local static objects generated through implicit template instantiations (which may themselves arise via implicit template instantiations) it's not only impossible to determine the right order of initialization, it's typically not even worth looking for special cases where it is possible to determine the right order.

Fortunately, a small design change eliminates the problem entirely. All that has to be done is to move each non-local static object into its own function, where it's declared static. These functions return references to the objects they contain. Clients then call the functions instead of referring to the objects. In other words, non-local static objects are replaced with local static objects. (Aficionados of design patterns will recognize this as a common implementation of the Singleton pattern.)

This approach is founded on C++'s guarantee that local static objects are initialized when the object's definition is first encountered during a call to that function. So if you replace direct accesses to non-local static objects with calls to functions that return references to local static objects, you're guaranteed that the references you get back will refer to initialized objects. As a bonus, if you never call a function emulating a non-local static object, you never incur the cost of constructing and destructing the object, something that can't be said for true non-local static objects.

Here's the technique applied to both tfs and tempDir:

 class FileSystem { ... };           // as before FileSystem& tfs()                   // this replaces the tfs object; it could be {                                   // static in the FileSystem class   static FileSystem fs;             // define and initialize a local static object   return fs;                        // return a reference to it } class Directory { ... };            // as before Directory::Directory( params )      // as before, except references to tfs are {                                   // now to tfs()   ...   std::size_t disks = tfs().numDisks();   ... } Directory& tempDir()                // this replaces the tempDir object; it {                                   // could be static in the Directory class   static Directory td;              // define/initialize local static object   return td;                        // return reference to it } 

Clients of this modified system program exactly as they used to, except they now refer to tfs() and tempDir() instead of tfs and tempDir. That is, they use functions returning references to objects instead of using the objects themselves.

The reference-returning functions dictated by this scheme are always simple: define and initialize a local static object on line 1, return it on line 2. This simplicity makes them excellent candidates for inlining, especially if they're called frequently (see Item 30). On the other hand, the fact that these functions contain static objects makes them problematic in multithreaded systems. Then again, any kind of non-const static object local or non-local is trouble waiting to happen in the presence of multiple threads. One way to deal with such trouble is to manually invoke all the reference-returning functions during the single-threaded startup portion of the program. This eliminates initialization-related race conditions.

Of course, the idea of using reference-returning functions to prevent initialization order problems is dependent on there being a reasonable initialization order for your objects in the first place. If you have a system where object A must be initialized before object B, but A's initialization is dependent on B's having already been initialized, you are going to have problems, and frankly, you deserve them. If you steer clear of such pathological scenarios, however, the approach described here should serve you nicely, at least in single-threaded applications.

To avoid using objects before they're initialized, then, you need to do only three things. First, manually initialize non-member objects of built-in types. Second, use member initialization lists to initialize all parts of an object. Finally, design around the initialization order uncertainty that afflicts non-local static objects defined in separate translation units.

Things to Remember

  • Manually initialize objects of built-in type, because C++ only sometimes initializes them itself.

  • In a constructor, prefer use of the member initialization list to assignment inside the body of the constructor. List data members in the initialization list in the same order they're declared in the class.

  • Avoid initialization order problems across translation units by replacing non-local static objects with local static objects.




Effective C++ Third Edition 55 Specific Ways to Improve Your Programs and Designs
Effective C++ Third Edition 55 Specific Ways to Improve Your Programs and Designs
ISBN: 321334876
EAN: N/A
Year: 2006
Pages: 102

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net