Item 21: Don't try to return a reference when you must return an objectOnce programmers grasp the efficiency implications of pass-by-value for objects (see Item 20), many become crusaders, determined to root out the evil of pass-by-value wherever it may hide. Unrelenting in their pursuit of pass-by-reference purity, they invariably make a fatal mistake: they start to pass references to objects that don't exist. This is not a good thing. Consider a class for representing rational numbers, including a function for multiplying two rationals together: class Rational { public: Rational(int numerator = 0, // see Item 24 for why this int denominator = 1); // ctor isn't declared explicit ... private: int n, d; // numerator and denominator friend const Rational // see Item 3 for why the operator*(const Rational& lhs, // return type is const const Rational& rhs); }; This version of operator* is returning its result object by value, and you'd be shirking your professional duties if you failed to worry about the cost of that object's construction and destruction. You don't want to pay for such an object if you don't have to. So the question is this: do you have to pay? Well, you don't have to if you can return a reference instead. But remember that a reference is just a name, a name for some existing object. Whenever you see the declaration for a reference, you should immediately ask yourself what it is another name for, because it must be another name for something. In the case of operator*, if the function is to return a reference, it must return a reference to some Rational object that already exists and that contains the product of the two objects that are to be multiplied together. There is certainly no reason to expect that such an object exists prior to the call to operator*. That is, if you have Rational a(1, 2); // a = 1/2 Rational b(3, 5); // b = 3/5 Rational c = a * b; // c should be 3/10 it seems unreasonable to expect that there already happens to exist a rational number with the value three-tenths. No, if operator* is to return a reference to such a number, it must create that number object itself. A function can create a new object in only two ways: on the stack or on the heap. Creation on the stack is accomplished by defining a local variable. Using that strategy, you might try to write operator* this way: const Rational& operator*(const Rational& lhs, // warning! bad code! const Rational& rhs) { Rational result(lhs.n * rhs.n, lhs.d * rhs.d); return result; } You can reject this approach out of hand, because your goal was to avoid a constructor call, and result will have to be constructed just like any other object. A more serious problem is that this function returns a reference to result, but result is a local object, and local objects are destroyed when the function exits. This version of operator*, then, doesn't return a reference to a Rational it returns a reference to an ex-Rational; a former Rational; the empty, stinking, rotting carcass of what used to be a Rational but is no longer, because it has been destroyed. Any caller so much as glancing at this function's return value would instantly enter the realm of undefined behavior. The fact is, any function returning a reference to a local object is broken. (The same is true for any function returning a pointer to a local object.) Let us consider, then, the possibility of constructing an object on the heap and returning a reference to it. Heap-based objects come into being through the use of new, so you might write a heap-based operator* like this: const Rational& operator*(const Rational& lhs, // warning! more bad const Rational& rhs) // code! { Rational *result = new Rational(lhs.n * rhs.n, lhs.d * rhs.d); return *result; } Well, you still have to pay for a constructor call, because the memory allocated by new is initialized by calling an appropriate constructor, but now you have a different problem: who will apply delete to the object conjured up by your use of new? Even if callers are conscientious and well intentioned, there's not much they can do to prevent leaks in reasonable usage scenarios like this: Rational w, x, y, z; w = x * y * z; // same as operator*(operator*(x, y), z) Here, there are two calls to operator* in the same statement, hence two uses of new that need to be undone with uses of delete. Yet there is no reasonable way for clients of operator* to make those calls, because there's no reasonable way for them to get at the pointers hidden behind the references being returned from the calls to operator*. This is a guaranteed resource leak. But perhaps you notice that both the on-the-stack and on-the-heap approaches suffer from having to call a constructor for each result returned from operator*. Perhaps you recall that our initial goal was to avoid such constructor invocations. Perhaps you think you know a way to avoid all but one constructor call. Perhaps the following implementation occurs to you, an implementation based on operator* returning a reference to a static Rational object, one defined inside the function: const Rational& operator*(const Rational& lhs, // warning! yet more const Rational& rhs) // bad code! { static Rational result; // static object to which a // reference will be returned result = ... ; // multiply lhs by rhs and put the // product inside result return result; } Like all designs employing the use of static objects, this one immediately raises our thread-safety hackles, but that's its more obvious weakness. To see its deeper flaw, consider this perfectly reasonable client code: bool operator==(const Rational& lhs, // an operator== const Rational& rhs); // for Rationals Rational a, b, c, d; ... if ((a * b) == (c * d)) { do whatever's appropriate when the products are equal; } else { do whatever's appropriate when they're not; } Guess what? The expression ((a*b) == (c*d)) will always evaluate to true, regardless of the values of a, b, c, and d! This revelation is easiest to understand when the code is rewritten in its equivalent functional form: if (operator==(operator*(a, b), operator*(c, d))) Notice that when operator== is called, there will already be two active calls to operator*, each of which will return a reference to the static Rational object inside operator*. Thus, operator== will be asked to compare the value of the static Rational object inside operator* with the value of the static Rational object inside operator*. It would be surprising indeed if they did not compare equal. Always. This should be enough to convince you that returning a reference from a function like operator* is a waste of time, but some of you are now thinking, "Well, if one static isn't enough, maybe a static array will do the trick...." I can't bring myself to dignify this design with example code, but I can sketch why the notion should cause you to blush in shame. First, you must choose n, the size of the array. If n is too small, you may run out of places to store function return values, in which case you'll have gained nothing over the single-static design we just discredited. But if n is too big, you'll decrease the performance of your program, because every object in the array will be constructed the first time the function is called. That will cost you n constructors and n destructors, even if the function in question is called only once. If "optimization" is the process of improving software performance, this kind of thing should be called "pessimization." Finally, think about how you'd put the values you need into the array's objects and what it would cost you to do it. The most direct way to move a value between objects is via assignment, but what is the cost of an assignment? For many types, it's about the same as a call to a destructor (to destroy the old value) plus a call to a constructor (to copy over the new value). But your goal is to avoid the costs of construction and destruction! Face it: this approach just isn't going to pan out. (No, using a vector instead of an array won't improve matters much.) The right way to write a function that must return a new object is to have that function return a new object. For Rational's operator*, that means either the following code or something essentially equivalent: inline const Rational operator*(const Rational& lhs, const Rational& rhs) { return Rational(lhs.n * rhs.n, lhs.d * rhs.d); } Sure, you may incur the cost of constructing and destructing operator*'s return value, but in the long run, that's a small price to pay for correct behavior. Besides, the bill that so terrifies you may never arrive. Like all programming languages, C++ allows compiler implementers to apply optimizations to improve the performance of the generated code without changing its observable behavior, and it turns out that in some cases, construction and destruction of operator*'s return value can be safely eliminated. When compilers take advantage of that fact (and compilers often do), your program continues to behave the way it's supposed to, just faster than you expected. It all boils down to this: when deciding between returning a reference and returning an object, your job is to make the choice that offers correct behavior. Let your compiler vendors wrestle with figuring out how to make that choice as inexpensive as possible. Things to Remember
|