Solution

I l @ ve RuBoard

graphics/bulb.gif

Prelude

Little Stanislaus watched his mother as she rummaged busily in the refrigerator. "Be a dear, Stan," she said to him, without looking around, "and get me a container I can pour this juice into."

Stanislaus was a good boy, and he was pretty sure he knew what a container was. He looked about and saw just the thing ”a bright and shiny plastic bowl with a mesh bottom. Well, little Stanislaus knew that a bowl was a container, so happily he fetched it and held it out for his mother. She, busy, started to pour the juice before she'd quite got a good look at what she was pouring it into, and as Stanislaus held the sieve with a helpful smile, the juice ran merrily into the sieve and through it, onto the floor.

Stanislaus got a scolding that day. He didn't really think it was his fault, though. The bowl with the mesh bottom looked a lot like the other bowls. How was he to know it didn't meet his mother's container requirements?

In this Item, we'll take a look at some of the C++ standard's container requirements, some reasons to use the technique of taking pointers into a container, and ”to your possible consternation and likely surprise ”a standard container that isn't really a container at all.

Why Take Pointers or References into Containers?

1. Consider the following code:

 // Example 6-1(a): Is this code valid? safe? good? //  vector<char> v;   // ... populate v ...   char* p = &v[0];   // ... do something with *p ...  

Is this code valid?

The standard guarantees that if a conforming sequence (such as vector<char> ) provides operator[]() , that operator must return an lvalue of type char , which in turn can have its address taken. [6] So the answer to the question is: Yes, as long as v is not empty, the code in Example 6-1(a) is perfectly valid.

[6] In this Item, I use different phrases for this term ”for example, "pointer into a container," "pointer to an object inside a container," and "pointer to a contained object" ”but they are all intended to mean the same thing.

Is this code safe? Some C++ programmers are initially surprised at the idea of taking a pointer into a container, but the answer is: Yes, it's safe, as long as we remain aware of when the pointer might be invalidated, which is pretty much whenever an equivalent iterator would be invalidated. For example, clearly, if we decide to start inserting or erasing elements of the container, not only iterators but also any pointers into the container will be invalidated as the underlying memory moves.

But does this code make sense? Again, yes; it can make perfect sense to have pointers or references into a container. One common case is when you read a data structure into memory once, say on program startup, and thereafter you never modify it, but you do need to access it in different ways. In that case, it can make sense to have additional data structures that contain pointers into the main container to optimize different access methods . I'll give a simple example in the next section.

How Could the Code Be Improved?

Whether it's valid or not, how could it be improved?

Example 6-1(a) could be improved in the following way:

 // Example 6-1(b): An improvement (when it's possible) // vector<char> v; // ... populate v ...  vector<char>::iterator i = v.begin();  // ... do something with *i ... 

In general, it's not a bad guideline to prefer using iterators instead of pointers when you want to point at an object that's inside a container. After all, iterators are invalidated at mostly the same times and in the same ways as pointers, and one reason iterators exist is to provide a way to "point" at a contained object. If you have a choice, prefer to use iterators into containers.

Unfortunately, you can't always get the same effect with iterators that you get with pointers into a container. There are two main potential drawbacks to the iterator method, and when either applies we have to continue to use pointers:

  1. You can't always conveniently use an iterator where you can use a pointer (see example below).

  2. Using iterators might incur extra space and performance overhead when the iterator is an object and not just a bald pointer.

For example, say you have a map< Name ,PhoneNumber> that is loaded once at program startup and thereafter is only queried. The idea is that, given a name, this makes it easy to look up the corresponding phone number in the fixed dictionary. But what if you need to do the reverse lookup too? A clean solution could be to build a second structure, perhaps a map<PhoneNumber*,Name*,Deref> that enables the reverse lookup but avoids doubling the storage overhead, because, using pointers, there's no need to store each name and phone number twice. The second structure simply has pointers into the first.

That's fine, and it will work well, but note that this effect would be more difficult to achieve using iterators instead of pointers. Why? Because the natural iterator candidate, map<Name,PhoneNumber>::iterator , points to a pair<Name,PhoneNumber> , and there's no handy way to get an iterator to just the name or phone number part individually. (We could store the entire iterator and always explicitly say ->first and ->second everywhere, but that's inconvenient to code, and it would mean that the reverse-lookup map would have to be redesigned or replaced by a different structure.)

This brings us to the next (and in my opinion, the most interesting) point of this Item's theme:

2. Now consider the following code:

 // Example 6-2: Is this code valid? //  template<typename T>   void f( T& t )   {   typename T::value_type* p1 = &t[0];   typename T::value_type* p2 = &*t.begin();   // ... do something with *p1 and *p2 ...   }  

Before reading on, think about these questions: Is this code valid? If yes, under what conditions is it valid? Specifically, what kinds of things can we say about a T that makes this code valid? (Ignore runtime considerations, such as whether t happens to be in a suitable state to have t[0] called on it. We're interested in program legality here.)

When Is a Container Not a Container?

Well, is Example 6-2 valid? In short, yes, it can be valid.

The longer answer involves thinking about what kinds of T would make the code valid: What characteristics and abilities must a suitable T have? Let's do some detective work:

  1. To make the expression &t[0] valid, T::operator[]() must exist and must return something that understands operator&() , which in turn must return a valid T::value_type* (or something that can be meaningfully converted to a valid T::value_type* ).

    In particular, this is true of containers that meet the standard's container and sequence requirements and implement the optional operator[]() , because that operator must return a reference to the contained object. By definition, you can then take the contained object's address.

  2. To make the expression &*t . begin() valid, T::begin() must exist, and it must return something that understands operator*() , which in turn must return something that understands operator& , which in turn must return a valid T::value_type* (or something that can be meaningfully converted to a valid T::value_type* ).

    In particular, this is true of containers whose iterators meet the standard's iterator requirements, because the iterator returned by begin() must, when dereferenced using operator*() , return a reference to the contained object. By definition, you can then take the contained object's address.

Which Brings Us to the Awkward Part

So here's the thing: The code in Example 6-2 will work for any container in the standard library that supports operator[]() . If you take away the line containing " &t[0] ", it will work for every container in the standard library, except for std::vector<bool> .

To see why, note that the following template works for every type T except bool .

 // Example 6-3: Also works for every T except bool // template<typename T> void g( vector<T>& v ) {   T* p = &v.front();   // ... do something with *p ... } 

Do you see why? At first (or even second and third) glance, it may seem odd that this is valid for all types but one. What makes vector<bool> so special?

The reason is as simple as it is unfortunate: Not all the templates defined in the C++ standard library that look like containers actually are containers. In particular, the standard library requires a specialization of vector for bool , and the specialization

vector<bool> is not a container

in that it does not meet the standard library's requirements for containers. True, vector<bool> does appear in the "containers and sequences" part of the standard; and, true, there's no note to indicate that it's really neither a container nor a sequence. But the fact is that vector<bool> is not a container, and so it should not be surprising that it cannot always be used as one. [7]

[7] If anyone else had written vector<bool> , it would have been called " nonconforming " and "nonstandard." But it's in the standard, so it's harder to call it those names at this point. One correct solution to this discrepancy would be to remove clause 23.2.5, the vector<bool> specialization requirement, so that vector<bool> would be just another instantiation of the primary vector<> template, and so would really be what it claims to be ” a vector of plain old bool s. (Besides, many aspects of vector<bool> are redundant; std::bitset was designed for this kind of packed-representation work.)

If this state of affairs seems a little strange to you, you're not alone, but there is a logical explanation.

The Life and Times of vector<bool>

The vector<bool> specialization is commonly (mis)used as an example right from the C++ standard showing how to write a proxied container. A "proxied container" is a container whose objects aren't accessed or manipulated directly. Instead of giving you pointers or references to a contained object, a proxied container gives you proxy objects that can be used to indirectly access or manipulate a contained object. Proxied collections can be useful in cases where the objects within the collection cannot always be reliably accessed directly as though they were in memory ”as, for example, with a disk-based collection that automatically pages pieces of itself in and out of memory under the covers as needed. So the mistaken idea that many people have is that vector<bool> shows how to make such a proxied collection meet the requirements of a "container" in the sense defined by the standard library.

The fly in the ointment is that vector<bool> is not a "container" in the sense of meeting the standard's container and iterator requirements. [8] Proxied containers are categorically disallowed by the standard's container and iterator requirements. For example, a container<T>::reference must be a true reference ( T& ), never a proxy. Iterators have similar requirements, so dereferencing a forward, bidirectional, or random-access iterator must yield a true reference ( T& ), never a proxy. These points preclude any proxy-based containers from meeting the standard's current container requirements.

[8] Obviously it's a container in the general sense of being a collection that you can put things into and take things out of.

The reason vector<bool> is of necessity nonconforming is that it attempts to do extra work invisibly , under the covers, in an attempt to optimize for space. A normal bool object is at least as big as a char ; sizeof(bool) is implementation-defined and will vary from compiler to compiler, but it must be at least 1. Does that seem extravagant? Some people thought so, so vector<bool> tries to be more efficient about the amount of storage it uses. Instead of storing a full char or int for every contained " bool ," it packs the bool s and stores them as individual bits (inside, say, char s or int s) in its internal representation. This gives vector<bool> a space advantage of at least 8:1 on platforms with 8-bit "normal" bool s, and possibly more. (By this point, some of you may already have noticed that this optimization makes the name vector<bool> rather misleading. The things inside aren't really normal bool s at all.)

One consequence of this packed representation is that vector<bool> clearly can't just return a normal bool& from its operator[] or its dereferenced iterators. [9] Instead, it has to return a "reference-like" proxy object that in many ways looks and smells like a bool& but which is not a bool& . Unfortunately, that can also make access into a vector<bool> slower, because we have to deal with proxies instead of direct pointers and references (not to mention the extra bit-fiddling). So vector<bool> is not a pure optimization, but a trade-off that favors "less space" at the expense of " potentially slower speed." More about this in a moment.

[9] This is because there is no standard way to express a pointer or a reference to a bit.

For example, whereas the functions vector<T>::operator[]() and vector<T>:: front() normally return simply a T& (a reference to the contained T ), vector<bool>::operator[]() and vector<bool>::front() actually return a "reference-like" proxy object of type vector<bool>::reference . To make this proxy object look and feel as much like a real bool& as possible, reference provides an implicit conversion to bool , operator bool() , so that a reference can be used most anywhere a bool can be used, at least as a value. It also provides member functions operator=(bool) , operator=(reference&) , and flip() , which can be used to indirectly change the value of the bool that's actually inside the container, enabling natural coding styles such as "v[0] = v[1];" . (For greater amusement , note that the one service vector<bool>::reference can't ever provide is a bool* operator&(); . That is, it can't provide a suitable address-of operator because there's no way to take the address of the individual bit inside the vector<bool> for which the reference object is serving as proxy.)

Proxied Containers and STL Assumptions

Bottom line: std::vector<bool> may be standard, but it's not a container. (It's also not a particularly good bit vector, because it's missing a number of bit-twiddling operations that would be appropriate and which are present in std::bitset .)

Further, both the original STL's container requirements and the C++ standard's container requirements are based on the implicit assumption (among others) that dereferencing an iterator: (a) is a constant-time operation; and (b) requires negligible time compared to other operations. Neither of these assumptions is true for a disk-based container or a packed-representation container. For a disk-based container, access through the proxy object may require a disk seek, which can be orders of magnitude slower than main memory access. To illustrate , consider that you would be unlikely to apply a standard algorithm such as std::find() to a disk-based container; the performance would be abysmal compared with a special-purpose replacement, largely because the fundamental performance assumptions for in-memory containers do not apply to disk-based containers. (For similar reasons, even in-memory containers, such as std::map , provide their own find() as a member function.) For a packed-representation container like vector<bool> , access through the proxy object requires bitwise operations, which are generally much slower than manipulation of a native type like an int . Further, whenever the proxy object construction and destruction can't be completely optimized away by the compiler, managing the proxies themselves adds further overhead.

Although vector<bool> was in part intended to be an example of how to write a proxied container, so that other programmers could follow its example when writing disk-based or other containers whose contained objects cannot reliably be accessed directly, it also serves to demonstrate that the standard's current container requirements do not allow proxied containers.

Proxied collections are a useful tool and can often be appropriate, especially for collections that can become very large. Every programmer should know about them. They just don't fit as well into STL as many people thought, that's all. Even vector<bool> is still a perfectly good model for how to write a proxied collection; it's just not a "container" in the STL sense, and it should be called something else (some people asked for " bitvector " ) so that people wouldn't think that it has anything to do with a conforming container.

Beware Premature Optimization

If you've read Exceptional C++ , you're already familiar with my regular harangues against premature optimization. [10] The rules boil down to: "(1) Don't optimize early. (2) Don't optimize until you know that it's needed. (3) Even then, don't optimize until you know what's needed, and where . " As others have put it even more succinctly:

[10] See also Item 12.

  • The First Rule of Optimization: Don't do it.

  • The Second Rule of Optimization: Don't do it yet.

By and large, programmers ”that includes you and me ”are notoriously bad at guessing the actual space/time performance bottlenecks in their own code. If you don't have performance profiles or other empirical evidence to guide you, you can easily spend days optimizing something that doesn't need optimizing and that won't measurably affect runtime space or time performance. What's even worse , however, is that when you don't understand what needs optimizing, you may actually end up pessimizing (degrading your program) by saving a small cost while unintentionally incurring a large cost. [11] Once you've run performance profiles and other tests, and you actually know that a particular optimization will help you in your situation, then it's the right time to optimize.

[11] For an interesting example of this, see also Appendix A "Optimizations That Aren't (in a Multithreaded World)" for a case in point. That appendix dissects a popular optimization that turns out to be, quite unintentionally, an optimization principally for single-threaded environments, and which can actually be a distinct pessimization for (possibly) multithreaded code.

By now, you can probably see where I'm going with this: The most premature optimization of all is an optimization that's enshrined in the standard and that can't easily be avoided. In particular, vector<bool> intentionally favors "less space" at the expense of "slower speed," and forces this optimization choice on all programs. This optimization choice implicitly assumes that virtually all users of a vector of bool s will prefer using less space at the expense of potentially slower speed, that they will be more space-constrained than performance-constrained, and so on. This is clearly untrue; on many popular implementations , a bool is the same size as an 8-bit char , and a vector of 1,000 bool s consumes about 1K of memory. Saving that 1K is unimportant in many applications. (Yes, clearly a 1K space saving is important in some environments, such as embedded systems, and clearly some applications may manipulate vector s containing millions of bool s, where the saved megabytes are real and significant.) The point is that the correct optimization depends on the application. If you are writing an application that manipulates a vector<bool> with 1,000 entries in an inner loop, it is much more likely that you would benefit from potentially faster raw performance due to reduced CPU workload (without the overhead of proxy objects and bit-fiddling) than from the marginal space savings, even if the space savings would reduce cache misses.

So What Should You Do?

If you are using vector<bool> in your code, you may be using it with algorithms that expect a real container (and that, hence, may or may not work portably), and you are using an optimization that favors space over time (possibly imperceptibly). Either point might be inappropriate for your application. The easy way to discover the first is to use the code until it fails to compile; that day may never come. The only way to discover the second is by running empirical tests to measure your code's performance the way it uses vector<bool> . In many cases, the performance difference compared to something like vector<int> , if there is one, may well be negligible.

If you are being affected by vector<bool> 's noncontainerness, or if the measured performance difference is not negligible in your environment and the difference is material in your application, then don't use vector<bool> . You might at first think of using a vector<char> or vector<int> instead (that is, store a bool as a char or an int ), and use casts when setting values in the container, but that's tedious . A superior workaround is to use a deque<bool> ; it's a far simpler solution, and ties in with the advice about deque given in Item 7.

In summary:

  1. vector<bool> is not a container.

  2. vector<bool> attempts to illustrate how to write standard-conforming proxied containers that do extra work invisibly under the covers. Unfortunately, that's not a sound idea, because although a proxied collection can be an important and useful tool, by definition it must violate the standard's container requirements and therefore can never be a conforming container (see #1).

    Corollary: Go ahead and write proxied collections, but don't attempt to write ones that still meet the standard's container or sequence requirements. First, it's not possible. Second, the main reason for collections to conform to the standard container requirements is to be used with the standard algorithms, yet the standard algorithms are typically inappropriate for proxied containers because proxied containers have different performance characteristics than do plain-and-in-memory containers.

  3. vector<bool> 's name is a trifle misleading because the things inside aren't even standard bool s. A standard bool is at least as big as a char so that it can be used "normally." So, in fact, vector<bool> does not even really store bool s, despite the name.

  4. vector<bool> forces a specific optimization on all users by enshrining it in the standard. That's probably not a good idea, even if the actual performance overhead turns out to be negligible for a given compiler for most applications; different users have different requirements.

A last word of advice: Optimize wisely . Never succumb to the temptation to perform premature optimizations until you have empirical evidence that the optimization is beneficial in your situation.

I l @ ve RuBoard


More Exceptional C++
More Exceptional C++: 40 New Engineering Puzzles, Programming Problems, and Solutions
ISBN: 020170434X
EAN: 2147483647
Year: 2001
Pages: 118
Authors: Herb Sutter

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net