Solution

I l @ ve RuBoard

graphics/bulb.gif

This Item answers the following questions:

  • Can you use a vector interchangeably with a C-style array (say, with an existing function that expects an array)? How? What are the issues?

  • Why would you usually prefer using a vector instead of a deque ?

  • How do you shrink a vector whose capacity has grown larger than you need? In particular, how do you "shrink-to-fit" (so that the vector's size is just big enough to hold only its current contents), or clear a vector 's internal storage entirely (so that the vector is truly empty, with no contents and little or no under-the-covers preallocated storage)? Can the same technique be used for other containers?

Using a vector as an Array: A Motivating Example

C++ programmers are strongly encouraged to use the standard vector template instead of C-style arrays. Using a vector is easier and safer than using an array, because a vector is a better-abstracted and better-encapsulated form of containerfor example, a vector can grow and shrink as needed, whereas an array has a fixed size.

Nonetheless, there's still a lot of code out there that is designed to work with C-style arrays. For example, consider the following code:

 // Example 7-1(a): A function that operates // on a C-style array // int FindCustomer(const char* szName,     // name to search for   Customer*   pCustomers, // pointer to beginning                           //  of a Customer array   size_t      nLength)   //  with nLength entries {   // performs a (possibly optimized) search and   // returns the index at which the specified   // name was found } 

To use the above function, we might create, populate, and pass in an array as follows :

 // Example 7-1(b): Using an array // Customer c[100];   //-- populate the contents of c somehow - int i = FindCustomer("Fred Jones", &c[0], 100); 

Example 7-1(b) is all well and good, but as modern C++ programmers aren't we supposed to use vector s instead of arrays? Indeed, wouldn't it be nice if we could have the best of both worlds use a vector for its convenience and safety, yet still use the vector 's contents with functions that expect an array? This desire comes up most often when an organization is migrating to C++ and people want to write new parts of the system in the "new and better" way while still tying in easily with a legacy code base.

So let's use a vector . Our first attempt could look something like this:

 // Example 7-1(c): Using a vector -- will this work? // vector<Customer> c; //-- populate the contents of c somehow - int i = FindCustomer("Fred Jones", &c[0], c.size()); 

In Example 7-1(c), when it comes time to call FindCustomer() , the idea is to pass in the address of the first element, and the number of elements, which is pretty much just what we did in Example 7-1(b).

Before reading on, stop for a moment and consider these questions: What potential benefits does the code in Example 7-1(c) have over the code in Example 7-1(b)? Do you see any potential problems with Example 7-1(c), or do you think that it will work as intended?

A Fly in the Ointment?

I didn't just pull Example 7-1(c) out of a hat. It turns out there are a lot of real-world C++ programmers who have done exactly this, and with good reason, because code like that in Example 7-1(c) illustrates some of the important benefits that come from using a vector :

  • We don't have to know the potential size of c in advance. In many C programs, arrays are allocated pessimistically so that they'll be big enough for most intended uses. Unfortunately, this means that space is needlessly wasted much of the time when the full size isn't needed, after all. (Worse, it means the program breaks if it ever ends up needing more space than we'd thought.) With a C++ vector , we can grow the container as needed instead of just guessing. Also, if we do happen to know in advance how many objects we're going to put into the vector , we can simply say so (by first calling c . reserve(100) , if we want to reserve space for 100 elements), so there's no performance disadvantage with a vector compared to an array, because we have a way to avoid repeated reallocations as we insert new elements into the vector .

  • We don't have to separately track the actual length of the array so that we can pass it as the final ( nLength ) parameter to FindCustomer() . Yes, you can use a C array and use workarounds that make remembering or recalculating the length easier, [12] but none of those workarounds are as easy and safe as just writing c . size() .

    [12] The most typical approaches are to use a const value or a #defined name for the size of the array, or else a macro that expands to sizeof(c)/sizeof(c[0]) to compute the size of the array.

In short, as far as I know, Example 7-1(c) has always worked on every implementation of std::vector that has ever existed.

Until 2001, a minor fly in the ointment was that the 1998 C++ standard [C++98] didn't actually guarantee that Example 7-1(c) would work portably on every C++ platform you might buy. This is because Example 7-1(c) assumes that the internal contents of the vector are stored contiguously in the same format as is an array, but the original C++ standard didn't require vendors to implement vector that way. If, for example, a vector implementation were to play games and store its contents in some other way (such as contiguously but in backwards order, or in sequential order but with an extra byte of housekeeping information between elements), then Example 7-1(c) would, in fact, fail miserably. This was fixed in 2001 as part of Technical Corrigendum #1 to the C++ standard, which now requires that the contents of a vector are stored contiguously, just like an array.

Even if your compiler doesn't include support for the Technical Corrigendum yet, though, Example 7-1(c) should work in practice today. Again, I don't know of a commercial vector implementation that doesn't store its elements contiguously.

Bottom line: Prefer using vector (or deque ; see below) instead of arrays.

In Most Cases, Prefer Using vector

1. In the standard library, vector and deque provide similar services. Which should you typically use? Why? Under what circumstances would you use the other?

The C++ standard [C++98], in section 23.1.1, offers some advice on which containers to prefer. It says:

vector is the type of sequence that should be used by default. deque is the data structure of choice when most insertions and deletions take place at the beginning or at the end of the sequence.

vector and deque offer nearly identical interfaces and are generally interchangeable. deque also offers push_front() and pop_front() , which vector does not. (True, vector offers capacity() and reserve() , which deque does not, but that's no loss. Those functions are actually something of a weakness in vector , as I'll demonstrate in a moment.)

The main structural difference between vector and deque lies in the way the two containers organize their internal storage. Under the covers, a deque allocates its storage in pages, or " chunks ," with a fixed number of contained elements in each page. This is why a deque is often compared to (and pronounced as) a "deck" of cards, [13] although its name originally came from "double-ended queue" because of its ability to insert elements efficiently at either end of the sequence. On the other hand, a vector allocates a contiguous block of memory, and can only insert elements efficiently at the end of the sequence.

[13] The first use I know of the "deck of cards" analogy for deque was by Donald Knuth in [Knuth97].

The paged organization of a deque offers several benefits:

  • A deque offers constant-time insert() and erase() operations at the front of the container, whereas a vector does not, hence the note in the standard about using a deque if you need to insert or erase at both ends of the sequence.

  • A deque uses memory in a more operating system-friendly way. For example, a 10-megabyte vector uses a single 10-megabyte block of memory. On some operating systems that single block can be less efficient in practice than a 10-megabyte deque that can fit in a series of smaller blocks of memory. On other operating systems, a contiguous address space can be built of smaller blocks under the covers anyway. On such platforms, this deque noncontiguity advantage won't matter.

  • A deque is somewhat easier to use, and inherently more efficient for growth, than a vector . The only operations supplied by vector that deque doesn't have are capacity() and reserve() . That's because deque doesn't need them! For vector , calling reserve() before a large number of push_back() s can eliminate reallocating ever-larger versions of the same buffer every time it finds out that the current one isn't big enough, after all. A deque has no such problem, and having a deque::reserve() before a large number of push_back() s would not eliminate any allocations (or any other work), because none of the allocations is redundant. The deque has to allocate the same number of extra pages whether it does it all at once or as elements are actually appended.

Interestingly, note that within the C++ standard itself, the stack adapter prefers deque . Even though a stack can only grow in one direction and so never needs insertion in the middle or at the other end, the default implementation is required to be a deque :

 namespace std {   template<typename T, typename Container =  deque<T> >  class stack   {      // ...   }; } 

Despite this, for readability and simplicity, prefer using vector by default in your programs unless you know you need to efficiently insert and erase at the beginning of the container and don't need the underlying objects to be stored contiguously. Note that vector is to C++ what the array is to Cthe default container type to use in your programs, and one that does a reasonably good all-around job unless you know you need something different.

The Incredible Shrinking vector

As already noted, it's nice that vector manages its own storage, and that we can give it hints about how much capacity to keep ready under the covers as it grows. If we have a vector<T> c that we're about to grow to contain 100 elements, we can first call c . reserve(100) and incur, at most, a single reallocation hit. This allows optimally efficient growth.

But what if we're doing the opposite ? What if we're using a vector that's pretty big, and then we remove elements we no longer need and want the vector to shrink to fitthat is, we want it to get rid of the now-unneeded extra capacity? You might think that the following would work:

2. What does the following code do?

 // Example 7-2(a): A nave attempt at shrinking a vector. //  vector<C> c(10000);  

Now c.size() == 10000 , and c.capacity() >= 10000 .

Next , we erase all but the first ten elements:

  c.erase(c.begin()+10, c.end());  

Now c . size() == 10 , and we know we don't need all that extra capacity, but the following line does not shrink c 's internal buffer to fit:

  c.reserve(10);  

The reason that the last line in Example 7-2(a) does not do what one might expect is that calling reserve() will never shrink the vector 's capacity. Calling reserve() can only increase the capacity, or do nothing if the capacity is already sufficient.

Fortunately, there is a right way to get the intended effect:

 // Example 7-2(b): The right way to shrink-to-fit a vector. // vector<Customer> c(10000); // ...now c.capacity() >= 10000... // erase all but the first 10 elements c.erase(c.begin()+10, c.end()); // the following line does shrink c's // internal buffer to fit (or close)  vector<Customer>(c).swap(c);  // ...now c.capacity() == c.size(), or // perhaps a little more than c.size() 

Do you see how the shrink-to-fit statement works? It's a little subtle:

  1. First, we create a temporary (unnamed) vector<Customer> and initialize it to hold the same contents as c . The salient difference between the temporary vector and c is that, while c still carries around a lot of extra capacity in its oversize internal buffer, the temporary vector has just enough capacity to hold its copy of c 's contents. (Some implementations may choose to round up the capacity slightly to their next larger internal " chunk size," with the result that the capacity actually ends up being slightly larger than the size of the contents.)

  2. Next, we call swap() to exchange the internals of c with the temporary vector . Now the temporary vector owns the oversize buffer with the extra capacity that we're trying to get rid of, and c owns the "right- sized " buffer with the just-enough capacity.

  3. Finally, the temporary vector goes out of scope, carrying away the old oversize buffer with it. The old buffer is deleted when the temporary vector is destroyed . All we're left with is c itself, but now c has a "right-sized" capacity.

Note that this procedure is not needlessly inefficient. Even if vector had a special-purpose shrink_to_fit() member function, it would have to do pretty much all the same work just described.

3. A vector or deque typically reserves extra internal storage as a hedge against future growth, to prevent too-frequent reallocation as new elements are added. Is it possible to completely clear a vector or deque (that is, not only remove all contained elements, but also free all internally reserved storage)? Demonstrate why or why not.

To completely clear a vector so that it has no contents and no extra capacity at all, the code is nearly identical. We just initialize the temporary vector to be empty instead of making it a copy of c :

 // Example 7-3: The right way to clear out a vector. // vector<Customer> c(10000); // ...now c.capacity() >= 10000... // the following line makes // c be truly empty  vector<Customer>().swap(c);  // ...now c.capacity() == 0, unless the // implementation happens to enforce a // minimum size even for empty vectors 

Again, note that the vector implementation you use may choose to make even empty vector s have some slight capacity, but now you're guaranteed that c 's capacity will be the smallest possible allowed by your implementation. It will be the same as the capacity of an empty vector .

These techniques work for deque , too, but you don't need to do this kind of thing for list , set , map , multiset , or multimap , because they allocate storage on an "exactly-as-needed" basis and so they should never have excess capacity.

Summary

It's safe to use vector s interchangeably with C-style arrays, and you should always prefer to use vector s or deque s instead of arrays. Use a vector as your default container type, unless you know you need something different and you don't need all the contained elements to be contiguous in memory. Use the swap-with-a-temporary-container idiom if you want to shrink an existing vector or deque . Finally, as noted in Item 6, unless you know that you really need the space optimization, always use deque<bool> instead of vector<bool> .

I l @ ve RuBoard


More Exceptional C++
More Exceptional C++: 40 New Engineering Puzzles, Programming Problems, and Solutions
ISBN: 020170434X
EAN: 2147483647
Year: 2001
Pages: 118
Authors: Herb Sutter

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net