Merging Data

Problem

You have two sorted sequences and you need to merge them.

Solution

Use either the merge or inplace_merge function template. merge merges two sequences and puts the results in a third, and inplace_merge merges two contiguous sequences. Example 7-5 shows how.

Example 7-5. Merging two sequences

#include 
#include 
#include 
#include 
#include 
#include 
#include "utils.h" // For printContainer( ): see 7.10

using namespace std;

int main( ) {

 vector v1, v2, v3;

 v1.push_back("a");
 v1.push_back("c");
 v1.push_back("e");

 v2.push_back("b");
 v2.push_back("d");
 v2.push_back("f");

 v3.reserve(v1.size( ) + v2.size( ) + 1);

 // Use a back_inserter from iterator to avoid having to put
 // a bunch of default objects in the container. But this doesn't
 // mean you don't have to use reserve!
 merge(v1.begin( ), v1.end( ),
 v2.begin( ), v2.end( ),
 back_inserter >(v3));

 printContainer(v3);

 // Now make a mess
 random_shuffle(v3.begin( ), v3.end( ));
 sort(v3.begin( ), v3.begin( ) + v3.size( ) / 2);
 sort(v3.begin( ) + v3.size( ) / 2, v3.end( ));

 printContainer(v3);

 inplace_merge(v3.begin( ), v3.begin( ) + 3, v3.end( ));

 printContainer(v3);

 // If you are using two lists, though, use list::merge instead.
 // As a general rule, blah blah... 
 list lstStr1, lstStr2;

 lstStr1.push_back("Frank");
 lstStr1.push_back("Rizzo");
 lstStr1.push_back("Bill");
 lstStr1.push_back("Cheetoh");

 lstStr2.push_back("Allie");
 lstStr2.push_back("McBeal");
 lstStr2.push_back("Slick");
 lstStr2.push_back("Willie");

 lstStr1.sort( ); // Sort these or merge makes garbage!
 lstStr2.sort( );

 lstStr1.merge(lstStr2); // Note that this only works with other
 // lists of the same type

 printContainer(lstStr1);
}

The output of Example 7-5 looks like this:

-----
a
b
c
d
e
f
-----
b
d
e
a
c
f
-----
a
b
c
d
e
f
-----
Allie
Bill
Cheetoh
Frank
McBeal
Rizzo
Slick
Willie

 

Discussion

merge merges two sorted sequences and places the result into a third, optionally using a caller-supplied comparison functor to determine when one element is less than anotherit uses operator< by default. The complexity is linear: the number of comparisons performed during the merge is the sum of the two sequence lengths minus one. The types of the elements in each sequence must be comparable with operator< (or the comparison functor you supply), and they must be convertible to the type of element in the output sequence via copy constructor or assignment; or there must be conversion operators defined such that the type of the element in the output sequence has assignment and copy construction defined for both types.

The declarations for merge look like this:

void merge(In1 first1, In1 last1, In2 first2, In2 last2, Out result)
void merge(In1 first1, In1 last1, In2 first2, In2 last2, Out result,
 BinPred comp)

Using merge is simple enough. Both sequences must be sorted (or the output will be garbage), and neither is modified by merge. The output iterator where the results are going to go must have enough room to accommodate the sum of the lengths of each input sequence. You can do this by explicitly reserving enough storage, or, as I did in Example 7-5, by using a back_inserter:

merge(v1.begin( ), v1.end( ),
 v2.begin( ), v2.end( ),
 back_inserter(v3));

A back_inserter is a class defined in that provides a convenient way to create an output iterator that calls push_back on a sequence every time you assign a value to it. This way, you don't have to explicitly size the output sequence. The following call creates a back_inserter for a vector named v3.

back_inserter(v3);

You don't have to specify the template arguments because a back_inserter is a function template, not a class template, so the type of the call arguments can be deduced. An equivalent call with explicit template arguments would look like this:

back_inserter >(v3);

Note, however, that sometimes you ought to explicitly size the output sequence, especially when the output sequence is a vector. A vector may need to keep resizing itself if you simply add items to it with push_back, and resizing is an expensive operation. See Recipe 6.2 for more details.

If there are two equivalent elements in the sequences, the one from the first sequence will precede the one from the second. Therefore, if you call merge twice with the input sequences switched, the resulting output sequences may be different (predictable and correct, but different).

Merging lists is a good example of a situation where you can use a sequence's member function or a similar standard algorithm. You should prefer a member function over a standard algorithm that does the same thing, but this doesn't always work, and here's an example of why.

Consider your list of strings from Example 7-5:

lstStr1.sort( ); // Sort these or merge makes garbage!
lstStr2.sort( );
lstStr1.merge(lstStr2); // This is list::merge

There are two reasons why this is different than calling std::merge. To begin with, both lists must have the same type of elements. This is because list::merge is declared like this:

void merge(list& lst)
template 
void merge(list& lst, Compare comp)

Where T is the same type as in the list class template itself. So you can't, for example, merge a list of null-terminated character arrays into a list of strings.

The other thing that's different is that list::merge erases the input sequence, while std::merge leaves the two input sequences untouched. Most likely, list::merge will have better performance, since in most cases the elements in the list are relinked instead of copied; but relinking is not guaranteed, so step into the source or experiment to be sure.

You can also merge two contiguous sequences with inplace_merge. inplace_merge is different from merge because it merges its two sequence arguments, well, in place. In other words, if you have two sequences that are contiguous (i.e., they are parts of the same sequence), and they are sorted, and you want the entire sequence sorted, you can use inplace_merge instead of a sort algorithm. The advantage is that inplace_merge can run in linear time if there is enough additional memory available. If there isn't, it runs in n log n, which is the average complexity of sort anyway.

The declaration for inplace_merge is a little different from merge:

void inplace_merge(Bid first, Bid mid, Bid last)
void inplace_merge(Bid first, Bid mid, Bid last, BinPred comp)

inplace_merge requires bidirectional iterators, so you can't use it interchangeably with merge, but in most cases either should work. Like merge, it uses operator< by default to determine elements' relative order, and comp if you supply it.

Building C++ Applications

Code Organization

Numbers

Strings and Text

Dates and Times

Managing Data with Containers

Algorithms

Classes

Exceptions and Safety

Streams and Files

Science and Mathematics

Multithreading

Internationalization

XML

Miscellaneous

Index



C++ Cookbook
Secure Programming Cookbook for C and C++: Recipes for Cryptography, Authentication, Input Validation & More
ISBN: 0596003943
EAN: 2147483647
Year: 2006
Pages: 241

Flylib.com © 2008-2020.
If you may any questions please contact us: flylib@qtcs.net