4.2 Reusable Code | Applied C++: Practical Techniques for Building Better Software

As software evolves through the addition of new features or routine maintenance, the intent of the original software tends to change. By intent we mean the original purpose and design of the software. Well-designed software should facilitate future design changes without breaking the existing system. For example, in our image framework, we certainly don't want the whole system to break if we define a 32-bit grayscale image type when the system already supports an 8-bit grayscale image. Adding a binary (i.e., 1-bit) image type is another story; some major design changes are required in order to support this type. Poorly designed software can make any small design change difficult or impossible to implement.

In this section we discuss reusable software components . By reusable, we mean not just software that can be used on multiple projects and/or platforms, but also software that can serve multiple purposes in the same application. A component's reusability does not guarantee a well-designed or cleanly written object, but it certainly increases the odds of being such. It is not uncommon to have developers whose entire mission is to develop reusable components for an organization or group .

To see what we mean, let us look at a well-known and well-designed reusable component: the string object in the Standard Template Library. String classes have been around since the beginning of C++. The std::string class is an elegantly designed package that has almost everything you could need. But even with this functionality, you don't have to search too hard to find somebody who thinks the library has too little functionality or, conversely, too much functionality. Let's use this class to explore the issues surrounding reusable software.

Reusable software solves a generic problem. It should address the needs of a company, workgroup, or class of applications. For example, the standard library string class is globally generic. It not only handles character manipulation of char- sized data, it can also work with characters of various sizes:

 typedef basic_string<char> string; typedef basic_string<wchar_t> wstring;

We will discuss this issue in more detail later when we talk about coding for internationalization, but suffice it to say, a character is frequently larger than what a char can store. From the standpoint of the std::string class (or more precisely, the std::basic_string class), a string is simply an array of characters whose size can be specified by the user . The class makes no other assumptions about what is contained. This raises an interesting question. Can nulls or other binary characters be embedded in a std::string ? The answer is yes. The string class maintains the length separate from the actual character data, so that length() and size() return this value rather than calling something like strlen() . The class appends a trailing null character, so that c_str() returns a C-like string, but this does not corrupt any data you wrote. We will explore a binary string class shortly.

Let us assume that the standard library did not exist and we were forced to design our own string class. What functionality would it support and what limitations would we impose? There is no single correct answer, because you have to consider a number of factors:

Does binary data need to be encoded?
Is multi-byte character encoding acceptable, or is Unicode necessary?
Can C-style string functions be used to provide some of the needed functionality?
What kind of insertion and extraction operations are required?

The answers to these questions may not be obvious until you examine how and where you could use a string class. Let's face it, if no string class existed at all, software would still get written, and it might look like this:

 char* append (const char* src1, const char* src2) {   char* dst = new char [strlen(src1) + strlen(src2) + 1];   strcpy (dst, src1);   strcat (dst, src2);   return dst; }

Writing a reusable string class is not the difficult part, since strings are very well understood . Sure, there may be performance and efficiency issues, but these are fairly easy to observe and correct. The hard part is deciding how extensive an object to create. If the object of your design is to create a general purpose string class for a class library you want to sell, customers will expect a feature-rich object. For example, the CString class from the Microsoft Foundation Class has almost 100 public member functions and operators. With all these functions, it is not inconceivable that some of these functions have never been used in any real application, other than in the unit test code to validate the class.

At the other extreme, you can write a string class that does only what is necessary to get the job done as follows :

 class apString { public:   apString  (int size);   ~apString ();   apString  (const apString& src);   apString& operator= (const apStringc& src);   int size () const;   char& operator[] (int index); private:   char* p_; };

Is this a string class? Yes it is, but as you can see it does very little. Such an object may prove invaluable in one application, but we do not consider it reusable because it does not do anything very useful. Let's extend our definition of reusable code to include the following attributes:

Provides useful and generic functionality.
Can be described in a few sentences using concrete terms. If you can describe an object in simple terms, it will tend to be a simple object. For example, we can describe a string class by saying, "An object that manipulates character arrays, replacing C-style string functionality in most applications."
Makes few assumptions about how or when the class can be used. For example, what if our string class only worked on strings with fewer than 100 characters? Or, what if the class only supported the storage of letters but no numbers ? All designs have some limitations, but these limitations should not limit the object's general-purpose use.

4.2.1 The Economics of Reusability

Believe it or not, reusable code is expensive to develop and maintain. Once a reusable component exists, it is "free" to use, and hopefully pays for itself many times over. But the path to reusable software isn't free. There is a lot more to the process than just designing and implementing the code. For example, what happens when the software is used by two developers and one of them wants to change the interface? Should it be easy to add new member functions? Who is responsible for preventing the object from getting too complex?

If maintenance and enhancement responsibilities fall to the development team of an application, it makes sense for the responsibilities of reusable components to fall to a similar group. Larger organizations can potentially allocate one or more resources to develop, maintain, and enhance reusable components. This is the very charter of this group and their success or failure depends on getting their peers to adopt and use the software they create. Think of this group as third-party developers with a big advantage. Since you share the same employer, you ultimately share the same goals. And it is much more likely that you can talk to the developers directly if necessary.

Smaller groups can handle the management of reusable components internally. If the software is only used within a single group, a single individual can be charged with maintaining the code, just like any other piece of software. The group's manager can act as the clearing house for requests and decide when it is appropriate to make changes.

The process is more interesting for mid-sized organizations that consist of more than one development team. Each team has different schedules and requirements, and there is no centralized group to manage the shared software. Control of the reusable code should belong to a team created from members of all the separate development groups. One developer from each group should act as spokesperson for his/her group to ensure that changes and enhancements will not negatively impact their project. Having multiple developers from the same project attend these meetings is discouraged, because it tends to slow down and complicate the process. After all, you don't want an entirely new bureaucracy to develop that will slow down your development cycle. This group should be led by a development manager or project manager to keep it on track.

Here are some common situations you might be faced with. Imagine there are two development project teams , team A and team B, that are each using the same reusable components.

Team A discovers that some enhancements are needed to continue to use a reusable component in their product. There is sufficient time to make these changes before the next release. Team B, however, has no desire to see any changes made to their software base.
Team A makes a small bug fix to the reusable component and commits the change to the source control system. Team B unknowingly uses this version and strange problems appear in their application.
The group in charge of reusable components approves a number of enhancements. However, the developer in team A charged with making these changes cannot complete the task because of other schedules and commitments. Team B is facing a deadline and needs these enhancements ASAP.

These issues are not unique to reusable software components, but they can create situations that cannot be resolved by a single development team. In the first example above, configuration management software can allow team B to use the current version of a component while team A uses a different version. These changes can consist of a combination of new functionality, extensions to existing functionality, or bug fixes. Whatever the reason, no changes to the software should be made by team A without the knowledge and approval of team B. Version control is very important for reusable components, so much so that using timestamps or other simple methods to select which version of software to use may not be sufficient. For this reason, we recommend adding version or release tags to new versions. This treats an internally developed component the same way as one purchased from a third-party source. It also impresses on the team the notion that changes to a reusable component should happen at the same frequency as one would expect new versions of third-party libraries.

The second issue can be caused by assuming that the most recent version of a component is always the one to select. The solution to this problem is easy: don't make this assumption. This is no different than taking a new compiler or library update from a vendor and assuming it will work. However, this scenario can easily happen because most changes made to mature software involve fixing bugs . If we had developed the standard library string class internally, who would not just want to use the most recent version when it becomes available? We sometimes place more trust in something or someone than we should. Consider this simple function as part of a reusable component:

 bool isValid (const std::string& src) {   for (int i=0; i<src.size(); i++)     if (!isalpha(src[i]))      // Verify [a-z,A-Z]       return false;   return true; }

Because this is such a simple function, it is clear that isValid() will return true if the string only consists of alphabetic characters, or is a null string. However, developers seldom look at the actual source code unless they suspect a problem. Instead, they consult the header file for the function declaration and the associated documentation. In this case, the header file contains:

 bool isValid (const std::string& src); // Returns true if the string contains only alphabetic characters

Perhaps you have already recognized the issue: by comparing the description in the header file with the actual function, there is an undocumented behavior when a null string is passed. The function will return true , but the documentation says nothing about this. The big question is: will a developer make use of this undocumented functionality? The answer is: you have no idea. The chance of this happening increases as more projects start using this code.

Now let us assume that the person maintaining the isValid() function makes an enhancement to the function as shown:

 bool isValid (const std::string& src) {   if (src.size() == 0)     return false;   for (int i=0; i<src.size(); i++)     if (!isalnum(src[i]))      // Verify [a-z,A-Z,0-9]       return false;   return true; }

By examining this function you will see that two changes have been made. The first is to specify the behavior for null strings. A null string will now return false. Second, this function now considers alphanumeric characters to be part of a valid string. The documentation is also changed to either:

 bool isValid (const std::string& src);  // Returns true if the string contains only alphanumeric characters

or:

 bool isValid (const std::string& src);   // Returns true if the string contains only alphanumeric characters.   // Returns false for null strings.

We certainly hope the change looks more like the second possibility, so that the documentation better reflects what the function actually does. It would not surprise us if the documentation continues to ignore the behavior of null strings. This can happen because documentation changes do not always occur when changes are made to the code. Many developers will argue that the code itself is the documentation, and any documentation included in the header file is just a synopsis of what the function does. We won't get into this issue right now because it will only get us off track.

Let us look at what happens when a newer version of isValid() is available in a software update. The decision to use the updated version of isValid() will most likely be made based on the function's accepting alphanumeric characters and not just alphabetic characters. Even if the update includes information regarding the behavior of nulls, this change can easily fall through the cracks and not get noticed. It may not even be obvious to the developer if this special null case is even an issue. When multiple teams use the same piece of reusable code, it is possible that all the developers on team A will use the null behavior of isValid() while team B never does. If the person responsible for maintaining isValid() is part of team B, they may change the null behavior of isValid() , thinking it has no effect on the code, when in actuality it could greatly affect team A.

The real costs of creating a reusable component are now becoming clear. A piece of software goes through a number of steps before it becomes a reusable component. For smaller organizations, it all starts with the creation of a piece of software that solves a particular problem, but may be generic enough to be used in other places. Once it is recognized as a possible reusable component, a set of well-defined steps should be followed:

Proposal. A short proposal should describe the overall functionality, and explain why this component is necessary to the project and why it can be used by other projects. A short example of how the code is used is helpful. At this stage, the proposal is only meant to see if there is enough interest in the component.
Approval and assignment of a maintainer (most likely the developer) and the group to review it.
Review of the interface.
Unit test review and other automated tests.
Documentation. In addition to functional and design documentation, you should also include change logs, such that any changes made are documented in a text file.
Initial check-in.

Binary String Class

A binary string class may not sound very interesting until you realize all the uses there are for it. If we renamed our example to be something like "object streaming" or "object persistence," it might appear more important. To achieve either goal, software is needed to collect and manipulate binary streams of data. These data streams can represent anything from image data to the contents of objects. We aren't going to tackle a complete object persistence mechanism here, but we will show you one important reusable component.

To be more precise, our binary string object manages tagged data. By this we mean that every item written to our stream consists of two parts . The first part is a tag , which specifies the type of data written, and is followed by the data itself. A raw binary stream of data can be difficult to decode, especially if its format is modified over time. Tagging the data makes it easier to interpret and allows anyone to read a stream of data, even when its meaning is not known. Our object, apBString , tags data in the following formats:

Byte (signed, 1 byte)
Word (signed, 2 bytes)
Integer (signed, 4 bytes)
Unsigned Integer (unsigned, 4 bytes)
Float (4 bytes)
Double (8 bytes)
String
Data
apBString

The tag field is one byte and precedes the data. Most tagged formats are pretty obvious, because they follow the way the underlying data types are stored in memory. A string is written as a length (4 bytes), followed by the string data. A data block is written in the same way and is used to represent arbitrary data. apBString objects can also be nested inside of other apBString objects, allowing this object to encapsulate other binary objects. It is this behavior that paves the way for object streaming. The definition of apBString is shown here.

 typedef unsigned char  Pel8;     // 1-byte typedef unsigned short Pel16;    // 2-bytes typedef int            Pel32;    // 4-bytes  (unsigned) typedef unsigned int   Pel32s;   // 4-bytes  (signed) class apBString { public:   apBString  ();   ~apBString ();   apBString            (const apBString& src);   apBString& operator= (const apBString& src);   size_t      size () const { return string_.size();}   const void* base () const { return string_.c_str();}   // Return pointer, and size of our data   void rewind () { offset_ = 0;}   // Reset our output pointer to the beginning   bool eof () const { return offset_ >= string_.size();}   // Return true if the stream is at the end   bool match () const { return match_;}   // Return true if all extraction resulted in a match between   // the requested data type and the stored data type   const std::string& str () const { return string_;}   // Acess to our underlying string data // Insertion operators   apBString& operator<< (Pel8   b);   apBString& operator<< (Pel16  w);   apBString& operator<< (Pel32s l);   apBString& operator<< (Pel32  l);   apBString& operator<< (float  f);   apBString& operator<< (double d);   apBString& operator<< (const std::string& s);   apBString& operator<< (const apBString& bstr);   void append (const void* data, long size);   // Extraction operators   apBString& operator>> (Pel8&   b);   apBString& operator>> (Pel16&  w);   apBString& operator>> (Pel32s& l);   apBString& operator>> (Pel32&  l);   apBString& operator>> (float&  f);   apBString& operator>> (double& d);   apBString& operator>> (std::string& s);   apBString& operator>> (apBString& bstr);   bool fetch (const void*& data, unsigned int& size);   std::string dump ();   // Ascii dump of data, from the current offset private:   std::string  string_;   int          offset_;   bool         match_;   enum eTypes {eNone=0, ePel8=1, ePel16=2, ePel32s=3, ePel32=4,                eFloat=5, eDouble=6, eString=7, eData=8, eBstr=9};   // Supported datatypes. These values can never be changed but   // more can be added   apBString  (const void* data, unsigned int size);   void add (eTypes type, const void* data, unsigned int size);   // Add the specified data to our buffer   const void* extract (eTypes& type);   // Return a pointer to the next data type and return its type.   // Returns null if you attempt to read past the end   Pel8        readPel8   (const void* p);   Pel16       readPel16  (const void* p);   Pel32s      readPel32s (const void* p);   Pel32       readPel32  (const void* p);   float       readFloat  (const void* p);   double      readDouble (const void* p);   std::string readString (const void* p);   // Read a particular quantity from our string.   std::string dumpBString (unsigned int indent);   // Text dump the contents from a single BString };

The complete source code can be found on the CD-ROM. Our binary string is stored as a std::string object. As we mentioned early in this section, std::string makes no assumptions regarding the data it contains. A trailing null character is often written at the end of the data so that code that calls the c_str() method can treat it as a C-style string. Note that if the data contains a null character, you'll need to use the data() method instead.

Let us look at how a variable of type Pel16 is treated. First, there needs to be a way to add data to our binary string:

 apBString& apBString::operator<< (Pel16 w) {   add (ePel16, &w, sizeof (w));   return *this; } void apBString::add (eTypes type, const void* data,                      unsigned int size) {   // Append the type   Pel8 t = static_cast<Pel8>(type);   string_.append (reinterpret_cast<char*>(&t), sizeof (Pel8));   // Append the data   string_.append (reinterpret_cast<const char*>(data), size); }

Most of our insertion operators, operator<< , call a private method add() to actually append data to our string (which calls the append() method of std::string ). The extraction operator, operator>> , is much more complicated. When it reads data from the stream, it first reads the tag to figure out what kind of data is in the stream, and then tries to convert this quantity to a Pel16 and return it. The current offset in the string is stored in offset_ , and this allows string_ to be parsed without having to modify it, as shown:

 apBString& apBString::operator>> (Pel16& w) {   eTypes type;   w = 0;   bool match = false;   const void* p = extract (type);   if (p == 0) return *this;   switch (type) {   case ePel8:     w = readPel8 (p);     break;   case ePel16:     w = readPel16 (p);     match = true;     break;   case ePel32s:     w = (Pel16) readPel32s (p);     break;   case ePel32:     w = (Pel16) readPel32 (p);     break;   case eFloat:     w = (Pel16) readFloat (p);     break;   case eDouble:     w = (Pel16) readDouble (p);     break;   case eString:     w = (Pel16) atoi (readString(p).c_str());     break;   default:     // Unsupported type. We don't have to do anything     break;   }   match_ &= match;   return *this; }

This function checks the data type of the next member in the stream using the extract() method, and attempts to read and then convert the data to a Pel16 . This is true for the String data type as well, because the atoi() function is called to convert the string to an integer quantity. Data types that cannot be converted to a Pel16 , our eData (block of binary data) and eBStr (binary string) type, return 0. The extraction operator also keeps track if the data type read in the stream exactly matches the requested data type. You can query the match() method to see if any previous extraction required a conversion. Depending upon the application, a data mismatch can imply that the data is corrupted. Note that apBString does not detect whether an overflow occurs during extraction, because data is typically extracted to variables of equal or larger size.

So, how reusable is apBString ? In general terms, it is a very basic building block to manage binary data that satisfies all of our definitions for a reusable object. But, as we have designed it, it is only portable on machines with the same memory architecture. Our class copies data, byte for byte, into our binary string. If we use apBString on a machine in little-endian format (the low-order byte is stored in memory at the lowest address, such as with Intel processors), it cannot be read properly on a big-endian machine (the high-order byte is stored in memory at the lowest address, such as with Sun SPARC). We could have chosen to address the endian issue in our design by making sure all of our string data was written in the chosen endian format; however, that was not one of our design guidelines. In this regard, apBString is not reusable between different types of machines. The code is portable, but the data files are not.