3.2 Basic functions | Implementing and Integrating Product Data Management and Software Configuration Management (Artech House Computing Library)

< Day Day Up >

SCM is very much about supporting developers in their daily routines. For developers, SCM maintains the current product components, keeps old versions and stores information of their history, provides a stable development environment, and coordinates simultaneous changes in the product. From the developer’s point of view, much of this work may be considerably facilitated by the use of suitable tools. Of course, defined processes and detailed routines will help to guide developers. However, to do this efficiently, tool support is necessary. The most common functions supported by SCM tools are:

Version management —making possible the storage of different versions and variants of a document and their subsequent retrieval and comparison;
Configuration selection —providing functions for the creation or selection of associated versions (or branches) of different documents;
Concurrent development —controlling simultaneous access by several users (either by prevention or by providing support);
Distributed development —supporting geographically dispersed developers working on the same system;
Build management —providing mechanisms for building software (for instance, compiling and linking) and keeping generated software up to date, preferably without unnecessary rebuilding;
Release management —packaging software in a form suitable for distribution and generating documentation to inform users and developers of changes included in the product release;
Workspace management —providing each user with a private sandbox in which the user can work in isolation but still under the control of the SCM tool;
Change management —keeping track of changes introduced in the product and providing support for the process of entering and implementing changes in the product;
Integration with other tools —the integration of SCM tools with the development environment and with other tools.

These functions are expanded on in further detail later. However, we will briefly discuss other SCM-related functionality first, that which is relevant to tool support. Specific terminology is frequently used in connection with some of the functions and is therefore introduced here:

Reporting status. This is the reporting of current status with lists of the files that have been changed during a certain time period, by whom made the changes were made, and differences between products. These are important functions, particularly in the support of the overall view as seen by the project management.
Process support. This is support made available to developers in following the development model and performing actions specified during the planning phase to ensure that the components are progressed through the chosen life cycle phases before being released.
Accessibility control (security). This prevents inappropriate access to information without complicating everyday work.

3.2.1 Version management

Version management is central, the core functionality in all SCM tools. Many developers believe, incorrectly, that version control is the equivalent of SCM. Even though it is important, SCM is more than versioning, as explained in the other sections.

An element of software or a document placed under version control is designated a configuration item (CI). The most common example of a CI is a source code file, but executables, products, and documents are also CIs. A group of CIs can be defined as a CI (i.e., the group is version controlled itself). The ability to store, recreate, and register the historical development of CIs is a fundamental characteristic of an SCM system. The most important property of a version is its immutability (i.e., when a version has been frozen, its content can never be modified). Instead, new versions must be created.

As an important aid to developers, all SCM systems offer support for the synchronization of simultaneous, concurrent changes of the CIs from different users. Depending on the tool, this support is given in different ways according to different synchronization models (presented as CM models by Feiler in [7]). The most basic model is the checkout/checkin model, in which individual files are stored in a compact form on a version-control basis in a small database, a repository (also called a vault). The repositories contain only one complete version. The differences between the versions are saved using delta algorithms (i.e., the algorithms by which it is possible to recreate a complete file version by parsing saved differences). Many tools use line based delta algorithms, calculating the differences, in terms of lines, between two versions. There are also binary-based delta algorithms, calculating binary differences, byte per byte [8]. The main purpose of using a delta technique is to save disk space in the repository.

Files are not read or changed directly in the repository without being checked out first. To check out means that a particular version of a file is copied into the developer’s working directory, and if write access is required, the file is locked in the repository. Locking prevents other developers from checking out that particular file (or, more specifically, that branch) in the write mode. When the file is checked in, a new version of the file is created in the repository and the lock is released. In this way, each file in the repository will be given its own version history with a new version for each check out and check in.

Versions of a file may be organized in a number of different ways. When organized in a sequence, they are often called revisions. They may also be organized as parallel development lines called branches. Branches can be merged into a new version, which then has two or more predecessors, as shown in Figure 3.1.

click to expand
Figure 3.1: Basic version control.

Revisions are usually deliberately created by a developer (e.g., when a developer completes a particular change in the file). In addition, many text editors maintain one or several micro revisions of a file to facilitate its recovery following unsuccessful editing. These revisions are not managed by the SCM tool.

Branches are created for several reasons. The primary branches are permanent, adjusting the file according to diverging demands, such as different operating or window systems. A second important purpose of branches is to enable parallel (concurrent) work. In the latter case, temporary branches are created and then merged when concurrent work is no longer required. Usually, a branch consists of a series of revisions and additional branches that can be created from the original branches. The development of strategies for creating and merging branches is often an important task for an SCM manager.

A tool for version control can identify revisions internally, usually utilizing a numbering technique in several stages. This may be user friendly to a greater or lesser degree. In addition, the users themselves can usually give the revisions one or several optional names in the form of text, often called a tag or a label. The tool can return a version identified by such a text. This facility (tagging) can be used to realize a simple selection mechanism (see Section 3.2.3).

3.2.1.1 Variants

The management of variants is a difficult problem, not yet completely solved. A common misunderstanding is to draw a parallel between a variant and a branch. Let us see what variants are and why are they used. The adjustment of entire products or configurations according to diverging demands is managed with variants. For example, different variants of a product may be developed for different operating systems or with different customer adaptations. These variants can be created and maintained in at least four ways:

With permanent branches of the included files. For a variant, file versions are primarily selected from the permanent branch created for the purpose. A file version from a variant independent branch is selected next. If this independent branch does not exist, a file version from a main branch can be selected. In this case, the main branch covers all variants.
With conditional compilation (compiling directives). This means that all variants are managed in the same version of the file and are therefore easier to keep together. However, the variant management will not be visible to SCM.
With installation descriptions clarifying which functionality should be included in a certain variant. Variant-dependent functionalities are implemented in different files, one for each variant.
Run-time check.

Thus, the creation of branches is only one way to implement variants. The most important thing is not the choice of implementation technique to be used, but the management of the many variants resulting from the combinatory explosion of several optional parameters. Read more about variants in [9].

3.2.2 Workspace management

Workspace management makes it possible for developers to work transparently with respect to SCM. When developers are focused on solving particular problems and have less interest in administrative tasks, a workspace functions as a sandbox in which they can work in isolation, remaining under the control of the SCM tool. Versions of files are checked out and temporarily stored in the workspace, with a mapping remaining between the versioned objects in the repository and the user files and directories in the workspace.

Files to be modified are not the only items are checked out to the workspace. Often all files needed in the build or test procedures, or those that are part of the product, are checked out (possibly, some of them read-only). Thus, the workspace also makes it possible for the files checked in to the common repository to maintain a certain degree of quality (e.g., that all files changed due to the same change request actually work together).

When several developers are working concurrently in their private workspaces, control is needed between the different copies of the same object, as described in Section 3.2.3.

Some tools also support cooperative versioning, as described in [10]. In short, this means that local versioning within the workspace is provided. When a file is checked in to the repository again, only the latest local version is checked in. The intermediate versions are deleted.

An example of integrated features is when the developer “logs in” to a project environment in which project structures and repositories are already prepared for the developer (e.g., by the SCM group). The developer then enters a transparent environment in which the development is done with SCM handled behind the scene. Examples of tools supporting this are Clear Case [11] through “Views” and CM Synergy (former Continuus) [12].

3.2.3 Configuration selection

As shown in Figure 3.1, a file can include a number of versions, and the one that should be used in a given situation is not always obvious. The situation is further complicated by the fact that a system consists of a large number of files such that the possible number of combinations is enormous. In everyday work during development, a developer usually wishes to have the latest revisions of the files being changed from a particular branch. For other unchanged files, the developer typically wants an older, stable version, such as that included in the latest product release or the most recently published stable version developed by another group. The development in the developer’s own group should be particularly flexible to make it possible to change between different levels of collaboration. For example, a change may require the modification of more than one file. In all situations, a consistent selection, in terms of the inclusion of versions with related modifications should be ensured. A set of particular file versions, or, more generally, particular versions of CIs, is designated a configuration.

A useful technique for the specification of a configuration supported by several systems is to offer a rule-based selection mechanism. Typical examples of rules that can be advantageous to specify include:

The latest revision in my own branch (for files that I myself/the group work with);
The latest revision in a named temporary branch (for files that other groups work with);
The latest revision in a named permanent branch (for files that differ depending on the product);
The fixed, named, version (e.g., the latest release for other subsystems).

A system that is built using a rule specifying the latest version is called a partially bound (or generic) configuration, as the exact versions included will vary in time when new versions are checked in. A system built without such a rule is called a bound configuration and is particularly suitable for deliveries, as the versions of all files included are fixed and it can therefore be guaranteed that the system can be recreated.

A certain bound configuration can form a baseline that functions as a basis for further development with formal change management. It can also form a release that is delivered to an internal or external customer.

In the same way that individual files have a version history that describes their evolution, configurations create their own version history as they evolve. Users and customers see the development of a system in large steps, namely the configurations, releases, that are distributed. Developers and project managers see many more stages in the development of the system as well as the division into subsystems and configurations, each with their own version histories. Therefore, the perspectives in which a system and subsystem are regarded as the development of configurations in bound configurations may be useful at several levels.

The naming of versions (tagging) can be used to manage the selection of bound configurations in that all of the files in the configurations are tagged with the same name (e.g., release 2.3). This is illustrated in Figure 3.2.

click to expand
Figure 3.2: A bound configuration can be defined by tagging all files with the same label.

Relations between such configurations (e.g., that release 2.3 is a successor to release 2.2) are rarely supported by the SCM tools but must be managed in a different manner (e.g., in a release document).

Consistent naming may also be used to represent logical changes (i.e., changes arising from a change request and resulting in the modification of several files).

3.2.4 Build management

Build management supports the user in building the product or part of the product (e.g., a component or a library). Build tools such as Make [1] are used to create the product automatically. The correct versions of appropriate files are first collected for a particular build, as described in Section 3.2.3, and are then compiled and linked in the correct order. Make describes the dependencies between source code files at build time and ensures that the dependent source code is built in the correct order.

As building large systems may take days and an inefficient build process can waste hours of developer time, it is important to reuse components that have not changed since the most recent build as much as possible. This is particularly important during test and integration, when the entire system must be built to test a small change. An effective build process can reduce build time dramatically by reusing partially built items from previous builds.

Many SCM tools have further developed ideas from Make. The build procedure is automatically created by the tool and often stored in a project file managed by the development environment.

3.2.5 Release management

The identification and organization of all deliverables (e.g., documents, executables, and libraries) incorporated in a product release is designated release management. Release management is closely related to configuration selection, build management, and change management. A particular configuration of items selected is used in the build process, and the items created in the build process are inputs to the release process.

Release management has a double role. First, it must prepare deliverables and all documentation for the users. Information, which a deliverable contains, includes a list of (new) functions, changes implemented from the previous release, and demands on the run-time environment. Second, it provides information used internally—for test purposes, maintenance, or further development. For example, release support makes it possible to determine which users have developed which versions of which components, and therefore which of these will be affected by a particular change.

It is possible with appropriate release management to create installation kits automatically to ease the task of the build manager. The build manager is responsible for providing the packed product with the correct configuration and features. Products such as Windows installer [13] and Install shield can be used to create installation kits. Hoek et al. [14] describe a prototype, designated Software Release Manager (SRM), which supports both developers and users in the software release management process. SRM incorporates the notion of components and helps in assembling them into systems. Dependencies are explicitly recorded so that users can understand and investigate them.

3.2.6 Concurrent development

One major advantage of using a SCM system is that it enables teams to work concurrently on a single project. This is advantageous for many reasons. Different developers may be working concurrently on the same files, correcting different errors, or one developer may be working on the latest release while another is correcting an error in a previous release. A test team can test the latest stable version as the development team is working on the latest (unstable) versions (see Figure 3.3).

click to expand
Figure 3.3: Developers use different configurations concurrently.

The SCM system makes this possible by providing:

Selection of versions building specific configurations for different needs;
A model for synchronization of concurrent changes (e.g., by locking the files edited or by permitting changes to be made at all times but detecting conflicts at check in and then at merge, often called optimistic check out).

For a more detailed description of synchronization models such as check out and check in, long transactions, and change sets and their use in different distributed development situations, we refer to [15].

3.2.7 Distributed development

The developers of a software system are often geographically dispersed. This situation is designated distributed development or remote development. Many SCM tools provide replicated repositories to support this process. There is no global master repository in most of the tools supporting replication, but all replicas are copies of the same repository, automatically kept synchronized. When a replica is modified by clients at

one site, the updates are sent to the other replicas (in batches at a predefined frequency). When data is replicated between different servers for the first time, all the data in the repository must be sent/copied. Data sizes could be as large as several gigabytes. The next time synchronization/replication is performed, only update packages with a typical size of 4 to 5 MB need to be sent.

The implementation must avoid conflicts, and the synchronization can always be made totally automatic. In ClearCase [11], for example, the synchronization problem is left to the users by managing different branches at different sites. Only the site holding the ownership can create versions on that branch. In this way, it is always possible to send new versions created on a branch and “install” them at other sites without conflict. Versions on branches from other sites can still be viewed and used to merge from, creating a merged version on a branch owned by the site (see Figure 3.4).

click to expand
Figure 3.4: Replication of repositories.

3.2.8 Change management

Change management keeps track of all changes in a product under development. The reason for a change can be the correction of an error, the improvement of a component, or the addition of functionality. Change management is often supported by separate tools integrated with the main SCM tool. Examples of such tools are PVCS tracker [16], Visual Intercept [17], and Rational ClearQuest [11].

Change management has two main objectives. The first is to provide support for the processing of changes. Such processing includes change identification, analysis, prioritizing, planning for their implementation, decisions for their rejection or postponement, and decisions for their integration in new product releases. The second objective is traceability (i.e., to make it possible to list all active and implemented changes). It should also be possible to track the modifications of different items due to a specific change.

3.2.8.1 Change management process

In a change management process, a change is usually identified by a change request (CR). When a change is initiated, a CR is created to track the change until it is resolved and closed. Figure 3.5 shows how a change proposal creates a CR as defined in [18]. The change control board (CCB), a group responsible for operational aspects of the project, analyzes the CR and determines the action to be taken. If the change is approved, the CR is filed to the developer responsible for implementing the change. When the developer has performed the change, its status becomes implemented and testing can be performed. The CCB also decides which changes are to be included in the new product release or if the change will be included in existing product versions in the form of a service pack. The latter is also part of release management.

click to expand
Figure 3.5: An example of a CR process.

3.2.8.2 Traceability

Change management includes tools and processes that support the organization and track the changes from the origin to the actual source code [19]. For each CR, it should be possible to see which versions of the modified files were created due to that request. Conversely, it should also be possible to answer the question, “for what reason (which CR) is this version of this file created?” (see Figure 3.6).

click to expand
Figure 3.6: From a CR, it is easy to see which files have been changed (what version has been created) due to the CR. For a specific version of an item, it is also easy to see why this version was created, due to which CR.

Change management data can be used to provide valuable metrics about the progress of project execution [20]. From this data, it can be seen which changes have been introduced between two releases. It is also possible to check the response time between the initiation of the CR and its implementation and acceptance. Figure 3.7 depicts a part of a release document listing all CRs implemented between two releases. It also depicts one of these requests and the files changed due to this request.

click to expand
Figure 3.7: As part of the release notes, all CRs implemented between the last and the new release are listed. From these CRs, the actual changes made can be traced as depicted in Figure 3.6.

3.2.9 Integration with other tools

The first generations of commercial SCM tools, although providing sophisticated functions, often appeared to be difficult to deploy and integrate in the development process. The newer versions of SCM tools try to minimize this extra effort required by integration with tools used in the development process. Typically, some of the functions are built in integrated development environments (IDEs). For instance, operations such as check in and check out are automatically performed when a file or an item is being changed. Similarly, when building a software product or a component, all files needed in the build procedure are automatically checked out. A seamless integration of SCM with different development and engineering tools is one of the crucial factors for its attractiveness on the market.

Different SCM tools provide different integration possibilities. Many tools provide a set of line commands that can be invoked from other tools by using different trigger mechanisms that they incorporate. This is not a high level of integration, as it depends on the capability, which may be limited, of other tools to invoke the commands. A more flexible and powerful interaction can be achieved by using the application program interface (API) provided by SCM tools. An API defines a set of interfaces to the SCM functions. Development tool vendors or development organizations can build their applications using interfaces provided by SCM tools. In some cases, suppliers of IDEs specify the interface of basic SCM functions. The SCM vendors then implement these functions in accordance with the interface specification, and in this way make possible the integration of their tools with the IDEs. Users using the same IDE can select different SCM tools and use those from IDE in the same way.

The integration of SCM and other tools is far from perfect because of the different treatment of basic structures. SCM recognizes files as entities, and in some cases it is able to recognize their internal structure. This makes possible the provision of functions such as difference or merge between different versions of these entities. For instance, all SCM tools can manage ordinary text files—the differences are based on differences of lines in the files. However, SCM tools do not understand the semantics (i.e., the meaning) of the lines in a file. Development and engineering tools may have completely different structuring. In object-oriented programming, for example, a class is a basic entity. From a version-management point of view, the differences between two versions of a class are of interest. Similarly, in a relation database, tables and records are the basic entities. A comparison between two versions of a database should indicate which tables or which records are different. SCM is unable to answer this, as it does not recognize these structures.

Many SCM functions are of a general purpose and might be included in other tools. Many tools build in their own SCM functions. For example, certain word processors or CAD/CAM systems can save and manage different versions of a document. As these tools know the internal structure of such documents, they can provide difference and merge functions on a semantic level. A word processor is able to show that formatting is different in two versions of a document or that two paragraphs have changed their positions. Although in this aspect they are superior to general SCM tools, these tools can include only some basic SCM functions on an entity level. They are not able to define a configuration of a set of documents by specifying a particular version of each document. For this reason, they cannot replace SCM tools.

< Day Day Up >