Gathering Data | Unix Application Migration Guide (Patterns & Practices)

As part of the plan for assessing an application for migration from UNIX to Windows , you must determine the application context, structure, development environment, and usage model.

It is also important to understand the target development environment for the application migration. Examples include migration of both deployment and development environments to Windows and Microsoft Win32 ; continued development on UNIX with Windows as a cross-platform migration; and porting to a Portable Operating System Interface for UNIX (POSIX) environment on Windows, such as Microsoft Interix.

You need to address these considerations, as well as the actual application source code migration, and possibly the migration of associated test programs and/or scripts. The point at this stage is not to conduct an analysis of the different elements involved in the migration, but rather to understand them at a high level and therefore make the preliminary decisions on how the migration should proceed.

Application Context

The application context consists of all the elements of information technology upon which the application depends. The migration must keep existing dependencies or in some way replace or remove them.

The application may have had a long history ” it may not have even begun as a UNIX application. Determining information about the application s history (for example, whether it was already ported to various UNIX environments) may provide useful insights into how the application should be migrated to Windows.

UNIX Platform Support

A wide variety of UNIX platforms exist, and each has a different bearing on how migration can proceed.

Standardization efforts have been existed throughout the history of UNIX, ever since its first incarnation at Bell Laboratories in the 1970s. Its code base was subsequently split when an early version of UNIX was adopted and enhanced at the University of California, Berkeley, to yield the Berkeley Standard Distribution (BSD). Today, UNIX is a trademark administered by the Open Group , and refers to an operating system that conforms to the X/ Open specification XPG4.2. This specification defines the name of, interfaces to, and behaviors of all UNIX operating system functions.

XPG4.2 aligns with UNIX95 ” properly called the Single UNIX Spec (SUS) - SPEC1170, the project name that led to specification ” and is largely a superset of earlier series of specifications. Approximately 80 percent of the SUS is made up of accredited International Organization for Standardization (ISO), POSIX, and American National Standards Institute (ANSI) C standards.

The current X/Open specification, UNIX98, aligns with SUS V2 (XPG5). This specification added POSIX.1b-1993 real-time and POSIX.1c-1996 threads and some additional thread interfaces, along with the traditional UNIX shared object interfaces (such as dlopen and dlsym). Only a few systems currently conform to this specification.

Many UNIX-based systems are available, such as Sun s Solaris, Hewlett Packard s HP-UX, FreeBSD, Linux, and the Microsoft Interix subsystem. Determining an application s standards support results in a better chance of understanding the application s portability to either Win32 or Interix. For example, if an application runs on a version of UNIX that supports the same standards as Interix, it is likely that a direct port using Interix would be more cost-effective than rewriting some of the code for the Win32 platform.

One of the first things you need to do in your assessment is determine which UNIX standards the application is based on. You can do so either by determining what the application claims to support (for example, through its original documentation) or by using analysis tools.

Standards organizations often provide certification brands alongside the standards. These brands are used as a rubber stamp by software vendors in much the same way as the Windows Powered or Designed for Microsoft Windows brands. You should therefore be aware of both the standards and the brands associated with them.

The application may support any of the following standards:

ANSI C: X3.159-1989, ISO/IEC 9899-1990
BSD 4.2-4.4 (Berkeley UNIX)
GNU
Institute of Electrical and Electronics Engineers (IEEE) 1003.1-1990 part 1, also known as ISO/IEC 9945-1:1990 (POSIX.1)
IEEE Std 1003.1b-1993, also known as ISO/IEC 9945-1:1993, POSIX1b (formerly POSIX.4)
IEEE 1003.2-1993 part 2, also known as ISO/IEC 9945-2:1993 (POSIX.2)
System V Interface Definition (SVID), SVID2, SVID3
System V4 UNIX
Version 7, the UNIX from Bell Laboratories
XPG2, XPG3
XPG4 (superset of POSIX - XPG Base branding)
XPG4v2 (UNIX95 branding)
XPG5 (UNIX98 branding)

UNIX Standards and Interix

The main purpose of Interix is to allow the direct port of an application to Windows, so Interix often supports a superset of standards or behaviors. Interix mainly uses POSIX.1/.2/ISO C. When a piece of code isn t described in those specifications, Interix looks to the XPG4 specification, and then to various UNIX implementations depending on the interface (for example Solaris, BSDI, or Red Hat Linux).

The original core standards ” POSIX.1-1990, ISO C (1990), and POSIX.2/2a-1992 ”continue to be important. For example, there have been two ways to determine the pseudo-terminal name to use, one BSD-based and the other SVID-based. Interix supports both; all of the signal interfaces are supported by the POSIX.1 reliable signal model. Interix also supports many interfaces that are not part of the SUS or a POSIX standard, but are nonetheless in constant use ”for example, Open Network Computing (ONC) remote procedure call (RPC) and X11 Release 5 (R5).

If the application supports XPG4, there is a high probability that the application can be ported to Interix. It is more important, however, to determine the following:

Does the application have significant dependence on UNIX application programming interfaces (APIs)?
Does Interix support the greatest majority of the application s required APIs?

Infrastructure Dependencies

The infrastructure upon which the application depends includes client and server hardware, network elements, and such peripheral devices as external tape drives and printers.

The migration of the application can be to a completely new infrastructure configured to support Windows. If this is the case, there may be migration issues in the application software. For example, if you are migrating from a 32-bit computer architecture to another architecture, you may face byte ordering issues. Some architectures number bytes in a binary word from left to right, which is referred to as big-endian . Other architectures number the bytes in a binary word from right to left, which is referred to as little-endian . A notable computer architecture that uses big-endian byte ordering is Sun s Sparc. Intel architecture uses little-endian byte ordering, as does the Compaq Alpha processor.

Using the big-endian and little-endian methods , the number 0x12345678 would be stored as shown in Table 4.1.

Table 4.1: Big-Endian and Little-Endian Byte Ordering Example
Method	Byte 0	Byte 1	Byte 2	Byte 3
Big-endian	12	34	56	78
Little-endian	78	56	34	12

The following code snippet illustrates how the use of the big-endian and little-endian methods can affect the compatibility of applications.

 #include <unistd.h> #include <sys/stat.h> #include <fcntl.h> int main() {   int buf;    int in;    int nread;   in = open("file.in", O_RDONLY);    nread = read(in, (int *) &buf, sizeof(buf));   printf("First Integer in file.in = %x\n", buf);   exit(0); }

In the preceding code, if the first integer word stored in the file.in file on a big-endian computer were the hexadecimal number 0x12345678, the resulting output on that computer would be as follows :

 % ./test First Integer in file.in = 12345678 %

If the file.in file were read by the same program running on a little-endian computer, the resulting output would be as follows:

 % ./test First Integer in file.in = 78563412 %

Because of the difference in output, you would need to rewrite the program so that it can read integers from a file based on the endian method that the computer uses.

The other computer hardware dependencies to check for are low-level hardware API calls or calls to specific devices, such as those shown in Table 4.2.

Table 4.2: Hardware API Calls
Hardware Dependency	Check for these calls or devices
Memory allocation	alloca()
Cache management	cacheflush() (this is MIPS processor-specific)
Port input	inb() , inw() , inl() , i nsb() , insw() , insl()
Port output	outb() , outw() , outl() , outsb() , outsw() , outsl()
Paused input/output (I/O)	outb_p() , outw_p() , outl_p() , inb_p() , inw_p() , inl_p()
Input device management	joystick, keyboard, mouse
Display management	video graphics adapter (VGA)

The presence of such API calls in the application code requires that you rewrite the code elements to minimize the hardware dependencies (through the inclusion of alternate, low-level routines) or that you replace the API calls. You must then decide whether the code is to be portable between the two platforms, whether alternate code bases are to exist, or whether the code for the original platform is to be made obsolete.

Applications can have dependencies on peripheral devices as well as computer hardware. Peripherals include not only printers and storage devices, but also firewalls (for example, to set up a virtual private network tunnel to another application) and domain-specific communications devices (for example, to connect to teller computers or telemetry equipment).

For the migration to be successful, you must know what devices are in use by the original application. These devices may have their own dependencies; therefore, it is appropriate to create an inventory of all of the devices concerned , and include the following information for each device:

Device name, type, and purpose
Software versions that the device runs
Mechanisms that the application uses to access the device
Dependencies on other devices and software packages
Whether the migrated application will use the device

The information that such an inventory provides will help you identify potential issues ”for example, whether Windows can support each device or whether you should seek an alternative.

Third-Party Libraries

Most applications use externally sourced libraries of code or application software extensively, thus enabling common functionality to be reused rather than redeveloped. In particular, many package suppliers (including database suppliers) provide code libraries to enable API-level access to their own products.

Use of a library implies access to functionality that the target platform must support. There are a number of ways to access a third-party library:

A version of the library may exist for the target platform.
The library can be accessed from the original platform by means of RPC (this may be the case for databases).
Code may need to be developed to replace or render unnecessary the library access.
Code can be rewritten to support alternative services provided by the target platform.

You can determine whether the UNIX application is using third-party libraries by using the UNIX grep command on the Makefile for the application:

 % grep e "-l" e "/lib" Makefile

Using the grep command in this way yields output that may look like the following example:

 /apps/oracle/lib/libclient.a /apps/oracle/lib/libcommon.a /apps/oracle/lib/libgeneric.a /apps/oracle/lib/libsqlnet.a /apps/oracle/lib/libc3v6.a /apps/oracle/lib/libcore3.a /usr/lib/X11R6 /usr/lib/Motif1.2_R6 /xrt/lib/libxrtfield.a -lXm -lXt -lX11

Note

The output in the preceding example was generated from a real application s Makefile. It has been edited to remove application-specific references.

Some applications make use of multiple Makefiles, in which case the UNIX find command can be used in conjunction with grep to traverse the source hierarchy.

For example: % find. -name [M m]akefile xargs grep -e -l -e lib.*[.].*[aso]

Analyzing the preceding output leads to the conclusion that the application is using the following third-party libraries:

Oracle Call Interface (OCI) libraries
X11 Release 6 (R6)
Motif 1.2 for X11R6
X11 toolkit library
X11 miscellaneous utilities library
Xrt widget library for Motif

If you have access to the binary libraries or executable files for the application, you can also run the ldd utility (or a similar utility) to obtain an inventory of the application s shared library dependencies (that is, libraries with a .so extension instead of the .a extension for static library archives).

For example, the dependencies of the test program shown in the earlier code snippet and output samples that illustrated the big-endian and little-endian methods ”running on a Linux 7.1 system ”yield the following output:

 $ ldd test     libc.so.6 => /lib/i686/libc.so.6 (0x40021000)     /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)

The preceding output shows that the simple test program depends on the standard C shared library major version 6 (libc.so.6).

Some libraries can be cross-platform libraries, which provide a platform-independent API to functionality such as database access, graphical user interface (GUI), or networking. Such libraries encapsulate the different APIs of different operating systems, providing an application with a common set of APIs for all operating systems. Example cross-platform libraries include:

Rogue Wave s SourcePro C++
Trolltech s Qt product
INT s Carnac and GeoToolkit GUI toolkits

As with hardware dependencies, it is useful to produce an inventory of library dependencies as you find them. The inventory can document the following:

The name, type, and context of the library
The purpose that the dependency serves
The impact that the dependency has on the migration
A recommendation for how to treat the dependency

Interoperability with Other Applications

In addition to libraries that support the operation of software, there are other applications and software packages that are either required by the application or depend on the application themselves . These applications and software packages can include database management systems, messaging and groupware applications, content management and Web server software, accounting software, resource management software, and sales automation software.

Migrating one application can have a number of impacts on such external software, including:

How data will be shared at a low level (through files and the passing of information) and at the user interface level (through the Clipboard and the drag-and-drop feature).
How the applications will intercommunicate.
- How users will access the applications; that is, through a single interface or portal, or through multiple interfaces.
- How security (authentication and access controls) will be managed between applications.

A first step toward resolving these issues is to gain an understanding of what other programs exist and the dependencies that exist between them. Again, the starting point is an inventory, to document each package and how it can affect the migration.

External software can have its own dependencies on libraries or hardware, which may further influence the migration. For example, a dependent software package already running on a Windows operating system may require an older version of Windows than the one intended for the migration. This can result in a decision to upgrade the external software package to a more recent version so that the two applications can reside on the same server; alternatively, two servers may be run in parallel, each running a different version of Windows. Each option offers trade-offs that are best explored though a cost-benefit analysis, which is discussed later in this chapter.

Application Structure

The next step toward the goal of migrating an application from UNIX to Windows is to gain a high-level view of the application to determine the main building blocks of the application, and therefore the application s scope (that is, the sum of the building blocks). This high-level view also helps establish the boundaries of the application.

These building blocks will enable the migration itself to be scoped, providing definitions of what parts of the application need to be migrated. After the objectives for the migration have been decided, each building block can be analyzed in more detail to determine exactly how it should be migrated.

An appropriate way to view the building blocks of an application is in terms of its layers , as described in the sections that follow.

Application Types

Most commonly used applications are either workstation-based or server-based. Workstation-based applications run at the UNIX workstation (desktop) computer and access data that resides on network file shares or database servers. Workstation-based applications have the following architectural characteristics:

They can be single-process (monolithic) applications or multiple-process applications.
They use character-based user interfaces or GUI-based (for example, X Windows or OpenGL) user interfaces.
They access a file server (through the network file system [NFS]) or a database server for data resources.
They access a compute server for compute- intensive services (for example, finite element models for structural analysis).

The second application type ” server-based applications ” run on a server and are accessed from a UNIX workstation (desktop) client through telnet and remote shells (character-based applications), X Windows server (GUI-based applications), interprocess communication (IPC) (client-server applications), and other means. Server-based applications can be described as one of the following UNIX software architectures:

Compute servers that use a messaging mechanism, such as RPC or sockets, for IPC (between the server and the client computers) for the purpose of method calls and to obtain result sets from those method calls. Compute servers also typically use the Message Passing Interface (MPI) for load balancing.
Database servers that provide interfaces (such as Open Database Connectivity [ODBC], OLE DB, and OCI) to clients for access to database tables, views, stored procedures, and triggers.
Web servers that contain Java Server Page (JSP) and Common Gateway Interface (CGI) access programs for generating dynamic Hypertext Markup Language (HTML) content from servers such as database servers and file servers.

Later sections in this chapter discuss application migration methodology for UNIX workstation and server application architectures, together with the migration of those architectures to a Windows workstation and/or server environment.

User Interfaces

The type of user interface has a clear bearing on the application migration. The following sections discuss the three most common types of user interfaces for a UNIX application: character based, graphical, and browser based.

Character-Based Interfaces

The traditional UNIX user interface incorporates a character-oriented, scrolling terminal. UNIX was created before GUIs were available. A user types commands in the UNIX user interface, and UNIX responds with ASCII character output. UNIX was developed as a multiple-user , timesharing operating system, which assumed that users would sit at so-called dumb terminals and type commands on the keyboard.

Applications often use both cursor movement and the graphics capabilities found in modern terminals or emulators. To support this, UNIX has a database of terminal capabilities known as termcap . Some newer implementations of this use binary data and are known as terminfo . These libraries allow application writers to query the database for specific cursor movement commands, so applications can operate with a variety of hardware without embedding specific terminal commands into the application.

Another application development package specifically designed to alleviate the problem of terminal dependence is the curses library originally written at the University of California, Berkeley. It is a set of functions for manipulating terminal input and output (mostly output). These functions perform such actions as clearing the screen, moving the cursor to a specific row and column, and writing a character or string to the screen. There are also input functions to retrieve user input in various modes, such as reading the input one character at a time or as a character string. Curses and similar libraries enable companies to create highly interactive, character-based applications, such as text editors.

Character-based interfaces can be ported directly into an Interix environment, or they can be rewritten to support Windows user interface standards. You should choose the latter option when user interface standardization is a main reason for the migration.

Graphical Interfaces

Graphical interfaces in UNIX are typically based on the X Windows standard. X Windows (typically called X) is a combination of several elements: the X Protocol, X Display Server, X Clients, low-level APIs called Xlib (libX11.a), and higher-level libraries. The X Windows standard was developed at MIT to create a platform-independent, network-based, graphical user environment. X Windows separates the server part that draws the graphical interface (X Display) from the client ” the application program that uses X Windows. The server and client can run on separate computers, so the application can run on a powerful compute server, while the X Display server runs at a workstation, listening for network connections at a specific port and acting on the commands sent by the X Clients.

Because X Windows is a set of toolkits and libraries, it has no look and feel like Windows or Mac OS. Motif is the windowing system, library, and user interface style built on the X Windows system. Motif handles windows and a set of user interface controls known as widgets. Widgets cover the whole range of user interface elements, including buttons , scroll bars, and menus .

Even X Windows with Motif is sometimes not enough to handle the requirements of the application to provide a user interface rich in widget content. Additional widget libraries have therefore been developed. One example is the xrt widget library, which makes use of the Motif libraries. Other libraries that you may encounter include Trolltech s Qt product and INT s Carnac and GeoToolkit GUI toolkits.

Graphics-intensive applications may support additional user interface standards, such as OpenGL. OpenGL has become a widely used and supported two-dimensional and three-dimensional graphics API. OpenGL fosters innovation and speeds application development by incorporating a broad set of rendering, texture mapping, special effects, and other powerful visualization functions. Developers can take advantage of the capabilities of OpenGL across all popular desktop and workstation platforms, ensuring wide application deployment.

OpenGL runs on every major operating system, including Mac OS, OS/2, UNIX, Microsoft Windows 95, Microsoft Windows 98, Windows 2000, Microsoft Windows NT , Linux, OPENStep, and BeOS; it also works with every major windowing system, including Win32, Mac OS, Presentation Manager, and X-Windows. OpenGL includes a full complement of graphics functions. The OpenGL standard has language bindings for C, C++, Fortran, Ada, and Java, so applications that use OpenGL functions are typically portable across a wide array of platforms.

Because Interix supports X Windows and an array of platforms support OpenGL, you may be able to move graphical UNIX applications to Interix with little or no modification. However, older applications may use an older version of the windowing system, or may use older widget libraries that have no direct mapping in Interix. Also, as with character-based interfaces, you may prefer the option of rewriting the interface to provide a standard look and feel across Windows-based applications.

Browser-Based Interfaces

The advent of the Web has resulted in a number of applications relying on Web browsers for their user interfaces. Such interfaces can usually be ported with a minimum of effort because of the platform independence of nearly all the standards used by Web browsers and Web servers.

Because Web-based interfaces require little or no migration effort, this guide does not cover them.

Application Code

The application code provides a first step toward analyzing the application structure. Most UNIX code is written in either C or C++. However, Fortran is also a language to consider. Fortran code is common in compute-intensive scientific and engineering applications. A Fortran application can be a stand-alone application or a routine called from other languages, or it may call routines in other languages. Fortran is commonly used in loosely coupled distributed grid computing. Distributed grid computing typically requires integration with dynamic scheduling and high-performance message passing, such as the MPI.

A Fortran application requires additional planning. The Fortran considerations for using either the Interix POSIX subsystem or UNIX emulator are the same as for C and C++ migrations. Note, however, that the GNU Fortran 77 compiler, f77, is currently the only Fortran compiler available for Interix.

Low-Level Services

Any UNIX application that uses primarily C and C++ run-time calls may compile with modification on the Win32 API. If, however, the application makes system calls for low-level services such as process creation, shared memory, semaphores, signals, message queues, and third-party library calls (library not available on Win32), these calls will not compile or link with Win32. You should consider a rewrite when only a few of these calls are made; however, a port to the Interix subsystem is desirable for applications that make large numbers of UNIX system calls, especially when those calls are made throughout the code and are not limited to easily identifiable modules.

At this stage, it is sufficient to understand the presence and use of low-level services in the application, such that you can give those services a thorough analysis at the evaluation stage. Use the information in Chapter 8, Preparing for Migration, and a text editor or text search tool (such as grep ) to perform an initial analysis of the code for UNIX API dependency. You can then identify areas of the code that are likely to contain UNIX-specific services ”for example, in device-specific or process management code.

Installation, Configuration, and Execution Scripts

UNIX applications usually have a complement of scripts that are used in the configuration, build, installation, setup, and deployment environments to support the application. These scripts are written in a variety of languages. Common scripting languages include Perl and Shell (Bourne, Korn, C, and Bash) scripts. The scripts take advantage of a number of UNIX utility features, along with features of the application itself.

The functions performed by UNIX scripts and utilities include configuration, setting up the user s desktop environment, scheduling ( cron ), monitoring (for example, Simple Network Management Protocol [SNMP]), parsing or searching ( grep , sed , and awk ), and administration (for example, distribution of updated application binaries). Migrating an application from UNIX to Windows requires a combination of scripting languages to create similar script syntax and utilities to perform these functions.

To choose the correct combination of scripting language, address the following questions:

Are UNIX shell scripts used for configuration of the application environment? If so, further determine the following:
- Which shell scripts are used for configuration (C, Korn, Bourne, Bash)?
- Are any core UNIX utilities used for tasks such as installation and setup? (Core utilities are those found in the /bin directory.)
- Which file systems do the shell scripts access, and do those file systems use or expect a UNIX-style, single file system root? For more information about UNIX file systems, see Chapter 8, Preparing for Migration.
- Is any current use made of cross-platform scripting, such as Perl, NuTCRACKER, or Interix?
How is an application started, and what parameters are needed? Must the superuser start the application?
Is a daemon used? In UNIX, you can start an application in the background and log off, and the application continues to run. This can t be done with Windows 2000 unless the application is a service.
How is the application closed? It is common in UNIX to have an application that receives a message to close; is the application then responsible for closing all the other applications?
Does this application interface with any other applications, and if so, which ones?
Which shells does this application require or rely on?
Do the migrated applications require standard UNIX utilities (such as cd or ls) at run time?

Migrated applications often require specific UNIX-style components in their deployment environments. Because of this requirement, the UNIX emulated environment (that is, NuTCRACKER or Cygwin) or UNIX native environment (that is, Interix) must provide a full set of UNIX utilities and shell environments, such as Korn and C. Also because of this requirement, the scripts must be migrated (rewritten or ported) to the Windows 2000 environment.

Development Environment

The development environment not only provides valuable input to the migration strategy, it also must be migrated itself. You need to understand the development environment for the application so that you can provide a basis for the application s continued development, support, and maintenance. The sections that follow discuss the tools that the development environment may include.

Modeling Tools

Modeling tools include such facilities as graphical tools in support of a given methodology, such as Structured Systems Analysis and Design Methodology (SSADM), Rational Unified Process (RUP), or Unified Modeling Language (UML). Although such tools are normally independent of the applications, facilities such as code generators and code synchronizers enable models to be kept up to date with the code.

The migration of modeling tools forms an important part of your application migration from UNIX to Windows. You therefore must include the migration of these tools in your planning process by determining the answers to the following questions:

Is a specific methodology being adhered to?
Are graphical modeling tools being used?
Is there an ongoing requirement to keep models up to date with the code?

Many modeling tools are available across different platforms, including Windows. Therefore, the migration of these tools should not significantly affect the migration of the application. Because of this, and because there is such a wide range of tools on the market, this guide does not cover the migration of modeling tools in detail.

Build Tools

As part of your application data-gathering efforts, you must know which build tools and configuration scripts are used in the application s build environment.

The most common build tool for UNIX applications is the Make utility, which works with the Makefile configuration file to automatically determine which pieces of a large program need to be recompiled. The Make utility also issues the gcc command to recompile those pieces. Every time a source file changes, entering make at a shell prompt causes all the necessary recompilations to be performed. The Make utility uses the last-modification times of the files to decide which of the files must be updated. For each of those files, the Make utility issues the commands recorded in the Makefile.

There are two main implementations of Make: BSD s Make and GNU s Make, which is referred to as gmake . Make can be used with any programming language that has a compiler that runs with a shell command. In fact, Make is not limited to programs. It can be used to describe any task where some files must be updated automatically from others whenever the others change.

To use Make, you must create (or generate ”for example, through a Configure script, as described later in this section) the Makefile, which describes the relationships among files in the program and states the commands for updating each file. For a program, the executable file is typically updated from object files, which are in turn made by compiling the source files.

The Make utility carries out commands in the Makefile to update one or more target names , where the name is typically a program. If no -f option is present, Make looks for a file: GNUmakefile (if gmake), makefile, and Makefile, in that order. The first name checked, GNUmakefile, is not normally used. It is used only if a Makefile is specific to gmake, and will not be understood by other versions of make. The Make utility updates a target if the target depends on prerequisite files that have been modified since the target was last modified, or if the target does not exist.

Imake is a C preprocessor interface to the Make utility. It is used to generate Makefiles from a template, a set of GNU C-Compatible Compiler Preprocessor (cpp) macro functions, and a per-directory input file called an Imakefile. This interface allows computer dependencies (such as compiler options, alternate command names, and special make rules) to be kept separate from the descriptions of the various items to be built.

The xmkmf command is the usual way to create a Makefile from an Imakefile that is shipped with the application software. When invoked with no arguments in a directory that contains an Imakefile, the Imake program is run with arguments appropriate for the system (configured into xmkmf when X Windows was first installed) and generates a Makefile.

When invoked with the -a option, xmkmf builds the Makefile in the current directory and then automatically executes Make Makefiles (in case there are subdirectories), Make includes, and Make depend. This is the usual way to configure software that is outside the X Windows build tree.

You can use programs such as autoconf to generate the configuration scripts automatically, or you can use the Make utility s CFLAGS variable to convey the configuration information. Build and timestamp information is captured in a Makefile for input to the UNIX Make program.

Many applications software packages come with a Configure script of some kind. The Configure script attempts to deduce features of the operating system and set a series of compile-time macros, which will turn on and off code appropriate to the operating system.

Configure is a script toolkit that attempts to determine what kind of system it is running on and then correctly build a Makefile, which in turn is used by make to build the application. The Configure script uses a number of tests to check for the presence or absence of a specified feature, but many of those tests involve small programs that are then run through the C preprocessor or are compiled. The feature tests are generally based on the output of the C preprocessor or driver rather than on execution of the test program. Configure works well when automating and running the same build process repeatedly; however, you should allow time for such builds to be configured. (For more information, see Chapter 7, Creating the Development Environment.)

Note	This chapter describes only the most common Configure scripts ”those generated through the GNU program autoconfig 2.53. Every package has different needs, and this section does not cover all of the Configure scripts that can be used on the many packages available.

Compilers

UNIX compilers process input files through multiple stages: preprocessing, compilation, assembly, and linking. Suffixes for source file names identify the source language, but the name used for each compiler governs the default assumptions. For example, gcc assumes that preprocessed (.i) files are written in C and therefore assumes C style linking; g++ assumes that preprocessed (.i) files are written in C++ and therefore assumes C++ style linking.

A large number of options can be provided to control the processing. For example, options can switch debugging on and off, refer to external libraries, and specify computer-dependent behavior.

A compiler can be used to gather some information about the application s ANSI C compliance. For example, as part of Quick Port migration described later in this chapter, the build Makefile can be modified to include gcc language options that control the dialect of C that the compiler accepts. For example, the -ansi option supports all ANSI C programs and turns off certain features of GNU C that are incompatible with ANSI C. The -pedantic option issues all the warnings demanded by ANSI C and rejects all programs that use forbidden extensions. Valid ANSI C programs should compile properly with or without the -pedantic option (though a rare few will require “ansi ). However, without the -pedantic option, certain GNU extensions and traditional C features are supported in addition to the ANSI features.

Integrated Development Environments

Integrated development environments (IDEs) typically provide all development tools needed for programming, including compilers, linkers, and project/configuration files that generate complete applications, create new classes, and integrate those classes into the current project. IDEs also include file management for sources, headers, documentation, and other material to be included in the project. IDEs can also include WYSIWYG (what you see is what you get) ”the creation of user interfaces that have built-in dialog box and resource editors. The debugging of the application, and the inclusion of any other program needed for development by adding it to a Tools menu, are also typical capabilities.

Microsoft Visual Studio is an IDE that includes all the functions and capabilities just described, including a complete set of development tools for building reusable applications in Microsoft Visual Basic , Microsoft Visual C++ , Microsoft Visual J++ , and Microsoft Visual FoxPro .

The Visual Studio development system also includes Microsoft Visual InterDev , which provides easy integration with the Internet and a full Web page authoring environment; Microsoft Visual SourceSafe , which provides source control; Microsoft MSDN Library, which provides development information, API references, code samples, references, and other information; and Installer, which helps developers create highly reliable, self-repairing applications.

Visual Studio and the individual tools and languages that it contains are the foundation for building Windows-based components and applications, creating scripts, developing Web sites and applications, and managing source code.

Visual Studio allows the developer to:

Write less code by providing programming wizards, drag-and-drop editing, and reuse of program components from any of the Visual Studio languages.
Write code more quickly by minimizing errors with syntax and programming assistance within the editor.
Integrate dynamic HTML, script, and components into Web solutions.
Manage Web sites from testing to production by means of integrated site management tools.
Create and debug Active Server Pages (ASP).
Use design-time controls to visually assemble data-driven Web applications.

Visual Studio also includes the Windows 2000 Developer s Readiness Kit, which contains developer training and technical resources.

Testing, Debugging, and Code Optimization Tools

After the build process has created the application s executable files, the next step should be a test process. The test process can be partially or wholly automated through tools or scripts. In most cases, you need to duplicate the test process ” along with its associated tools or scripts ” on Windows.

Test Process

When creating a test process for the application, consider the following:

Does the application have documented test procedures?
Does the application have its own test programs or test scripts?
If the application has test programs or scripts, how are they implemented? For example, are they implemented through C programs or scripts, Perl, or Korn shell scripts?
Is the application instrumented with compiler preprocessor symbols?

It is a good idea to test the current UNIX versions of the application so that you can validate the process and verify that any testing programs and scripts are up to date. This may help you to identify a latent bug before you find it on Windows.

Tools and/or Scripts

If the application comes with its own test programs or scripts, consider migrating them along with the application. The test programs and scripts should and can go through the same portability analysis as the application programs.

Debuggers

Determine what has been used for debugging the application on the UNIX platforms. For example, the GNU debugger, gdb, is available on almost every platform.

Packaging and Archiving Tools

It is important to understand the different archiving, packaging, and compression facilities that are typically used for UNIX applications, because it is likely that the code will be delivered in one of these forms and will then have to be imported into the Windows environment.

The following are the archiving and packaging facilities typically used for UNIX applications:

tar . The tape archive program, which uses a variety of formats and has nothing to do with tape. It is still one of the most popular formats.
cpio . Copies files into or out of a cpio or tar archive. It was intended to replace tar. The cpio archives can also be unpacked by pax.
pax . The POSIX.2 standard archiver. It reads, writes , and lists the members of an archive file (including tar-format archives), and also copies directory hierarchies.
rpm . The Red Hat Package Manager.
ar . Creates and maintains groups of files combined into an archive. It is not typically used for source archives, but is almost always used for object file archives (static object libraries).

The following are the compression formats typically used for UNIX applications:

compress . Creates a compressed file by using adaptive Lempel-Ziv coding to reduce the size of files (typically 70 percent smaller than the original file).
zip / gzip . Algorithms combine a version of Lempel-Ziv coding (LZ77) with another version of Huffman coding in what is often called string compression.
pack . Compresses files by using Huffman coding.
uncompress , zcat . Extracts compressed files.
gunzip . Decompresses files created through compress, zip, gzip, or pack. The detection of the input format is automatic.
unpack , pcat . Restores files compressed by pack.

Table 4.3 summarizes common suffixes for archived and compressed file names.

Table 4.3: Archived/Compressed File Suffixes
Suffix	Format	Description
.a	ar	Created by and extracted with ar
.cpio	cpio	Created by and extracted with cpio
.gz	gzip	Created by gzip and extracted with gunzip
.tar	tar	Created by and extracted with tar
.Z	compressed	Compressed by compress Uncompressed with uncompress , zcat , or gunzip
.z	pack or gzip	Compressed by pack and extracted with pcat Compressed by gzip and extracted with gunzip
.zip	Zip	Compressed by zip and extracted with unzip , or compressed by pkzip and extracted with pkunzip

Source Code Management Tools

Source code management tools group a number of facilities, such as source code control, build management, and code archiving. A number of source control systems are used in UNIX and cross-platform environments, including Revision Control System (RCS), Source Code Control System (SCCS), Concurrent Version System (CVS), and Program Version Control System (PVCS) Dimensions.

When reviewing the application to be migrated, consider the following:

Is the code currently stored in a code management system?
Does the code management system operate in both the UNIX and Windows environments?

The native Windows source control system is Visual SourceSafe, which supports the Source Code Control Interface (SCCI) API.

Application Usage

It is important to determine how and when the application is used, who uses it, and how often it is used so that you can mitigate or deal with usage- related issues as part of the migration. The best way to determine usage is by speaking with the users and operators. You can use a combination of interviews and workshops for this. It is also helpful to gather additional materials, such as user documentation and training guides.

User-Facing Functionality

When analyzing application features and facilities, you should determine which are used most frequently (and are thus critical to the migration) and which are used less frequently or are not used at all.

Administrator Functionality and Operational Management

Your analysis of the application s administrative or operational feasibilities and usage should include an examination of data management, backup and restore, security management, application configuration, startup and shutdown, failure operation, and contingency planning and disaster recovery.

Service Level Definitions

You can define application service levels in terms of availability, scalability, performance, downtime constraints, planned outages, and similar factors. To complete your analysis, you should determine the application s scalability and performance requirements.