Evaluating the Application


Now that data gathering is complete, you can begin to thoroughly analyze the application. You should examine each building block of the application to determine any issues that may arise in porting the code to the new platform.

The application blocks include the following:

  • User interface

  • File and device I/O

  • File and process security

  • Processes and threads

  • Interprocess communication

  • Signals

The results of this examination will provide you with insight into any migration issues that may exist, and therefore the most appropriate approach to migrating the application. If you find a substantial number of issues, you should consider a rewrite of the application. More likely, however, you will need to choose between porting the application to the Interix environment or making it work under Win32.

In the analysis, you should look for use of UNIX-specific code. UNIX applications can contain millions of lines of custom code. The effort to rewrite an application generally increases as the amount of code increases, and using porting tools becomes more viable as the amount of code increases . The major issues are normally with code that the application uses to communicate with the UNIX operating system through system calls. Solaris, HP-UX, Advanced Interactive Executive (AIX), Linux, FreeBSD, and other UNIX brands all have some unique architectural features, APIs, commands, and utilities.

Unix-specific code will use either UNIX standard conventions (for example, the file hierarchy) or function calls that are specific to the source UNIX environment. You should log each occurrence of a UNIX-specific code element, because it will influence the decision on how to migrate the application.

In addition, you should consider whether the application code has been written in a hardware-independent manner. Examine the word size of the basic data types (for example, 64-bit versus 32-bit pointers), byte ordering (big-endian versus little-endian), and data alignment in structures. To facilitate portability, all hardware dependencies must be isolated and conditionally compiled and linked for the target environment build process. Or, all hardware dependencies must be rewritten to use hardware-independent constructs. UNIX “based applications that are designed around modular and portable coding methodologies have taken these issues into consideration.

You should determine whether the application contains custom device drivers. For example, custom device drivers are very common in process control applications. These device drivers are not portable and must generally be rewritten for the Windows 2000 platform.

User Interface

In your evaluation of the application, you should review the user interface to determine how it is built (that is, what libraries it uses) and what standards (if any) it uses.

X Windows/Motif

You can determine whether the UNIX application is using X Windows, Motif, or xrt libraries by looking at the make program s Makefile and the output of the application s build. For example, you can use grep and ldd , as described earlier in this chapter.

X Windows libraries include:

  • X11 toolkit library (libXt.a)

  • X11 intrinsics library (libXi.a)

  • Athena widget library (libXaw.a)

  • X11 extensions library (libXext.a)

  • X Windows Display Manager (XDM) control protocol library (libXdmcp.a)

  • Xauthority routines library (libXau.a)

  • Miscellaneous utilities library (libXmu.a)

When a UNIX application makes calls to these libraries, it is linked to one or more of the following: X11, Xau, Xaw, Xi, Xmu, Xt, and Xtst. These libraries contain several hundred API calls and do not map easily to the Win32 user interface API. To successfully port a user interface that uses this API, you must ensure that these libraries are available on the Windows platform (for example, by using Interix s X11R5 library).

When a UNIX application makes calls to the Motif API, it is linked to either or both of the following libraries: xm, mrm. These libraries also contain several hundred API calls that do not map easily to the Win32 GUI API. To port the user interface that makes use of this API, you must ensure that the Motif libraries are available on the Windows platform. Note that Motif libraries are built on top of X11R5 or R6 libraries and therefore require those as well. Additionally, the Motif Window Manager, mwm, is required to perform window management functions.

OpenGL

OpenGL is an API that allows an application to manipulate three-dimensional graphics on the screen, and it is available on both UNIX and Windows. On UNIX, OpenGL is often mixed with X Windows and Motif code to display buttons , menus , and dialog boxes. When this is the case, the X Windows and Motif code guide the migration choice between an application port or a rewrite.

Character-Mode Interfaces

A character-mode user interface application writes to the console one line at a time (for example, by using the C printf() library call), and data input is requested through the use of prompts. This code can easily be migrated to Win32, usually by just recompiling the code.

When a UNIX application makes calls to the curses API, it is linked to a library called Curses or nCurses. These libraries contain several hundred API calls that do not map easily to the Win32 user interface API. To successfully port a user interface that makes use of these APIs, you must ensure that the libraries are available on the Win32 platform. This library makes use of the terminfo technology that is available on UNIX and not on standard Windows 2000. Use of the Interix curses or ncurses library makes a port of this user interface relatively easy.

File and Device I/O

The UNIX approach to file access differs from that of Windows. UNIX has a single file hierarchy based on the root file system (as indicated by a slash mark), whereas Windows uses a separate letter for each file system (for example, A and B for floppy disks, and C through Z for hard disks). UNIX determines the location of the root file system only at boot time; the other file systems are added to the directory tree by mounting them (for example, with the entry: mount /dev/fd0 /mnt/floppy ) by means of entries in the table /etc/fstab (or possibly /etc/vfstab or /etc/mnttab).

At the lowest level of the UNIX file system, files are referred to by numbers ”that is, inodes . Inodes are indices to the Inode table , a part of the file system reserved for describing files (somewhat similar to the role of the file allocation table [FAT] in the file system included with the Microsoft MS-DOS operating system). The directory system enables a file to be referred to by a name. The relationship between a file name and an inode is called a link .

There are two types of links: hard links and symbolic links . A hard link (sometimes referred to as a traditional link) links a file name to an inode, and also enables a single file to have multiple names (that is, links). A symbolic link is a file whose contents are the name of another file. In a hard link, each of the file names has the same relationship to the inode; in a symbolic link, the symbolic link name refers to the true name and directory location of the file. If you delete one of several files (including the original) that are cross-referenced by hard links, the other file names will continue to work, but if you delete a file that is referenced by a symbolic link, then the symbolic link will point to nothing.

The network file system is a method of sharing file systems across networks. NFS has some similarities to, but is quite different from, the server message block (SMB) file system used on MS-DOS and Windows. With NFS, a UNIX computer can mount file systems connected to a different computer on the network ”for example, mount hostname:/exporteddir1 /mnt. From the perspective of the local computer, /mnt is now just another portion of the single file hierarchy.

Devices are also treated as files from a UNIX application perspective.

The following are some questions that you need to ask to obtain information about the application from the perspective of file and device I/O:

  • Is there a reliance on absolute path names?

  • Are hard links and/or symbolic links used? Example calls to look for include readlink() , which reads the contents of a symbolic link, and symlink () , which creates a symbolic link to a file.

  • Are there any NFS file system dependencies?

  • Are file and device function calls used and required, especially the use of calls that are neither ANSI C/C++ nor POSIX compliant? The call to look for is chsize() , which changes the end of file on an open file.

  • Is non-blocking file I/O (that is, asynchronous I/O) used and required?

  • Are there any file-locking and/or record-locking requirements?

  • Are memory mapped files being used? Example calls to look for are mmap() , which maps a file into memory, and munmap() , which removes mappings for files in memory.

Interprocess Communication

As discussed in Chapter 2, UNIX introduced a philosophy of computing with features such as pipes , which provide the ability to link the output of one program to the input of another. Pipes are just one means of transferring data between processes. Various UNIX system implementations offer other forms of interprocess communication, as explained in the following subsections.

Process Pipes

Process pipes are found in all versions of UNIX, and are also supported by Interix. They transfer data in one direction only. In general, the output of one process is piped (attached) to the input of another. Process pipes require ancestry between the processes (for example, a parent/child relationship).

Function calls to look for in the application are:

  • pipe() . Creates a pipe.

  • popen(), pclose() . Processes I/O by using pipes.

Named Pipes

Named pipes, also called FIFO (first in, first out) pipes, are process pipes with file names that allow unrelated (no ancestry) processes to communicate with each other. Named pipes transfer data in one direction only. For example, one process opens the FIFO for reading, whereas another process opens the FIFO for writing. In effect, a named pipe can be used just like a file.

Function calls to look for in the application are:

  • mkfifo() . Makes a FIFO special file (a named pipe).

  • mknod() . Creates a regular file, special file, or directory (historical call for creating a named pipe).

System V IPC

System V IPC uses a messaging queue to facilitate interprocess communication. Processes can add or remove messages from the queue. In addition, they can access shared memory and create semaphores, and read or set semaphore values.

Message Queues

Message queues are like named pipes, but with two differences. Named pipes transmit a byte stream and are one directional. Message queues transmit records, and these messages can have different priorities. Therefore, with message queues it is necessary to determine the sequence in which the records are received. (POSIX also has message queues, but the APIs are not exactly the same as in the System V implementation.)

Function calls to look for in the application are:

  • msgctl() . Controls message operations.

  • msgget() . Gets a message queue identifier.

  • msgrcv() . Reads a message from a message queue.

  • msgsnd() . Sends a message to a message queue.

Shared Memory

Shared memory allows two unrelated processes to access the same logical memory. It is a special range of addresses that is created for one process and is added to the address space of that process. Other processes attach to the same shared memory segment, and the segment also becomes part of their address space. A message is sent by writing data into a buffer that is part of this shared memory segment of both processes.

Function calls to look for in the application are:

  • shmat() . Attaches a shared memory segment operation.

  • shmctl() . Controls shared memory operations.

  • shmdt() . Detaches a shared memory segment operation.

  • shmget () . Allocates a shared memory segment.

Semaphores

When two processes share a memory segment, one process cannot determine when the other is doing something. For example, two processes could be modifying the same data at the same time. To deal with such situations, Dijkstra introduced the concept of the semaphore. A semaphore is a special variable that takes only positive integer numbers and upon which only two operations are allowed: wait and signal.

A small positive number (counter) is maintained in the semaphore. When a process accesses the semaphore (wait operation), it decrements the counter. If the counter is still nonzero, the process has access; otherwise , the process is blocked and execution of the process is suspended .

Function calls to look for in the application are:

  • semctl () . Performs a control operation on a semaphore.

  • semget() . Gets or creates a set of semaphores.

  • semop() . Operates on a set of semaphores.

Sockets

Beginning with BSD 4.2, sockets were introduced as part of the Transmission Control Protocol/Internet Protocol (TCP/IP) networking implementation. The sockets interface is an extension of the pipes concept. Sockets are therefore used in much the same way as pipes, but they can be used across a network of computers over the TCP/IP transport protocol. In BSD systems, the other IPC communications facilities are based on sockets.

Function calls to look for in the application are:

  • accept() . Accepts a connection on a socket.

  • bind() . Binds a name to a socket.

  • bindresvport() . Binds a socket to a privileged IP port.

  • connect() . Initiates a connection on a socket.

  • getsockname() . Gets a socket name.

  • getsockopt(), setsockopt() . Gets and sets options on sockets.

  • listen() . Listens for connections on a socket.

  • recv(), recvfrom() . Receives a message from a socket.

  • recvmsg() . Receives a message from a socket.

  • send(), sendto () . Sends a message to a socket.

  • sendmsg () . Sends a message to a socket.

  • socket() . Creates an endpoint for communication.

  • socketpair() . Creates a pair of connected sockets.

Streams

Beginning with System V4, streams were introduced as a generalized I/O concept. Streams can be used for both local and remote communications, just like sockets. A stream is a full-duplex data path within the kernel between a user process and drivers. The primary components are a stream head, a driver, and zero or more modules between the stream head and driver. A stream is analogous to a pipe, except that data flow and processing are bidirectional.

The stream head is the end of the stream that provides the interface between the stream and a user process. The principal functions of the stream head are processing stream- related system calls and data between a user process and the stream.

A module contains processing routines for input and output data. It exists in the middle of a stream, between the stream s head and a driver. A module is the streams counterpart to commands in a shell pipeline, except that a module contains a pair of functions that allow independent bidirectional data flow and processing.

Function calls to look for in the application are:

  • getmsg () , getpmsg () . Retrieves the next message in a stream.

  • putmsg() , putpmsg() . Sends a message in a stream.

Header files to look for in source files are:

  • stream.h

  • stropts.h

Stream Pipes and Named Stream Pipes

Stream pipes are similar to process pipes, but can transfer data in both directions. They may be names, as named stream pipes. Stream pipes are typically used in System V for passing file descriptors between processes.

Header files to look for in source files are:

  • stream.h

  • stropts.h

Note  

Stream pipes and named stream pipes are unrelated to streams.

Processes and Threads

The Single UNIX Specification (both UNIX95/XPG4v2 and UNIX98/XPG5) defines a process as an address space with one or more threads executing within that address space, and the required system resources for those threads.

UNIX is designed to be a multiple-processing, multiple-user system. At any point in time, many applications and processes are running. UNIX makes it easy to create processes, and many of the features of the operating system and shells result in the common practice of running many programs at once. A UNIX “based application can start a new program and process, replace its own process image, and duplicate its process image. When UNIX duplicates its process image, the new process becomes a child of the creating process. This process hierarchy is often important, and there are system calls for manipulating child processes.

UNIX process-handling functions do not map directly to the Windows environment; therefore, you must identify such function calls. The following is partial list of these calls:

  • system() . Passes a command to the shell, which starts a new program, thereby creating a new process.

  • execl(), execle(), execlp(), execv(), execve (), execvp() . Replaces the current process image with a new process image by executing a file.

  • fork() . Creates a new process. The new process (child process) is a copy of the calling process (parent process), with some exceptions ”for example, the child process has a unique process ID.

  • vfork() . Creates a new process, just as fork() does, but doesn t fully copy the address space of the parent process.

  • popen() . Opens a process by creating a pipe, using fork() to create another process, and invoking the shell.

UNIX also makes it convenient to create a group of cooperating processes, especially when the processes access terminals. Such groups are known as process groups , which include the following:

  • getpgrp() . Returns the process group ID of the current process.

  • setpgid() . Adds a process to a process group.

  • setsid() . Creates a new process group and sets the calling process as group leader.

  • tcgetpgrp () . Returns the ID number of the terminal s foreground process group.

  • tcsetpgrp() . Sets the foreground process group ID.

  • killpg() . Sends a signal to a process group. (For more information about signals, see Signals later in this chapter).

When a program is creating processes, it also needs a variety of function calls to manage the processes. The following are some examples of function calls:

  • getpid() . Returns the process ID of the calling process.

  • getppid() . Returns the process ID of the parent of the calling process.

  • getpriority(), setpriority () . Gets or sets a process s nice value ( nice refers to setting a process priority to a low value, such that it does not take priority over other processes).

  • getrlimit (), setrlimit() . Sets or retrieves resource limits.

  • getrusage() . Gets information about the use of resources.

  • sleep() . Suspends process execution for intervals of seconds.

  • wait(), waitpid() . Waits for process termination.

A thread is a sequence of control within a process. All processes have at least one thread of execution. When the process creates a new thread, the thread gets its own stack for the maintenance of all its function parameters and local variables required for thread execution. However, the thread shares global variables , file descriptors, and other process characteristics with the other threads in the process.

The idea of threads has been in existence for some time for various UNIX systems, but until the IEEE POSIX committee created the POSIX.1c-1996 thread extensions, each UNIX vendor s implementation was unique. Threads are now much more standardized, and are available on most UNIX platforms. The POSIX threads are known as pthreads; some of the function calls are as follows :

  • pthread_create() . Creates a new thread.

  • pthread_exit() . Terminates the calling thread.

  • pthread_join() . Waits for termination of another thread.

  • pthread_detach() . Puts a running thread in the detached state.

  • pthread_attr_init() . Initializes the thread attribute object attr and fills it with default values for the attributes.

The following are some questions that you need to ask to obtain information about the application from the perspective of its process and thread structure:

  • Does it make extensive use of process creation and management?

  • Does it make use of process groups?

  • Does it need to run under the context of a different user or group at times during its execution?

  • Does it make extensive use of multiple threads (that is, is it multithreaded)?

Process Management

Functions in this category provide support for the scheduling and priority management of processes.

Note  

These functions are not supported by Interix.

Process Functions

Function Description

getexecname

Return path of executable file

p_online

Return or change processor operational status

Priocntl

Display or set scheduling parameters of specified process(es)

Priocntlset

Generalized process scheduler control

Processor_info

Determine type and status of a processor

pset_assign

Manage sets of processors

pset_create

Manage sets of processors

pset_destroy

Manage sets of processors

pset_info

Get information about a processor set

Setpriority

Set  process  scheduling priority

File and Process Security

The security models of UNIX- and Windows 2000 “based applications are different. Most UNIX and Windows-based applications have not been written to support a common security model, such as Kerberos.

UNIX systems also have a concept of setuid (set-user-identifier-on-execution) and setgid (set-group-identifier-on-execution) bits in the permissions bits for a file. By using the chmod utility, the setuid bit and/or setgid bit can be set and reset on files. All UNIX systems maintain at least two user/group IDs, the effective user/group ID and the real user/group ID. Some systems also support a saved set user/group ID.

These IDs are not only used for user access to files. Processes are also allocated certain permissions. UNIX systems manipulate user/group IDs in the following ways:

  • If the setuid or setgid bit is not set, UNIX runs the program as the specific user or group that executed the program; therefore, both the effective user/group ID and the real user/group ID are set to the user or group that executed the program.

  • If the setuid or setgid bit is set, UNIX runs the program under the user/group ID of the file s owner, not the user/group ID of the user who is executing the file. The effective user/group ID is set to the owner of the program (file), and the real user/group ID is set to the user or group that executed the program (file). If there is a saved set user/group ID, it is also set to the owner of the program (file).

There are also a number of function calls that allow a program to manipulate the effective, real, and saved set user/group IDs:

  • getgid() . Returns the real group ID of the calling process.

  • getegid() . Returns the effective group ID of the calling process.

  • setregid() . Sets real and effective group IDs.

  • setreuid() . Sets real and effective user IDs.

  • setuid(), seteuid(), setgid(), setegid() . Sets real and effective user/group IDs.

  • setuser () . Changes effective and real user/group IDs of a process.

Sometimes a trusted application may require access to files and/or services to which the user should not have unlimited access. For example, the UNIX application ps requires access to /dev/proc, which only the superuser (root) has access to. UNIX solves this problem by allowing the application to run as a specific user or group. That is, if you execute an application that has the setuid bit set in the file permissions, this bit instructs the kernel to run the application with the identity and privileges of the owner of the file. Similarly, the setgid bit set causes the application to run as if it had been executed in the group to which the file belongs.

Signals

A signal is an event generated by the shell and by terminal device handlers in the UNIX operating system, in response to a condition. Signals tell a process that an exceptional condition has occurred or that something needs attention or action. For example, signals happen when the computer hardware detects such conditions as floating point overflows and memory segment violations. Signals are also used for interprocess communication. Common IPC signals are used to end a process or to notify a parent process that a child process has stopped executing.

It is important to understand the UNIX application s signal implementation to determine what issues you will encounter during the migration to Windows. The following list provides a brief overview of the signal models of each of the four UNIX systems, and therefore, what to look for in the application:

  • UNIX Seventh Edition had unreliable signals (that is, there was no guarantee of the signal s success) and provided only one signal function: signal() .

  • BSD 4.2 introduced reliable signals (that is, signal status could be checked and acted on), supported by the following functions:

    • killpg() . Terminates a process group.

    • sigsetmask() . Replaces the set of blocked signals with a new set specified in a mask.

    • sigblock() . Adds (as a logical OR) the signals specified in a mask to the set of signals currently being blocked from delivery.

    • siggetmask() . Returns the current set of masked signals.

    • sigvec() . Sets the disposition of a signal.

    • sigaltstack() . Gets or sets alternate signal stack content.

  • System V introduced another implementation of reliable signals, supported by the following functions:

    • sigset() . Modifies the handling of the specified signal.

    • sighold() . Blocks delivery of the specified signal by setting the corresponding bit in the process signal mask.

    • sigrelse() . Allows delivery of the specified signal by resetting the corresponding bit in the process signal mask.

    • sigignore() . Sets the disposition of the specified signal to SIG_IGN (ignore signal).

    • sigpause() . Allows delivery of the specified signal by resetting the corresponding bit in the process signal mask and then suspends the process until any signal is delivered.

    • sigaltstack() . Gets or sets alternate signal stack content (also XPG4 conformant).

  • POSIX.1 introduced a third implementation of reliable signals, supported by the following functions:

    • sigaction () . Changes the action taken by a process on receipt of a specific signal, and also specifies a mask of signals to be blocked during the processing of a signal.

    • sigpending () . Gets a pending signal mask.

    • sigprocmask () . Manipulates the current process signal mask.

    • sigsuspend() . Temporarily sets the process signal mask and then waits for a signal.

    • sigemptyset() . Initializes the given signal set to empty, with all signals excluded from the set.

    • sigfillset() . Initializes the set to full, including all signals.

    • sigaddset() . Adds the specified signal to the set.

    • sigdelset() . Deletes the specified signal from the set.

    • sigismember() . Tests whether the specified signal is a member of the set.

Note  

Signals on a POSIX.1 system are neither BSD nor SVID. POSIX defined a new signal mechanism based on the sigaction() API.

In addition, there is an ANSI C signal implementation that uses the signal() function. This API is built on top of the POSIX.1 sigaction() model. Furthermore, the ANSI raise() function sends a signal to the current process.

The POSIX.1 committee introduced its new signal semantics because of problems with traditional signal implementations found on BSD and System V systems. When the System V3 signal() function catches a signal, the action associated with the signal is reset to the default. In BSD 4.3, the action is not reset. In the ANSI C standard, the signal() function either resets the default or does an implementation-defined blocking of the signal. The POSIX sigaction() call does not reset the default if the handler returns normally.

There may be an opportunity during the application s migration to Windows to convert the signal code to use the POSIX.1 signal calls.




UNIX Application Migration Guide
Unix Application Migration Guide (Patterns & Practices)
ISBN: 0735618380
EAN: 2147483647
Year: 2003
Pages: 134

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net