Migrating Scripts


This section describes how to port UNIX shell scripts to the Interix and Windows environments. The steps in the process are described in more detail below:

  • Evaluating the script migration tasks

  • Planning for platform differences

  • Considering source and target environments

Scripts fall into two basic categories:

  • Shell scripts, such as Korn and C Shell

  • Scripting language scripts, such as Perl, Tcl, and Python

Shell and scripting language scripts tend to be more portable than compiled languages, such as C and C++. A scripting language such as Perl handles most platform specifics. However, the original developer might have used easier or faster platform-specific features, or simply might not have taken cross-platform compatibility into consideration.

The choice of porting approach depends on the source script type and whether the target environment is Windows only, Windows plus Interix, or uses CGI scripts.

With an Interix installation, a large number of the usual UNIX commands are available. Because Interix provides both the Korn and Tenex C shells , many UNIX shell scripts run under Interix without conversion. For more information, see Porting Shell Scripts at http://www.microsoft.com/windows2000/docs/portingshellscripts.doc.

In the Windows-only environment, a solution is to write all common scripts in Perl because there are several versions of Perl available. If software is to be maintained on both UNIX and Windows-based systems, writing all-new scripts in Perl, and even converting some existing shell scripts to Perl, is a good strategy.

Evaluating the Script Migration Tasks

Before script migration begins, all required tasks need to be considered . To identify script migration tasks, consider the following questions:

  • Does the script rely on the syntax of the shell?

  • Does the script use substantial external programs?

  • Does the script use extensions that rely on third-party libraries?

  • Does the script use or rely on nonportable concepts for essential functionality?

  • Can a quick port be done now, with a rewrite later?

  • Does the developer understand enough of the original code to quickly locate the issues, and then make the changes necessary to port to a new platform?

By answering these questions, script migration tasks can be evaluated and defined. Redesigning and rewriting portions of the application might be easier than porting because it is more efficient to take advantage of native features.

Planning for Fundamental Platform Differences

When porting scripts, the code needs to address some inevitable fundamental differences between the platforms. The following areas, which are described in more detail later in this section, are often sources of script migration issues:

  • File system interaction and I/O

  • Environment variables

  • Shell and console handling

  • Interprocess communication

  • Process manipulation

  • Device and network programming

  • User interfaces

  • Localization and internationalization

File System Interaction

UNIX and Windows-based systems interact differently with the file system. The UNIX and Interix path separator is a forward slash (/); Windows uses the backslash (\). The root of UNIX and Interix files is represented by the forward slash (/), but Windows uses locally mounted drives ([ A-Z ] :\ ) and network-accessible drives using the Universal Naming Convention ( \\ ServerName \ SharePoint \ Dir \ ).

The first things you should correct in any code to be migrated are any hard-coded file paths. These paths are commonly used to find initialization or configuration files (that is, to set up environment variables or application paths). One common mistake when doing initial porting work is to refer to a Windows-based file in native form. The problem is that the backslash ( \ ) is also the common escape character. As a result, the path C:\dir\text.txt is translated as C:dir ext.txt . (The space is a single tab character.)

In most cases, Windows can also handle the forward slash ( / ) as a path separator. However, when building cross-platform paths, scripting language compilers can misinterpret even correctly used file path separators or methods .

Unlike UNIX file systems, Windows Win32 file systems are not case-sensitive. They may preserve the case of file names , but the same directory cannot contain two different files where only the case of the file name letters differ (for example, file.txt and FILE.txt). Windows also does not allow users to create a file with the same name as the directory in which it is created.

Keep in mind when hard coding paths in a script that certain Windows directories change depending on native language. For example, the directory named C:\Program Files\ in the English version of Windows is named C:\Programme\ in the German version.

The exact names for paths and other information that may be critical in porting your code are often found in the Windows registry. For example, the correct path for the Program Files directory can be found in HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\ProgramFilesDir.

The registry is a central database of information about your Windows system. It must be referred to often when other platform-independent methods are not available. Use the regedit command to peruse the Windows registry and get a feel for its basic structure. Some of the information stored in the registry is also available by using language APIs, which are safer to use.

Environment Variables

Both Windows and UNIX use environment variables. Although Windows maintains an environment array, its contents are not similar to UNIX. The Windows environment array is not case-sensitive, so the environment variables PATH , path , and PaTh all refer to the same item. The PATH variable is similar in purpose across platforms (for example, shells search the directories specified in the PATH environment variable for executables and scripts), but Windows uses a semicolon ( ; ) as a separator, whereas UNIX and Interix use a colon ( : ). Fortunately, compiled languages usually have features that handle the differences in usage of the PATH variable.

Commonly used UNIX environment variables are HOME, PATH, USER , and TEMP . Windows also has the PATH and TEMP variables, and sometimes has the others. To determine which environment variables are used in a Windows installation, use the abstractions in a compiled language or look in the Windows registry. As an alternative, you can use the following technique.

To see the full contents of the environment

  1. Right-click My Computer , and then click Properties .

  2. In the System Properties dialog box, click the Advanced tab.

  3. Under Environment Variables , click Environment Variables .

  4. In the Environment Variables dialog box, view and modify the environment.

Note that Windows has separate User and System environments. Administrator rights are required to modify the System environment.

Scripts commonly require a temporary data file, which is usually hard-coded to reside in /tmp on UNIX. On Windows and UNIX, use the TEMP environment variable instead to refer to an acceptable temporary file directory. Some scripts also rely on environment variables beginning with LC_, which indicate the locale information for that system.

Do not expect files to always be the same at the binary level. For example, Windows uses CRLF ( carriage return/ linefeed or characters \015\012) at the end of a line; UNIX uses LF only; Interix supports both, and provides a flip command-line utility to convert between the two formats. Script environments provide methods for handling this transparently . Another nuance is that ^Z (character \032) represents the end-of-file character. A UNIX script with this character embedded in code might ignore it, and Windows might stop reading the file at that point. Interix behavior can vary depending on the utility or program; for example, the flip utility removes the last control-Z in a file, but leaves embedded control-Z characters unchanged when converting from Windows to POSIX format.

Shell and Console

The shell is found on all UNIX desktops. Interix includes both the Korn and C shells. Windows provides a command shell as well. Windows 2000 stores the path to the shell in the COMSPEC variable of the environment array. Developers interact with the command shell during testing, but it can interfere or act in unexpected ways during the normal operation of a script. Some languages can run without any attachment or specific connection to a terminal. Refer to the language specifics for how to make the console behave as required.

Some scripts call the shell to reuse existing commands, such as cat , ls , sendmail , date , and grep . Relying on the shell is not recommended because it not only wastes processing power by creating external process execution overhead, but it is also highly nonportable. To avoid portability issues, it is better to rely on the methods that the language provides.

For example, the following example might not be portable:

 set date [exec date "+%D %H:%M"] 

The following example is portable:

 set date [clock format [clock secs] -format "%m/%d/%y %H:%M"] 

Also note that invoking commands from the shell automatically uses wildcard expansion (usually referred to as globbing ). When the script relies on globbing, you should use the language methods for file globbing to expand file names. In cases where it is unavoidable to call the shell, it is important to note that the Windows command shell has different native commands and quoting rules.

Process and Thread Execution

The script might need to deal with process manipulation, especially if external system calls are unavoidable. In a language that supports process manipulation, the features are usually portable to Windows 2000. However, it is still necessary to evaluate all uses of process manipulation to ensure that the application code is manipulating the correct Windows processes.

It is common in UNIX to manage processes by passing signals, especially for daemon processes and system administration tasks. Signal handling, where handled by the language, is similar to process manipulation. Some uses of signal handling are portable from UNIX to Windows 2000, but not all signals are relevant. Windows uses an event passing model. A UNIX daemon process ported to Windows needs to respond to these events. When porting a UNIX daemon on Windows, it is necessary to create a Windows service that provides essentially the same functionality.

It is important to note that a fork command can have different behavior on Windows depending on the language. If the fork command is used in a Web application, it is highly recommended that you look at alternative techniques for achieving the same result on Windows. The best solution is to switch to using threads.

Device and Network Programming

Many applications built today use a client/server model, or must follow network or interprocess communications requirements, such as HTTP, TCP/IP, and UDP. Scripting languages provide varying levels of abstraction over the standard system mechanism for communicating with files and sockets. Because some are more portable than others, it is important to examine socket handling when porting code. Methods for interprocess communication outside of socket programming or communicating through a pipe are to be avoided because they are normally nonportable. A well-known remote procedure call (RPC) mechanism that works well across platforms and fits well into Web server applications is Simple Object Access Protocol (SOAP), which most scripting languages already support.

An application that communicates with the serial port or other system device can use the same protocol for interacting with the device, but often must address the device differently. For example, a serial device on UNIX can be addressed as the special file /dev/ttya. On Windows, it is addressed as COM1:.

User Interfaces

Many scripting languages have access to one or more graphical user interface (GUI) toolkits. If the language used in script has a GUI toolkit, it is important to determine the portability of that toolkit across platforms. Interix, the UNIX portability layer, provides a port of the curses terminal user interface library.

Tk is a GUI toolkit common to Tcl, Perl, and Python. It is fully cross-platform compatible between UNIX and Windows. Some of the finer points of cursor and font handling can vary between these systems because of underlying operating system differences.

Considering the Target Environments

This section describes the major scripting environments found in the target environments. For each there are some differences with the scripting environment under UNIX. These are described so that they can be addressed during the migration.

Porting UNIX Shell Scripts to Interix

When porting a shell script from an open -system implementation of UNIX (such as System V4 or BSD) to Interix, there are only two significant differences. First, by default Interix stores binaries in one of three directories: /bin, /usr/contrib and /usr/local/bin. For example, Perl is installed in one of those directories. Second, even though Interix has a standard UNIX file hierarchy ” and a single-rooted file system with the forward slash (/) as the base of the installation regardless of the Windows drive or directory ” absolute paths can be different. Absolute paths normally do not need to be converted because adding symbolic links can handle most situations. For example, /usr/ucb can be linked to /usr/contrib/bin and /usr/local/bin can be linked to /usr/ contrib /bin.

Additional considerations are as follows :

  • First, port scripts that set up either local or environment variables.

  • The Interix C shell initialization process executes two files before the . cshrc and .login files in the user s home directory. These are /etc/csh.cshrc and /etc/csh.login.

  • Be aware of the current limits of Interix shell parameters so that appropriate action can be taken. These parameters and their current limits are:

    • Maximum length of $path ($PATH) variable = ARG_MAX (normally not a problem)

    • Maximum (shell) command length = ARG_MAX (normally not a problem)

    • Maximum (shell) environment size = ARG_MAX

    • Maximum length of command arguments, that is, length of arguments for exec() in bytes, including environ data (ARG_MAX) = 1048576

    • Maximum length of file path (PATH_MAX) = 512

    • Maximum length of file name (NAME_MAX) = 255 (normally not a problem)

  • Modify any scripts that rely on information from /etc/passwd or /etc/ group (for example, a script that uses grep to find a user name) to use other techniques, such as Win32 ADSI scripts, to obtain information about a user). Examples include:

    • Calls to Interix getpwent() , setpwent() , getgrent() , and setgrent() APIs

    • Win32 ADSI scripts

    • Win32 net user commands

CGI Script Migration

The Common Gateway Interface (CGI) protocol is the standard interface used by Web servers to execute programs and scripts that handle dynamic content. Any language can be used as a CGI language if it supports reading and writing STDOUT and STDIN console handles, and chances are that many existing scripts are CGI-based. In recent years , many Web server plug-ins have been written for scripting languages to work around performance limitations in CGI, although using these plug-ins sometimes requires minor changes to the CGI script itself. Apache has direct language plug-ins for Perl (mod_perl), PHP (mod_php), and Tcl (mod_tcl). Through the Internet Server API (ISAPI), Microsoft Internet Information Server (IIS) has a direct language plug-in for Perl called PerlEx.

Normally, CGI portability is not an issue because CGI is a standardized interface available under all major Web servers.




UNIX Application Migration Guide
Unix Application Migration Guide (Patterns & Practices)
ISBN: 0735618380
EAN: 2147483647
Year: 2003
Pages: 134

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net