Chapter 17. Introduction to Software Development | UNIX Users Handbook (2nd Edition)

CONTENTS

Introduction
Understanding Computer Programs
Compiled vs. Interpreted Languages

Introduction

UNIX is a mature and robust operating environment, designed for performance, reliability, and scalability. Initially developed by researchers, UNIX has become the development platform of choice for many applications, including business critical applications.

As seen in previous chapters of this book, the UNIX system includes hundreds of utilities also referred to as commands. When these commands are combined in the form of scripts, they create programs to solve problems.

Script languages, however, do have several shortcomings. They are flexible and easy to use, but ill-suited to manipulate the computer's memory and I/O devices directly. Script languages are also interpreted languages. The commands typed into the scripts are read and evaluated only when the script is executed. This fact makes them inefficient because the commands must be reinterpreted each time the script is executed.

Command scripts for shell programs are sufficient in some cases, but for more complex problem solving and functionality, a programming language is necessary.

The following programming chapters will introduce you to programming basics and three popular languages: C, C++ and Java. The content and structure is targeted at beginner programmers, working through basic concepts. The upcoming programming sections are only an introduction. They don't provide enough information to be used as a reference for a specific programming language. You should supplement each of these sections with a book dedicated to the language(s) you'll be using.

Understanding Computer Programs

Computer programs are instructions that we give the computer to perform a task or solve a problem. Human-readable instructions are called source code.

Unfortunately, computers only understand machine language. Machine language, a collection of binary numbers, is unintelligible to us. A developer has to program in terms of binary numbers that corresponded directly to specific machine instructions and locations in the computer's memory. A machine language fragment such as:

00000010101111001010  00000010111111001000  00000011001110101000

is not as clear as source code found in modern programming languages, which may look similar to:

k =i +j;

Languages originally needed to evolve further for ease of use and readability.

Next came assembly languages. Instead of using sequences of binary numbers, an assembly language allows the programmer to use symbolic names to perform various operations and to refer to specific memory locations. A sequence of binary numbers within a machine language program may tell the computer to store a number. The symbolic equivalent may be store x.

Machine and assembly languages are referred to as low-level languages. Low-level languages are time-consuming and difficult to use. Simple programming operations, such as adding two numbers together, require multiple low-level operations. Using the example above, k =i +j, assembly language requires the following operations: load i, add j, andstore k. Each step requires an explicit statement for the programmer to write.

There is still a one-to-one correspondence between each assembly language statement and a specific machine instruction. The machine instruction sets vary for different computer systems. Therefore, a programmer must learn the machine instruction set for each type of computer system used.

Low-level languages create programs that are not portable. This means the program will not run on a different computer system without being rewritten, due to differences in the machine instruction sets. Because assembly language programs are written in terms of these instruction sets, they are machine-dependent.

In contrast, programming languages are considered higher-level languages. C, C++, and Java are examples of programming languages. These languages are more human-readable. Multiple steps in lower-level languages are implied in single statements of higher-level languages. Programming time, effort, and difficulty are reduced. Even UNIX, originally written in assembly language, was rewritten in C language.

The syntax of the higher-level languages became standardized across varying computer systems. There was no longer a need to be concerned with developing a program for a particular computer system's instruction set. Programmers could now write source code, allowing machine-independent programs to be written. Thus, programming in higher-level languages allowed for portability.

But the computer system could not understand the source code. It needed to be translated into machine language. To solve this problem, a special computer program was developed, called a compiler.

Compiled vs. Interpreted Languages

Source code must be translated into machine language for a computer to understand. A compiled language requires a compiler to convert source code into a machine language. An interpreted language requires an interpreter to convert source code into a machine language.

A major difference between compiled and interpreted languages is when this translation occurs.

An interpreted language, such as Java or UNIX script programs, translates into machine language at runtime. Whenever the program is executed, the translation is performed. Interpreted languages tend to be slower and less efficient than compiled languages. There is very little time for an interpreted language to attempt to optimize the resulting machine language for execution. This is shown in Figure 17-1.

Figure 17-1. Interpreted Execution Flow

graphics/17fig01.gif

On the other hand, a compiled language, such as C or C++, allows the translation into machine language to occur before execution time.

The programmer creates the source code, runs the compiler, and executes the program, using the resulting executable. Not only is the compiler given more time to find optimizations, but also the translation into machine language happens only once, at compile time. Compiling does not occur again unless changes are made to the source code, as shown in Figure 17-2.

Figure 17-2. Compiled Execution Flow

graphics/17fig02.gif

Another distinction between compiled and interpreted languages is the potential for portable "executable" code or executables.

An executable is the file that runs the program. If the file name is entered into the computer, the program will run. For example, in C and C++, the executable is the machine code generated by the compiler.

A compiler translates source code into a target language. The target language is machine language for C and C++. The target language for Java is bytecode.

Important Note: Java is both a compiled and interpreted language. Source code is compiled into bytecode, and the bytecode is then interpreted when the program is executed. This discussion is referring to the interpreted aspect of Java programs.

A compiled language needs compiler software on the computer system performing the compile. The compiler is written specifically for that type of computer system. The executable generated by the compiler can only be moved to other computer systems using the same machine instruction set.

For example, all HP 9000 computer systems running HP-UX can run the same C or C++ executable without forcing a recompile on the individual computers. The same executable will not run on IBM, Sun, or DEC computer systems.

If a C or C++ program is to execute on a different computer system, a compiler for that computer system must be obtained and the source code must be recompiled.

Executables for C and C++ are not portable.

In contrast, the target language for Java, bytecode, is portable to different computer systems. Any computer system compiling with a Java compiler will generate a standard bytecode. The bytecode is usable on any computer system, and thus is portable.

The difference is that a Java-compliant virtual machine must be on the computer system executing the Java bytecode. A Java virtual machine, or JVM, accepts bytecode at execution time and interprets it into machine language. The JVM is performing the translation into machine language; therefore, virtual machines are written for specific computer systems.

For example, bytecode generated on a HP 9000 HP-UX computer system can be executed on a Sun Solaris computer system without performing a recompile. The Sun computer system must have a JVM. The JVM used on the HP computer system cannot be used on the Sun computer system.

Thus, the bytecode for Java is portable; the JVM is not.

CONTENTS