4.4 Program Distribution and Execution


4.4 Program Distribution and Execution

A software program embodies the actions required in the processing, storage, and communication of information content. It consists of instructions authored by a programmer and executed by a computer that specify the detailed actions in response to each possible circumstance and input. These actions are embodied in an algorithm, which prescribes those actions and the sequence in which they must be taken.

Software in isolation is useless; it must be executed, which requires a processor. A program captures an algorithm in a form that the processor can understand and act upon. A processor has a fixed set of available instructions; a program specifies a sequence of these instructions. There are a number of different processors with distinctive instruction sets, including several that are widely used.

A challenge in the software business is distribution: How does software get from where it is produced to where it is executed, that is, on a processor available to a user? Software is represented digitally and can thus be replicated cheaply, stored on the same media as information, and transferred over a network. This leaves unanswered some questions, however. In what form is the software distributed? In what manner are the replicas conveyed from supplier to user? These questions have profound implications to the business model of software suppliers.

This section deals with the technical aspects of this problem. Later, in chapters 5 and 6, some related business issues are discussed. There is also much more involved in getting an application up and running for the benefit of users than just distributing the software (see chapter 5).

4.4.1 Application and Infrastructure

After software is distributed, it typically doesn't execute in isolation. Rather, its execution requires a context incorporating a substantial amount of infrastructure software, making use of capabilities provided by that infrastructure through its APIs. The prevalence and market share of a given infrastructure largely define the available market for software making use of that infrastructure; this creates obstacles to the commercialization of infrastructural innovations (see chapter 7).

As defined in section 2.2.6, infrastructure software provides capabilities that support a variety of applications, while application software is focused on needs specific to end-users. The application developer's attention focuses on user needs, whereas the infrastructure developer is more concerned with providing useful capabilities to a variety of application developers and operators. Infrastructure software comes closer to being purely technical. Although applications are becoming more specialized (see section 3.1), even specialized applications have much in common, and this creates an opportunity for expanding infrastructure (see chapter 7).

As applications become more specialized, market size decreases, but the value to each user may be higher. Can a higher revenue from each user compensate for the smaller user base? To make it more likely that answer will be yes, an important trend is reducing the cost of application development. One method is software reuse, using an existing solution rather than creating something from scratch (see chapter 7). Another is software components (see chapter 7), which are modules designed for multiple uses, software tools, and user programming (see section 4.2.7). Another is expanding the capabilities of the infrastructure by observing what is reimplemented in various applications and capturing it in a general and configurable way.

Example Although some applications don't restrict access, many applications require access control, the ability to specify who has permission to use them, means of identifying users trying to gain access, and means of denying access to users not having permission. It makes sense to capture this access control in the infrastructure. An alternative is reusable software components implementing these functions that can be incorporated as is into many applications.

The software category where this infrastructure expansion is occurring is often called middleware. Middleware is something added to the existing infrastructure, just below the application, to expand its capabilities.

Example Message-oriented middleware (MOM) provides a set of capabilities surrounding the generation, storage, and retrieval of messages (packages of information, such as documents, packaged up for communication). Work flow applications, where documents flow through an organization and are acted upon, can make use of MOM capabilities.

Infrastructure software faces many industry and economic challenges, among them overcoming issues of network effects (see chapters 6 and 9).

4.4.2 Platform and Environment

There are many software execution models, which lead directly to different forms in which software can be distributed, as well as distinct business models.

Consider a specific software program (it could be an application, or part of the infrastructure, or a component), which we call here the software distribution. As described in the last subsection, the distribution likely relies on complementary software, and other software may rely on it. One instance is where the distribution is an application that depends on an infrastructure.

Example An application program always requires an operating system (OS) (e.g., Apple OS, Linux, Mac OS, Microsoft Windows, Sun Solaris). The OS is infrastructure software providing many functions.

Another instance is where the software distribution is an application that depends on other applications.

Example Each of the programs in an office suite (word processor, spreadsheet, and presentation) needs the ability to create and edit drawings. Rather than implementing separate drawing editors, these individual programs will likely share a common drawing editor.

A platform is the aggregate of all hardware and software that is assumed available and static from the perspective of the software distribution. Sometimes there is other optional software, neither part of the platform nor under control of the platform or the distribution. The aggregation of platform and this other software is the environment for the distribution. These concepts are illustrated in figure 4.4.

click to expand
Figure 4.4: Architecture of a platform and environment for a software distribution.

Example The Web browser has become part of a platform for many applications that make use of its presentation capabilities. An application can add an executable program to a browser, for example, a program written in JavaScript that is embedded in a Web page. The Netscape browser can be extended by adding plug-ins, software capabilities that extend the browser's capabilities. Where an application program depends on a capability to function at all, that capability becomes part of its platform requirements. If it is not essential for the functioning of the software distribution, but if present can be used, it becomes part of the environment. For example, a plug-in may enhance the looks of the user interface but not be necessary for proper functioning of the browser.

Other software may come to rely on the software distribution's being available and static, in which case the distribution becomes a part of that program's platform. Thus, the platform is defined relative to a particular software distribution and will typically be different (but overlapping) for different distributions.

Usually, although not always, the platform is infrastructure software and the distribution is an application.

Example An office suite is an application that may use a particular operating system as its platform. Middleware is infrastructure software that uses an operating system platform; this illustrates that infrastructure software can also be a distribution.

4.4.3 Portability

The market reality is that potential customers use different software suppliers, and thus a software distribution's platform may use software not readily available to every potential customer. There are several approaches a supplier might take. It can address only one platform, thereby limiting market share or forcing customers to adopt a different platform and imposing added switching costs (see chapter 9 for a discussion of lock-in). The supplier can produce a variation on the distribution for more than one platform, increasing development and distribution costs. Or the distribution can be made portable, allowing a single software code to run on different platforms.

The first (but not only) obstacle to portability is distinct microprocessor instruction sets. Regardless of portability considerations, it is desirable that programming not be too closely tied to a particular processor instruction set. Because of the primitive nature of individual processor instructions, programs directly tied to an instruction set are difficult to write, read, and understand. Portable execution—the ability of a program to execute without modification on different microprocessors—is the first step to portability.

Portable execution falls far short of portability because a distribution makes use of many capabilities provided by its environment.

Example An executing distribution may need to communicate across the Internet, store and retrieve files from storage, and display interactive features on its screen. To be portable, the distribution would have to perform these functions in equivalent ways on different platforms and without modification.

A given software distribution is portable with respect to a specific set of platforms if full functionality and behavior are preserved when executing on those platforms. Besides portable execution, portability requires that each platform provide the distribution with an environment on that platform that appears to be identical from the distribution's perspective. The word appear is important here; the environments on different microprocessors need not actually be identical (and in fact cannot be identical if they incorporate different platforms). What is required is that the differences be encapsulated and hidden from the distribution, at the API.

Example If a software distribution is to run on both an Apple Macintosh and a Windows PC, the environments will necessarily incorporate different microprocessors (a PowerPC and an Intel-86 compatible processor, respectively) and different operating systems, since Mac OS doesn't run on a PC and Windows doesn't run on a Macintosh[10]. If we succeed in this, the distribution will have a larger potential market, including both PC and Mac users.

A way to achieve portability was already illustrated in figure 4.4. First, the distribution software is developed using an abstract execution model, divorced from the instruction set of a particular processor. Second, a virtual machine is added to each platform that realizes the abstract execution model by building on the native instruction set of the platform. This achieves execution portability, inevitably at some expense in performance. Third, an environment is created that appears to the distribution to be the same on each platform. This environment provides means to access operating system services, access the network and display, and so on, that appear identical across all platforms. That environment provides standard APIs, with equivalent functionality and behavior behind those APIs.

Example Java from Sun Microsystems defines an environment for portable execution that is available for a number of platforms (Apple OS, Microsoft Windows, Linux, and others). Java provides a virtual machine (called the Java virtual machine) that executes a program represented by so-called bytecode, as well as uniform ways to access operating system services, the network, and display. For example, a Java virtual machine is embedded in Web browsers so that Web servers can download programs represented as Java bytecode for execution in the browser environment, without regard to the platform and in a uniform environment provided by the browser. Similarly, Microsoft .NET introduces the Common Language Runtime (CLR), a virtual machine that supports multiple programming languages and portable execution. Although different in technical detail, it is comparable to a Java virtual machine. Portability is achieved by the portable subset of the .NET Framework. The underlying specifications for the runtime, for the new programming language C# (pronounced "C sharp"), and for several of the frameworks have been standardized by an industry standards group, the European Computer Manufacturers Association in Geneva (ECMA 2001a; 2001b). As with Java, there are variants on CLR for other platforms and for small devices.

Portability is valuable to both users and software suppliers. For suppliers, it allows a single software distribution to be developed for two or more platforms, reducing development, maintenance, and distribution costs. For users, it decouples decisions about platform from decisions about application software and allows them to replace one without the other. The most compelling advantages arise in a networked environment, as discussed later.

Having said this, the goal of universal portability—all software distributions execute identically on all platforms—is probably not an attainable goal and arguably not even desirable. Universal portability can be accomplished in one of two ways. First, all platforms offer essentially the same capabilities and thus are undifferentiated and commoditized. At that point, there is little incentive or opportunity to invest in new capabilities that differentiate one platform from another. Alternatively, all distributions take advantage only of capabilities common to all platforms, what might be called "lowest common denominator" capabilities. Here again, the differentiating benefits that one platform might offer are neutralized, since the other platforms do not offer this capability. In either case, innovation in the platform is largely thwarted. Thus, portability is most reasonably ascribed to a single software distribution regarding its execution on a specific set of platforms.

4.4.4 Compilation and Interpretation

The virtual machine illustrates that software programs can have different representations that preserve the same functionality, just as information can have different representations that preserve its content or meaning (see Section 2.1). In the case of programs, equivalent functionality can be preserved even while executing on microprocessors with different instruction sets, but only if the necessary software infrastructure is in place to translate from the assumed instruction set (the virtual machine) to the physical microprocessor instruction set. This idea put into practice is an essential underpinning to software distribution with a diversity of platforms in the hands of users.

The program format manipulated directly by the software developer is called source code. Source code is designed to enhance its descriptive abilities in the application context, and also to be a natural representation for human programmers. Source code is inappropriate for direct execution on a processor—the microprocessor and the human have very different capabilities and needs. The processor executes object code, expressed directly in the processor instruction set. Source code is for people to write and understand, object code is for machines to execute, and is generally not very accessible to people.[11]

Example Source code to add up the first n integers would look something like:

sum = 0;

For {i = 1 to n} sum = sum + i;

When translated to object code, this small program would consist of a sequence of instructions having specific meaning to the microprocessor in terms of its primitive instructions. The processor typically has a set of registers to store data values and a set of primitive instructions operating on the values currently stored in these registers. The object code does primitive operations like resetting a register value, adding the values in two registers, jumping through the instructions back to the beginning, and checking if the index i (stored in one register) has yet reached n (stored in another register). While a person could figure out what this object code was accomplishing, it would be tedious.

There are many specialized source code languages in use, for general purposes the holdovers from early days of computing are COBOL and FORTRAN, and more modern widely used languages include C, C++, and Java.[12] There are strong indirect network effects in languages. While software written in different languages can be composable, source code is an important means of documentation and communication among programmers, so more widely used languages prove more valuable. In addition, development organizations tend to choose a language familiar to a large number of prospective employees.

The form of object code that is directly executed on the microprocessor is called native code. Native code is expressed in terms of instructions implemented directly by the microprocessor. A software tool can automatically translate the source code to object code, including native code. However, it is not necessary to translate directly from source to native code. Instead, a series of transformations can be used to achieve that goal—these transformations can even be staged to happen at different times and places (Lee and Leone 1996). This adds a needed degree of flexibility in business and operational models for software, especially in the Internet age, when software is often distributed over a network.

Specifically, three approaches to distribution software are prominent today (see figure 4.5). The two single-stage approaches dominated before the Internet:

  • Native code distribution. Once development is complete, a single automatic translation from source to native code creates a native code representation of the program that can be distributed to customers.[13] This translation is called compilation and is performed by a compiler.

  • Source code distribution. Source code is distributed to the customer, and the translation to native code occurs on the fly, during the execution of the program. This translation is called interpretation and is performed by an interpreter. The interpreter must be included in the platform; often it is added to the native platform to create an environment for interpreted programs. Special source code languages are designed[14] to be particularly appropriate for interpretation, including the widely used JavaScript and Visual Basic Script.

click to expand
Figure 4.5: Three common approaches to distributing software: native code, source code, and intermediate object code.

A compiler is analogous to a German-to-English language human translator who waits until the end of the speech and then translates the entire speech. A software interpreter is analogous to a human interpreter who translates the speech as it is spoken.

Example ECMAScript (ECMA 1999), sometimes called JavaScript or Jscript, is an interpreted language in which small programs can be written and the source code embedded in Web pages. When the Web browser encounters a JavaScript program, it interprets and executes it, providing an environment and APIs for this program to do various things. JavaScript is used to enhance the richness of functionality of Web pages, extending the native capabilities of HTML, where the programmer must not be sensitive to making source code available.

Each of these first two approaches has strengths and weakness—this problem is addressed by the additional stage of translation shown in figure 4.5, intermediate object code. The advantages and disadvantages of single-stage compilation and interpretation are listed in table 4.2. The choice of one or the other method has considerable business implications.

Table 4.2: Considerations in Choosing Single-Stage Compilation vs. Interpretation

Advantages

Disadvantages


Compilation

Compilation to native code avoids the distribution of source code, contributing to maintaining trade secrets and enhancing encapsulation. The execution time interpreter overhead is avoided.

Execution portability is lost, because a different native code version must be generated for each platform. Different environments on the different platforms necessitate different variants of the source code, not simply a recompilation for each platform.

Interpretation

Execution portability can be achieved if a set of platforms include an interpreter for the source language and provide an equivalent environment for a set of software distributions.

Source code is distributed to user, compromising trade secrets and the benefits of encapsulation. Interpretation during execution adds processing overhead. Portable execution depends on a compatible environment provided on the host platform, reducing market size.

A multistage transformation with more than one object code representation—the native code and also one or more intermediate object codes—can overcome these disadvantages. Each intermediate object code is not native code—it requires at least one stage of transformation to native code—but it is low-level object code, helping to preserve trade secrets and encapsulation, and reducing the runtime overhead in the final stage of interpretation.

In this case, compilation and interpretation can be combined in a two-stage process, gaining the advantages of both. One stage of compilation to an intermediate object code is followed by a stage of interpretation by a virtual machine (see figures 4.4 and 4.5). The intermediate object code is the program representation replicated and distributed to the customer. Portability to all platforms hosting the intermediate object code is preserved, while retaining the advantages of distributing object rather than source code. An added advantage is that the source code language need not be designed with interpretation in mind.

Example Java can be compiled into intermediate object code called bytecode. Identical bytecode can be distributed to and executed on different platforms, as long as each includes a Java virtual machine (bytecode interpreter). Java can also be compiled directly into native object code for a given platform.

For a software distribution executed multiple times on the same processor, the repetitive overhead of interpretation can be avoided by just-in-time (JIT) compilation, in which a compiler is invoked within the interpreter to compile some of the intermediate object code to native code. Compilation includes optimization, tuning the output object code to minimize execution time. JIT compilation's online optimization actually improves efficiency by observing the local execution.[15]

Example Current implementations of Java illustrate this (Suganuma et al. 2000; Sun Microsystems 1999a). All current Java virtual machine implementations use JIT compilation, often including online optimization, to achieve good performance.

There are many variations on this theme. Interpretation can be avoided entirely without losing portability by always applying install-time or JIT compilation (as is the case with the CLR virtual machine of the Microsoft .NET Framework). In a narrower definition of portability, interpretation and JIT compilation can allow any software designed for one platform to be run on another specific platform, enhancing the latter with a new suite of applications.

Example Digital Equipment Corporation (absorbed into Compaq, then into Hewlett-Packard) included interpretation and compilation in its Alpha platform that allow Microsoft Windows applications to execute.[16]

4.4.5 Trust in Execution

An important security issue to users is the implicit trust that a user places in an executing program (Devanbu, Fong, and Stubblebine 1998). An untrustworthy program could damage stored data, violate privacy, or do other invasive or damaging things (see section 3.2.8 and chapter 5). This is a consideration in the choice of an intermediate object code format and the design of the interpreter.

Two different models are currently in use. Users or operators acquiring software from what they consider a reputable software supplier may trust the code based on that fact alone. Software sold in shrink-wrap form in retail stores can use physical techniques like unique holograms to help establish trust. However, today much software is distributed over a network, where approaches based on physical security are not feasible. Fortunately, security technologies provide an even more trustworthy approach, the digital signature (see chapter 6), which verifies that code originated from a specific supplier and has not been tampered with since it left the supplier. This does not affect the choice of intermediate object code or program functionality.

Example Java applets, Microsoft's Authenticode technology for ActiveX controls, bytecode, and Microsoft's .NET Framework assemblies use digital signatures. The .NET assemblies are unusual in that two signatures are used. The standard signature protects against tampering. An optional additional signature establishes the authenticated originator of the assembly.

The second approach is policy-based, as many security systems are. A set of policies as to what the intermediate object code is allowed and not allowed to do is established, and the virtual machine is responsible for enforcing these policies.[17] There may be user-set configuration options, allowing the user some control over the trade-off between functionality, usability, and security.

Example A strict policy might preclude the executing program from reading and writing files in storage, or from communicating over a network. This might be appropriate for some programs but not for others. For example, if the user knows a program needs to legitimately write to a file on disk, she can relax that policy.

Security always requires qualifiers. Both the techniques described require the user to place implicit trust in the platform and environment to enforce security policies or properly check digital signatures. A piece of hardware, software, or hardware/software combination that is relied upon to enforce security policies is called a trusted system.[18]

Example The software environment that checks signatures or enforces execution policies is also acquired from a software supplier. The user is thus placing implicit trust in that supplier. Even assuming the supplier is trustworthy (as is normally the case), security can still be compromised by tampering with that software after it leaves the supplier. That vendor can add a digital signature, but how can that signature be checked in a trustworthy manner?

4.4.6 Operating System

A universal software infrastructure is the operating system. It provides core infrastructure capabilities that virtually all applications rely upon. Among its functions are to provide an abstract execution environment serving to isolate the program from unnecessary hardware details (e.g., the particulars of how data is stored on disk), hide the reality that multiple programs are executing concurrently on the same computer (called multitasking), allocate shared resources (e.g., memory and processor cycles) to programs, and provide useful services (e.g., network communication). The operating system is thus an essential part of any platform, along with the hardware. Two platforms can differ because they have different operating systems, different hardware, or both.

Example The Intel 86-compatible PC has two commonly used operating systems, Microsoft Windows and Linux, and thus supports two platforms. Linux runs on other hardware and hence provides similar platforms on different hardware foundations. However, these Linux platforms are not identical; for example, they provide software distributions with different processor instruction sets.

Some users appreciate the ability to mix and match solutions from different equipment and software vendors, and this contributes to competition.

4.4.7 Development Tools

Development tools help programmers' productivity and enhance their ability to manage complexity. These software tools automate tasks that would otherwise be time-consuming and do a number of other functions like keeping track of and merging changes. Sophisticated toolkits are necessary for the management and longterm success of large projects involving hundreds or thousands of programmers. Today, most developments employ an integrated development environment (IDE) that combines many tools in an integrated package.

Example Traditionally, the two most important tools of a software developer were source code editors and compilers. In IDEs the toolkit has grown to include functional and performance debuggers, collectors of statistics, defect trackers, and so on. However, facing the substantial complexity of many current software systems, build systems have become one of the most important sets of tools.[19] They coordinate the largely independent efforts of different software teams, allowing them to merge their efforts into a single software distribution while providing audit trails and automating retrenchment as necessary.

In terms of the categories of software (see section 2.2.6), these tools can be considered either applications (serving a developer organization) or infrastructure (aiding the development of many applications).

[10]There are infrastructure extensions to the Mac OS that allow it to run Windows programs, such as Connectix's Virtual PC for Mac or Lismore's Blue Label PowerEmulator. Assume this is not present, so that the portability issue is more interesting. Alternatively, assume this is not sufficient, as it presents suboptimal integration of user interface behavior, look, and feel.

[11]People can and do write object code. This used to be much more common, before Moore's law reduced the critical need for performance. However, object code (or a representation very close called assembly language) is still written in performance-critical situations, like for example in the kernel of an operating system or in signal processing.

[12]C has long held primacy for system programming tasks (like operating systems). C++ is an extension of C using object-oriented programming, a methodology that supports modularity by decomposing a program into interacting modules called objects. Java was developed more recently, primarily to support mobile code. New languages are arising all the time. For example, C# is a new language (based on C++) designed for the Microsoft .NET initiative.

[13]In some cases, source code as well as object code may be distributed (see section 4.2.4).

[14]Interpretation introduces runtime overhead that reduces performance, whereas the onetime compilation before distribution is not of concern. Languages that are normally interpreted include built-in operations that perform complex tasks in a single step, allowing an interpreter to efficiently map these operations to an efficient implementation. Languages designed to be compiled avoid such built-in complex operations and instead assume that they could be programmed using only primitive built-in operations and operations already programmed.

[15]By monitoring the performance, the online optimizer can dynamically optimize critical parts of the program. Based on usage profiling, an online optimizer can recompile critical parts of the software using optimization techniques that would be prohibitively expensive in terms of time and memory requirements when applied to all the software. Since such a process can draw on actually observed system behavior at use time, interpreters combined with online optimizing compilation technology can exceed the performance achieved by traditional (ahead-of-time) compilation.

[16]There is nothing special about intermediate object code: one machine's native code can be another machine's intermediate object code. For instance, Digital developed a Pentium virtual machine called FX!32 (White Book 2002) that ran on Alpha processors. FX!32 used a combination of interpretation, just-in-time compilation, and profile-based online optimization to achieve impressive performance. At the time, several Windows applications, compiled to Pentium object code, ran faster on top of FX!32 on top of Alpha than on their native Pentium platforms.

[17]A generalization of this checking approach is now attracting attention: proof-carrying code. The idea is to add enough auxiliary information to an object code so that a receiving platform can check that the code meets certain requirements. Such checking is, by construction, much cheaper than constructing the original proof: the auxiliary information guides the checker in finding a proof. If the checker finds a proof, then the validity of the proof rests only on the correctness of the checker itself, not on the trustworthiness of either the supplied code or the supplied auxiliary information.

[18]Of course, what we would really like is a trustworthy system. However, within the realm of cost-effective commercial systems, security systems are never foolproof. Thus, it is better to admit that a system may be trusted out of necessity but is never completely trustworthy. This distinction becomes especially important in rights management (see chapter 8).

[19]A build system takes care of maintaining a graph of configurations (of varying release status), including all information required to build the actual deliverables as needed. Industrial-strength build systems tend to apply extensive consistency checks, including automated runs of test suites, on every check-in of new code.




Software Ecosystems(c) Understanding an Indispensable Technology and Industry
Software Ecosystem: Understanding an Indispensable Technology and Industry
ISBN: 0262633310
EAN: 2147483647
Year: 2005
Pages: 145

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net