Section A.1. A Brief History of Software Engineering

A.1. A Brief History of Software Engineering

The first modern computer was an electromechanical, typewriter-size device developed in Poland in the late 1920s for enciphering messages. The device was later sold to the German Commerce Ministry, and in the 1930s was adopted by the German military for enciphering all communication. Today we know it as the Enigma. Enigma used mechanical rotors for changing the route of electrical current flow from a key type to a light board with a different letter on it (the ciphered letter). Enigma was not a general-purpose computer: it could only do enciphering and deciphering (today we call it encryption and decryption). If the operator wanted to change the encryption algorithm, he had to change the mechanical structure of the machine by changing the rotors, their order, their initial positions, and the wired plugs that connected the keyboard to the light board. The "program" was therefore coupled in the extreme to the problem it was designed to solve (encryption), and to the mechanical design of the computer.

The late 1940s and the 1950s saw the introduction of the first general-purpose electronic computers for defense purposes. These machines could run code that addressed any problem, not just a single predetermined task. The downside was that the code executed on those computers was in a machine-specific "language" with the program coupled to the hardware itself. Code developed for one machine could not run on another. Initially this was not a cause for concern since there were only a handful of computers in the world anyway. As machines because more prolific, in the early 1960s the emergence of assembly language decoupled the code from specific machines, and enabled code to run on multiple machines. However, the code was now coupled to the machine architecture. Code written for an 8-bit machine could not run on a 16-bit machine, let alone withstand differences in the registers or available memory and memory layout. As a result, the cost of owning and maintaining a program began to escalate. This coincided more or less with the widespread adoption of computers in the civilian and government sectors, where the more limited resources and budgets necessitated a better solution.

In the 1960s, higher-level languages such as COBOL and FORTRAN introduced the notion of a compiler. The developer would write in an abstraction of machine programming (the language), and the compiler would translate that into actual assembly code. Compilers for the first time decoupled the code from the hardware and its architecture. The problem with those first-generation languages was that the code resulted in nonstructured programming, where the code was internally coupled to its own structure, via the use of jump or go-to statements. Minute changes to the code structure had devastating effects in multiple places in the program.

The 1970s saw the eminence of structured programming via languages such as C and Pascal, which decoupled the code from its internal layout and structure, using functions and structures. The 1970s were also the first time that developers and researchers started to examine software as an engineered entity. To drive down the cost of ownership, companies had to start thinking about reusewhat would make a piece of code able to be reused in other contexts. With languages like C, the basic unit of reuse is the function. The problem with function-based reuse is that the function is coupled to the data it manipulates, and if the data is global, a change to benefit one function in one reuse context damages another function used somewhere else.

A.1.1. Object-Orientation

The solution to these problems in the 1980s was object-orientation, with languages such as Smalltalk, and later C++. With object-orientation, the functions and the data they manipulate are packaged together in an object. The functions (now called methods) encapsulate the logic, and the object encapsulates the data. Object-orientation enables domain modeling in the form of a class hierarchy. The mechanism of reuse is class-based, enabling both direct reuse and specialization via inheritance. But object-orientation is not without its own acute problems. First, the generated application (or code artifacts) is a single monolithic application. Languages like C++ have nothing to say about the binary representation of the generated code. Developers had to deploy huge code bases every time, even for minute changes. This had a detrimental effect on the development process, quality, time to market, and cost. While the basic unit of reuse was a class, it was a class in source format. Consequently, the application was coupled to the language used. You could not have a Smalltalk client consuming a C++ class or deriving from it. Moreover, it turned out that inheritance is a poor mechanism for reuse, often harboring more harm than good because the developer of the derived class needs to be intimately aware of the implementation of the base class, which introduces vertical coupling across the class hierarchy. Object-orientation was oblivious of real-life challenges, such as deployment and versioning. Serialization and persistence posed yet another set of problemsmost applications did not start by plucking objects out of thin airthey had some persistent state that needed to be hydrated into objects, and yet there was no way of enforcing compatibility between the persisted state and the potentially new object code. If the objects were distributed across multiple processes or machines, there was no way of using raw C++ for the invocation, since C++ required direct memory reference and did not support distribution. Developers had to write host processes and use some remote call technology such as TCP sockets to remote the calls, but such invocations looked nothing like native C++ calls and did not benefit from it.

A.1.2. Component-Orientation

The solution for the problems of object-orientation evolved over time, involving technologies such as a static library (.lib) and a dynamic library (.dll), culminating in 1994 with the first component-oriented technology called COM (Component Object Model). Component-orientation provided interchangeable interoperable binary components. Instead of sharing source files, the client and the server agree on a binary type system (such as IDL) and a way of representing the metadata inside the opaque binary components. The components are discovered and loaded at runtime, enabling scenarios such as dropping a control on a form and having that control automatically loaded at runtime on the client's machine. The client only programs against an abstraction of the service, a contract called the interface. As long as the interface is immutable, the service is free to evolve at will. A proxy could implement the same interface and thus enable seamless remote calls by encapsulating the low-level mechanics of the remote call. The availability of a common binary type system enables cross-language interoperability, and so a Visual Basic client could consume a C++ COM component. The basic unit of reuse is the interface, not the component, and polymorphic implementations are interchangeable. Versioning was controlled by assigning a unique identifier for every interface, COM object, and type library. While COM was a fundamental breakthrough in modern software engineering, most developers found it unpalatable. COM was unnecessarily ugly because it was bolted on top of an operation system that was unaware of it, and the languages used for writing COM components (such as C++ and Visual Basic) were at best object-oriented but not component-oriented, which greatly complicated the programming model, requiring frameworks such as ATL to bridge the two worlds. Recognizing these issues, Microsoft released .NET 1.0 in 2002. .NET is (in the abstract) nothing more than cleaned-up COM, C++, and Windows, all working seamlessly together under a single, new component-oriented runtime. .NET supports all the advantages of COM, and mandates and standardizes many ingredients such as type metadata sharing, serialization, and versioning. While .NET is at least an order of magnitude easier to work with than COM, both COM and .NET suffer from a similar set of problems:

Technology and platform: The application and the code are coupled to the technology and the platform. Both COM and .NET are only available on Windows. Both COM and .NET expect the client and the service to be either COM or .NET, and cannot interoperate natively with other technologies, be they on Windows or not. While bridging technologies such as web services make interoperability possible, they force the developers to let go of almost all of the benefits of working with the native framework and introduce their own complexities.
Concurrency management: When a vendor ships a component, it cannot assume it will not be accessed by multiple threads concurrently by its clients. It fact, the only safe assumption the vendor can make is that the component will be accessed by multiple threads. As a result, the components must be thread-safe and must be equipped with a synchronization lock. If an application developer is building an application by aggregating multiple components from multiple vendors, the introduction of multiple locks renders the application deadlock-prone. Avoiding the deadlock couples the application and the components.
Transactions: If the application wishes to have the components participate in a single transaction, it requires the application that hosts them to coordinate the transaction and flow the transaction from one component to the next, which is a serious programming fit. It also introduces coupling between the application and the components regarding the nature of the transaction coordination.
Communication protocols: If components are deployed across process or machine boundaries, they are coupled to the details of the remote calls, the transport protocol used, and its implication on the programming model (such as reliability and security).
Communication patterns: The components could be invoked synchronously or asynchronously, and could be connected or disconnected. A component may or may not be able to be invoked in either one of these modes, and the application must be aware of the exact preference.
Versioning: Applications may be written against one version of a component and yet encounter another in production. Dealing robustly with versioning issues couples the application to the components it uses.
Security: Components may need to authenticate and authorize their callers, and yet how would the component know which security authority it should use or which user is a member of which role? Not only that, but the component may want to ensure that the communication from its clients is secure, and that, of course, imposes certain restrictions on the clients and in turn couples them to the security needs of the component.

Both COM and .NET tried to address some (but not all) of these challenges using technologies such as COM+ and Enterprise Services respectively (similarly, Java had J2EE), but in reality, such applications were inundated with plumbing. In a decent-size application, the bulk of the effort, the development, and the debugging time is spent on addressing such plumbing issues, as opposed to focusing on business logic and features. To make things even worse, since the end customer (or the development manager) rarely cares about plumbing (as opposed to features), the developers are not given adequate time required to develop robust plumbing. Instead, most handcrafted plumbing solutions are proprietary (which hinders reuse, migration, and hiring), and are of low quality, because most developers are not security or synchronization experts and because the developers were not given the time and resources to develop the plumbing properly.