Abstraction, Classification, and Generalization | MDA Distilled. Principles of Model-Driven Architecture

No matter the discipline, a modeler ignores some information that is not of interest and groups important information together based on common properties, even though the entities under study are, of course, different from one another. This is where abstraction and classification come into play.

"Abstracts" is the ideologically sound term for just ignoring something (it sounds like a whole lot more work), while "classifies" captures the notion of grouping based on common properties. Both take place in many contexts, but one crucial area is the formalization of knowledge of some universe of discourse in a model.

Suppose, for example, we're faced with a pet store whose owner needs some software to help in the business. Figure 3-1 shows the relationship between the animals and the concepts of abstraction and classification.

Figure 3-1. Abstraction, classification, and generalization

graphics/03fig01.gif

Broadly speaking, abstraction involves moving from the left column to the right one, while classification involves moving from the upper row to the lower one. Allow us to explain.

The upper left-hand corner shows a universe of discourse that contains various real, abstract, or hypothetical things, in this case, a cat Munchkin, a dog Fido, and an unnamed slug (it's not a pet, so why would it have a name?). We classify these creatures according to their common properties: All dogs slobber to a certain degree, and all cats are more or less standoffish or cuddly.

These groupings, Dogs, Cats, and Distracting Animals, are shown in the lower left-hand corner.

Note that classifying things does not change the amount of detail; rather, it is more a grouping or set-building operation that finds commonalities and arranges common instances into the same class.

The second important dimension is that of abstraction. Some properties of a pet are not relevant, such as the precise color of its fur or its sleeping habits. Some of the creatures we find aren't themselves relevant they exist, but their primary property of interest is that we don't care much. We abstract away from each entity's original properties, leaving only those of interest. Note that abstraction doesn't mean that the real-world pet's peculiarities that its owner love have gone away; rather, abstraction collars only the properties of interest.

Common features of both dogs and cats (in this case, name and weight) have also been generalized into a separate class, Pet. This grouping of common features of all pets is called generalization. It is "double-strength" classification, in that a single beast, say Fido, has been classified both as a Dog and as a Pet.

A modeler typically combines the three steps of classifying, abstracting, and generalizing, and thus accomplishes them simultaneously. That is, he or she looks at the problem domain the real Fido, Fifi, Munchkin, and Squishy (even a slug has a name to his mates) and abstracts away unwanted information at the same time that he or she classifies the problem-domain things into types. We just know that both Rex and Fido are dogs, and that cats and dogs are pets. In our heads, we classify all of these pets that the store has (or will ever have) in stock, each of which has its own name and weight and so forth, into a few entities: the Pet class, with subclasses for Dog, Cat, and so forth. At the same time, we start to think about the properties of these classes, which we previously abstracted away during generalization, and attribute these properties to the right groups. So, we've abstracted, classified, and generalized the set of animals in what seemed to be a single step.

The right-hand column of Figure 3-1 represents the modeling realm. The class diagram at the bottom captures the types and their properties of interest, as well as the generalization relationships that we identified during the classification step. The upper right-hand quadrant represents the objects in our system, each of which is an instance of a particular class in the modeling realm.

The instances we've created are abstractions of the original problem-domain things, but only with the properties of interest. Hence, the object referred to by pointer F1D0 is the abstracted computer version of the real-world Fido. (Cloning aside, we can't magically instantiate real-world cats and dogs.)