Data Flow Diagrams


Team-Fly

	Requirements Analysis: From Business Views to Architecture By David C. Hay
	Table of Contents

	Chapter 4. Column Two: Activities

Data Flow Diagrams

The data flow diagram is a specialized kind of dependency diagram. Data flow diagrams are the workhorses of requirements analysis and have been around for longer than almost any other technique. They are specifically concerned with data dependency and show this in terms of the flow of information from one activity to another. Their usefulness is limited by the facts that the nature of the flows in an enterprise changes frequently (so diagrams tend to go out of date quickly) and that many modern enterprises aren't as likely to use data in sequence, instead processing them asynchronously from some central data store. Still, it is worthwhile to understand data flow diagrams and to be able to recognize when they can be useful.

Data flow diagrams were described originally by Tom DeMarco in 1978 and a year later by Trish Sarson and Chris Gane. Their different notations do essentially the same thing. Mr. DeMarco's notation seems particularly suitable when you do not have a CASE tool and must produce the diagrams by hand. Ms. Sarson's and Mr. Gane's are a little more rigorous in identifying each symbol.

Because the activities of a data flow diagram are inherently sequential, they describe processes rather than functions . Still, it may be appropriate at some level to break down the function hierarchy sequentially, so that the resulting activities can be linked together in a data flow diagram.

The context of a data flow diagram is defined by its external entities . An external entity is a party that supplies or consumes information. Note that within the diagram, information is neither created nor destroyed . All the information referred to there originates from one or more external entities and eventually must wind up in one or more external entities. As shown in Figure 4.7, Mr. DeMarco uses a simple box for this, while Ms. Sarson and Mr. Gane use a shadowed box. Ms. Sarson and Mr. Gane also add a one- or two-letter designator to identify the entity.

Figure 4.7. External Entities.

graphics/04fig07.gif

The central symbol in a data flow diagram is a process . In Ms. Sarson's and Mr. Gane's notation, this is represented by a round-cornered rectangle, as shown on the left side of Figure 4.8. It is labeled by a piece of text and a number. Optionally, it may also show the party that performs it. In Mr. DeMarco's notation it is represented by a simple circle with a textual label and a number. Typically a DeMarco diagram does not show who is doing the process. An example of this is shown on the right side of Figure 4.8.

Figure 4.8. Processes.

graphics/04fig08.gif

The data flow in a data flow diagram is shown by a solid arrow. In Ms. Sarson's and Mr. Gane's version the line tends to be orthogonal (left and right or up and down), with elbow bends. In Mr. DeMarco's notation it is typically curved . Figure 4.9 shows this, with Ms. Sarson's and Mr. Gane's version on the left and Mr. DeMarco's on the right.

Figure 4.9. Flows.

graphics/04fig09.gif

There are actually two kinds of data flows. Information is simply data passed from one process to another or between a process and a data store; a message is information that something has happened , calling for a response from the receiving process. It is an event. Messages are not stored in a data store.

This distinction is not represented in the diagram, but it is important to understand if you want to understand the nature of the flows. Other modeling techniques make this explicit. The IDEF0 technique, described below, does so.

Note that a data flow diagram is concerned only with data flows, not material flows. Sometimes this can be inconvenient, since the flow of material may be what the department is about. Still, the presence of material may itself be information. A good example is receiving shipped materialbut it is the information that the shipment arrived, not the material itself, that is of interest. "Existence of material received" (or simply, "receipt of material") would be the data flow. Ms. Sarson and Mr. Gane do recognize that material flow should sometimes be kept track of, and they introduce a special kind of arrow for this purpose, but it is rarely used.

It is often the case that data will reside somewhere temporarily before they are used. An example is a purchase order awaiting receipt of purchased material. This is represented on a Sarson and Gane data flow diagram as an open -sided rectangle (called a data store ), or, on a DeMarco diagram, as lines above and below (or sometimes just above) text (called a file ). These are shown in Figure 4.10, again with the Sarson and Gane version on the left and the DeMarco version on the right.

Figure 4.10. Data Stores.

graphics/04fig10.gif

Figure 4.11 shows a complete diagram in the Sarson and Gane notation. From here on, that notation will be used in the example, since it has a slightly more rigorous system of symbol identification.

Figure 4.11. A Complete Sarson and Gane Data Flow Diagram.

graphics/04fig11.gif

Note, by the way, that the data shown in either flows or data stores represent an observer's view of the data. These are, in effect, views of the entities and attributes represented in the entity/relationship diagram. A view can be defined just as an SQL view would be defined, in terms of the entities and attributes it contains. It will be described by a common name , such as "payment", "order details", or some such term . The data- flow and entity models can then be linked together via the strict definition of each of these views.

For example, Figure 4.12 shows a data model and the views of it that could be represented as data stores or flows. In the model, each PURCHASE must be by one and only one PARTY and from one and only one PARTY . Each PURCHASE , in turn , may be composed of one or more LINE ITEMS , each of which must be for one and only one BOOK.

Figure 4.12. A Data Model and Data Stores.

graphics/04fig12.gif

The derived attribute "(value)" in LINE ITEM is computed by multiplying "price" in BOOK by "quantity ordered" in LINE ITEM. This in turn is summed across all LINE ITEMS that are part of a PURCHASE to calculate "(total value)" as an attribute of PURCHASE. This in turn is summarized across all PURCHASES that are from a (vendor) PARTY to calculate "(total purchases)" from that PARTY.

Similarly, "(total value)" for a BOOK is calculated as the total of "value" in LINE ITEM across all LINE ITEMS that are for the BOOK.

One data-store view, then, could be "Pending Purchase Orders", from the diagrams described here. This consists of all four entities. Note that for the data flow diagram the only orders of interest were those that had a "Status" of "Pending". In fact, the view could as easily be simply "Purchase Orders". Another might be "Vendor List", showing for each vendor ( PARTY that is the seller in a PURCHASE) the total purchases for the year. This is created from the PARTY entity, but with the attribute "Total purchases" that is computed as the total of "(Total value)" from all the PURCHASES the PARTY is a vendor in . Another view is "Book List", showing for each book its total purchases. This is easily captured from the BOOK and LINE ITEM entities.

"Exploding" Processes

Note that neither the function hierarchy nor the data flow diagram shows the internal logic of a process. This must be done as separate documentation or models, as described below. (See page 187.)

As with the function hierarchy, processes in a data flow diagram can be exploded into component processes. That is, one diagram may describe the fact that a process shown on a higher-level diagram is composed of six, seven, or eight smaller processes, and then another may in turn take the first of these processes and show its component six, seven or eight processes. Figure 4.13 illustrates this.

Figure 4.13. Explosion of Processes.

graphics/04fig13.gif

Certain rules must be followed when exploding a process:

All of the flow lines going into or out of a process at the higher level must be represented at the lower level.
Additional flows into or out of the process may be added at the lower level, but these must be marked with an "X" at the point where each new flow crosses the process boundary.
Data stores may be added within the lower-level process. In Ms. Sarson's and Mr. Gane's numbering scheme, a sequence number is added to the process number. For example, in Process 3, the first data store is numbered 3/1.
Similarly, in both techniques, lower-level processes are given a decimal number based on the parent process number (3.1, 3.2, etc.).

(Sometimes it is more convenient to simply give each process a mnemonic label, like "acbk" for "Acquire books". This is particularly true while the data flow diagram is being developed and many structural changes are being made. Once the diagram has stabilized, the processes may be relabeled with hierarchical numbers .)
A process on the higher level that communicates with the process being exploded may be shown as a process outside the exploded process boundary on the lower level, or it may be converted to an external entity which is the agent that performs the process.

Note that manipulating diagrams and their exploded children is difficult, especially while adjustments are being made to the models. For this reason, you are well-advised to do a function hierarchy first, to get the levels sorted out, before trying to connect the functions/process to the data flows.

Context Diagram

Creation of a data flow diagram usually starts by creating a context diagram . This is a diagram containing only the highest-level process and the external entities with which it communicates. The process description should not be as flowery as a mission statement. Try to make it succinct. For example, The Dinosaur Public Library's mission statement (p. 149), "Provide materials and services for the information, learning, and popular reading needs for all citizens and organizations of Dinosaur County", has been reduced in Figure 4.14 to "Provide materials and services for all in Dinosaur County". This, then, is surrounded by the external entities "Patron", "Citizen", "Book Supplier", and the "Dinosaur County" government.

Figure 4.14. Context Diagram.

graphics/04fig14.gif

The context diagram is useful in determining the environment of the analysis and the players that will be important. It is also the basis for determining the events that will affect the enterprise and the overall body of data that will be needed.

Physical Data Flow Diagrams

There are two kinds of data flow diagrams. A "physical data flow diagram" describes processes in terms of the specific physical mechanisms used to carry them out. This "as is" diagram corresponds to the Row Two perspective ("business owners ' view") of Column Two (activities) in the Architecture Framework. This shows the operation of the business in terms known to the people who do it. It is a very good way to capture what goes on and is fairly easy to do.

In an interview, you ask each person, "What reports and forms do you get?", "What do you do with them?", "What reports and forms do you produce?", and "Where do you send these?" A physical process then might be "Fill out purchase order", and data flows from this process might include "yellow copy of purchase order" to "vendor", "blue copy of purchase order" to "accounting", and so forth.

In the physical model, it is appropriate (indeed necessary) to identify who performs each activity. This can be the name either of an individual or of a department (or computer system). Indeed, sometimes the process itself may be no more specifically identified than "Sally's job". Here you have a link between Column Two's activities and Column Four's people and organizations.

This modeling exercise can have amusing results. You complete an interview and realize that you have not heard mention of a report that another worker told you was sent to this person. When you ask about that report, you will get one of the following responses:

"Oh, that's right. I forgot . When I get that I..."
"Oh, yes, I get that. But it goes straight to the dustbin. I don't do anything with it."
"What report?"

The assignment clearly is to make as complete and thorough a record as possible of all the physical data flows in the organization.

Logical ("Essential") Data Flow Diagrams

Tom DeMarco described the distinction between physical and logical data flow diagrams. All he said, however, was that you should "logicalize" the model by replacing physical items with their logical equivalents. "The underlying objectives of the current operation are divorced from the methods of carrying out those objectives" [DeMarco, 1978, p. 27].

Ms. Sarson and Mr. Gane believe that you should start with a logical model in the first place.

Neither of these was as clear as one could be as to exactly what constitutes a logical model and how to tell if you have one, let alone how to get from the physical model to the logical one. ^[2] Yes, physical mechanisms are easy enough to recognize. But how do you know if you have really captured the correct atomic-level functions at the bottom row?

^[2] One is reminded of the cartoon of some years ago, where a blackboard is shown covered by two groups of exceedingly complex equations. In the space between the two groups is the legend "Miracle occurs here".

Enter Stephen M. McMenamin and John Palmer. As mentioned earlier, in their 1984 book Essential Systems Analysis they describe a specific procedure for arriving at a set of "essential activities"the most basic, fully functional description of what an enterprise does.

As with Mr. DeMarco, Messrs. McMenamin and Palmer believe that the place to start is to remove all references to mechanisms from processes, data flows, and data stores. An essential data flow diagram must first be technologically neutral true no matter what technology might be employed to carry out the work. This assumes instantaneous response time, no matter what is being done, and infinite storage space. Processing time and storage space are the kinds of constraints that cause current physical systems to have their sometimes peculiar characteristics, so we will assume those constraints away for purpose of identifying the essential processes.

As defined earlier in the chapter, an essential activity is either a fundamental activity that performs a task that is part of the system's stated purpose, or a custodial activity that establishes and maintains the system's essential memory by acquiring and storing the information needed by the fundamental activities.

Where previous authors simply advocated going from a physical data flow diagram to a "logical" one, Messrs. McMenamin and Palmer described very specific steps for identifying fundamental activities:

Explode the data flow diagram down to the lowest (atomic) level of detail.

Richard Barker and Cliff Longman call this level that of an elementary business function (one that "when triggered must either be completed successfully, or, if for some reason it cannot be completed successfully, must 'undo' any effects that it had up to the point of failure" [Barker and Longman, 1992, p. 40]).

This corresponds to an action in UML. Messrs. Rumbaugh, Jacobson, and Booch define an action as "an executable atomic computation that results in a change in the state of the model or the return of a value" [Rumbaugh et al. 1999, p. 122].

When a physical data flow diagram is broken down to its smallest component activities (however this is defined), Messrs. McMenamin and Palmer call these activity fragments , some of which are physical mechanisms, some of which are essential.

Figure 4.15 shows an example where an information flow begins with the arrival of a shipment and ends with flows to data stores ("pending purchase order" and "card catalogue ") and flows to an external entity ("Baker and Taylor" and "Dinosaur County" government). Each of the boxes shown represents an elementary business function or essential activity fragment.

Figure 4.15. Activity Fragments.
Identify and remove nonessential fragments (mechanisms). These include:
- References to physical media Replace "Gold copy of the purchase order" with "purchase order information". Similarly, replace a "Daily fax" with separate arrows for each flow that is being faxed. Also, remove references to the tools used to perform the activities. For example, replace "Generate CRISP Report" with "Report sales data".
- Internal validity check Remove processes that unnecessarily check internal data. In an essential system, it is assumed that data passed from one component to another are correct. There is no need to add a process to check that. The energy spent coming up with that process could be spent making sure that the first component works correctly. (This does not remove the need for processes that check the validity of data coming from the outside.)
- Artificial sequence Make processes operate in parallel as much as possible. Processes may currently be executed in a sequence that is not required by the function.
- Redundant activities and data references Eliminate multiple processes that update the same data; eliminate multiple data stores that are the same data in different states.
- Transportation Eliminate all processes that say "copy", "send", and so forth.
- Buffers Eliminate data stores with no essential function. There are data stores whose sole function is to wait for the next process to catch up. That is, process A prepares data for process B, and if process B isn't working fast enough and the data must wait, they are placed in a "buffer" data store. This data store has no essential function and should be eliminated.
Figure 4.16 shows the following being renamed or eliminated:
- External entity "Baker and Taylor" becomes both "Book Supplier" and "Classification Service", reflecting the two roles the company now plays. This company has supplied books and cataloguing services in the past, but there is no reason to assume it will continue to do either or both of these things.
- The sequence implied by the connection of 3.4.1 and 3.4.2 is not required. The "Titles, quantity" data flow is removed. A new flow now goes directly from "Book Supplier" to activity 3.4.2. Checking the shipped quantities may be done before or after recording receipt of the shipment. Note that this will require a new communication back to process 3.4.1 ("Return of book") if the book is rejected.
- The activities "Prepare book cards" (activity 3.5.2), and "Forward invoice to Dinosaur County Accounting" (activity 3.4.3) are removed. These activities describe communication only.
- The data flows "shipping documentation" and "classification publications " are renamed. These are the physical manifestations of information. The information involved is "receipt of books" and "classification guidelines".
- The data store, "Card Catalogue" is renamed "Book Archive" (or it could also be "Book Index"). This is a physical repository for storing information about books. Along with that is the renaming of "book cards" to "Title and classification".
  
  This is a very good example, by the way, of how the technology has traditionally affected our view of processes. Those born before 1980 will remember that the card catalogue has always been a fundamental part of a library. It was a dramatic change to see this replaced by a computer terminal. But in fact the card catalogue and the database are but two different technologies for implementing the same data store.
Figure 4.16. Remove Physical References.
Identify all the real-world events that impinge on the operation. These are things happening in the outside world that cause the enterprise to react . They can be identified from the context data flow diagram. They might include such things as a customer inquiry, a customer order, the receipt of purchased goods, a person applying for a job, and so forth. (We are not concerned here with internal events, such as the completion of a process.)
Identify the activity fragments that are carried out in response to each external event.
Define a parent activity that encompasses these activity fragments that are the response to that event. This is a "fundamental essential activity".
Examine the data stores used by the essential activities. Are there activities that update those stores? If not, define additional "custodial" activities. For example "Manage patron changes" might be a custodial activity.
Eliminate any activities that are not fundamental or custodial.

After Step 2, above, the activity fragments left are:
- Record arrival of shipment.
- Check shipment against purchase order.
- Classify book.

The only relevant event is "Receipt of a book", a message from the Book Supplier to "Record arrival of shipment". As it happens, all three of our fragments are performed in response to that event, so we define a parent activity, "Receive each book". This, then, is the "essential activity" shown in Figure 4.17.

Figure 4.17. An Essential Activity.

graphics/04fig17.gif

Notice, by the way, that "Catalogue book" has wound up as a component of "Receive each book". This was not what we originally expected when we laid out the function hierarchy and the first draft of the data flow diagram in Figure 4.13. The process of determining essential activities has revealed something to us.

Now this would not be true if it were not the case that, as is currently understood , a book is catalogued as soon as it is received. If a book were not catalogued when received, then that activity would be outside the essential activity "Receive each book", just as we originally portrayed it.

It may be the case that a particular fragment is used by more than one essential activity. In this case, it is appropriate to break out the fragment as its own essential activity.

Note that an essential activity is the lowest level of detail at which a process (or function) can be described. Indeed it may be that these lower-level activities are the atomic ones, in terms of Barker's definition of an elementary business function or the UML's definition of an action. Those more atomic process fragments, however, that comprise an essential activity are by definition specific to the company and, at least to some extent, are bound to a technology. For example, the company could employ new bar-coding technology that would, at the same time the shipment is moved into inventory, perform the necessary checks, recording, and notification. Below the essential level, the activities are chosen by the enterprise as a mechanism for carrying out the essential business function. As a consequence, they could be changed in a new system, so they are not of concern in an analysis diagram.

Above this level, you should have described functions that the company must perform, regardless of the technology used.

Once the essential data flow diagram has been developed, it is possible to examine it to see what parts should be automated and how that automation should work.


Team-Fly

Top