Section 6.1. THE BASICS OF INFORMATION GRAPHICS

6.1. THE BASICS OF INFORMATION GRAPHICS

"Information graphics" simply means data presented visually, with the goal of imparting knowledge to the user. I'm including tables and tree views in that description because they are inherently visual, even though they're constructed primarily from text instead of lines and polygons. Other familiar static information graphics includes maps, flowcharts, bar plots, and diagrams of real-world objects.

But we're dealing with computers, not paper. You can make almost any good static design better with interactivity. Interactive tools let the user hide and show information as she needs it, and they put the user in the "driver's seat" as she chooses how to view and explore it.

Even the mere act of manipulating and rearranging the data in an interactive graphic has valuethe user becomes a participant in the discovery process, not just a passive observer. This can be invaluable. The user may not produce the world's best-designed plot or table, but the process of manipulating that plot or table puts her face-to-face with aspects of data that she may never have noticed on paper.

Ultimately, the user's goal in using information graphics is to learn something. But the designer needs to understand what the user needs to learn. The user might look for something very specific, like a particular street on a map, in which case she needs to be able to find itsay by searching directly, or by filtering out extraneous information. She needs to get a "big picture" only to the extent necessary to reach that specific data point. The abilities to search, filter, and zero in on details are critical.

On the other hand, she might try to learn something less concrete. She might look at a map to grasp the layout of a city, rather than to find a specific address. Or she may be a scientist visualizing a biochemical process, trying to understand how it works. Now overviews are important; she needs to see how the parts interconnect into the whole. She may want to zoom in, zoom back out again, look at the details occasionally, and compare one view of the data to another.

Good interactive information graphics offer users answers to these questions:

How is this data organized?
What's related to what?
How can I explore this data?
Can I rearrange this data to see it differently?
Show me only what I need to know.
What are the specific data values?

In these sections, remember that the term "information graphics" is a very big umbrella. It covers plots, graphs, maps, tables, trees, timelines, and diagrams of all sorts. The data can be huge and multilayered, or small and focused. Many of these techniques apply surprisingly well to graphic types that you wouldn't expect.

Before describing the patterns themselves, let's set the stage by discussing some of the questions posed above.

6.1.1. ORGANIZATIONAL MODELS: HOW IS THIS DATA ORGANIZED?

The first thing a user sees in any information visualization is the shape you've chosen for the data. Ideally, the data itself has an inherent structure that suggests this shape to you. Which of these models fits your data best?

Table 6-1.
Model	Diagram	Common graphics
Linear		List or single-variable plot
Tabular		Spreadsheet, multi-column list, Sortable Table, Multi-Y Plot, or other multi-variable plots
Hierarchical		Tree, Cascaded Lists, Tree Table, Treemap, or directed graph
Network (or organic)		Directed graph or flowchart
Geographic (or spatial)		Map or schematic
Other		Plots of various sorts, such as parallel coordinate plots, or Treemaps

Try these models against the data you try to show. If two or more might fit, consider which ones play up which aspects of your data. If your data could be both geographic and tabular, for instance, showing it as only a table may obscure its geographic naturea viewer may miss interesting features or relationships in the data if you do not show it as a map too.

6.1.2. PREATTENTIVE VARIABLES: WHAT'S RELATED TO WHAT?

The organizational model you choose tells the user a lot about the shape of the data. Part of this message operates at a subconscious level; people recognize trees, tables, and maps, and they immediately jump to conclusions about the underlying data before they even start to think consciously about it. But it's not just the shape that does this. The look of the individual data elements also works at a subconscious level in the user's mind: things that look alike must be associated with one another.

If you've read Chapter 4, that should sound familiaryou already know about the Gestalt principles. (If you jumped ahead in the book, this might be a good time to go back and read the introduction to Chapter 4.) Most of those principles, especially similarity and continuity, will come into play here, too. I'll tell you a little more about how they seem to work.

Certain visual features operate "preattentively:" they convey information before the viewer pays conscious attention. Look at Figure 6-1 and find the blue objects.

Figure 6-1. Find the blue objects

I'm guessing that you can do that pretty quickly. Now look at Figure 6-2 and do the same.

Figure 6-2. Again

You did that pretty quickly too, right? In fact, it doesn't matter how many red objects there are; the amount of time it takes you to find the blue ones is constant! You might think it should be linear with the total number of objectsorder-N time, in algorithmic termsbut it's not. Color operates at a primitive cognitive level. Your visual system does the hard work for you, and it seems to work in a "massively parallel" fashion.

On the other hand, visually monotonous text forces you to read the values and think about them. Figure 6-3 shows exactly the same problem with numbers instead of colors. How fast can you find the numbers that are greater than 1?

Figure 6-3. Find the values greater than one

When dealing with text like this, your "search time" really is linear with the number of items. What if we still used text, but made the target numbers physically larger than the others, as in Figure 6-4?

Figure 6-4. Again

Now we're back to constant time again. Size is, in fact, another preattentive variable. The fact that the larger numbers protrude into their right margins also helps you find themalignment is yet another preattentive variable.

Figure 6-5 shows many known preattentive variables.

Figure 6-5. Eight preattentive variables

This concept has profound implications for text-based information graphics, like the table of numbers in Figure 6-3. If you want some data points to stand out from the others, you have to make them look different by varying their color, size, or some other preattentive variable. More generally, you can use these variables to differentiate classes or dimensions of data on any kind of information graphic. This is sometimes called "encoding."

When you have to plot a multidimensional data set, you can use several different visual variables to encode all those dimensions in a single static display. Consider the scatter plot shown in Figure 6-6. Position is used along the X and Y axes; color hue encodes a third variable. The shape of the scatter markers could encode yet a fourth variable, but in this case, shape is redundant with color hue. The redundant encoding helps a user visually separate the three data groups.

Figure 6-6. Encoding three variables in a scatter plot

Encoding via preattentive factors relates to a general graphic design concept called "layering." When you look at well-designed graphics of any sort, you perceive different classes of information on the page. Preattentive factors like color cause some of them to "pop out" of the page, and similarity causes you to see them as connected to one another, as if each were on a transparent layer over the base graphic. It's an extremely effective way of segmenting dataeach layer is simpler than the whole graphic, and the viewer can study each in turn, but relationships among the whole are preserved and emphasized.

6.1.3. NAVIGATION AND BROWSING: HOW CAN I EXPLORE THIS DATA?

A user's first investigation of an interactive data graphic may be browsingjust looking around to see what's there. He also may navigate through it to find some specific thing he's seeking. Filtering and searching can serve that purpose too, but navigation through the "virtual space" of a dataset often is better. Spatial Memory (Chapter 1) kicks in, and the user can see points of interest in context with the rest of the data.

A famous mantra in the information visualization field is: "focus plus context." A good visualization should permit a user to focus on a point of interest, while simultaneously showing enough material around that point of interest to give the user a sense of where it is in the big picture.

Here are some common techniques for navigation and browsing:

Scroll and pan

If the data display won't fit onscreen at once, you could put it in a scrolled window, giving the user easy and familiar access to the offscreen portions. Scrollbars are familiar to almost everyone, and are easy to use. However, some displays are too big, or their size is indeterminate (thus making scrollbars inaccurate), or they have data beyond the visible window that you need to retrieve or recalculate (thus making scrollbars too slow to respond). Instead of using scrollbars in those cases, try setting up buttons that the user has to click to retrieve the next screenful of data; think about how MapQuest or MapBlast works. Other applications do panning instead, in which the cursor "grabs" the information graphic and drags it until the point of interest is found, as in Google Maps.

These techniques are appropriate for different situations, but the basic idea is the same: to interactively move the visible part of the graphic. Sometimes Overview Plus Detail can help the user stay oriented. A small view of the whole graphic can be shown with an indicator rectangle displaying the visible "viewport;" the user might pan by dragging that rectangle, in addition to scrollbars or whatever else is used.

Zoom

Zooming changes the scale of the viewed section, whereas scrolling changes the location. When you present a data-dense map or graph, consider offering the user the ability to zoom in on points of interest. It means you don't have to pack every single data detail into the full viewif you have lots of labels, or very tiny features (especially on maps), it may be impossible anyway. As the user zooms in, those features can emerge when they have enough space.

Most zooms are implemented with a mouse click or button press, and the whole viewing area changes scale at once. But that's not the only way to zoom. Some applications create nonlinear distortions of the information graphic as the user moves the mouse pointer over the graphic: whatever is under the pointer zooms, but the stuff far away from the pointer doesn't change scale. See the Local Zooming pattern for more information.

Open and close points of interest

Tree views typically let users open and close parent items at will, so they can inspect the contents of those items. Some hierarchically structured diagrams and graphs also give users the chance to open and close parts of the diagram "in place," without having to open a new window or go to a new screen. With these devices, the user can explore containment or parent/child relationships easily, without leaving that window. The Cascading Lists pattern describes another effective way to explore a hierarchy; it works entirely through single-click opening and closing of items.

Drill down into points of interest

Some information graphics just present a "top level" of information. A user might click or double-click on a map to see information about the city she just clicked on, or she might click on key points in a diagram to see subdiagrams. This "drilling down" might reuse the same window, use a separate panel on the same window, or bring up a new one (see Chapter 2 for a discussion of window mechanics). This technique resembles opening and closing points of interest, except that the viewing occurs separately from the graphic, and is not integrated into it.

If you also provide a search facility for an interactive information graphic, consider linking the search results to whatever technique listed previously that you use. In other words, when a user searches for the city of Sydney on a map, show the map zooming and/or panning to that point. Then the search user gets some of the benefits of context and spatial memory.

6.1.4. SORTING AND REARRANGEMENT: CAN I REARRANGE THIS DATA TO SEE IT DIFFERENTLY?

Sometimes just rearranging an information graphic can reveal unexpected relationships. Look at the following graphic, taken from the National Cancer Institute's online mortality charts (Figure 6-7). It shows the number of deaths from lung cancer in the state of Texas. The major metropolitan regions in Texas are arranged alphabeticallynot an unreasonable default order if you look up specific cities, but as presented, the data doesn't prompt many interesting questions. It's not clear why Abilene, Alice, Amarillo, and Austin all seem to have similar numbers, for instance; it may just be chance.

Figure 6-7. From http://cancer.gov/atlasplus/, sorted alphabetically

But this web application lets you reorder the data into numerically descending order, as in Figure 6-8. Suddenly the graph becomes much more interesting. Galveston is ranked firstwhy is that, when its neighbor, Houston, is further down the scale? What's special about Galveston? (Okay, you needed to know something about Texas geography to ask these questions, but you get my point.) Likewise, why the difference between neighbors Dallas and Fort Worth? And apparently, the Mexico-bordering southern cities of El Paso, Brownsville, and Laredo have less lung cancer than the rest of Texas; why might that be?

Figure 6-8. The same chart, sorted numerically

People who can interact with data graphics this way have more opportunities to learn from the graphic. Sorting and rearranging puts different data points next to each other, thus letting users make different kinds of comparisonsit's far easier to compare neighbors than widely scattered points. And users tend to zero in on the extreme ends of scales, as I did in this example.

How else can you apply this concept? The pattern Sortable Table talks about one obvious way: when you have a many-columned table, users might want to sort the rows according to their choice of column. This pattern is pretty common.

(Many table implementations also permit rearrangement of the columns themselves, by dragging.) Trees can have their children reordered. Diagrams and connected graphs might allow spatial repositioning of their elements, while retaining their connectivity. Use your imagination!

Consider these methods of sorting and rearranging:

Alphabetically
Numerically
By date or time
By physical location
By category or tag
Popularityheavily versus lightly used
User-designed arrangement
Completely random (you never know what you might see)

For a subtle example, look at Figure 6-9. Bar charts that show multiple data values on each bar ("stacked" bar charts) might also be amenable to rearrangingthe bar segments nearest the baseline are the easiest to evaluate and compare, so you might want to let users determine which variable is next to the baseline.

The light blue variable in this example might be the same height from bar to bar. Does it vary, and how? Which light blue bars are the tallest? You really can't tell until you move that data series to the baselinethat transformation lines up the bases of all blue rectangles. Now a visual comparison is easy: light-blue bars 6 and 12 are the tallest, and the variation seems loosely correlated to the overall bar heights.

Figure 6-9. Rearrangement of a stacked bar chart

6.1.5. SEARCHING AND FILTERING: SHOW ME ONLY WHAT I NEED TO KNOW.

Sometimes you don't want to see an entire dataset at once. You might start with the whole thing, and then narrow it down to what you needfiltering. Or, you might build up a subset of the data via searching or querying. Most users won't even distinguish between filtering and querying (though there's a big difference from, say, a database's point of view). Whatever term you use, the user's intent is the same: to zero in on whatever part of the data is of interest, and get rid of the rest.

The simplest filtering and querying techniques offer users a choice of which aspects of the data to view. Checkboxes and other one-click controls turn parts of the interactive graphic on and off. A table might show some columns and not others, per the user's choice; a map might show only the points of interest (e.g., restaurants) that the user selects. The Dynamic Queries pattern, which can offer rich interaction, is a logical extension of simple filter controls like these.

Sometimes simply highlighting a subset of the data, rather than hiding or removing the rest, is sufficient. That way a user can see that subset in context with the rest of the data. Interactively, you can highlight with simple controls, as described earlier. The Data Brushing pattern describes a variation of data highlighting; data brushing highlights the same data in several data graphics at once.

Look at Figure 6-10. This interactive ski-trail map can show four categories of trails, coded by symbol, plus other features like ski lifts and base lodges. When everything is "turned on" at once, it's so crowded that it's hard to read anything! But users can click on the trail symbols, as shown, to turn the data "layers" on and off. The first screenshot shows no highlighted trails; the second switches on the trails rated black-diamond with a single click.

Figure 6-10. From http://www.sundayriver.com/trailmap.htm

Searching mechanisms vary heavily from one type of graphic to another. A table or tree should permit textual searches, of course; a map should offer searches on addresses and other physical locations; numeric charts and plots might let users search for specific data values or ranges of values. What are your users interested in searching on?

When the search is done, and results obtained, you might set up the interface to see the results in context, on the graphicyou could scroll the table or map so that the searched-for item is in the middle of the viewport, for instance. Seeing the results in context with the rest of the data helps the user understand the results better. The Jump to Item pattern is a common way to search and scroll in one step.

The best filtering and querying interfaces are:

Highly interactive: They respond as quickly as possible to the user's searching and filtering. Implementing this admittedly isn't easy for web applications and other interfaces that need to get data from across a network.
Iterative: They let a user refine the search, query, or filter until she gets the desired results. They also might combine these operations: a user might do a search, get a screenful of results, and then filter those results down to what she wants.
Contextual: They show results in context with surrounding data to make it easier for a user to understand where they are in a data space. This is also true for other kinds of searches; the best web search engines show a keyword embedded in a sentence, or an image embedded in its web page.

6.1.6. THE ACTUAL DATA: WHAT ARE THE SPECIFIC DATA VALUES?

Several common techniques help a viewer get specific values out of an information graphic. Know your audienceif they're only interested in getting a qualitative sense of the data, then there's no need for you to spend large amounts of time or pixels labeling every little thing. But some actual numbers or text usually are necessary.

Since these techniques all involve text, don't forget the graphic design principles that make text look good: readable fonts, appropriate font size (not too big, not too small), proper visual separation between unrelated text items, alignment of related items, no heavy-bordered boxes, and no unnecessary obscuring of data.

Labels: Many information graphics put labels directly on the graphic, such as town names on a map. Labels also can identify the values of symbols on a scatter plot, bars on a bar graph, and other things that might normally force the user to depend on axes or legends. Labels are easier to use. They communicate data values precisely and unambiguously (when placed correctly), and they're located in or beside the data point of interestno going back and forth between the data point and a legend. The downside is that they clutter up a graphic when overused, so be careful.
Legends: When you use color, texture, linestyle, symbol, or size on an information graphic to represent values (or categories or value ranges), the legend shows the user what represents what. You should place the legend on the same page as the graphic itself so the user's eyes don't need to travel far between the data and the legend.
Axes, rulers, scales, and timelines: Whenever position represents data, as it does on plots and maps (but not on most diagrams), then these techniques tell the user what values those positions represent. They are reference lines or curves on which reference values are marked. The user has to draw an imaginary line from the point of interest to the axis, and maybe interpolate to find the right number. This situation is more of a burden on the user than direct labeling. But labeling clutters things when the data is dense, and many users don't need to derive precise values from graphics; they just want a more general sense of the values involved. For those situations, axes are appropriate.
Datatips: This chapter describes the Datatips pattern. Datatips, which are tooltips that show data values when the user hovers over a point of interest, have the physical proximity advantages of labels without the clutter. They work only in interactive graphics, though.
Data brushing: A technique called "data brushing" lets users select a subset of the data in the information graphic and see how that data fits into other contexts. You usually use it with two or more information graphics; for instance, selecting some outliers in a scatter plot highlights those same data points in a table showing the same data. For more information, see the Data Brushing pattern in this chapter.