The Range of Possible Terms and Meanings Is Vast


There are so many terms in a traditional business system that few people will ever understand even a single domain in its entirety. A subtle problem this introduces is that designers often don't realize what they don't know. They interview users, learn some terms, and design their databases, but often they have only reviewed the tip of an iceberg. To acquire some insight into the size and complexity of that iceberg, we're going to review business language and vocabularies in detail.

Language

Language can be defined as a set of symbols and rules used to convey information. We tend to associate the broadest language groups with nations, either current or historical; so, for example, we have the English language and the Spanish language. We believe that two people who speak the same language will be able to communicate with each other. Human languages are often called natural languages to distinguish them from computer languages. To understand natural language, we'll take a look at vocabulary (the universe of words and how those words are typically used).

Vocabulary

Vocabulary is the set of symbols (words) to which a given language has ascribed shared meaning. There is little overlap in vocabulary between any two of the major natural languages. For example, there are very few words in traditional Chinese that mean the same thing in English.

Of course, not everyone in a language group has the same vocabulary. First there is the issue of subsetting. Virtually all people in a language group understand and use some small subset of the total vocabulary.

We're on slippery ground here, because there is (surprisingly) little agreement on what constitutes a "word" and therefore on how large a vocabulary is. I have used several sources to create Figure 4.1. An unabridged dictionary typically has 1 million or more entries. The Oxford Unabridged has 1.5 million entries. Abridged dictionaries typically have about 100,000 entries. Estimates vary, but the median vocabulary is thought to be around 60,000 to 70,000 words for a college graduate, and far less than half that for a student entering high school.

click to expand
Figure 4.1: Relative size of vocabularies.

The first thing to note is that somehow society works despite the fact that most of us understand between 2% and 4% of the vocabulary we supposedly share. What is not shown here is an extreme skew toward the most commonly used words, which is where we do most of our communicating. I'll use the same scales as we explore specialized vocabularies in business, but for now, file this away.

Dialect

A dialect is a regional variation of a language, distinguished often by differences in pronunciation and vocabulary. For instance, we might speak of a Cockney British dialect. As we will see, businesses (more specifically their industries) create their own unique dialects.

Idiolect

No wonder there are a million words in the English language. We have words for things that at first blush don't appear to need a word, such as "idiolect." An idiolect is your own personal dialect. We are amused by Humpty Dumpty, because at least he was up front about his own personal language (see sidebar). Your idiolect is the set of words and patterns you use in your speech and writing. I assume that this is useful when trying, for example, to figure out who really wrote Shakespeare's plays, but we will find some other interesting uses for it when we look at personally developed systems.

start example

"When I use a word, it means just what I choose it to mean—neither more nor less."

—Humpty Dumpty in Through the Looking Glass

end example

Business Vocabularies, Professional Jargon, and Job Security

Each industry has its own vocabulary and its own dialect. Each business then typically creates its own vocabulary, which is added to its industry. Professionals in their own domain often create specific terms as a shorthand for unique procedures and to keep trade knowledge secret and thereby enhance their job security. In this section we'll look at a few of the ways in which businesses create their own vocabularies, to get an idea of the challenge we have in front of us if we hope to resolve the frequent confusion and errors that unrecognized semantic differences create for the unwary analyst. The major ways these semantic differences have developed include the following:

  • Overloading of common words

  • Created words

  • Nonword identifiers

  • Double words

Overloading of Common Words (Homonyms)

The area that has the most potential for misunderstanding is the way different industries, or different parts of the same industry, use common words differently. In the case of made-up words, or nonwords, there is little chance of misunderstanding. There is nonunderstanding, to be sure, but you generally know that you don't know what is trying to be communicated. With the case of common words, though, there is frequent opportunity for believing that you understood what was being communicated, when in fact you did not.

The lexicographer calls these homonyms. We'll go over a few examples to make this clearer.

Capacity, to most of us, is how much something will hold; it is a measure of volume. For educators, though, it is the ability to retain knowledge. In the legal profession it is either your role (in your corporate capacity) or whether you are of the age of majority. For electrical engineers it is the phenomenon whereby an electric charge is stored.

A router is a device that carves rounded edges on molding for woodworkers, a person who lays out a circuit board in the electronics industry, an electronic device that dispatches packets in the data communications industry, and a person who stocks vending machines in the snack food industry.

A credit is a rebate to a customer in the retail industry, a negative balance in the banking business, an acknowledgment in the entertainment industry, a free game in the arcade business, revenue to an accountant, an arrangement for deferred payment in any industry that allows payment over time, an assessment of your "loan-worthiness" in the mortgage industry, and a source of honor and distinction in plain English.

A lot is a batch in the pharmaceutical industry, undeveloped land in the real estate business, a place of business for the movie industry, a quantity of stocks in the stockbroking business (odd lots), a method of choosing for sports teams (drawing lots), and a large number for most of us.

I've performed several random samplings of industry lexicons, as well as industry-specific documents, and found that the number of overloaded words is about 10% of the total. That is not really a lot, except for two things: In most cases they are some of the most commonly used words, and they are the most likely to lead to confusion because there is not a direct clue that you don't know these words.

Created Words

There are a wide variety of created words. Some are brand names, some borrow from foreign languages, many are acronyms, some are technical terms, some are shortened from chemical names, and so on.

Interestingly, many technical terms borrow from dead languages (Greek and Latin are the most popular), which I think is partially to avoid the overloading of common terms. "Microscope" is literally ancient Greek for "small see." We could call the device with two sets of lenses that allow us to enlarge the image of things less than a millimeter across a "small see," but not only does that sound somewhat childish, it opens up a great deal of confusion by overloading these common words.

Figure 4.2 is an example I pulled randomly from the middle of a defense industry contract. I boxed words that are not words in the English language and underlined words that are words (or more often phrases) in the English language, but whose use in this context is not the most common English definition. Most of the boxed words (acronyms in this case) are defined elsewhere in the document, but this still means that they must be learned and added to your own idiolect before you can understand the contract.

click to expand
Figure 4.2: Excerpt from a contract.

I'm also going to include in this category Latin words or phrases, as well as any non-English phrase that isn't in the common lexicon. Black's Law Dictionary has 25,000 entries. I took a random sample of 10 pages and found that one third of the entries were Latin.

Nonword Identifiers

Companies create identifiers for their products and services. Sometimes these identifiers are nonmeaningful numbers, but often the authors create identifiers that have some mnemonic intention and that often become "words" of their own within the company. Laws and statutes are also identified by "nonword words" (Figure 4.3).

click to expand
Figure 4.3: Excerpt from a legal opinion.

The nonword words in Figure 4.3 are boxed.

A more prosaic example would be "Miller Brewing Company ordered another 6 tons of C545" as a sensible statement in a company that has a product called "C545" that might be purchased in ton increments and has enough of a relationship with a company (or person) they call "Miller" to be on a first-name basis.

"We'll process the return as soon as we get a 1040" is a phrase that most of us understand, but only because of the pervasiveness of the IRS.

Compounds

One of the trickiest areas is compounds, in which two or more words are combined to create a new meaning.[11] Similar to overloading of common words, these can create the illusion of understanding where there is none.

For example, knowing what the word "world" means and what the word "series" means doesn't necessarily mean we know what the World Series is. We might even think it was a sequence of planets.

"Brown shoe" is knowable if you understand the two words that make up the phrase. On the other hand, "white shoe" might mean a shoe that is colored white, or a prestigious investment banking firm.

The problem, both for humans and for systems attempting to interpret language, is determining when a phrase is really a word—or, more correctly, determining when several words together stand for one semantic thought.

Although the examples in Table 4.1 may seem contrived, I did a randomized sample in three technical dictionaries and found that over half of all the entries were compound words. On the other hand, only about 20% of my abridged dictionary consists of compound words, and nearly half of those are people and place names.

Table 4.1: SOME SAMPLE COMPOUND WORDS.

Phrase

Apparent Meaning

Actual Meaning

Claim check

Review of a claim

A piece of paper that lets you retrieve your property

Conduit theory

Plumbing principles

Justification for why mutual funds are not taxed on their profit

Passive exercise

Oxymoron

A physical therapist moves your muscles for you

Fill or kill

Sounds military

Stock order that must be traded immediately or it is withdrawn

Hot dog bun

Rear quarter of a warm canine

Bread roll for sausage

This has several implications for business semantics. Most important, there is no such thing as one word/one meaning. There is so little overlap between industry and business that the key issue is establishing and maintaining context for all our communications.

An Estimate of the Amount of Overlap between Industry and Business

It is truly mind boggling how many specialized terms there are. Let's start with a few statistics and see what we can determine.

  • Retail (grocery and other)—A grocery store has 50,000 stock-keeping unit (SKUs). (SKUs indicate the level of detail at which inventory is kept. For example, Coca-Cola in a 16-oz bottle is a different SKU than Coca-Cola in a 26-oz bottle.) Most products come in two sizes, so there are really 25,000 products. Most items have two brands that provide virtually identical products, bringing the total down to 12,500. Assume there are 5,000 distinct products in a grocery store. Clothing stores typically have 5,000 to 10,000 SKUs, but given that many of the SKUs are for size differences, there are probably fewer than 2,000 names to learn. In either environment, what this means is that new hires will have to expand their vocabulary by 10% to 20% just to go to work.

  • Any industrial manufacturer—Most manufacturers of goods produce thousands of different products, which they of course name (usually with numbers, but the best-selling ones get "real" names). Most manufacturers also have an explosion of part numbers and names to learn about their suppliers' parts, and for all intents and purposes their customers become "words" in their vocabulary ("Is the Budweiser order ready yet?" "Will Crampton's be late?" etc). In addition, each subindustry has a vocabulary of several thousand business terms that must be relearned because they have been overloaded.

  • Legal—As mentioned previously, Black's Law Dictionary has 25,000 entries. Some of these are English words with different meanings, but Black's Law Dictionary does not repeat English words if the legal definition is the same as the common definition. (Incidentally, for hundreds of years dictionaries included only "difficult" words; it was assumed that the reader knew the definition of common words.) My guess, from dealing with lawyers, is that most lawyers know at least a majority of the terms in such a book. Let's give them 15,000 and realize that they spent a great deal of their time at law school adding legal terms to their vocabularies.

  • Medical industry—I believe the medical field is the winner in this sweepstakes. There are 50,000 drug names and about as many "procedure codes" (shorthand for what a physician does to a patient). SNOMED (a lexicon of pathology and disease, which has been expanded to cover procedures) covers 137,000 clinical terms. The granddaddy of them all (that I'm aware of) is the Unified Medical Language System (UMLS), weighing in at 776,940 concepts, with a total of 2.1 million concept names (UMLS indexes many foreign languages). I'm continually amazed at what a relatively high proportion of these terms many clinicians know. Let's give them 50,000 and conclude that half of their vocabulary consists of medical terms.

How Do These Vocabularies Relate?

Unlike the earlier examples we considered, which were primarily subset/superset relationships between the vocabularies, these are distinct enough that they are not subsets of each other. This creates all sorts of problems for ontologists and language interpreters. We'll discuss this situation later, but for now we'll chalk it up as part of doing business: Each industry has its own dialect, and each business has its own idiolect.

The dark circles in Figure 4.4 represent the specialized parts of each domain's respective vocabularies. They clearly share a lot of the high school freshman's vocabulary, which they don't include in their specialized dictionaries.

click to expand
Figure 4.4: Two industries' vocabularies, barely overlapping.

[11]For a more complete treatment of this subject, see Barbara Rosario, "Classification of the Semantic Relations in Noun Compounds." Available at http://www.sims.berkeley.edu/~rosario/projects/NC_ling181.pdf.




Semantics in Business Systems(c) The Savvy Manager's Guide
Semantics in Business Systems: The Savvy Managers Guide (The Savvy Managers Guides)
ISBN: 1558609172
EAN: 2147483647
Year: 2005
Pages: 184
Authors: Dave McComb

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net