Statistics | Six Sigma Fundamentals: A Complete Introduction to the System, Methods, and Tools

The word statistics came into the English language from Greek, Latin and German and ultimately is understood to mean standing, status, state and even understand. In the minds of most people, the meaning of statistics has a lot in common with these related words, meaning roughly a description of how things are. It is, of course, true that a part of the theory of statistics concerns effective ways of summarizing and communicating masses of information that describe a situation. This part of the overall theory and set of methods is usually known as descriptive statistics.

Although descriptive statistics form an important basis for dealing with data, a major part of the theory of statistics is concerned with another question: How does one go beyond a given set of data, and make general statements about the large body of potential observations, of which the data collected represents only a sample? This is the theory of inferential statistics, with which six sigma methodology is mainly concerned.

Applications of inferential statistics occur in virtually all fields of research endeavor—the physical sciences, the biological sciences, the social sciences, engineering, market and consumer research, quality control in industry, and so on, almost without end. Although the actual methods differ somewhat in the different fields, the applications all rest on the same general theory of statistics. By examining what the fields have in common in their applications of statistics, we can gain a picture of the basic problem studied in mathematical statistics. The major applications of statistics in any field all rest on the possibility of repeated observations or experiments made under essentially the same conditions. That is, either the researcher actually can observe the same process repeated many times, as in industrial quality control, or there is the conceptual possibility of repeated observation, as in a scientific experiment that might, in principle, be repeated under identical conditions. However, in any circumstance where repeated observations are made, even though every precaution is taken to make conditions exactly the same, the results of observations will vary, or tend to be different, from trial to trial. The experimenter or researcher has control over some, but not all, of the factors that make outcomes of observations differ from each other.

When observations are made under the same conditions in one or more respects, but they give outcomes differing in other ways, then there is some uncertainty connected with the observation of any given object or phenomenon. Even though some things are known to be true about an object in advance of the observation, the experimenter cannot predict with complete certainty what its other characteristics will be. Given enough repeated observations of the same object, or kind of object, a good bet may be formulated about what the other characteristics are likely to be, but one cannot be completely sure of the status of any given object.

This fact leads us to the central problem of inferential statistics: a theory about uncertainty, the tendency of outcomes to vary when repeated observations are made under identical conditions. Granted that certain conditions are fulfilled, theoretical statistics permits deductions about the likelihood of the various possible outcomes of observation. The essential concepts in statistics derive from the theory of probability, and the deductions made within the theory of statistics are, by and large, statements about the probability of particular kinds of outcomes, given that initial, mathematical conditions are met.

Mathematical statistics is a formal mathematical system. Any mathematical system consists of these basic parts:

A collection of undefined things or elements, considered only as abstract entities.
A set of undefined operations or possible relations among the abstract elements.
A set of postulates and definitions, each asserting that some specific relation holds among the various elements, the various operations, or both.

In any mathematical system, the application of logic to combinations of hypotheses and definitions leads to new statements, or theorems, about the undefined elements of the system. Given that the original hypotheses and definitions are true, then the new statements must be true. Mathematical systems are purely abstract, and essentially undefined, deductive structures. In other words, they are not really about anything in particular. They are systems of statements about things having the formal properties given by the hypotheses. No one may know what the original mathematician really had in mind with regard to these abstract elements. Indeed, they may represent absolutely nothing that exists in real world experience, and the sole concern may be what one can derive about the other necessary relations among abstract elements given particular sets of hypotheses. To clarify—statistics make sense only when defined from particular sets of hypotheses and not all hypotheses are derived from real world experience. It is perfectly true, of course, that many mathematical systems originated from attempts to describe real objects or phenomena and their interrelationships: historically, the abstract systems of geometry, school algebra and calculus grew out of problems where something very practical and concrete was in the back of the mathematician's mind. However, these systems deal with completely abstract entities.

When a mathematical system is interpreted in terms of real objects or events, then the system is said to be a mathematical model for those objects or events. Somewhat more precisely, the undefined terms in the mathematical system are identified with particular, relevant properties of objects or events. Thus, in applications of arithmetic, the number symbols are identified with magnitudes or amounts of some particular property that objects possess, such as weight, or extent, or numerousness. The system of arithmetic need not apply to other characteristics of the same objects, for example, to their colors. Once this identification can be made between the mathematical system and the relevant properties of objects, then anything that is a logical consequence in the system is a true statement about objects in the model, provided, of course, that the formal characteristics of the system actually parallel the real characteristics of objects in terms of the particular properties considered. In short, to be useful as a mathematical model, a mathematical system must have a formal structure that fits at least one aspect of a real situation. This is a very important characteristic and useful in predicting behavior and/or results in a given situation.

Probability theory and statistics are each both mathematical systems and mathematical models. Probability theory deals with elements called events, which are completely abstract. Furthermore, these abstract elements are paired with numbers called probabilities. The theory itself is the system of logical relations among these essentially undefined things. The experimenter uses this abstract system as a mathematical model: the experiment produces a real outcome, which is called an event, and the model of probability theory provides a value, which is interpreted as the relative frequency of occurrence for that outcome. If the requirements of the model are met, this is a true, and perhaps useful, result. If the experiment really does not fit the requirements of probability theory as a system, then the statement made about the actual result need not be true. This point must not be overstressed, however. We will find that often a statistical method can yield practical, useful results, even when its requirements are not fully satisfied. Much of the art in applying statistical methods lies in understanding when and how this is true.

Mathematical systems, such as probability theory and the theory of statistics are, by their very nature, deductive. Formal assertions are postulated as true, and then by logical argument true conclusions are reached. All well-developed theories have this formal, logic-deductive character.

On the other hand, the problem of the empirical scientist is essentially different from that of the logician or mathematician. Scientists search for general relations among events; these general relations are those that can be expected to hold whenever the appropriate set of circumstances exists. The very name empirical science asserts that these laws shall be discovered and verified by the actual observation of what happens in the real world of experience. However, no mortal scientist ever observes all the phenomena about which a generalization must be made. Scientific conclusions about what would happen for all of a certain class of phenomena always come from observations of only a very few particular cases of that phenomenon.

The reader acquainted with logic will recognize that this is a problem of induction. The rules of logical deduction are rules for arriving at true consequences from true premises. Scientific theories are, for the most part, systems of deductions from basic principles held to be true. If the basic principles are true, then the deductions must be true. However, how does one go about arriving at and checking the truth of the initial propositions? The answer is, for an empirical science, observation and inductive generalization—going from what is true of some observations to a statement that this is true for all possible observations made under the same conditions. Any empirical science begins with observation and generalization.

Furthermore, even after deductive theories exist in science, experimentation is used to check on the truth of these theories. Observations that contradict deductions made within the theory are prima facie evidence against the truth of the theory itself. Yet, how does the experimenter or scientist know that the results are not an accident, the product of some chance variation in procedure or conditions over which there is no control? Would the result be the same in the long run if the experiment could be repeated many times?

It takes only a little imagination to see that this process of going from the specific to the general is a very risky one. Each observation the experimenter or scientist makes is different in some way from the next. Innumerable influences are at work altering—sometimes minutely, sometimes radically—the similarities and differences the experimenter or scientist observes among events. Controlled experimentation in any science is an attempt to minimize at least part of the accidental variation or error in observation. Precise techniques of measurement are aids to scientists in sharpening their own rather dull powers of observation and comparison among events. So-called exact sciences, such as physics and chemistry, have thus been able to remove a substantial amount of the unwanted variation among observations from time to time, place to place and observer to observer, and hence are often able to make general statements about physical phenomena with great assurance from the observation of quite limited numbers of events. Observations in these sciences can often be made in such a way that the generality of conclusions is not a major point at issue. Here, there is relatively little reliance on probability and statistics. However, as even these scientists delve into the molecular, atomic and subatomic domain, negligible differences turn into enormous, unpredictable occurrences and statistical theories become an important adjunct to their work.

In the biological, behavioral, and social sciences, however, the situation is radically different. In these sciences the variations between observations are not subject to the precise experimental controls that are possible in the physical sciences. Refined measurement techniques have not reached the stage of development that they have attained in physics and chemistry.

Consequently, the drawing of general conclusions is a much more dangerous business in these fields, where the sources of variability among living things are extremely difficulty to identify, measure and control. Yet the aim of the social or biological scientist is precisely the same as that of the physical scientist—to arrive at general statements about the phenomena under study. Faced with only a limited number of observations or with an experiment that can be conducted only once, the scientist can reach general conclusions only in the form of a "bet" about what the true, long-run situation actually is like. Given only sample evidence, the experimenter or scientist is always unsure of the accuracy of any assertion made about the true state of affairs. The theory of statistics provides ways to assess this uncertainty and to calculate the probability of being wrong through deciding in a particular way. Provided that the experimenter can make some assumptions about what is true, then the deductive theory of statistics tells us how likely particular results should be. Armed with this information, the experimenter is in a better position to decide what to say about the true situation. Regardless of what one decides from evidence, it could be wrong; but deductive statistical theory can at least determine the probabilities of error in a particular decision.

In recent years, a branch of mathematics has been developed around this problem of decision-making under uncertain conditions. This is sometimes called statistical decision theory. One of the main problems treated in decision theory is the choice of a decision rule, or deciding how to decide from evidence. Decision theory evaluates rules for deciding from evidence in the light of what the decision-maker wants to accomplish. While it is true that mathematics can tell us wise ways to decide how to decide under some circumstances, mathematics can never tell the experimenter how a decision must be reached in any particular situation. The theory of statistics supplies one very important piece of information to the experimenter: the probability of sample results given certain conditions. Decision theory can supply another: optimal ways of using this and other information to accomplish certain ends. Nevertheless, neither theory tells the experimenter exactly how to decide—how to make the inductive leap from observation to what is true in general. This is the experimenter's problem, and the answer must be sought outside of deductive mathematics, and in the light of what the experimenter is trying to do.

Furthermore, a true revolution has occurred in the past two decades, deeply affecting the application and the teaching of statistical methodology. This has been brought about by the new generations of computers, which are faster, more flexible and cheaper to use than anyone would have dreamed only a few years back. Large-scale statistical analysis is now done by computer in almost all research settings.

There is hardly any area in which the impact of statistics has been felt more strongly than in both research and business activities. Indeed, it would be hard to overestimate the contributions statistical methods have made to the effective planning and control of all types of business activities. In the past 25 to 30 years the application of statistical methods has brought about drastic changes in all the major areas of business management: general management, research and development, finance, production, sales, advertising, etc. Of course, not all problems in these areas are of a statistical nature, but the list of those that can be treated either partly or entirely by statistical methods, is very long. To illustrate, let us mention but a few which might face a large manufacturer.

In the general management area, for example, where long-range planning is of great concern, population trends must be forecast and their effects on consumer markets must be analyzed. In research and engineering, costs must be estimated for various projects, and manpower, skill, equipment and time requirements must be anticipated. In the area of finance, the profit potential of capital investments must be determined, overall financial requirements must be projected and capital markets must be studied so that sound long-range financing and investment plans can be developed. Although we cannot illustrate in this book how specific statistical tools are actually used in these areas of application, let us point out that they are all excellent candidates for six sigma application.

In production, problems of a statistical nature arise in connection with plant layout and structure, size and location, inventory, production scheduling and control, maintenance, traffic and materials handling, quality assurance, etc. Enormous strides have been made in recent years in the application of statistics to sampling inspection and quality assurance and control. In the area of sales, many problems require statistical solutions. For instance, sales must be forecast for both present and new products for existing as well as new markets, channels of distribution must be determined and requirements for sales forces must be estimated. Building a successful advertising campaign is also a troublesome task; budgets must be determined, allocations must be made to various media and the effectiveness of the campaign must be measured (or predicted) by means of survey samples of public response and other statistical techniques.

So far we have been speaking of problems of a statistical nature that might typically be encountered by a large manufacturer. However, similar problems are faced, say, by a large railroad trying to make the best use of its thousands of freight cars, by a large rancher trying to decide how to feed his cattle so that nutritional needs will be met at the lowest possible cost or by an investment company trying to decide which stocks and bonds to include in its portfolio.

It is not at all necessary to refer to large organizations to find business applications of statistics. For smaller businesses, problems usually differ more in degree than in kind from those of their large competitors. Neither the largest supermarket nor the smallest neighborhood grocery store, for example, has unlimited capital or shelf space, and neither can afford to tie these two assets up in the wrong goods. The problem of utilizing capital and shelf space most effectively is as real for the small store as for the large, and it is extremely shortsighted to think that modern management tools (including modern statistical techniques) are of value only to big business. In fact, they could hardly be needed more anywhere else than in small business, where each year thousands of operating units fail and many of the thousands of new units entering the field are destined to fail because of inadequate capital, overextended credit, overloading with the wrong stock, and generally speaking, no knowledge of the market or the competition.

The intention of this book is not to introduce the reader to specific tools of statistical analysis. It is, however, to acquaint and sensitize the reader to the concepts and opportunities that statistical methods may provide to solve real business problems. Furthermore, it is the intention of this section to make sure that the reader understands that the formal notions of statistics as a way of making rational decisions ought to be part of any thoughtful person's equipment. After all, business managers are not the only ones who must make decisions involving uncertainties and risks; everyone has to make decisions of this sort professionally or as part of everyday life. It is true that many of the choices we have to make entail only matters of taste and personal preference, in which case there is, of course, no question of a decision being right or wrong. On the other hand, many choices we have to make between alternatives can be wrong in the sense that there is the possibility of an actual loss or penalty of some sort involved—possibly only a minor annoyance, perhaps something as serious as loss of life, or anything in between these two extremes. The methods of modern statistics deal with problems of this kind and they do so not only in business, industry, and in the world of everyday life, but also in such fields as medicine, physics, chemistry, agriculture, economics, psychology, government and education, to name a few.

There are many kinds of statistical analyses and presentations are beyond the scope of this book. However, although specialized techniques exist for handling particular kinds of problems, the underlying principles and ideas are identical, regardless of the field of application. The next section, however, reviews some of the most simple and common tools that the practitioner of the six sigma methodology may want to use.