10.4 Estimates for Profiles

Now we know what our measurement objectives are. We wish to know the operational, functional, and module profiles. Unfortunately, these data are known only to Nature, who is not anxious to share her knowledge of these profiles with us. We can never know the real distribution of these three things. We can, however, develop reasonable estimates for the profiles.

The focus will now shift to the problem of understanding the nature of the distribution of the probabilities for various profiles. We have thus far come to recognize these profiles in terms of their multinomial nature. This multinomial distribution is useful for representing the outcome of an experiment involving a set of mutually exclusive events. Let

where S_i is one of M mutually exclusive sets of events. Each of these events would correspond to a program executing a particular module in the total set of program modules. Further, let

Pr(S_i) = w_i

w_T = l - w_l - w₂ - ··· - w_M

where T = M + 1. In this case, w_i is the probability that the outcome of a random experiment is an element of the set S_i. If this experiment is conducted over a period of n trials, then the random variable X_i will represent the frequency of S_i outcomes. In this case, the value n represents the number of transitions from one program module to the next. Note that:

X_T = n - X₁ - X₂ - ··· - X_m

This particular distribution will be useful in the modeling of a program with a set of k modules. During a set of n program steps, each of the modules can be executed. These, of course, are mutually exclusive events. If module i is executing, then module j cannot be executing.

The multinomial distribution function with parameters n and w = (w₁,w₂,...,w_T) is given by:

where x_i represents the frequency of execution of the i^th program module.

The expected values for the x_t are given by:

the variances by:

Var(x_i) = nw_i(1 - w_i)

and the covariance by:

Cov(w_i, w_j) = -nw_iw_j, i ≠ j

We would like to come to understand, for example, the multinomial distribution of a program's execution profile while it is executing a particular functionality. The problem here is that every time a program is run, we will observe that there is some variation in the profile from one execution sample to the next. It will be difficult to estimate the parameters w = (w₁,w₂,...,w_T) for the multinomial distribution of the execution profile. Rather than estimating these parameters statically, it would be far more useful to us to get estimates of these parameters dynamically as the program is actually in operation. In essence, we would like to watch the system execute, collect data, and stop when we know that we have enough information about the profiles to satisfy our needs.

Unfortunately, for the multinomial distribution, the probabilities are the parameters of the distribution. We cannot know or measure these. We can, however, observe the frequency that each module has executed. Thus, it is in our interest to choose a probability distribution whose parameters are related to the things that we can measure.

To aid in the understanding of the nature of the true underlying multinomial distribution, let us observe that the family of Dirichlet distributions is a conjugate family for observations that have a multinomial distribution (see Reference3). The probability density function for a Dirichlet distribution, D(α,α_T), with a parametric vector α =(α₁,α₂,...α_k-1) where (α₁>0; i = 1,2, ...,k-1) is:

where (w_i > 0; i = 1,2,...,M) and . The expected values of the w_i are given by:

(1)

where . In this context, α₀ represents the total epochs. The variance of the w_i is given by:

(2)

and the covariance by:

Obtaining confidence intervals for our estimates of the parameters for the Dirichlet distribution is not a very tractable problem. To simplify the process of setting these confidence limits, let us observe that if w = (w₁,w₂,...,w_M) is a random vector having the M-variate Dirichlet distribution, D(α,α_T), then the sum z = w₁ +...+ w_M has the beta distribution:

or alternately:

where γ = α₁ + α₂ +...+αM.

Thus, we can obtain 100(1 - α) percent confidence limits for:

μ_T - a ≤ μ_T ≤ μ_T+ b

from

(3)

and

(4)

Where this computation is inconvenient, let us observe that the cumulative beta function, F_β, can also be obtained from existing tables of the cumulative binomial distribution, F_b, by making use of the knowledge from Raiffa ^[4]_, ^[5] that:

(5)

and

F_b(α_T |1 - (μ_T + b),γ + α_T) = F_β(μ_T + b|γ,α_T)

The value of using the Dirichlet conjugate family for modeling purposes is twofold. First, it permits us to estimate the probabilities of the module transitions directly from the observed module frequencies. Second, we are able to obtain revised estimates for these probabilities as the observation process progresses.

Let us now suppose that we wish to obtain better estimates of the parameters for our software system, the execution profile of which has a multinomial distribution with parameters n and W = (w₁,w₂...,w_M), where n is the total number of observed module transitions and the values of the w_l are unknown. Let us assume that the prior distribution of W is a Dirichlet distribution with a parametric vector α = (α₁,α₂,...,α_M), where (α_i>0; i = 1,2,...,M). In this case, α_i is the observed frequency of execution of module i over epochs. If we were to let the system run for an additional, say, epochs, then we would get better estimates for the parameters for the cost of the observation of these new epochs. Then the posterior distribution of W for the additional observations X = (x₁,x₂,...,x_M) is a Dirichlet distribution with parametric vector α* = (α₁ + x_l,α₂ + x₂,...,α_M + x_M) (see also Reference 5).

As an example, suppose that we now wish to measure the activity of a large software system. At each epoch (operational, functional, or module) we will increment the frequency count for the event. As the system makes sequential transitions from one event to another, the posterior distribution of W at each transition will be a Dirichlet distribution. Further, for i = 1,2....T the i^th component of the augmented parametric vector a will be increased by one unit each time a new event is expressed.

^[4]Raiffa, H. and Schlaifer, R., Applied Statistical Decision Theory, Studies in Managerial Economics, Harvard University, Boston, 1961.

^[5]DeGroot, M.H., Optimal Statistical Decision, McGraw-Hill, New York, 1970.