Statistical Methods of the New Science | Play to Your Strengths(c) Managing Your Internal Labor Markets for Lasting Compe[. .. ]ntage

Various forms of multivariate regression analysis are deployed in the core diagnostic tools discussed in this book, including Internal Labor Market (ILM) Analysis and Business Impact Modeling. In this section we provide some illustrations of the actual statistical models used.

Internal Labor Market Analysis

ILM Analysis is used to measure and model the causes and consequences of an organization’s workforce dynamics—attraction, development, and retention—as well as the rewards that motivate them. Statistical modeling is a critical tool for identifying the causes and consequences of internal labor market dynamics. Because in our view the internal labor market functions as a system, with the various workforce dynamics continuously interacting to create an organization’s workforce, the models that characterize those dynamics are largely symmetrical, relying on a consistent set of explanatory or predictor variables. As was noted in Chapter 5, those variables include individual employee characteristics, organizational practices, and external market influences. Of course, each organization will have a different set of specific variables that reflect its unique business context. The following is an example of one such set that was used for an ILM Analysis of one of the companies mentioned in this book:

External Influences	Organizational Practices	Employee Attributes
Unemployment rates Location Market share Competition (location, size, type, and number) Labor pool (demographics, income levels, education, occupations)	Size of facility (location, growth) Dispersion of incentive payouts Supervision (structure, spans of control, stability) Turnover rate (location, team) Employee heterogeneity Risk (variable pay, employment variability) Workload Performance (level and volatility) Line of business	Age Gender Education (level, specialization) Ethnic background Prior experience Recruiting source Termination reason Job family/occupation Salaried/ hourly Union status Tenure (in position, with company) Pay level (current, prior) Incentive earnings Relative pay in grade Level (job grade, level) Promotion history Transfer history (business unit, job family, country) Location (work, home) Performance rating

External Influences

Organizational Practices

Employee Attributes

Unemployment rates

Location

Market share

Competition (location, size, type, and number)

Labor pool (demographics, income levels, education, occupations)

Size of facility (location, growth)

Dispersion of incentive payouts

Supervision (structure, spans of control, stability)

Turnover rate (location, team)

Employee heterogeneity

Risk (variable pay, employment variability)

Workload

Performance (level and volatility)

Line of business

Age

Gender

Education (level, specialization)

Ethnic background

Prior experience

Recruiting source

Termination reason

Job family/occupation

Salaried/ hourly

Union status

Tenure (in position, with company)

Pay level (current, prior)

Incentive earnings

Relative pay in grade

Level (job grade, level)

Promotion history

Transfer history (business unit, job family, country)

Location (work, home)

Performance rating

Figure B-1: Case Example of ILM Variable List

The basic ILM analysis involves statistical estimation of models of the drivers of internal movements—promotion and lateral job changes—as well as the drivers of retention. It also involves statistical modeling of the drivers of compensation, including pay, total compensation, and annual pay growth.

Both linear and nonlinear regression techniques are used to estimate those models statistically. When the outcomes of interest (i.e., the dependent variables) are “categorical” or “dichotomous” in nature, as are events such as promotion, lateral movement, and turnover, we most often use “discrete choice” models, particularly in their binomial form. In these circumstances the outcome or event either occurs or it does not. That is how it appears in employee data. Hence, the dependent variable is restricted to those two alternatives, denoted 1 if the event occurs and 0 if it does not.

For binomial models, the objective of modeling is to determine the influence of specific explanatory variables on the probability that an event or outcome will occur. What is being estimated, then, is the probability that the event will occur given the values of the specified explanatory variables. Since the overall probability cannot be less than 0 or greater than 1, the mathematical expression or functional form relating those variables must restrict the outcome to the range between 0 and 1. A variety of probability distributions can be used for this purpose, the two most common being the cumulative normal distribution and the logistic distribution. We usually rely on the logistic distribution and employ logistic regression for estimation purposes.^[11] Typically, the models are structured to estimate the probability of the particular event in a specific year based on the values of the explanatory and control variables at the end of the prior year.

A logit model generally is specified as follows:

where P denotes probability and U is the vector product of parameter estimates, β, and the values of the corresponding explanatory and control variables, X. U is thought of usually as representing the level of “utility” that underlies an individual’s decision, such as to turn over or not.

Variants of a logit model take the form of different specifications of U, the linear component in the equation above. An example of a turnover model estimated through logistic regression specifies the following linear structure (for U):

U = β₀ + β₁* local unemployment rate + β₂* years of service + β₃*full-time status + β₄*degree_bach + β₅*degree_grad + β₆*supervisor’s span of control + β₇trainingtaken+ β₈*prior year pay growth+ β₉*(promotion = 1/0) + . . . + β_N* business unit A + β

Therefore the model to estimate is:

In this equation, the event of interest is on the left-hand side and the variables assumed to influence those events—the independent variables—are on the right-hand side. Statistical estimation produces estimates of these betas that reveal the significance and relative strength of each factor’s influence on the probability of turnover. They can be converted readily into so-called elasticities that measure the expected impact of a change in a particular characteristic, such as an employee’s performance rating or gender, on the probability that that employee will leave the organization in a particular year. Quite apart from its relevance to a retention strategy, just think how valuable such information could be to an organization that is evaluating its performance management system or its efforts in regard to diversity. Illustrations of such elasticities were presented in Figures 2-4 and 5-4.

For the models used to estimate drivers of continuous variables such as pay and pay growth we use ordinary and/or generalized least-squares regression. As was noted above, there is a vast literature in both economics and organizational psychology on the determinants and consequences of pay. The specific models we have developed as part of ILM analysis draw heavily on the established research but are more comprehensive in their coverage, especially in regard to organizational factors and individual characteristics that influence compensation.

Business Impact Modeling

Similar multivariate regression techniques are used for Business Impact Modeling. Once again a critical issue is to determine the functional form that will be utilized to characterize the relationships being studied. We noted in Chapter 6 that a family of methods is employed. Here we focus only on one—the production function—to illustrate what the actual statistical model might look like.

As was noted before, the production function is a mathematical expression of the relationship between inputs and outputs in the production process. The practical value of the production function is enhanced by augmenting the traditional specification of labor input on the right-hand side of the equation to reflect the broader array of workforce attributes and management practices that affect the productivity of human capital in an organization. In most instances it is possible to specify a form of production function that directly links critical business outcomes such as productivity and profitability to key characteristics of the workforce and the way human capital is managed. Here is an illustration:

Revenue per employee = β₀ + β₁*assets + β₂*average years of service + β₃*percent full-time status + β₄*urban location + β₅* office size + β₆*average supervisor span of control + β₇*average trainingexpenditure per employee+ β₈*percent bonus participation+ . . . . + β

In this adapted equation the business outcome on the left-hand side is revenue per employee, something that is tracked routinely. The purpose of the equation is to identify which among many possible human capital–related factors consistently drive revenue per employee after accounting for other influences, such as location and capitalization of the business units or facilities examined. Some of those factors seem intuitively plausible. For example, more experienced employees (measured as a count of years of service) and full-time employees (measured as 1 = full-time status and 0 = part-time) might be expected to produce greater revenue per employee in many organizations. Other factors are included as statistical controls in the model to allow for correct estimates of the key human capital parameters in the equation. For example, whether a sales location is in an urban area may have a substantial effect on revenue per employee. What if it turns out that most full-time employees work in urban areas and part-time employees work in suburban or rural areas and full-time employees on average produce more revenue per sale? Is the higher revenue per employee a result of the employee’s full-time status—and all that implies about what hours the employee is on the job or his or her interest in the work—or is it a result of the fact that full-time employees happen to work mostly in urban offices? The action implications differ greatly depending on how this question is answered. Statistical models answer questions such as this one, helping to identify which of the many possible influences are indeed the most important and thus are priorities for action.

As in ILM Analysis, the betas (βs) for each element on the right-hand side of the equation are estimated through the modeling process. The overall equation, then, is a statement about which factors contribute to revenue per employee and how important each factor is in comparison to the others. Unimportant factors can be removed, and the remaining factors can be prioritized by their relative importance. These are key facts in setting human capital strategies.

In addition, the modeling can be used to test for productive interactions, or “complementarities,” among different factors. For example, one might suppose, as research suggests, that training investments in employees have a bigger impact on labor productivity in environments where recruitment and selection criteria are more stringent or where the education level of the workforce is higher. Evidence of the latter relationship, for example, will be revealed if the overall weight of the variable “average training expenditure per employee” rises when that variable is statistically interacted with a measure of educational attainment (e.g., average years of education, percentage of the workforce with advanced degrees). Rigorous tests of interactions among these variables reveal whether and to what extent complementarities of these kinds materialize in the organization. You might recall from Chapter 3 that this kind of interaction was uncovered in the relationship between the utilization of part-timers and overtime. To everyone’s surprise, overtime actually improved productivity at Healthco. However, that was the case because managers were using overtme to deploy their more productive full-timers more intensively. Certainly, understanding that relationship had important implications for the way overtime use should be regulated in that organization.

When is Business Impact Modeling of this kind feasible statistically? Two conditions must be met at a minimum. First, a sufficient number of measures of performance over time are required in order to fill in the left-hand side of the model. This means that one needs to find like units within an organization whose performance can be compared systematically. In a manufacturing organization those units might be factories. In a retail company they could be stores. For a financial services organization branches are the likely unit of observation. Within a single business unit departments and cost centers are likely candidates for comparison. All that matters is that there are common output measures that can be tracked and enough observations over time to make statistical analysis viable. Obviously, the measures must be tracked at a level that corresponds to a meaningful workforce unit as well.

In the example above, hundreds, if not thousands, of sales transactions would become the left-hand-side data. Our example addresses revenue per employee, but remember that any performance measure—profit, errors, customer attrition, speed, efficiency, and so on—can be subjected to this type of analysis. Data to fill in the right-hand side of the model, the human capital side, typically come from HR information systems, payroll records, and related sources. Often they are variables that are derived as outputs of an ILM Analysis.

The second requirement is that there be enough variation in the internal operations and/or external environment of the facilities for it to be possible to discriminate between alternative drivers of performance. Business Impact Modeling operates by simultaneously comparing performance across facilities in an organization and within any specific facility over time. If all the units within the organization were identical or remained completely unchanged in the way they were managed over time, there would be nothing to compare. Performance differences would be random and therefore uninformative. Fortunately, one seldom finds such uniformity in the operations of organizations. Even if organizations are run “by the book,” the reality on the ground almost always differs across units. There are differences in business and labor market environments, differences in workforce demographics and ILM dynamics, and differences in operations. Those differences can be treated as a form of natural experiment that sheds light on the drivers of business performance, including those related to human capital.

By providing some simple illustrations of the models underlying ILM Analysis and Business Impact Modeling, we do not mean to suggest that there is a mechanical formulaic approach to addressing human capital management. First of all, there is nothing mechanical about the statistical analyses. In an actual assignment many statistical issues have to be addressed carefully, including determining the most appropriate model specification, determining the appropriate units of analysis, and deciding on the evaluation and treatment of the data (for instance, especially when perceptual data are introduced, one may need to perform a factor analysis to create valid summary measures of highly correlated responses that could cause multicollinearity problems otherwise). Testing the robustness of results to changes in model specifications and for estimation using different employee populations is usually important for building confidence in diagnostic results.

Most important, one must always keep in mind that the results of such analysis are only one source of information, albeit an important one. To determine their relevance to actual management practice, they should be filtered through the experience of those most familiar with the organization and combined with the results of more qualitative appraisals. As we noted in Chapter 2, good decisions require the right facts, and the right facts seldom are unearthed through the use of a single method.

^[11]A good reference source for the estimation of discrete choice models is William H. Greene, Econometric Analysis, 2nd ed., New York: Macmillan, (1993).