Getting Started | SAS/STAT 9.1, Users Guide, Volume 3 (volume 3 ONLY)

This section demonstrates how you can use the INBREED procedure to calculate the inbreeding or covariance coefficients for a pedigree, how you can control the analysis mode if the population consists of nonoverlapping generations, and how you can obtain averages within sex categories.

For you to use PROC INBREED effectively, your input data set must have a definite format. The following sections first introduce this format for a fictitious population and then demonstrate how you can analyze this population using the INBREED procedure.

The Format of the Input Data Set

The SAS data set used as input to the INBREED procedure must contain an observation for each individual. Each observation must include one variable identifying the individual and two variables identifying the individual s parents. Optionally, an observation can contain a known covariance coefficient and a character variable defining the gender of the individual.

For example, consider the following data:

  data Population;   input Individual $ Parent1 $ Parent2 $   Covariance Sex $ Generation;   datalines;   MARK   GEORGE LISA    .    M 1   KELLY  SCOTT  LISA    .    F 1   MIKE   GEORGE AMY     .    M 1   .      MARK   KELLY  0.50  . 1   DAVID MARK KELLY  .   M 2   MERLE MIKE JANE   .   F 2   JIM   MARK KELLY 0.50 M 2   MARK  MIKE KELLY  .   M 2   ;

It is important to order the pedigree observations so that individuals are defined before they are used as parents of other individuals. The family relationships between individuals cannot be ascertained correctly unless you observe this ordering. Also, older individuals must precede younger ones. For example, ˜MARK appears as the first parent of ˜DAVID at observation 5; therefore, his observation needs to be defined prior to observation 5. Indeed, this is the case (see observation 1). Also, ˜DAVID is older than ˜JIM , whose observation appears after the observation for ˜DAVID , as is appropriate.

In populations with distinct, nonoverlapping generations, the older generation (parents) must precede the younger generation. For example, the individuals defined in Generation =1 appear as parents of individuals defined in Generation =2.

PROC INBREED produces warning messages when a parent cannot be found. For example, ˜JANE appears as the second parent of the individual ˜MERLE even though there are no previous observations defining her own parents. If the population is treated as an overlapping population, that is, if the generation grouping is ignored, then the procedure inserts an observation for ˜JANE with missing parents just before the sixth observation, which defines ˜MERLE as follows :

  JANE  .    .    . F 2   MERLE MIKE JANE . F 2

However, if generation grouping is taken into consideration, then ˜JANE is defined as the last observation in Generation =1, as follows:

  MIKE GEORGE AMY . M 1   JANE .      .   . F 1

In this latter case, however, the observation for ˜JANE is inserted after the computations are reported for the first generation. Therefore, she does not appear in the covariance/inbreeding matrix, even though her observation is used in computations for the second generation (see the example on page 1970).

If the data for an individual are duplicated , only the first occurrence of the data is used by the procedure, and a warning message is displayed to note the duplication. For example, individual ˜MARK is defined twice, at observations 1 and 8. If generation grouping is ignored, then this is an error and observation 8 is skipped . However, if the population is processed with respect to two distinct generations, then ˜MARK refers to two different individuals, one in Generation =1 and the other in Generation =2.

If a covariance is to be assigned between two individuals, then those individuals must be defined prior to the assignment observation. For example, a covariance of 0.50 can be assigned between ˜MARK and ˜KELLY since they are previously defined. Note that assignment statements must have different formats depending on whether the population is processed with respect to generations (see the DATA= Data Set section on page 1976 for further information). For example, while observation 4 is valid for nonoverlapping generations, it is invalid for a processing mode that ignores generation grouping. In this latter case, observation 7 indicates a valid assignment, and observation 4 is skipped.

The latest covariance specification between any given two individuals overrides the previous one between the same individuals.

Performing the Analysis

To compute the covariance coefficients for the overlapping generation mode, use the following statements:

  proc inbreed data=Population covar matrix init=0.25;   run;

Here, the DATA= option names the SAS data set to be analyzed , and the COVAR and MATRIX options tell the procedure to output the covariance coefficients matrix. If you omit the COVAR option, the inbreeding coefficients are output instead of the covariance coefficients.

Note that the PROC INBREED statement also contains the INIT= option. This option gives an initial covariance between any individual and unknown individuals. For example, the covariance between any individual and ˜JANE would be 0.25, since ˜JANE is unknown, except when ˜JANE appears as a parent (see Figure 35.1).

  The INBREED Procedure   Covariance Coefficients   Individual  Parent1   Parent2     GEORGE      LISA      MARK     SCOTT     KELLY   GEORGE                            1.1250    0.2500    0.6875    0.2500    0.2500   LISA                              0.2500    1.1250    0.6875    0.2500    0.6875   MARK        GEORGE    LISA        0.6875    0.6875    1.1250    0.2500    0.5000   SCOTT                             0.2500    0.2500    0.2500    1.1250    0.6875   KELLY       SCOTT     LISA        0.2500    0.6875    0.5000    0.6875    1.1250   AMY                               0.2500    0.2500    0.2500    0.2500    0.2500   MIKE        GEORGE    AMY         0.6875    0.2500    0.4688    0.2500    0.2500   DAVID       MARK      KELLY       0.4688    0.6875    0.8125    0.4688    0.8125   JANE                              0.2500    0.2500    0.2500    0.2500    0.2500   MERLE       MIKE      JANE        0.4688    0.2500    0.3594    0.2500    0.2500   JIM         MARK      KELLY       0.4688    0.6875    0.8125    0.4688    0.8125   Covariance Coefficients   Individual  Parent1   Parent2        AMY      MIKE     DAVID      JANE     MERLE   GEORGE                            0.2500    0.6875    0.4688    0.2500    0.4688   LISA                              0.2500    0.2500    0.6875    0.2500    0.2500   MARK        GEORGE    LISA        0.2500    0.4688    0.8125    0.2500    0.3594   SCOTT                             0.2500    0.2500    0.4688    0.2500    0.2500   KELLY       SCOTT     LISA        0.2500    0.2500    0.8125    0.2500    0.2500   AMY                               1.1250    0.6875    0.2500    0.2500    0.4688   MIKE        GEORGE    AMY         0.6875    1.1250    0.3594    0.2500    0.6875   DAVID       MARK      KELLY       0.2500    0.3594    1.2500    0.2500    0.3047   JANE                              0.2500    0.2500    0.2500    1.1250    0.6875   MERLE       MIKE      JANE        0.4688    0.6875    0.3047    0.6875    1.1250   JIM         MARK      KELLY       0.2500    0.3594    0.8125    0.2500    0.3047   Covariance Coefficients   Individual Parent1   Parent2         JIM   GEORGE                            0.4688   LISA                              0.6875   MARK        GEORGE    LISA        0.8125   SCOTT                             0.4688   KELLY       SCOTT     LISA        0.8125   AMY                               0.2500   MIKE        GEORGE    AMY         0.3594   DAVID       MARK      KELLY       0.8125   JANE                              0.2500   MERLE       MIKE      JANE        0.3047   JIM         MARK      KELLY       1.2500   Number of Individuals    11

Figure 35.1: Analysis for an Overlapping Population

In the previous example, PROC INBREED treats the population as a single generation. However, you may want to process the population with respect to distinct, nonoverlapping generations. To accomplish this, you need to identify the generation variable in a CLASS statement, as shown by the following statements.

  proc inbreed data=Population covar matrix init=0.25;   class Generation;   run;

Note that, in this case, the covariance matrix is displayed separately for each generation (see Figure 35.2).

  The INBREED Procedure   Generation = 1   Covariance Coefficients   Individual    Parent1     Parent2         MARK       KELLY        MIKE   MARK          GEORGE      LISA          1.1250      0.5000      0.4688   KELLY         SCOTT       LISA          0.5000      1.1250      0.2500   MIKE          GEORGE      AMY           0.4688      0.2500      1.1250   Number of Individuals    3   The INBREED Procedure   Generation = 2   Covariance Coefficients   Individual   Parent1    Parent2       DAVID      MERLE        JIM       MARK   DAVID        MARK       KELLY        1.2500     0.3047     0.8125     0.5859   MERLE        MIKE       JANE         0.3047     1.1250     0.3047     0.4688   JIM          MARK       KELLY        0.8125     0.3047     1.2500     0.5859   MARK         MIKE       KELLY        0.5859     0.4688     0.5859     1.1250   Number of Individuals    4

Figure 35.2: Analysis for a Nonoverlapping Population

You may also want to see covariance coefficient averages within sex categories. This is accomplished by indicating the variable defining the gender of individuals in a GENDER statement and by adding the AVERAGE option to the PROC INBREED statement. For example, the following statements produce the covariance coefficient averages shown in Figure 35.3.

  proc inbreed data=Population covar average init=0.25;   class Generation;   gender Sex;   run;

  The INBREED Procedure   Generation = 1   Averages of Covariance Coefficient Matrix in Generation 1   On Diagonal      Below Diagonal   Male X Male                1.1250              0.4688   Male X Female               .                  0.3750   Female X Female            1.1250              0.0000   Over Sex                   1.1250              0.4063   Number of Males          2   Number of Females        1   Number of Individuals    3   The INBREED Procedure   Generation = 2   Averages of Covariance Coefficient Matrix in Generation 2   On Diagonal      Below Diagonal   Male X Male                1.2083              0.6615   Male X Female               .                  0.3594   Female X Female            1.1250              0.0000   Over Sex                   1.1875              0.5104   Number of Males          3   Number of Females        1   Number of Individuals    4

Figure 35.3: Averages within Sex Categories for a Nonoverlapping Generation